The Center for Data Innovation spoke with Shaun Bierweiler, vice president of U.S. public sector at Hortonworks, a software company based in Santa Clara, California. Bierweiler discussed the benefits of using enterprise open source software for government agencies and how Hortonworks is helping to modernize the 2020 Census in the United States.
Joshua New: Can you explain what the benefits of using open source software would be for a government agency?
Shaun Bierweiler: Before we dive into open source, it’s important that we distinguish between “open source” and “enterprise open source” software. Open source software is technology that has a source code open to the community for anyone to view, modify, or improve upon. Open source allows for collaboration among contributors that can greatly enhance the speed at which members of the community can access new technologies and resources, and create innovations in their own IT ecosystem.
Enterprise open source leverages all of the benefits of open source—enhanced capability due to supported collaboration and contribution, unparalleled flexibility from a selection of the “best” choices, and improved cost efficiency that comes from avoiding vendor lock-in—but provides the security, reliability, and interoperability of a commercial off the shelf (COTS) product. This is the key distinction that makes enterprise open source a viable solution for government agencies.
In short, enterprise open source software is a fast, functional, and future-oriented IT infrastructure whose innumerable benefits could help any government agency accomplish its tasks.
New: What are some of the biggest difference you’ve noticed in how agencies at different levels of government manage data? Beyond size, why might a local government agency use a different approach than a federal agency?
Bierweiler: Every government agency—whether federal, state, or local—has a desire and a need to leverage their ever-growing data sets to more effectively accomplish their missions and serve constituents. However, the actionable intelligence they hope to gain insight into or from varies depending on the mission or the data they are collecting. Another key consideration is the data management tools and resources the agency has at its disposal.
Local and state government agencies are challenged with establishing and managing an IT infrastructure but often have even tighter budgets than federal agencies, thereby inhibiting their ability to keep pace with technology advancements while sustaining their legacy systems.
Whatever the case, these agencies require a tailored approach that enables them to periodically update infrastructures without allocating significant costs to do so. This is what makes enterprise open source data management solutions ideal. Rather than delegating funds to develop and build solutions “in house,” agencies can harness the contribution and support of the large open source communities and leverage enterprise-ready solutions that will reduce integration cost and risk while improving the operational effectiveness and efficiency of their infrastructure.
New: As agencies work to publish open data, they often run into issues of competing priorities: open data initiatives on the federal level do not provide funding to agencies to publish their data, but their budget is already stretched to meet other responsibilities. What can agencies do from a data management perspective to make it easier to commit to treating all of their data as open by default?
Bierweiler: Agencies are forced to juggle a handful of various priorities at any given time to serve citizens, so even a flawless data system is not a silver bullet to streamline operations. That being said, making IT a priority can help fix all of the other operational pieces of an organization. While there will always be competing priorities, investing in a future-oriented solution like enterprise open source will help save costs in the long run, that can be re-allocated elsewhere.
In order to enable openness in the public sector, data managers must make data management cost effective and time efficient. A great place to start in achieving cost savings is modernizing an agencies central data repository, or enterprise data warehouse (EDW). An optimized data warehouse for the larger data system is a lot like maintaining a healthy heart for the benefit of the body. If you can ensure that the core repository of information is cycling data through in an efficient manner, you can improve the health for the rest of the system.
New: The value proposition for a city to deploy smart city technologies lies in the data these applications generate, rather than just the physical devices themselves. How does Hortonworks help cities take advantage of this data?
Bierweiler: You’re right in saying that the true value of smart city technologies lies in the data. However, it must be properly leveraged into insights and actionable intelligence in order to produce real value for agencies. State and local governments are running into challenges finding enterprise-ready technologies that can deliver the benefits of this data while staying within their budgets.
Hortonworks attempts to help cities take advantage of this data by providing solutions that can slice through the massive amounts of data to deliver the right intelligence to the right people in real-time—without the astronomical costs of working with a proprietary vendor. These solutions have strong enough processing capabilities to sift through unstructured data quickly or evaluate archived data with predictive analytics.
A good example of this in action is our work with Metro Transit of St. Louis (MTL). MTL faced challenges when it came to the maintenance of their bus system. Due to a lack of data, they often found themselves replacing bus components or entire buses when it may not have been entirely necessary. Their solution for this problem was to develop a Smart Bus that generated machine data during its operation to help predict and schedule its own maintenance. We employed one of our data solutions to compress and store the data, which allowed for it to be analyzed for impending maintenance requests and support MTL’s mission in the future.
New: Can you describe how Hortonworks is helping the Census Bureau prepare for the 2020 Census? What kind of new challenges does its efforts to modernize pose to data collection and management?
Bierweiler: The Census Bureau is aiming for the 2020 Census to be the most technologically advanced and automated data collection event to date. In particular, an unprecedented change has been made to the way that the Census is conducted; for the first time, the headcount of citizens will be taken largely online as opposed to the traditional paper mail system. As you suggest, this attempt at modernization creates its own set of challenges for data collection and management.
For starters, the rapid pace of change of technology itself makes it difficult to plan for and test the technology before new solutions are introduced. Additionally, modernization requires funding that is more or less unavailable in its entirety in the current constrained fiscal environment. Finally, with the massive amounts of data soon to be collected, more effort is required to collect, organize, and store that data in an efficient way while also ensuring accuracy when compared to traditional methods.
Hortonworks intends to help the Census Bureau with several of these challenges by providing the data platform for the 2020 Census. Hadoop services as provided through Hortonworks’ solutions allow for the collection and evaluation of data in real time in motion and at rest from any number of sources and in a variety of formats. This provides an innovative, flexible, and cost-efficient way to mine, process, and extract insights from collected data. This will result in quality, insightful information that will be converted into the groundwork for new Congressional districts, state budgetary allocations, and more.