The Center for Data Innovation spoke with Ben Balter, GitHub‘s government evangelist and former White House Presidential Innovation Fellow. Balter discussed the evolution of the White House’s Project Open Data and his vision for the collaborative future of open data.
This interview has been lightly edited.
Travis Korte: What does a government evangelist at GitHub actually do? What are some projects you’ve been working on recently?
Ben Balter: At GitHub, the government team educates and inspires government employees and civic hackers to get more involved with open source, open data, and open government efforts across all levels of government. Through a combination of community engagement, white glove support, and customer advocacy, we seek to increase the awareness of and appreciate for open source in the public sector, be it through intra-agency, inter-government, or public/private collaboration. Think about it this way: the geeks in government already love open source, yet often their suit-and-tie wearing superiors assume open source is limited to a community of hippie hobbyists with tie-dye laptops. We help make the case that open source is an integral aspect of any enterprise solution.
In terms of projects, there’s a small, semi-private community, open only to government employees at github.com/government, which curates best practices around open source in government which has been a great resource for agencies wishing to dip their toes in the water, and we’re hoping to take that idea further by holding technical training sessions and executive briefings both online and in DC so that when innovators and other change agents go to their boss, the groundwork is already laid to do things in the open.
TK: You were one of the primary architects of the White House Open Data Policy. Since you’ve left the project, what do you think has gone well in terms of implementation and what could have been done better?
BB: When we first sat down to sketch out the Open Data Policy, the default workflow at the White House was to sit down at a desktop word processor, accept comments via email (tracked by hand, in a spreadsheet), and then once “perfect,” publish as a PDF, to be set in stone forever and ever. Two things have changed since then:
First, what shipped was seen as a version 0.1, not as a 1.0. The policy document itself (the thing with the CIO and CTO’s signatures) simply provided a framework and pointed agencies at Project Open Data, a living document. There have been nearly 500 changes since its initial release, and in what the White House called “Open and Iterative Policy Development” two weeks ago, is seeking to iterate on not only the technical guidance and best-practices documentation, but on the policy itself—a first.
Second, none of this is happening in a vacuum, as is traditionally the case. There have been nearly 50 contributors to the policy since its release, including by lay people and subject-matter experts both in and outside of government. There have been highly technical, highly detailed discussions on the merits of proposed changes, some as small as a single character. And all this is happening outside the White House firewall, in the open, and in a way that anyone—from K street lobbyists to the 18-year old that just completed her high-school civics class—can participate and have an equal say. Every change, whether just proposed or actually realized, is tracked, giving the policy an audit trail of who made what change when. Best of all, each of these elements have their own unique URL, further facilitating a broader discussion within the community.
TK: You’ve also been a major proponent for government adopting agile development methodologies, open source, and other aspects of software community culture. What else do software engineers know that government doesn’t?
BB: The private sector, strongly influenced by the success of open source, seems to has gravitated around several ideals across process, product, and people:
Process: Open source is bound by certain constraints. Rarely are two people working on the same thing at the same time nor are they in the same place at the same time, yet open source often produces better outcomes than its proprietary counterparts. Think Encyclopedia Britannica vs. Wikipedia. These constraints are that everything must be electronic (meaning discussions are captured in a high-fidelity medium like chat or issues), available (meaning every step of the process, every discussion has a URL), asynchronous (meaning you’re not requiring facetime or interrupting flow) and lock-free (meaning rapid experimentation and prototyping is encouraged).
Product: The types of products open source produces tend to look different from purpose-built alternatives. This is often due to the fact that open source projects must be abstracted and modular enough to be generally applicable across different use cases. The software produced by open source projects tends to be lean (meaning if there’s a less heavyweight solution that relies more heavily on existing tools, services, or standards, it’s been over-engineered), iterative (meaning if you’re not embarrassed by the first version of your product, you’ve waited too long to ship), decentralized (meaning it avoids single points of failure, both in systems and in people), and open (meaning barriers to the free-flow of information are seen as a liability, not an asset).
People: Open source projects attract and thus are shaped by culture vastly different than the ones found in most government contractors’ offices. It’s a culture strongly influenced by the hacker ethic—a commitment to sharing and an eagerness to solve challenging technical problems in elegant, scalable ways. Open source culture seeks to minimize information asymmetry (meaning those inside and outside the firewall have access to the same information and shape the big picture), grow communities around shared challenges (meaning contributors are encouraged, empowered, and acknowledged), trusts fellow contributors (meaning social constraints are preferred to technical constraints), and minimizes friction (meaning the time it takes for a potential contributor to go from “I might want to contribute” to “I have contributed”). Above all, open source optimizes for happiness, something we don’t often talk about within risk-averse government.
TK: You had an excellent blog post a while back titled Treat Data as Code. In it, you made a couple of suggestions about how to make open government data more accessible and usable. Could you expand a little on that vision, and what you see as the next steps for the open data community?
BB: There are a lot of parallels between the open data community today and the open source community 10-20 years ago. The tooling isn’t quite there yet, it’s still fighting for legitimacy among more conservative technologists, and by-and-large you need technical knowledge to really dig in or appreciate the nuances. When open source started out, source code was posted to FTP servers and if you had any questions you emailed the author. Then purpose-build publication platforms like SourceForge emerged to make publication easier, but it was still very much just that: publication. That’s where open data is today. Some agencies FTP Excel files to a public-facing server, others use data publishing platforms like Socrata, but it’s still a snapshot and a one way broadcast. Two things should change about how we treat data.
First, data should be multi-dimensional. The interesting story, especially with regulatory data, isn’t where the data is today, but how it’s changed since a policy decision or over time. That’s one thing that open source has built tooling to do extremely well, and with relatively little friction. Go to any software project, and not only can you get version 1.0 from a few years back, but you can also see who made what change when for every subsequent iteration since. We don’t have that in the open data community, not because the technology’s not available, but because the culture’s not there yet. We’re still fighting to get data outside the firewall, let alone to expose process alongside it.
Second, data should be collaborative. We’re still transacting everything though email. “What’s this column heading mean? What format is this date in? Oh hey, by the way, this should be a seven, not a nine.” Everything hits the data publisher’s inbox and requires a one-off response (or more frequently, gets lost among competing priorities). Agencies should be facilitating conversations. They should be creating the vehicle for subject matter and technical experts to connect. “You know Python? Great. I know about FCC broadcast regulations and have a killer idea. We should push those conversations to agency-curated forums where I can not only see the questions other data consumers are asking (which have presumably already been asked), but answer them myself, or better yet, submit a proposed fixed.” We need to push the conversation outside the firewall and make it agency-curated, community-owned data, not the other way around. Open data shouldn’t just be open, it should be collaborative.
TK: Version control is, in some fundamental sense, about accountability. Do you know of any interesting examples where people have used government GitHub activity as data per se to analyze and derive insight from? If not, do you see this as a useful possibility?
BB: Imagine if every piece of legislation, from conception to law was tracked in an industry standard, open source version control system. Now when that highly paid lobbyist walks in to influence legislation, the congressperson can respond with the standard open source mantra, “It’s on GitHub, pull requests welcome.” All of a sudden (assuming the we get the tooling right), any citizen with access to the internet is on equal footing. Not to mention, ten years after the fact, we can go back and review that pull request and subsequent discussion—no more squinting to guess author’s intent. Two recent examples that stand out in my mind as first steps:
@FISACourt, run by Eric Mill, is a super-simple bot that checks the Foreign Intelligence Surveillance Act court website on a regular basis for opinions, and if found, stores a copy in a GitHub repository, where differences can be easily tracked and linked to.
@SCOTUS_servo, run by David Zvenyach, takes that same idea and applies it to Supreme Court slip opinions, which often differ from the official opinions published by a select set of private publishers. The bot versions the text on the Supreme Court’s behalf, and can even compare changes in the resulting PDFs using a nice bit of code from Joshua Tauberer.
Both of these are examples of civic hackers outside of government showing the government how, with a literally a few lines of code, open data coupled with version control can provide a wide array of souped-up citizen services and operationalize a level of transparency previously only talked about.