The Center for Data Innovation spoke with Benn Stancil, the chief analyst at Mode, a software company that creates tools for analyzing and visualizing data collaboratively. Stancil talked about the big opportunity for collaborative tools in data journalism and how Mode’s approach to version control differs in key ways from GitHub‘s.
Travis Korte: What’s broken with the current way people collaboratively work on data, and how does Mode help fix it?
Benn Stancil: Great analysis on public data is great not just because of the questions that it answers, but because of the questions it inspires. Unfortunately, if I discover something that inspires a question, I usually have to spend a great deal of time partially recreating the original author’s work. I have to track down their data sources, piece together their analysis, and figure out their assumptions. In the best case of well-documented work, this takes a lot of time and I rarely get everything right. In the worst case, the burden is too great and my question goes unexplored.
Individually, the inability to dig into others’ work is frustrating—but aggregated together, the burden slows how we learn from data. People around the world are working with similar data sets and on similar problems, but aren’t connected enough to push their collective knowledge further, faster.
Mode aims to fix these problems by allowing everyone to immediately explore and build on anyone else’s analysis. Mode is a web-based tool that combines SQL (R and Python to come!) and visualization editors with publishing and discovery tools. It executes analysis, displays results, and renders visualizations in one place. By gluing every step of the analytical workflow together, Mode ensures that insights are never separated from the analysis and data that produced them. This makes the underlying work more shareable, discoverable, and reproducible. If analysis is transparent, others can immediately build on it and start find new insights right away.
TK: I notice you and other Mode analysts have posted some sample use cases on the launch page, spanning a wide range of fields from sports to venture capital to business analytics. I realize Mode has a broad focus, but is there one field or sector where you think Mode is most likely to make its first big impact?
BS: We want Mode to be a place for anyone working with data, regardless of field or sector. We see a huge opportunity, however, to better connect media outlets to their readers through data. While new publications like FiveThirtyEight and the New York Times‘ The Upshot are often pointed to as examples of data’s increasing popularity in journalism, the trend has much wider and deeper roots.
Many outlets want to be transparent, but simply lack tools designed for sharing data and analysis in an effective manner. There’s also an increasing emphasis on reader engagement. Mode offers a great platform for both. Not only can journalists share their data and analysis, but they can do so in a way that’s easy for readers to replicate and extend. We think this can be a powerful tool—sharing through Mode encourages further engagement and on a platform that allows a community to share additional insights in one place. It’s also a win for readers, who now can access to the data in a way that enables them to immediately explore questions the original reporting inspired. Perhaps most importantly, it’s a win for society at large: transparency both encourages scrutiny of existing work and helps uncover new findings.
TK: Mode has been called “GitHub for Data.” How does Mode’s approach to version control differ from Github’s?
BS: A number of people have drawn comparisons to what we’re doing with what GitHub did for software engineers. I think we share some philosophical similarities with GitHub, but our product is actually quite different.
Philosophically, we—like GitHub—believe in the power of transparency. Work done in the open tends to produce better results, faster. We also believe that great ideas can come from anywhere, not just from the most famous or most connected people. GitHub is transformational because it built a process—pull requests—that allows anyone in the world to contribute to open source software without an explicit invitation. At Mode, we are applying the same idea to analysis.
Mode is purpose-built to bring transparency and collaboration into an analyst’s unique workflow. While it’s valuable to share versioned code that underlies analysis, it’s more powerful if this code is shared with the data it’s run on and the results it produced. Mode organizes SQL code and results in a simple package that’s directly connected to the data and can be updated with one click. This way, whenever analysts return to a previous version of their work or readers discover work published in a newspaper or on a blog, they can see the full context—data, code, results—on which it was built.
TK: Right now you support data analysis in SQL. What’s the logic behind that, and do you plan to expand to some of the newer languages data scientists use?
BS: We do; we have plans to add support for R and Python. We started with a focus on SQL because, for most analysts, it’s is a foundational tool. It’s the language required to work most databases. Even analysts working with Hadoop often use SQL through technologies like Hive. Because of this, the most advanced data scientists spend a lot of time working with SQL—and many analysts work almost exclusively in SQL. If analysts’ workflow is built on SQL, Mode should be, too.
There are some additional benefits to a SQL-first approach. Tying back to our goal of making analysis more open and transparent, SQL is relatively easy to understand and learn. (To this end, we’ve invested in writing, in plain English, a free SQL tutorial for anyone interested in learning.) Additionally, despite SQL being crucial to so many people, SQL users are underserved. Because many new analytics tools focus on the newest technologies, older tools like SQL are often overlooked.
That said, we plan to add support for tools like R and Python. As we add analytical functionality, we see Mode being less about the specific technologies it includes, but more about how it ties them together. There are a lot of great analytical languages available today—we have the power of the open source community to thank for this—and there’s no doubt many more will emerge. We don’t want Mode to dictate which of those tools analysts can use. Instead, we want Mode to be the glue that connects your work in one technology to your work in the next. The more of these connections we can maintain—by seeing how data is extracted using SQL, then manipulated in R, then visualized with D3—the more completely we can understand an analysis, and the faster others can expand on it.
TK: Mode also just announced a major round of venture funding. What are some of the things your team will be working on with the new funding?
BS: We’re making the current product as useful as we can, with a few initial focus areas. First, we’re making it easier to connect more data to Mode. This includes adding more tools for privately connecting Mode to existing databases and building a robust data API that allows data providers to make public data more accessible.
Second, we want to continue adding support for more analytical tools. That not only includes languages like R and Python, but also includes better charting tools and better ways for people to present data. If you want to show an entire array of maps based on a complex model built in Python, we want to support that.
Finally, we’re working on ways for people to better discover analysis. If you’re working on a dataset—or if you’ve provided a dataset—we want you to be able to quickly see who else is using the data and what they’re doing with it. By connecting people through the data and the work, we hope to make the open data community much more valuable for everyone.