The Center for Data Innovation spoke with Hicham Oudghiri, co-founder and CEO of data search company Enigma. Oudghiri talked about using Enigma in the underwriting and insurance sector and why Delaware corporate registrations are at the top of his data wish list.
This interview has been lightly edited.
Travis Korte: Can you introduce Enigma, what it can do, and who uses it?
Hicham Oudghiri: Enigma is a large-scale search and discovery platform sitting on top of the broadest collection of public data on the market. We provide a new way to access and search all this data. The first goal was to get as much data as we could and then provide an experience around jumping from dataset to dataset. Everyone from insurance companies, investigative journalists, hedge funds, students, and researchers use Enigma as a portal. The second thing we’ve done is to interpret data ourselves. We asked the question: what if transparency isn’t enough? What if we took the responsibility of interpreting difficult data problems?
One place this applies is in real-time company verification and underwriting. We have a suite of analytics products on top of Enigma powered by the Enigma graph, a representation of data that’s entirely linked and entity-based. That’s been the bulk of our technological output over the last year. Our strategy is to partner with people who have impact in consuming data. One place this is useful is with alternative lending market companies. Small and medium sized businesses looking for access to credit have no real profile when they go to an American Express or a Capital One or JP Morgan, and they’re basically denied opportunities for underwriting, so they go to these alternative companies that charge extremely high interest because they’re willing to do deeper due diligence work. There’s a lot of automation we can do for underwriting. We know for any given business what their 401k filings are, whether they’ve been visited by the Department of Labor, how many employees they have. We have a hundred indicators giving us insight into companies that haven’t been productized. The goal has been to take all the data, link it together, and think of what analytics to provide for this to be valuable to aid real-time decision making.
TK: A big part of Enigma’s job is cleaning data sets so that they can be combined with other data. Is this problem going to be solved as more government agencies release their data in open and machine-readable formats? If not, what advice would you give to government agencies opening up their data so that they can make your job easier?
HO: We’re still at the early stages of this stuff, where people aren’t even releasing catalogs of their datasets. We’re still seeing agencies knowing what datasets they have but not understanding what exactly they should release. These problems are so ingrained by laws, providers, etc., and this is seen in the United States and abroad as well. The problem of data access is still there. But in the future, people will be releasing more and more.
Take the U.S. Census: we have the whole community trying to make sense of this data. We’re still a ways away from the release of the data coming close to having some sort of standards. That said, we think this will eventually become more accessible. Governments still are unsure of the value of their data and how much money to invest in opening it up. The most important job, though, is linking, not cleaning. When data sets are released, they may be clean, but that doesn’t mean there’s a common syntax to relating them together. We see that as our biggest mission going forward: forcing ourselves to think of data as a whole rather than multiple data sets. The problem for us is the more data becomes open, the more there is a risk of it not being able to communicate with other data. So we need standards more than in just formats but also in semantics. If there’s no identifier, there’s no common way to transact with open data by entity. People are still unsure how to manage interpolation of specific data points. We think these will be the problems of the future.
TK: What data sets are at the top of your wish list? This can include public data that isn’t yet ingested into Enigma, as well as government data that that hasn’t yet been released.
HO: Delaware corporate registrations is a huge one. We’re closer to getting Swiss corporate registrations than we will ever be to getting Delaware corporate registrations, which is astounding. The price for buying these is prohibitive: it’s in the hundreds of thousands of dollars. Most states provide it practically free.
We’re also extremely interested in more real-time sensor data becoming public: traffic sensors, bridges, tunnels. Even though many real-time systems are available, the data is often aggregated for release. People parse the Twitter firehose, but that’s it.
TK: It must have been exciting when President Obama tweeted your government shutdown visualization last year. Talk a little about how government works with Enigma—not just as data providers but as users of the platform.
HO: We have excellent relationships with the government. We’ve been to White House and interacted with multiple agencies. We feel our job is to be involved in the coordination of all of these open data efforts and help with the standards. The biggest value we can bring to the table here is helping figure out what helps companies and individuals create value from data. Government has a huge job on its hands to measure the value of this stuff. A couple months ago I was invited by the French prime minister to give a speech about this with players like [former New York City director of analytics] Mike Flowers and a couple of folks on the American side, educating them around this issue: it’s not just transparency, it’s about putting data to use. Stories need to be brought forth not only on the government side but also on the private side.
It took years to get Human Genome Project right, but it has now become foundation for so much science. So having first public use cases and explaining to people why this work is so valuable is a role we’ve taken on very seriously.
TK: I first saw Enigma demoed in a journalistic context. Can you walk me through a use case of an investigative reporter or auditor working with Enigma?
HO: We work with any journalistic institutions. Our attitude is, “Hey, this stuff is now free for you to use. Come and use it, and if you’ve got a big problem call us and we’ll help.” We have close relationships with the New York Times and a couple of folks at smaller institutions like Quartz. Here’s a use case: an article by Ian Urbina at the New York Times. The U.S. government has banned manufacturing in certain factories in Bangladesh, so companies like Wal-Mart are not allowed to use them because they’re so bad. The journalist used Enigma to link shipping reports, company reports, etc., and found that government was subcontracting in these factories without even knowing it. We use stories like that to get feedback about the best way to present data and clean it. Journalists expose something amazing with the data and at the same time we get this added signal as to how to make data and product better. We feel like journalism has carried the torch.
That’s why we call it Enigma: it’s a mystery what we’ll find in this data. And journalists have the hunch.
Photo: Flickr user Etalab