The Center for Data Innovation spoke with Robert Goldman, director of product at FindTheBest, a Santa Barbara, California-based research and comparison tool that takes in a wide range of data sources and structures them to encourage information discovery. Goldman discussed some of the data sets the company has found unexpectedly popular, as well as how to integrate humans and machines into a complex hybrid product.
This interview has been lightly edited.
Travis Korte: Can you briefly describe FindTheBest, what you make, and who uses it?
Robert Goldman: FindTheBest is a research engine focused on collecting, structuring, and connecting the world’s data—59 billion facts to date—to give people all the information they need to research with confidence. It includes information on over 2,000 topics, from smartphones and companies to dog breeds and members of Congress, and most recently we expanded to real estate. 23 million people visit our site each month when they have an important topic to research. It’s basically anyone who’s ever been overwhelmed by searching for information online.
TK: In particular, I’m curious if you’re seeing any business uses, or if there are any you’re looking to explore in the future you can mention.
RG: Several of our topics help small businesses, particularly in the software and business resource categories. For software, we cover topics like project management software, CRM software, accounting software, and a dozen others. For business resources, we offer topics like advertising agencies, payroll service companies, web design firms, and much more.
We’ve also gotten a lot of feedback from journalists who find our company topic to be incredibly helpful. We’re able to pull data from a variety of sources to give a holistic picture of a company—a standard company listing includes general information, SEC filings, Form Ds, product information, executive information, government contracts, and more.
Finally, we’ve all been surprised by the success of our Section 8 Housing comparison, which became a hit early on and has only grown since. For individuals investing in real estate, it’s the easiest way to research, evaluate and compare Section 8 housing complexes.
TK: FindTheBest draws from a wide range of open data sources. Can you talk about some of the ways you use open government data to build a better product?
RG: Government data powers hundreds of topics across our site, from education (e.g., colleges and public schools) to health (e.g., health care coverage and medical services) to business (e.g., SEC filings). What’s more, we can integrate this content into other topics to make the research experience even better. So for instance, if you’re researching a home purchase, you can explore nearby schools: How well do the students perform? What are the demographics? What are the associated costs? Much of this supplementary data comes from the government.
TK: One thing I find fascinating about FindTheBest is the smooth integration of both machine-produced and human-produced components. What aspects of the product do you need humans to help out with, and what aspects can be most easily automated?
RG: Building a platform of this scale—59 billion facts (e.g., Harvard’s tuition), 726 million things (e.g., Harvard University)—requires great technology. Our platform can ingest just about any kind of data, a process that requires a sophisticated back-end. However, machines aren’t sophisticated enough and data isn’t standardized in a way that allows for a completely automated approach. Machines alone can’t organize and present that data in intuitive, human-readable ways.
Our process always starts with humans. When we expand into a new topic, our product team defines all the attributes someone would need when researching that topic. The team is split up by experts in each category, so that the person managing the content has a deep understanding of that particular topic or industry. They then get to work gathering all this data, whether that means filing a FOIA request, licensing the data from a third party, or sending a team of people out to collect every individual data point. Humans are critical to defining and collecting all the data we need—our platform instantly ingests this data and makes it meaningful.
TK: By merging the various data sources and adding contextual information and structure, FindTheBest is adding quite a bit of value to that raw data. Do you, or do you have any plans in the future, to sell the post-processed data on its own?
RG: We have done some limited data sharing in the past and we’re exploring future possibilities. The challenge here is that a significant portion of the value we add is actually at the presentation layer. Sitting behind any one of our comparisons might be hundreds or thousands of data tables, that if downloaded on their own would be useless to most people. Yes, we spend a ton of time merging data sources and adding contextual information—some of this is done with the raw data, but a lot of this is done through our proprietary technology platform that packages everything together in an easily digestible format on the front end.