The Center for Data Innovation spoke with Colin Gounden, CEO of Cambridge, Massachusetts-based Via Science, which built a software platform that automatically generates predictive algorithms from vast datasets. Gounden discussed some manufacturing applications for the company’s new streaming analytics capabilities and a recent initiative to comb through unmanned aerial vehicle (UAV) camera data.
This interview has been lightly edited.
Travis Korte: Can you introduce your Reverse Engineering/Forward Simulation (REFS™) software and briefly explain what you mean when you describe Via Science as a “big math” company?
Colin Gounden: Math is everywhere. If you have a list of customers and their previous purchases and want to predict what they might buy next, you would create some kind of mathematical algorithm or formula to do so. If you’re Amazon and you want to predict at an individual level what a customer might buy next across millions of customers and millions of products in real-time, that’s Big Math.
REFS™ is our software platform that automatically generates Big Math algorithms directly from data with limited human input across a wide variety of problems.
As another example, if you are trying to figure out profit margin you could look at price (the cost per unit) and volume (how many units are sold). As each of price and volume go up, the absolute profit that you make also goes up. The tricky bit is that as price goes up, volume may go down. That is, price and volume are not independent of each other.
When variables are not independent of each other, the math can become very hard. We use a form of mathematics that’s good at figuring out the right algorithms when there are dependent variables. This is hard enough with three obvious things like profit, price, and volume. In a big data world where there might be hundreds of variables, there might be an astronomical number of possibilities and dependencies. Finding the right algorithm in that world is Big Math and what our platform discovers in an automated way.
TK: Via Science strives capture causation in addition to correlations within data. Causal inference is still an exotic topic even for many professional data scientists; do you have rules of thumb about what kinds of data lend themselves to causal statements and what kinds of data resist that kind of analysis?
CG: To find cause and effect directly from data, you need what we call “interventions.” For example, if you are trying to figure out if sales at a retailer are going up because of a new TV ad for the retailer or a new product introduced at the retailer, you’d like to see sales data in markets where there is a TV ad and where there isn’t and sales data in markets with the new product and those without. The display of an ad or introduction of a product is called an intervention.
Wherever there are interventions, there’s the ability to find cause and effect relationships.
TK: The Via Science website lists work in finance, health care, consumer packaged goods and retail, energy, and telecommunications. That’s quite a broad range: are these simply markets with high data availability? What do you look for in an industry when deciding if Via Science can make an impact in it?
CG: We don’t have an algorithm for a specific problem or industry. Rather, we have a software platform that generates algorithms from data so that lends itself to a wide variety of problems.
In general, we look for high value questions to answer. We often tell customers if you can solve your problem with Excel or in SPSS, you should go do that. The people we work with tend to demand better answers than they could get with any other approach in the market. As a result, we work across industries but usually with companies that are or want to be leaders in their market.
TK: You were recently featured in an article about your work in predictive analytics for sensor-equipped UAVs. Can you talk a little about that project specifically and what you’ve concluded?
CG: This initiative came out of a number of conversations with both energy companies and domestic security agencies. What we found is that gathering video data and other sensor data through UAVs or fixed cameras and sensors is becoming increasingly commonplace. The issue is that there aren’t enough people to look at all of that data.
Tom Davenport, who wrote Competing on Analytics and a number of other analytics books and is an advisor to Via Science, recently told me that the US flies drones over Afghanistan but doesn’t have enough people to look at the thousands of hours of footage from them.
We saw this as an opportunity for our software. What we’ve concluded is that there is an important first step to convert the video footage to features. If you are looking at cars, for example, features would be things like size, color, speed, direction, location, etc., of those cares. Once features are defined from the video, REFS™ is a good tool to find patterns of those features or anomalies that deviate from those patterns.
TK: Speaking of sensors, you recently wrote in a blog post that REFS™ now has the ability to make predictions on streaming data. What are some of the use cases you’re most excited about for this upgrade?
CG: I think the two big use cases that we are excited about are predicting failure of devices and machines and optimizing preventative maintenance schedules for devices.
We’re excited because there are huge efficiencies for companies to be gained by reducing the down time of machines and making maintenance schedules more efficient.
This goes beyond just better profits for companies (although that’s a good outcome too). We all share an interest in making sure that we predict whether a power plant or jet engine or part on an oil rig is going to failure before it happens.
Making maintenance schedules more efficient also has positive social outcomes. I think maintenance scheduling will go through an efficiency drive the same way logistics companies like UPS and FedEx have done. Because of the analysis of all the data that UPS collects related to traffic, its drivers, packages, etc., each driver now delivers 50% more packages a day than 10 years ago. At the same time, drivers at UPS now earn twice as much as they did 10 years ago because of the efficiency gains.
We see a day where all the sensor data being gathered is going to make maintenance scheduling for jet engines and power plants more efficient and provide better wages for maintenance personnel as well.