The Center for Data Innovation spoke with Greg Corrado, a senior research scientist on Google’s machine learning team. Corrado discussed how machine learning improves Google’s search results, as well as how Google benefits from making valuable machine learning software publicly available as open source.
This transcription has been edited for clarity.
Joshua New: You helped develop RankBrain, an artificial intelligence (AI) system that helps Google prioritize search results based on how it interprets what users really want to know, rather than just literal interpretations of their queries. How does RankBrain “read between the lines,” so to speak, and why is this approach preferable to traditional methods?
Greg Corrado: It’s not that it’s preferable, but it is complimentary. Language is imprecise and subtle and there are innumerable ways of saying the same thing, or approximately the same thing. RankBrain adds a layer of additional context to a search query. A large fraction of the queries we get every day are actually brand new—we’ve never seen them before. So part of what this system does is it allows us to interpret these never-before-seen questions and give a reasonable guess about what the user is asking about.
New: Before that, you worked on an artificial neural network that analyzed millions of YouTube videos and was famously able to learn, without human input, what a cat was. Both this and RankBrain take advantage of machine learning techniques. Are there any similarities in how you program a system to be able to learn what cats are and a system like RankBrain?
Corrado: They are very different. They are both machine learning, and they both use the same kind of machine learning technology called deep neural networks, but other than that, there really aren’t very many commonalities. For example, the cat discovery system was really a science experiment in what’s called unsupervised learning to see if, without any human guidance, the system could discover basic concepts—in this case, the kinds of things that show up in YouTube videos. The answer to that scientific question was “yes,” but that doesn’t naturally turn into any practical application of the technology. For a system like RankBrain on the other hand, we had a very specific, practical objective that we wanted to tackle directly.
New: In 2015, Google made its machine learning platform TensorFlow freely available as open source. Google initially developed TensorFlow as a proprietary platform to power a wide variety of its own machine learning applications—why then make it open source? How does Google benefit by giving it away for free after investing so much in it?
Corrado: The first version of a deep learning system we built at Google was entirely internal and there was simply no way of open-sourcing it. When we made the decision to start over fresh with a new version, the decision was made very early on to make something that we could eventually publish as open source. We did this because we think that it’s valuable for the community overall to establish standards in this space. Machine learning will be a new fundamental technology, so the sooner the engineering community agrees on standards for how we build these kind of systems, the better it is for everyone. It helps Google, but it also helps external educators, other companies, and the technology as a whole.
New: You developed an updated version of TensorFlow designed to run on “heterogeneous distributed systems,” meaning that an application using TensorFlow could draw on the computing power of multiple different devices. Why was this necessary? What does this mean for TensorFlow applications?
Corrado: This directly addresses a very real and practical problem within Google. When you build a machine learning system, you don’t want to be tied too tightly to the specific hardware that you’re running your system on. For example, if you’re a researcher you might want to start by developing or sketching out your ideas on a desktop using a GPU, or something like that. But then, say you want to scale your system up and see if it really works in a much bigger sense, you might want to run it on a cluster of many machines. Then if it really works and you have a product, you might want to release it as an app for a smartphone. The problem here is that we had to rewrite the machine learning system for each of these various stages.
Now, the idea behind TensorFlow is that it allows you to add a slightly higher-level description for your machine learning system so that you write it once, and then are able to apply it across all these different heterogenous platforms.
New: Whether it’s helping to autocomplete emails or help cybersecurity teams better monitor attacks and other threats, machine learning applications can make people substantially productive when you add them up. What can a Google employee or just a regular person accomplish today thanks to machine learning that would have been impossible just five years ago without a huge amount of effort?
Corrado: The things that we’ve developed aren’t just specific for Google employees—they’re general systems, and almost every system we have is available to the public. I particularly love the automatic email responding feature, which I use all the time because I’m always on the go and it lets me respond quickly. A lot of people are passionate about a lot of the other tools that we’ve released. For example, one of my colleagues takes a lot of photos and always intended to organize them, but she never found the time to do that. But now, one of our machine learning tools in Google Photos takes that entirely off the to-do list and removes all the guilt from putting it off! The tool can recognize what’s in your photos and let you search through them based on their content.