The Center for Data Innovation spoke with Derik Pridmore, president of Osaro, a machine learning startup based in San Francisco. Pridmore discussed how machines can use deep reinforcement learning to solve new problems and what a factory using learning machines can accomplish.
This interview has been lightly edited.
Joshua New: Osaro is a new company, but it has already made the news several times for using novel methods for machine learning. If these techniques are better, why do other companies developing artificial intelligence not do the same?
Derik Pridmore: First, deep reinforcement learning, which is the process we use, is very much a “bleeding edge” machine learning technique, therefore most companies don’t have the capability to implement it. Furthermore, the few companies that do use deep reinforcement learning, such as Google, typically learn from scratch using algorithms that are slow to train. In some cases, such as Sergey Levine’s recent work on grasping, they are able to overcome this limitation by using many robots trained using learning algorithms for many hours. But there will always be a more complex motion or problem where this approach will be infeasible. Though we can’t discuss much about this yet, Osaro’s key innovation is in the realm of accelerating this type of deep reinforcement learning
New: How does deep reinforcement learning differ from deep learning, or machine learning in general?
Pridmore: Deep learning and deep reinforcement learning are two techniques that fall under the broad heading of machine learning, which is allowing algorithms to learn from data. Deep learning is a supervised learning technique that performs classification of data. Given a large amount of labelled training data—images, for instance—deep learning algorithms can automatically learn to recognize a general class of objects, such as cats. Reinforcement learning is an entirely different technique which is focused on actions. Reinforcement learning algorithms attempt to learn optimal control policies using trial and error and feedback in the form of positive or negative “rewards.” Rather than telling the algorithm exactly how to behave, you simply tell it when it’s done a good job, and it figures out how to behave. This is especially useful when an environment may change over time or when hand coding control policies is difficult.
New: Artificial intelligence has many different beneficial applications. Why did Osaro decide to start with industrial applications?
Pridmore: We are extremely optimistic about the power of AI to solve problems now, as opposed to in some distant future, and provide real value to people’s lives. We wanted to start with markets that are huge today, rather than markets which are still developing, like drones and household robotics. After we tackle those markets, we’ll move on to other applications by leveraging the generality and broad applicability of deep reinforcement learning.
New: What does a factory fully reliant on machines that can learn and adapt look like? What are the benefits?
Pridmore: A factory fully reliant on machines will be incredibly safe, efficient, and cheap. Humans will set high level goals, while machines will handle repetitive tasks with incredible accuracy. And with Osaro’s technology, those machines will also be adaptable and robust to changes in their environment. This will also open up automation in less structured environments like construction or the household. Product cycles for manufacturing will be shorter due to faster setup times and innovation and refinement will happen at a faster pace. It will be possible to automate the production of small batches of products, allowing consumers more choices. Products could potentially be individually customized. Some tedious jobs will be automated, resulting in the need to retrain individuals. But the net benefit to society will be quite large as the prices of products continue to fall.
New: You plan on deploying your technology with industrial robotics manufacturers in 2017. What obstacles need to be overcome before that can happen?
Pridmore: Osaro is currently engaged with a number of potential partners as we evaluate industrial use cases. At the same time, we continue to research ways to make deep reinforcement learning faster and more scaleable. We are growing our team this year and are always on the lookout for outstanding researchers and engineers to join our team. As with any new product or technology, especially one involving machine learning, a key challenge is to build products that are game changers, not just incremental improvements. Another key challenge is building comfort with next-generation, data-driven control techniques, since companies are accustomed to thinking in terms of hand coded solutions which are easier to understand but more brittle.