The Center for Data Innovation spoke with Onno Zoeter, a research scientist at Xerox Research Center Europe. Zoeter spoke about how the Internet of Things can help Los Angeles’ traffic problems and how the field of data science has evolved over the years.
This interview is lightly edited.
Josh New: In your work with machine learning, you say you have a particular focus on modeling human behavior. How are the two related?
Onno Zoeter: Machine learning is often used to predict human behavior like clicks on advertisements, selecting products, the quality at which an expert performs a task, and so on. But human behavior has an important special characteristic: it can change if the machine learning system is used to optimize what human users experience. For example, it is very natural to use “machine-learned” models to optimize what advertisements are displayed, which products are offered at what price, and who is chosen for a particular assignment. When human participants realize their actions have an impact on what they will experience later, they will strategically change their behavior to maximize their return. The current machine learning algorithms ignore this aspect completely which can lead to systems running poorly or a complete breakdown when they are put into practice. I work on designing the full cycle: machine learning methods, optimized data-driven decisions, and the strategic response of human users. This involves predicting and controlling to what equilibrium the whole system will move. Systems that are well designed will learn automatically to identify which advertisement is relevant in which context, which customer is interested in which product, which expert is suitable for which task, and more, all while leaving advertisers, readers, experts, and outsourcers happy to participate. The idea is to design the system such that the incentives of the participants are aligned with the goals of the system. In other words, that there are no loopholes that incentivize strategic deviations from optimal participation. These problems form an exciting and very challenging mix of machine learning and economics.
New: Can you talk about some of the work you have done with e-commerce and workforce management?
Zoeter: A few years ago I developed a system that predicted the interest of web searchers in online advertisements. That system won a competition and was put into use by a major search engine. I am currently working on solutions where home-shore and offshore workers can meaningfully specialize themselves in outsourcing markets and are incentivized to do so. This has the benefit of enabling them to earn more, whilst allowing the outsourcing companies to optimize their jobs along more than just the price dimension. They can leverage the fact that for some workers it is hard to work at a lower rate, but with some training can work at a higher quality level which pays at their acceptable rate. The outsourcing system can for instance learn that jobs that used to require two to three quality assurance layers can be performed by a single more experienced worker. Typical benefits of such markets are higher quality outcomes, or the same quality outcome at a lower price. The opportunity to tap into quality does not exist in today’s typical markets that select on price only.
New: At Xerox Labs you work on a team devoted to reducing traffic in Los Angeles with the use of networked sensors. What has your team accomplished?
Zoeter: We designed the smart pricing algorithms behind the Xerox Merge smart parking system that runs in downtown LA as part of the LA Express Park project. The idea behind the project is simple: when any publicly owned utility, for example on-street parking, is priced far below market rates it will be used very inefficiently. In busy locations, for instance right in front of a shop, early arriving long term parkers will take prime spots. This means customers, who often stay for short periods, have to walk from side-streets and back-streets. Even worse is when whole areas have under priced parking this leads to always full zones and to drivers continuously circling blocks looking for what are essentially subsidized spaces. This so-called “cruising for parking” is notorious in many cities. Academic studies have shown that it is not uncommon to find that 30 percent of downtown traffic is actually drivers looking for somewhere to park. This cruising contributes significantly to congestion and pollution.
Our algorithms use the data of more than 6,000 sensors to predict parking behavior and each quarter rates are updated to optimally align them with demand. Instead of the traditional hand-set rates for entire zones, parking is now priced per street which, for downtown LA, has resulted in a more heterogeneous rate map. Previously you could only walk to a discount at the edge of an expensive zone. Now we see that nearly everywhere there is a cheaper alternative within walking distance. Since the system has been introduced the use of available parking spaces is more evenly spread: both the fraction of the time that streets are congested—more than 90 percent full—and the fraction of the time streets that are under used—less than 70 percent full—have gone down.
The solution has received several prestigious awards, among them the Organization for Economic Cooperation and Development’s International Transport Forum Innovation Award and the International Parking Institute Award of Excellence.
New: Can insights and successes from the LA project be transferred to other cities, or even problems unrelated to traffic?
Zoeter: The underlying economic principles, developed by Nobel Prize winner William Vickrey in the 1950s, are very general. Our algorithms have succeeded in making these fundamental principles practical. They can make a difference in parking in many cities, but they can also be extended to reduce more general road congestion or in public transport. Since the economic principles are so general there are many other areas where they may be applied. The consumption of electricity is a relevant example.
An important insight is that providing optimally placed cheaper parking options is one thing, but getting users to know about them is another. The methods need to go hand in hand with easy to use information systems. Our Merge parking platform for instance pushes out real-time parking availability and pricing information to smart phone applications. The more such apps are used, and the easier it is to use them, the bigger the impact of the system as a whole.
New: You have been involved with data science for a while now. How has the field changed, and what aspects of the future of data science are you most looking forward to?
Zoeter: The last two decades have been a very exciting time to work in data science and machine learning, from both a business and theoretical point of view. In terms of business we have come from a time where applications were rare to a time where, in many domains, it is no longer possible to try and compete without good machine learning techniques. Search, online advertising, and product recommendation are important examples.
Most of these successes are in fully digital domains. For these applications data is easy and cheap to capture and models can be put to use in online platforms with just a few relatively simple changes. Any proposed improvement can easily be tested using a part of internet traffic, sometimes even by a single person in a single day. For applications of machine learning in non-digital domains it’s usually a bit more complicated: dedicated sensors need to be installed to get the required data, solutions need to be sold to clients first, or a user client base needs to be informed and engaged in using the new system. This means that these applications have been fairly rare in the first wave of success stories. We are only now beginning to see the results in off-line domains, our on-street parking system being one of them, and we will see many more soon.
From a methods perspective it has been very interesting to see the impact of scale. Machine learning, the challenge of capturing underlying regularities from data, is particularly difficult if you have very little data. On the contrary, if you have a lot it can become even trivial. Online interactions with search engines, photo sites, voice assistants, and so on have resulted in extremely large datasets. By applying simple methods these sets gave pretty good results. Now we see that by scaling the computer hardware more complicated methods can be used with the same amount of data. The outcome is quite impressive: in the last few years the performance of image understanding, machine translation and voice recognition has greatly improved.
I remember very clearly tinkering with my first computer when I was a kid. Both the memory and computing power were so small it severely limited what you could do with it. It is impressive how rapidly the technology has evolved, both in hardware and in software. I’m looking forward to seeing how far we can push it. At the same time I’m hoping that we can put it to good use. It takes not just technical skills but also a lot of care and attention to design systems that realize benefits without sacrificing essential elements such as privacy. I see very many great opportunities for machine learning applications in the near future, like reducing city congestion and pollution even more, improving health through systematic data driven personalized medicine or providing tailor-made study programs in education.