The Center for Data Innovation spoke to Marc Warner, chief executive officer of ASI Data Science, a London-based AI firm working with the British government to identify terrorist propaganda online. Warner discussed the balance between AI and human moderation of online content, and how AI can tackle the old British problem of waiting around for a bus, only for three to come along all at once.
Nick Wallace: How does ASI identify terrorist content? What exactly does it look for?
Marc Warner: There’s a limit to how much we can say, because the details of the algorithm have to be kept confidential for fairly obvious reasons. Basically, the algorithm uses pattern-finding techniques to spot patterns that distinguish Daesh content from everything else that’s available on the Internet.
I think it’s probably worth saying that this algorithm is only focussed on official Daesh content. It can’t be used for anything else. It doesn’t identify propaganda from different terrorist groups—it’s focuses solely on Daesh content. Now why aim at such a limited range? The answer is that a lot of the terrorist attacks in the Western world are coming from lone wolf terrorists. These are people who are being radicalized in their bedrooms while watching content online, ultimately going and renting a vehicle and driving it into crowds of people. There’s very little that a conventional intelligence agency can do to intercept that person once they’re radicalized—they can’t stop people renting vans. So the obvious point to cut it off at the source is to prevent them from seeing the radicalizing content. It was the Home Office’s educated opinion that this particular Daesh content was associated with the largest number of attacks.
I don’t know if you’ve ever seen any of this stuff, but it’s horrible. It’s very well produced and it’s very high quality, but it’s extremely poisonous. It works very hard to justify very violent acts. So the work was focused on making sure that this material was blocked, or could be blocked. I should also say that we are now working with the home office to roll this out into real-world platforms, but the initial phase of the work was to demonstrate that it would work if it were in the real world. It has to be put into a platform’s video upload stream before it can start doing that.
Wallace: A layperson might sometimes struggle to distinguish between illegal jihadist content that incites terrorism, and legal, albeit extreme Islamist content that does not incite terrorism. How does your algorithm avoid false positives?
Warner: We went out of our way to create a dataset that contained the closest examples that we could find of non-Daesh content. The Home Office gave us a list of the videos that they most wanted to spot, then we went out and found a load of news coverage of these terrorist videos, and things like that. We made sure to test the algorithms on the most difficult circumstances. In this case, there was a very clean line that we knew made the difference between whether something was official Daesh content or not. That made it a good problem to solve with AI very effectively.
It is certainly true that over time, society is going to have to think carefully about the right ways of solving exactly these challenges. The truth is though, of course, exactly the same challenges face human moderators. I think the way to think about this is as the same philosophical question we’ve been worrying about with free speech for thousands of years, rather than something that is specifically new right now. In that context, I think this is a totally relevant question, but it’s exactly the debates a hundred years ago people would have had around the same kind of issues.
Wallace: How do algorithms compare to human moderators in weeding-out this kind of content?
Warner: It’s hard to say, because algorithms vary in quality, and I’m sure human moderators vary in quality as well. At a very very high level, this is not referring to the spotting of videos or anything like that, algorithms tend to be worse than humans in any specific individual task that requires quite a lot of context, but they can do things at a much greater scale. That’s generally the trade-off that you look for. The same is true in fraud detection. If you were to lay two transactions in front of a police officer, they would be able to bring so much more context to the transaction and ultimately do a better job of deciding which of those two was fraud. But if they have to do three million per second, suddenly this is totally impossible. In general, that’s where things tend to sit. But in specific cases it depends on the quality of the algorithm and what data it’s been trained on before you can make any specific statements.
Wallace: What other kinds of AI projects does ASI work on?
Warner: We like to think of ASI’s speciality as delivering the value of artificial intelligence to the real world. So this tool is one example, and we think it’s a pretty cool application. But it’s very different to our other projects.
One project that I’m particularly excited about is with a website called Isaac Physics, which is run by Cambridge University, and helps kids get exposure to really great physics teaching. They do it by asking lots of physics questions and helping them understand where they’re weak and where they’re strong.
We’ve been working with them on and off over the last few years to help them take the techniques that make Candy Crush addictive, like tweaking the level of difficulty to keep them maximally engaged, and moving those into the education space. You want to get a few right, then you want to be challenged, and then you want to get a couple right again—there’s balances of difficulties of questions that will just keep you going for a bit longer. We’re quite excited to see where that goes. They have something like 50,000 kids answering 100,000 questions per day, so it’s getting to a fairly sizeable scale.
Another project is trying to stop London buses from all arriving at the same time. There’s very natural reasons why buses cluster together. The first one goes through and has to pick up a load of people, so it takes time at each stop, and then the next one comes to that stop and there’s nobody there, so they just go straight through. There’s these forces that push buses closer and closer together.
So TfL (Transport for London) incentivizes the buses to keep them spread apart. They have controllers who tell the buses things like, “wait here a minute, slow down a bit, speed up a bit.” But of course, a controller has—I forget exactly the number—but let’s say 10 bus routes that they’re monitoring at any particular time. These guys are experienced, they’re thoughtful, but there’s a limit. If you have 10 bus routes, your ability to take in all that information and make good decisions is harmed. So we built them a predictive model that predicts when buses are going to arrive at the next stop. Instead of having to make all their decisions in a vacuum, they can make even better decisions. It could save them a few hundred thousand a year in fines from TfL, but more importantly it could give everyone in London a better experience of the bus service.
Wallace: Where do you think AI is going to have the biggest or most interesting impact in the near future? What current projections for future uses of AI interest you the most?
Warner: I’ve got a slightly boring answer to this, unfortunately. So let me tell you my boring answer and then I’ll try and make it a bit more interesting.
We were thinking reasonably hard about what the biggest impacts of AI are going to be in five years time. And the truth is we can say that with pretty much certainty, because—and I like this quote—”the future is already here, it’s just not evenly distributed.” What we’re doing right now for organizations is going to be absolutely core to big organizations. Truthfully, the boring answer is the biggest impact is going to be the marginal gains across a huge variety of industries that we’re not really going to notice from the outside, it’s just suddenly our payments will get processed 1 percent faster, and 1 percent more fraud gets detected, the aeroplane parts fail one percent less, because sensors detect when they’re going to fail and they get replaced earlier. It won’t work every time, but it will just move the needle of productivity a little bit.
I think of AI as a very general purpose technology, there’s no particular industry or vertical focus, it’s not like it’s only going to affect banking or it’s only going to affect manufacturing. It’s going to take all of these things and incrementally improve them. Fundamentally, AI is better software, so it’s about tweaking that forward.
So that’s the boring answer. But I think there’s a slightly different question, which is what are going to be the slightly mind-blowing ways in which it suddenly spikes out and people see AI in their day to day life, in a way that’s different and unexpected. I consider the recent example of this as Alexa. Five years ago I never would’ve thought that you were going to be able to talk to something like Alexa and it be pretty reasonable at understanding stuff.
We’ve talked a bit internally about the things we think might happen next. Probably more and more autonomous car driving. The right way to think about this is not autonomous “on” or “off,” it’s going to be more in a Tesla-type way, it’s going to be autonomous autopiloting for stretches of road, and the amount of roads that can be covered by autopilot, and the amount of weather conditions that it can work under, will just grow and grow and grow. But at some point, it’s going to be significant enough that you see it regularly. I think that will be one of those moments when people sit up and notice.
Then it’s possible that household robotics-type stuff might take a step forward in the next five, or certainly ten years. We’ll start seeing not just Roomba-type robots, doing a fairly limited set of tasks, but somewhat more useful.
In five years we’ll pretty much have interactions nailed with audio, so Google Assistant or Alexa will really be able to understand—not understand at a deep level, but you’ll be able to phrase a question that they know how to answer any way you want in any noise conditions you want, and they’ll just be able to deal with that. Speech detection will be essentially perfect. That has a few cool applications, like moving away from keyboards as an input device.