The Center for Data Innovation spoke to Dhruv Ghulati, founder and chief executive officer of Factmata, a London-based startup using AI to tackle fake news. Ghulati talked about how algorithms can help people identify spurious stories, and what the future might hold for online misinformation and propaganda.
Nick Wallace: Factmata has two main functions, one of which is to help journalists and the public check facts. How does AI help them do that?
Dhruv Ghulati: Our premise is to build a platform for journalists, fact checkers, social scientists, and financial analysts—people who are checking facts as part of their day-to-day reading of online content. Some people go through articles, and look through claims and assertions made, and question them. Then they start to debate what’s being said, they have discussions with other people about that content, they refute things and support things with evidence. The idea of Factmata is to make that process very easy on the web. That’s the purpose of our news platform: to assist the fact-checking process.
Some of the things we think about include detecting claims in a body of text, which is what we have experience in building our algorithms for, and assisting the fact checker in knowing what they should focus on, and the key points of argument they should think about when debating this content. Another example is automatically sourcing evidence for claims that are made, which maybe the journalist has not had time to do, or doesn’t have space to put in the article, but may be quite interesting context for a journalist or a reader to access when viewing our platform.
Fundamentally, our platform is built upon machine learning and natural language processing, which allows us to automatically highlight claims and automatically bring together evidence from other sources on the web, or statistical databases, facts from Wikipedia—all sorts of context that helps you check and think about what’s being said. The goal is to provide context so readers can be assisted in fact-checking.
Wallace: Factmata also helps advertisers avoid inadvertently supporting fake news or hate speech. How does that side of Factmata work, and what does it mean for advertisers and for peddlers of fake news and hate speech?
Ghulati: When we think about the media ecosystem, and why this kind of content really exists, why fake news has been a phenomenon and why we are worried about it today, we think it is about the structure of how content is monetized on the Internet, and how it’s incentivised. When you think about fake news, there are different types of incentives for producing fake news. For example, Craig Silverman has done a lot of work in mapping out drivers of people creating that content. One of them might be just to have some fun and cause some disruption. Making people laugh is another example. Or something that confuses people. Another reason is pure propaganda that’s government-motivated, trying to influence elections, or people’s viewpoints on a specific issue.
But another key reason is monetization. Fake news and outrageous content, controversial content in particular, they all do very well on the Internet, as do memes, as do short videos. This is about trying to think about a structure where that type of content and that incentive structure is de-incentivized and de-funded. The first step in doing that is being able to detect that type of content. What we’ve realized is that advertising platforms and brand safety companies working to protect brands from things like pornography or swear words on the Internet are struggling to detect these more subjective forms of content, which are very unsafe, but very hard to detect because there are subtleties involved, nuances, differences of opinion. That’s what Factmata is trying to focus on: the detection of these more subtle forms of content, and providing flags for programmatic advertisers who sometimes have very little knowledge of where their ads are being placed, so they can effectively defund it.
When you look at most existing players that work on detecting problematic content on the web, it’s on a publisher level most of the time. It’s a whitelisting or blacklisting process of specific domains. So a domain that’s published some problematic content once in a while, they ban the entire domain and remove them from their list. We think this is not very effective or fair, it’s defunding and censoring content. What we are trying to do is be very point-specific and atomic in the way we detect content. So for example, our technology for hate speech can go through and find specific paragraphs or sentences on the page that were hateful, rather than assuming “that whole website is a hate-producing website.”
We think this is a much fairer and more equitable way of banning content. You shouldn’t be banned on your identity as a platform or a publisher—what matters is the actual things that you say.
Wallace: It is a truism that there is nothing new about false stories that spread very quickly—there are obvious examples that go back centuries. The axiom of not believing everything you read on the Internet is not new either. But the term “fake news” started to appear fairly recently. Do you think there is at least something new going on?
Ghulati: Yeah. I think fake news has become a phenomenon because the technology to create new content is sometimes automated. Now you have technology that can micro-generate advertising to create exactly the right message for you to react to that content. This is the specific profit motive of some companies. If you go on some ad-tech providers’ websites, it’s literally about, “hey, we can give you the perfect message to get people to buy this washing machine, or this shampoo.” So naturally, that technology will be deployed to create other reactions in people. For example, electing certain presidents, or voting in a specific way, or potentially inciting violence, or inciting protests. I think it’s effectively a weaponization of technology that is used for one specific purpose but can easily be repurposed for other means.
I think the term “fake news” has developed because there are fears about the speed with which this can spread and the ease at which it can be generated on a scale that’s never been seen before. And I only see it getting worse. We are seeing tools now where you can manipulate video content very easily, you can change what someone appears to be saying in a video. This is a problem that could veer out of control very quickly, unless technology can come in and assist the verification process. It’s not enough to say, “don’t trust what you read on the Internet,” because at the moment, I’d argue that’s very difficult advice to follow with the barrage that you see on a daily basis and the lack of tools to alert readers and give them context to what they’re seeing.
If you think about the format in which content is being delivered to people, it’s got smaller and smaller and smaller. It’s on your mobile phone, and delivery units on social networks often have very small numbers of characters. The incentive is to have a headline or a phrase that really captures your attention. People don’t have time to go through and understand what the context is and what’s being said. We’re trying to deploy an additional layer that makes it easier for people to dig into and question things.
Right now if you read a piece of content on one of these networks, you’ll have to go and search and do your research and Googling and understand if it might be correct or not. We can make that process a bit easier for you, and potentially help you filter that content out, if you don’t want to be exposed to it.
Wallace: As you noted, it is also becoming possible to use AI to create fake news, such as by manipulating people’s features and voices in a video to falsely attribute statements to them. Do you think we are going to see an AI arms-race between those creating fake news and those trying to fight it, and how do you account for legitimate uses, like satire?
Ghulati: Absolutely, yes I do. I see the fake news phenomenon as being very similar to the e-mail spam arms race. Where even today, new techniques are continuously being developed to get spam into your inbox and get you to click and buy products, or to download malware. Mainly, those things are driven by a profit incentive too.
So what we want to do at Factmata is work with the advertising industry to build technologies and help shape policy and regulation that can make it difficult to make this type of content easy to monetize. Part of that is an education process to educate publishers, advertisers, ad networks, and everyone in that chain, and to say “yes, viral content like fake news can do very well, but what are the long-term consequences for your brand? What are the long-term consequences for democracy? Is this technology that we want to keep developing, or do we want to think about quality you can actually pay for in the ecosystem, where bad content can be defunded?” Our technology is effectively de-funding fake news.
We also have a satire-detection system. Satire-detection is a field within natural language processing, and we’ve developed classifiers to develop satire vs. non-satirical content. We’re also building humor detection vs. non-humor detection. The funny thing is, these fields of research were never deployed into the wild, because they couldn’t really make money. There wasn’t a financial incentive for this kind of research. The financial incentive for ad technology was always, “how to we best personalize and target the right message to people?”
From a technology standpoint, all we’re trying to do at Factmata is classify content into these very niche categorizations, which wasn’t done before, so we can get to the hart of what actually is being said. Another thing that we’re working on is quote detection: how do you distinguish between, say, an opinion piece that’s actually inciting hateful opinion, or some language being quoted in a usual news reporting piece?
Wallace: Where do you see all of this going in the next five to ten years? You said you see it getting worse—do you think the tide can be pushed back against fake news online, or do you think we’ll have to learn to live with it?
Ghulati: I think we’re always going to have to learn to live with problematic or manipulated content on the web. And really, there are grades of this even in the content that we write. The piece you write after this interview today may not be an accurate representation of what I said. The best representation you could give is to quote me verbatim, like you said you would. But as we know, with all news, this is not how it usually works.
What I find interesting is that you might be speaking to your friends in the pub and discussing politics, you have a knowledge based on what you read, and the person who wrote that is basing that on some other source, and really this problem of fake news is how we as readers and producers of content be most fair and accurate in representing information, and how we be critical in the way that we do it, because we realize the long-term consequences of representing information in the wrong way, or trying to write content that may be biased.
The reason I’m building this company is because, when I think about major issues in the world that we’re dealing with—climate change being an example, or our attitude towards gun violence in the United States—we have certain inbuilt biases and preconceptions and misconceptions about how things are, because we don’t think about original source research or think about things factually.
So to answer the question of where I see fake news going, it’s going to be a cat and mouse race where content producers will try and find ways of producing content to get through filters and get to the top of your news feed or search engine, and cause disruption or generate ad dollars. This will always be the case. When it comes to manipulated content or biased content, this will always be the case as long as we have a news cycle like it is right now, where the incentive mechanism is really to produce fast content that does incite outrage, or anger, or sadness, because this does well on the Internet.
I want Factmata to build not just the algorithms, but almost the policy and the regulation and the architecture so that we strike at the heart of this problem, which is how we move the advertising industry a different way, and how we move search engines and social networks to think about content in a different way. And they are starting to do this, by thinking about quality scoring, or de-ranking certain types of content, The Trust Project, putting badges on publishers to show that they’re accredited and so on. This is all work that’s underway, and it’s what we’ve been thinking about for a while at Factmata.