The Center for Data Innovation spoke to Tom Felle, senior lecturer in print and digital journalism at City, University of London. Felle leads a team of researchers building DMINR, an AI tool to help journalists verify information. Felle talked about how AI can help tackle fake news, and how the technology will impact journalism in the coming years.
Nick Wallace: How can data and AI help us tackle fake news?
Tom Felle: One of the interesting things about data journalism is that we’re really only at the beginning of digging into its power. So much data now is held electronically—by governments, police, health care services, you name it. And if you think about it, there are answers to all kinds of questions there, waiting to be found—if we have the power and the knowledge and the digital tools to go and look for them.
The thing about fake news is that it means different things to different people. You’ve got propaganda, you’ve got people just telling lies, you’ve got companies based out of Eastern Europe producing content for profit. You’ve got all kinds of different versions of fake news. So if we pare it back to verification: if somebody says something, how can we verify whether or not it’s true? Can we debunk it, or prove that it’s true? Can we use data to do that? I think we can, and that’s the genesis of this project.
My day job is as a lecturer in journalism, but I used to be a real journalist, and for quite a while I’ve been looking at the proliferation of software tools for social media that are on the market now. Everybody is selling something that can harvest and make sense of social media, whether it be trending topics, or a TV show like Strictly Come Dancing, or user-generated content showing natural disasters or terrorist attacks, or just the latest video of a cat playing the piano. Or maybe a cat with a violin—who knows?
I was thinking to myself for quite a while that if we could take all that public data, then there could be something in this. That’s where we’ve come from with this idea. We wanted to take two fairly common software applications from computer science that are now being used in social media. One is API harvesting, which is just taking APIs and putting them together to find connections, and the other is machine learning.
We use that to take any alleged fact, and verify it. It’s not going to do anything to stop people producing fake news. It’s not going to do anything to stop Donald Trump tweeting at four o’clock in the morning. But it will allow journalists to verify and fact-check, once information is presented to us.
Obviously, a lot of fake news is almost impossible to prove or disprove—but you can say there is no evidence to back the story up. A journalist can use a keyword search, very similar to you doing a Google search, and set some parameters to look at, for example, previous news articles, or different varieties of publicly-available datasets. You could plug in as much detail as you can, and do as many keyword searches as you can, and the tool will search the archives and public data, and find connections—assuming there are any.
One example would be something on corruption. There are many fake stories alleging corruption. You could access, for example, planning records, local government spending, or Companies House data. You can access all of that data on your own, of course, but if you put a number of APIs together, you can then start to find connections. Are there connections between a certain politician, for example, and a certain lawyer in another state? Are there connections between this particular lawyer, and other cases like this, where there were allegations of corruption?
Very quickly, you can start to build leads using the API aggregator. The machine learning algorithm—the AI element—will go one step further and say “here are interesting things, here are connections, here are correlations” and it’ll learn from that, and start to pull in more data itself and find more connections from your first keyword search.
There’s another project that a company called Full Fact are working on here in the UK, it’s something we’d be very interested in working on as well, which is live fact-checking. We already have voice-recognition technology that can, as I speak, recognize what I’m saying and put text on screen for deaf or hard-of-hearing people. We could take that text, of let’s say an election debate, and live feed that into a program like DMINR, and actually fact-check an election campaign as it was going out, and use that as a way of holding candidates and parties to account. If I’m making claims about health care improving or spending going down, or “we did more to fight crime,” they could potentially be checked in real-time, and the audience would get that information in real-time, using something like this. We’re one of a number of different projects working in this area, but I think that would be a fantastic way of contributing to an election campaign.
Wallace: If I check a story on Snopes, that’s probably because I already suspect it’s fake anyway. Can AI help us spot false reporting that’s less obvious? You teach journalism and you’ve been a journalist—have you ever been surprised to discover a story wasn’t true?
Felle: In my journalism career, that happened once a week! You’d start on a Tuesday—working for a Sunday newspaper, you wouldn’t work Mondays—and you’d pick up some leads early on in the week and by the end of the week, they may or may not have turned out to be true. Everybody has an agenda, everybody has a vested interest. Defining what is or isn’t true is very difficult to begin with.
But to take your fundamental question, it’s all very well trying to debunk something you already think is probably fake, or that you have some suspicions about. How do you do that on an ongoing basis when you don’t know whether or not it’s true or whether or not it’s fake? This is where tools like DMINR are going to come into their own. I think it was interesting that of the other projects that received funding in the third round of Google’s Digital News Initiative, an awful lot of them were in this general area. Certainly four big ones in particular: ourselves, Full Fact, Press Association, and there was one in Germany, but I’ve forgotten what it’s called [the RightHere project by DuMont Mediengruppe—Ed.]. They’re not a million miles away from each other in terms of doing this kind of thing.
This is where the electronic collection of data is, I think, really important. Now, we’re not talking about data privacy, or interfering with people’s rights—of course that’s a separate issue—but if you have a machine learning algorithm set up, it can potentially run 24/7, 365 days a year checking everything, flagging stuff as it goes.
A very small example of my first encounter with this kind of thing: there was a former Prime Minister in Ireland, Charles Haughey, who was quite elderly and was expected to die at some stage. Newspapers tend to have these kinds of stories ready to go. You can’t do it on the day because there’s so much work involved, so you have it already prepackaged. Once a week, someone would ring into the newspaper and say, “oh look, you don’t know who I am, but I’ve just heard Charlie Haughey has died.” What could we do? We had to check out every story every time. That phone call came in once on a Saturday night to me, on the news desk of the Sunday Independent in Ireland—and we had to check it out, because we didn’t know whether it was true or fake.
In the digital era, there are stories like that all the time. There are going to be political parties, non-governmental organizations (NGOs), charities, and trades unions, all of whom will make claims online—well-meaning claims, perhaps, in some cases. There are going to be scenarios like the fire at Grenfell Tower in London, horrendous stories, where emotions run high and stories gather legs. It’s the job of journalism to bring first principles back to storytelling and ensure that these stories are true. There all kinds of stories around Grenfell that I’ve seen, some of which turned out not to be true, some of which we still don’t yet know whether they’re true or not. I think the Grenfell tragedy is an example of where these types of tools in the future will make investigations like that easier to do, because you’ll have access to quite robust AI and machine learning. As more and more public data is held electronically and made available in machine-readable format, these will get much more powerful.
Wallace: If an AI tool flags something as fake, how important is it to understand why, and how can you make this clear to the audience in a consistent way?
Felle: It’s something that we’re building in from the start. We’re starting with the user experience (UX) as a central part of how we develop the app. The UX is really important, because as you say, if something is flagged by an AI tool, how will the end user understand what that means?
Part of the answer is, I think, a user interface that’s incredibly clear and incredibly easy to use. It has a visual element to it, where it can show you what the steps were that it took. It’ll have a visual tool that will demonstrate to the user, “these are the rules we followed, and there is an 80 percent chance this isn’t true based on this data.” It can’t just be an orb that says, “I’ve checked it out, here’s the result.” It has to offer journalism, and journalists, reasons—any tool will have to do that. It has to simply the information, certainly, but it has to show the reasons why it came up with a certain conclusion. It isn’t just going to be a Google-style search result with ten results on the first page. It has to offer the ability to dip in and out and see what happened.
This is why it’s important to stress that we aren’t trying to replace journalists. The Press Association project involves using a bot to replace journalism, by writing up stories based on data—that isn’t our aim. Our aim isn’t to get rid of journalists and introduce some kind of robo-hack into the newsroom. Or aim is to support journalism. There will be an end user here who will make a decision.
The AI tool—DMINR, or whatever piece of software they’re using—will give its results, and reasons for them, but at the end of the day, it may not be able to find anything useful. It still might require good old-fashioned shoe-leather reporting, where you get out and talk to people on the streets. AI will just help journalists to find those connections, make the investigative process easier, debunk and verify as well as it can—hopefully as near to real-time as possible. But at the end of the day, there will be an end user here who will make a judgement.
We’re not going to replace journalists—rather, I think journalists are going to become much more specialized. I think in the future, you’ll see the developer-journalist becoming a key part of the newsroom. There’ll always be crime correspondents, there’ll always be political correspondents, who’ll stand in front of the TV cameras and write the articles and pick up the phone and talk to the police, or whatever. But one of the things digitally-literate journalists will be able to do is independently look at the police data, and say, “hang on, what the Chief Constable said isn’t accurate, and here’s why it isn’t accurate.” These kinds of journalists will be able to contribute greatly to the news ecosystem. For other reasons, there’ll be less journalism—not because of AI.
Wallace: What impact do you think this type of technology is likely to have on the public conversation about fake news over the next five to ten years? Do you think we will still be talking about fake news in ten years?
Felle: I think it’ll be something we talk about, it’ll still be around, but it’ll be a lot less prevalent than it is now, because we’re so aware of it now. All the different kinds of fake news—propaganda, honest mistakes, intentional mistakes, the profit incentives—tools like this and verification processes that use AI will sort out some of that. The next bit is digital literacy. That’s going to be part of the solution as well: digital education in schools and universities and digital media literacy for journalists.
Part of digital literacy is accepting that if somebody presents you with a piece of information, it isn’t necessarily true. It isn’t necessarily false either, but it needs to be verified. If you turned the clock back 50 years, and you went to a burning hotel at Covent Garden in London, and you spoke to an eyewitness, you’d ask them what they saw, and then you’d speak to another one, and another one, and another one. It was only by corroborating the story that you got a full enough picture—and maybe you’d speak to the police as well, to get the official line. But you certainly didn’t speak to one person and then go back to the office and file a story based on one person’s interpretation.
Yet that’s what happened with social media, and it’s what happened with fake news. If information was produced, nobody questioned it. There are so many examples of that in the last four or five years. I think one of the legacies of the Trump election is going to be a much more discerning media, a much more rigorous verification industry, around verifying news.
One of the really interesting stories that I saw in the last week in this area of AI was the ability to take video and manipulate it. Researchers had taken a piece of video of Barack Obama and had digitally remastered his mouth movements to make him say other things, and it looked very real. They didn’t do it to produce fake news, they did it to show that they could, and presumably it could potentially be used for something else, like virtually reality—but it could be used for fake news. The technology is progressing very quickly, and journalism wasn’t quick enough off the mark, but has been jolted now very very quickly. In the U.S., and in the UK, and in Europe now, there are so many startups and the like in this space. You’re going to see a critical mass in this area over the next couple of years.
Wallace: DMINR is a tool for professional journalists, but most of the fake news I see on social media doesn’t come from real journalists making mistakes, it comes from unreliable sources that publish fake news on purpose, and then friends of mine fall for it and share it. How can AI help us with that kind of fake news—the kind that’s aimed at people who aren’t even looking to professional journalists at all for information?
Felle: I think there are two parts to this. The first part is digital literacy. In the way that I had civics when I was a child going to school, you’re going to need to see digital literacy taught in schools, as part of English, or as part of societal modules. You’re going to need to see that being part of how we educate the next generation. Social capital, online presence, digital literacy—those kinds of skills are going to be as important as critical thinking for the next generation of students in our universities. So that’s the first part.
The second part is AI. There are no silver bullets, but AI can, and I’m sure will be used—it already is being used in pilot projects by Google and Facebook—to read search results, or to read postings, and to make sense of them and to provide context, showing that what’s being posted is disputed or isn’t true. You’ll never stop somebody who is determined from posting it, but if somebody who, before they shared it, also saw the warning, it would get a lot less shares.
I don’t think Facebook or Google want to go down the road of pre-censoring, and I don’t think that would be welcome either. But I think they do want to offer their users a good experience, and part of that is being able to trust the content you read. And if AI is going to be used to give context, then I think friend “B,” “C,” and “D” will be much less likely to share or to like or to comment. The individual person who posted it is very difficult to stop if they’re determined, but it doesn’t get to go viral, because you’ve added context to it. We know that, for example, if something goes viral, about 80 percent of it is “likes,” about 15 percent of it is shares, and only about 5 percent are comments. I’m pretty sure if I got a warning saying, “what you’re about to ‘like’ isn’t true,” then those numbers would come right down. That’s probably where you’re going to see most of the movement in this area: contextual warnings for people before they share or “like,” rather than pre-filtering.