How UBC Researchers Use Artificial Intelligence to Help Tackle Fake News
Especially in recent years, fake news stories have exploded around the world, sowing confusion and doubt about politics and global events, and fanning social unrest. Studies have shown that it can even alter people’s cognition, behavior, and perception of reality — and affect everything from major elections to local businesses.
Despite efforts by companies like Twitter and Facebook to stem the tide, fake news items continue to proliferate across myriad social media platforms.
What’s more, detecting which stories are fake can be a tricky business, and there is no way that human fact checkers can keep up with the online deluge — so researchers from the UBC Sauder School of Business have developed a novel way to train computers to do that important work using Arabic-language media reports.
According to UBC Sauder associate professor Hasan Cavusoglu, artificial intelligence systems are the most accurate when they have plenty of fodder from which to “learn.” For example, the Generative Pre-trained Transformer 3 model (GPT-3), which can mimic human-generated text, is trained using 499 billion words.
Problem was, no such data set existed when it came to real and fake news; in fact, the only one Cavusoglu and his collaborators could find was a recent study that looked at roughly 3,072 true sentences and 1,475 fake stories in Arabic, which were annotated by a human being.
“To use machine learning and AI, you have to create a model that will allow you to predict using data that isn’t part of the training process, and tell you whether something is fake news or not, or whether a cancer tumor is benign or malignant,” says Cavusoglu. “The problem was we didn’t have much data in the fake news context, so there was a limited amount of training data — and the less training data you have, the less performance you get out of your model.”
So to achieve larger “ground truth” — that is, stockpile of data in which each news article is objectively labeled as true or fake — the researchers gathered millions of news items from established Arab news media and treated them as true news articles and generated manipulated news articles by altering them.
First, they applied speech tagging techniques to identify the parts of sentences such as nouns, verbs and adjectives. They then used a word embedding model to automatically convert every word into vectors that could be replaced by a similar word — so the name of a popular soccer team, for example, could be replaced by the name of a different soccer team, or a number could be replaced by a different number.
From there, they started to manipulate sentences by swapping some of those similar words for the “true” words, and in the process created a fake news data set that ranged from slightly manipulated to wildly untrue.
“If you change a number for some random number, for instance, it can change the sentence quite a bit, or you can say someone did not resign instead of saying someone did resign,” explains Cavusoglu. “So there’s a range of fakeness. Some are very close to the actual story; others are totally wrong.”
In the end, they had created a massive “ground truth” data set containing true news articles as well as artificially created fake news articles that can be used to train AI systems to detect fake news.
As a test, they had their AI system analyze the 3,072 true and 1,475 fake stories from the earlier study, and confirmed it did a better job of separating real from fake — without all the human effort.
“The results indicate that we improved the accuracy,” says Cavusoglu, who worked with El Moatez Billah Nagoudi, AbdelRahim Elmadany, and Muhammad Abdul-Mageed of UBC’s Natural Language Processing Lab and Tariq Alhindi from Columbia University’s Department of Computer Science.
“So when you create these manipulated stories, and you augment the ground truth with that, you improve your detection capability.”
The researchers, who outlined their findings in a paperMachine Generation and Detection of Arabic Manipulated and Fake News, chose to work with sources in Arabic, but Cavusoglu emphasizes the method could be applied to news in any language.
It’s always a cat-and-mouse game, warns Cavusoglu, and fake news creators will get better at automatically generating content that fuels social and political fires. As a result, it’s imperative that scientists continue to fine tune detection methods by training AI systems with ever-larger ground truth sources.
The technology could be used by social media companies like Twitter and Facebook to help root out fake articles and accounts, says Cavusoglu, and help consumers separate real news from false sources.
“Any text can be predicted in real time, so social media platforms could identify accounts that are disseminating fake news, and those adversaries could be removed from the system,” says Cavusoglu, who adds the technology could also help detect language anomalies in other situations. “It would create a more civil and more accurate platform.”