Scientists enlist computers to hunt down fake news | Science News for Students

Scientists enlist computers to hunt down fake news

And these automatons tend to outperform people at scouting out bogus claims
Sep 27, 2018 — 6:45 am EST
fake news

Researchers are building computer algorithms that can check the veracity of online news.

juststock/iStockphoto

Some lies are easy to spot. Take that report that First Lady Melania Trump wanted an exorcist to cleanse the White House of Obama-era demons. (Bogus!) Then there was that piece on an Ohio school principal who was arrested for defecating in front of a student assembly. (Also untrue.) Sometimes, however, fiction blends a little too well with fact. For instance, did cops really find a meth lab inside an Alabama Walmart? (Again, no.)

We live in a golden age of misinformation. So anyone scrolling through a slew of stories might easily be fooled. In fact, data show, plenty of us are.

On Twitter, falsehoods spread further and faster than the truth, a March 9 study found. Online bots have been accused of spreading these tall tales. But the study in Science found that people shared more bogus tales than did web bots.

Plenty of fake-news stories moved around the web in the weeks leading up to the 2016 U.S. presidential election, U.S. intelligence officials reported this year. And the most popular of these fake news reports got more Facebook shares, reactions and comments than did truthful news, according to one analysis in BuzzFeed News.

Luca de Alfaro is a computer scientist at the University of California, Santa Cruz. In the old days, he says, “you could not have a person sitting in an attic and generating conspiracy theories at a mass scale.” Then along came the internet, and more recently social media. Now, peddling lies is truly easy — especially as the internet can essentially hide who is behind a post. Consider that group of teens in the European nation of Macedonia, two years ago. They raked in huge sums of cash by writing popular fake news on topics that were popular during the U.S. presidential election. Readers thought those posts had been written by true journalists.

Most web users probably don’t post bunk on purpose. They likely “share” posts they simply didn’t take the time to fact-check. And that’s easy to do if you’re suffering from information overload. Or if you are overworked or tired.

Confirmation bias — already suspecting an issue is true, or not — can make the problem worse. Consider reports from sources you don’t recognize. “It’s likely that people will choose something that conforms to their own thinking,” says Fabiana Zollo. That’s true “even if that information is false,” says this computer scientist. She works in Italy at Ca’ Foscari University of Venice. Her studies focus on how information circulates within social networks.

Fake news doesn’t just threaten the integrity of elections. It also can erode the trust people place in real news. Lies posing as truth can even threaten lives. False rumors in India, earlier this year, incited lynchings. More than a dozen people died. And the source of those rumors: posts on WhatsApp, a smartphone messaging system.

To help sort fake news from truth, scientists have begun building new computer programs. They sift through online news stories, looking for telltale signs of fake facts or claims. Such a program might consider how an article reads. It might also look at how readers respond to the story on social media. If a computer spied a possible lie, its job would be to alert human fact-checkers. They would then be asked to make any final judgment call.

Giovanni Luca Ciampaglia works at Indiana University in Bloomington. Today’s automatic lie-detection tools are “still in their infancy,” he says. This means there’s still a lot of work to be done before many fact-checking programs are let loose on the internet.

The good news: Research teams around the world are forging ahead. They realize that the internet is a fire hose of facts or purported facts. So human fact-checkers clearly need help. And computers are beginning to step up to the challenge.


Reader referrals

Visitors to real news websites mainly reach those sites directly or from search engines, such as Google or Bing. Fake news sites attract a much higher share of their incoming web traffic through links on social media.

 
H. Thompson

Substance and style

When inspecting a news story, there are two major ways to tell if it looks like a fraud. The first: What is the author saying? Then look at how the author phrases it.

Ciampaglia and his colleagues have just built a system to automate this task. It checks how closely related a statement’s subject and object are. To do this, the computer sifts through a vast network of nouns. This network has been built from facts found in the “infobox” on the right side of every Wikipedia page. (Related networks have been built from other reservoirs of knowledge, such as research databases.)

The computer judges two nouns — that subject and object — as being “connected” if each appears in the infobox of the other. The computer then measures how closely a statement’s subject and object are linked within this network. The closer they are, the more likely the computer program is to label a statement as true.

Take the false assertion: “Barack Obama is a Muslim.”

There are seven degrees of separation between “Obama” and “Islam” in the noun network. These include very general nouns, such as “Canada,” that link to many other words. Given this, the computer program decided Obama was unlikely to be Muslim.

The Indiana University team unveiled this computer fact-checker, three years ago, in PLOS ONE.


Roundabout route

An automated fact-checker judges the assertion “Barack Obama is a Muslim” by studying degrees of separation between the words “Obama” and “Islam” in a noun network built from Wikipedia info. The very loose connection between these two nouns suggests the statement is false. 

a composite image showing a part of Obama's wikipedia entry and a flowchart showing degrees of seperation to Islam
Source: G.L. Ciampaglia et al/PLOS One 2015; Credit: C. Chang; Wikipedia

But looking to gauge the truth of statements based on how closely the subjects and objects are linked has its limits. For instance, the system found it likely that former President George W. Bush was married to Laura Bush. Great. It also decided he was probably married to Barbara Bush, his mother. Not so great. The Indiana group is now working to improve its program’s skill in judging how nouns in the network link to one another.

Grammar checks

Writing style may also point to untruths.

Benjamin Horne and Sibel Adali are computer scientists at Rensselaer Polytechnic Institute in Troy, N.Y. They analyzed 75 true articles from news outlets deemed very trustworthy. They also probed 75 false stories from websites accused of dispensing untrustworthy info. Compared with truthful news, bogus stories tended to be shorter. They also were more repetitive. They contained more adverbs (such as quite, always, never, incredibly, highly, unbelievably, absolutely, basically). Fake stories also had fewer quotes, fewer technical words and fewer nouns.

The researchers used these observations to make a computerized lie detector. It focused on the number of nouns, quotes, redundancies and words in a story. And it correctly sorted fake news from true 71 percent of the time. (For comparison, a random sorting of fake news from true should be right about 50 percent of the time).


Truth and lies

A study of hundreds of articles revealed differences in style between genuine and made-up news. Real stories contained more language conveying differentiation. False stories expressed more certainty. 

Frequently used words in legitimate newsFrequently used words in fake news
"think"
"know"
"consider"
Words that express insightWords that express certainty"always"
"never"
"proven"
"work"
"class"
"boss"
Work-related wordsSocial words"talk"
"us"
"friend"
"not"
"without"
"don't"
NegationsWords that express positive emotions"happy"
"pretty"
"good"
"but"
"instead"
"against"
Words that express differentiationWords related to cognitive processes"cause"
"know"
"ought"
"percent"
"majority"
"part"
Words that quantifyWords that focus on the future"will"
"gonna"
"soon"
Source: V. Pérez-Rosas et al/arxiv.org 2017

Horne and Adali unveiled their system at the 2017 International Conference on Web and Social Media in Montreal, Canada. The two are now looking for ways to boost the system’s accuracy.

Computer scientist and engineer Vagelis Papalexakis works at the University of California, Riverside. His team also built a fake news detector. In one test, it sorted through roughly 64,000 articles that had been shared on Twitter. Half were fake stories. The scientists then directed the computer to sort these stories into groups based on how similar they were. The researchers didn’t instruct their computer on how to judge what was “similar.” Now the researchers identified 5 percent of the stories for the computer as being true or bogus.

Fed just that little bit of information, the program was able to predict which of the rest of the stories were true or not. And it was right for almost seven in every 10 of the other stories. The researchers described their work April 24 at arXiv.org.

Adult supervision

Getting it right about 70 percent of the time isn’t nearly accurate enough to trust. But it might be good enough to offer a proceed-with-caution alert when a reader opens up a suspicious story on the web.

In a similar kind of first step, social-media sites could use computer watchdogs to prowl news feeds for questionable stories. Those that get flagged this way could then be sent to human fact-checkers for a closer look.

Today, Facebook considers feedback from users as it decides which stories to fact-check. Those comments might point to skeptical claims or clear falsehoods. The media company then sends these stories to be reviewed by professional skeptics (such as FactCheck.org, PolitiFact or Snopes). And Facebook is open to using other signs to rout possible hoaxes, notes its spokesperson Lauren Svensson.

No matter how good computers get at finding fake news, they shouldn’t totally replace people in fact-checking, Horne says. The final judgment may require a better understanding than today’s computers have for the topics being discussed, for the language or for the culture.

Alex Kasprak is a science writer at Snopes. It’s the world’s oldest and largest online fact-checking site. He imagines a future where fact-checking will be like computer-assisted voice transcription. The first, rough draft of a transcription will attempt to identify what was said. A human still has to review that text, however. Her role: looking for missed details, such as spelling errors or words the program just got wrong. Similarly, computers could compile lists of suspect articles for people to check, Kasprak says. Here, people would then be asked to offer the final assessment of what reads true.

Eyes on the audience

Even as computers get better at flagging bogus stories, there’s no guarantee that fake news creators won’t step up their game to stay just ahead of the truth cops. If computer programs are designed to be skeptical of stories that are overly positive or express lots of certainty, then authors could craft their lies to read that way.

“Fake news, like a virus, can evolve and update itself,” says Daqing Li. He’s a network scientist at Beihang University in Beijing. And he has studied fake news on Twitter. Fortunately, online news stories can be judged on more than their content. And some of these — such as the posts people make about these stories on social media — might be much harder to mask or alter.

For instance, tweets about a certain piece of news can sometimes be good clues to whether a story is true, notes Juan Cao. She’s a computer scientist in Beijing at the Institute of Computing Technology. (It’s part of the Chinese Academy of Sciences.)


Sheeples

The majority of Twitter users who discussed false rumors about two disasters posted tweets that just spread these rumors. Only a small fraction had looked for verification that a story was real or expressed doubt about the stories. 

 
H. Thompson

Cao’s team built a system that could round up the tweets posted on Sina Weibo. That’s China’s version of Twitter. The computer focused on tweets mentioning a particular news event. It then sorted those posts into two groups: tweets that expressed support for the story and those that opposed it. The system then considered several factors to judge how truthful the posts were likely to be. If, for example, the story centered on a local event, the user’s input was seen as more credible than a comment from someone far away. The computer also judged as suspicious the frequent tweets on a story by someone who had not tweeted for a long time. It used other factors, too, to sift through tweets.

Cao’s group tested their system on 73 real and 73 fake stories. (They had been labeled as such by organizations such as Xinhua, China’s state-run News Agency.) The computer sifted through about 50,000 tweets about these stories. It correctly labeled 84 percent of the fakes. Cao’s team described its findings in 2016 at an artificial intelligence conference held in Phoenix, Ariz.

The U.C. Santa Cruz team similarly showed that hoaxes can be picked out of real news based on which users “liked” these stories on Facebook. They reported the finding, last year, at a machine-learning conference in Europe.

A computer might even look at how the story is getting passed around on social media. Li at Beihang University and his colleagues studied the shape of the networks that branch out from news stories on social media. The researchers reviewed reposted stories and the networks of readers that they involved for some 1,700 fake and 500 true news stories on Weibo. They did the same for about 30 fake and 30 true news networks on Twitter. On both social-media sites, Li’s team found, most people tend to repost real news straight from a single source. The same was not true for fake news. These stories tended to spread more through people reposting from other reposters.

A typical network of reposted truthful stories “looks much more like a star,” Li says. “Fake news spreads more like a tree.” Li’s team reported its new findings March 9 at arXiv.org. They suggest that computers could analyze how people engage with social media posts as a test for truthfulness. Here, they wouldn’t even have to put individual posts under the microscope.


Branching out

On Twitter, most people reposting real news (red dots) get it from a single, central source (green dot). Fake news, in contrast, spreads more through people reposting from other reposters.

twitter sharing
Z. ZHAO ET AL/ARXIV.ORG 2018

Truth to the people

What’s the best way to deal with lies on social media networks? Right now, no one really knows. Just erasing bogus articles from news feeds is probably not the way to go. Social media sites that exert such control over what visitors see would be viewed with alarm. It’s similar to what people who live under dictators encounter, notes Murphy Choy. He’s a data analyst at SSON Analytics in Singapore.

Internet sites could put warning signs on stories they suspect of being untrue. But labeling such stories may have an unfortunate effect. It may cause people to think that the remaining unlabeled stories are all true. In fact, they may simply have not been reviewed by a fact-checker. This was the conclusion, anyway, of research posted last year on the Social Science Research Network. Gordon Pennycook in Canada at the University of Regina, and David Rand of Yale University in New Haven, Conn., conducted the study.

Facebook has taken a different approach. It shows debunked stories lower in a user’s news feeds. And that can cut a false article’s future views by as much as 80 percent, company spokesperson Svensson says. Facebook also displays articles that debunk false stories whenever users encounter the related stories. That technique, however, may backfire. Zollo in Venice, Italy, works with Walter Quattrociocchi. They conducted a study of Facebook users who like and share conspiracy news. And after these readers interacted with debunking articles, they actually increased their activity on Facebook’s conspiracy pages, they reported this past June.

There’s still a long way to go in teaching computers — and people — to reliably spot fake news. As the old saying goes: A lie can get halfway around the world before the truth has put on its shoes. But keen-eyed computer programs may at least slow down fake stories with some new ankle weights.

Power Words

(for more about Power Words, click here)

analytics     A term largely used in the business world to mean the interpretation of large quantities of data. Similar to statistics, it has a greater focus on real-world applications.

artificial intelligence     A type of knowledge-based decision-making exhibited by machines or computers. The term also refers to the field of study in which scientists try to create machines or computer software capable of intelligent behavior.

arXiv     A website that posts research papers — often before they are formally published — in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance and statistics. Anyone can read a posted paper at no charge.

bias     The tendency to hold a particular perspective or preference that favors some thing, some group or some choice. Scientists often “blind” subjects to the details of a test (don’t tell them what it is) so that their biases will not affect the results.

bot     (short for web robot) A computer program designed to appear that its actions come from some human. The goal is to have it interact with people or perform automated tasks such as finding and sharing online information through social-media accounts.

browser     (in computing) The software program or application that someone uses to find and retrieve information from web pages on the internet.

colleague     Someone who works with another; a co-worker or team member.

computer program     A set of instructions that a computer uses to perform some analysis or computation. The writing of these instructions is known as computer programming.

confirmation bias     Some idea, phrase, text or speech that agrees with, or confirms, what someone already thinks.

credible      (n. credibility) An adjective meaning believable or convincing

culture      (n. in social science) The sum total of typical behaviors and social practices of a related group of people (such as a tribe or nation). Their culture includes their beliefs, values and the symbols that they accept and/or use. Culture is passed on from generation to generation through learning. Scientists once thought culture to be exclusive to humans. Now they recognize some other animals show signs of culture as well, including dolphins and primates.

database     An organized collection of related data.

engineer     A person who uses science to solve problems. As a verb, to engineer means to design a device, material or process that will solve some problem or unmet need.

erode     Gradual removal of soil or stone, caused by the flow of water or the movement of winds. Or the gradual wearing away of anything else (such as firmly held ideas).

evolve     (adj. evolving) To change gradually over generations, or a long period of time. In living organisms, such an evolution usually involves random changes to genes that will then be passed along to an individual’s offspring. These can lead to new traits, such as altered coloration, new susceptibility to disease or protection from it, or different shaped features (such as legs, antennae, toes or internal organs). Nonliving things may also be described as evolving if they change over time. For instance, the miniaturization of computers is sometimes described as these devices evolving to smaller, more complex devices.

factor     Something that plays a role in a particular condition or event; a contributor.

feedback     A response or assessment that follows some a particular act or decision. Or a process or combination of processes that propel or exaggerate a change in some direction. For instance, as the cover of Arctic ice disappears with global warming, less of the sun’s warming energy will be reflected back into space. This will serve to increase the rate of Earth’s warming. That warming might trigger some feedback (like sea-ice melting) that fosters additional warming.

fiction     (adj. fictional) An idea or a story that is made-up, not a depiction of real events.

filter     (in chemistry and environmental science) A device or system that allows some materials to pass through but not others, based on their size or some other feature.

fraud     To cheat; or the resulting effects of something done by cheating. Or to make a mistake and intentionally cover up the error.

gauge     A device to measure the size or volume of something. For instance, tide gauges track the ever-changing height of coastal water levels throughout the day. Or any system or event that can be used to estimate the size or magnitude of something else. (v. to gauge) The act of measuring or estimating the size of something.

information     (as opposed to data) Facts provided or trends learned about something or someone, often as a result of studying data.

internet     An electronic communications network. It allows computers anywhere in the world to link into other networks to find information, download files and share data (including pictures).

link     A connection between two people or things.

media     (in the social sciences) A term for the ways information is delivered and shared within a society. It encompasses not only the traditional media — newspapers, magazines, radio and television — but also Internet- and smartphone-based outlets, such as blogs, Twitter, Facebook and more. The newer, digital media are sometimes referred to as social media. The singular form of this term is medium.

microscope     An instrument used to view objects, like bacteria, or the single cells of plants or animals, that are too small to be visible to the unaided eye.

network     A group of interconnected people or things. (v.) The act of connecting with other people who work in a given area or do similar thing (such as artists, business leaders or medical-support groups), often by going to gatherings where such people would be expected, and then chatting them up. (n. networking)

online     (n.) On the internet. (adj.) A term for what can be found or accessed on the internet.

random     Something that occurs haphazardly or without reason, based on no intention or purpose.

reservoir     A large store of something. Lakes are reservoirs that hold water. People who study infections refer to the environment in which germs can survive safely (such as the bodies of birds or pigs) as living reservoirs.

Singapore     An island nation located just off the tip of Malaysia in southeast Asia. Formerly an English colony, it became an independent nation in 1965. Its roughly 55 islands (the largest is Singapore) comprise some 687 square kilometers (265 square miles) of land, and are home to more than 5.3 million people.

skeptical     Not easily convinced; having doubts or reservations.

smartphone     A cell (or mobile) phone that can perform a host of functions, including search for information on the internet.

social     (adj.) Relating to gatherings of people; a term for animals (or people) that prefer to exist in groups. (noun) A gathering of people, for instance those who belong to a club or other organization, for the purpose of enjoying each other’s company.

social media     Internet-based media, such as Facebook, Twitter and Tumblr, that allow people to connect with each other (often anonymously) and to share information.

social network     Communities of people (or animals) that are interrelated owing to the way they relate to each other. In humans, this can involve sharing details of their life and interests on Twitter or Facebook, or perhaps belonging to the same sports team, religious group or school.

social science     The scientific study of people and their relationships to each other.

tweet     Message consisting of 140 or fewer characters that is available to people with an online Twitter account.

Twitter     An online social network that allows users to post messages containing no more than 280 characters (until November 2017, the limit had been just 140 characters).

virus     Tiny infectious particles consisting of RNA or DNA surrounded by protein. Viruses can reproduce only by injecting their genetic material into the cells of living creatures. Although scientists frequently refer to viruses as live or dead, in fact no virus is truly alive. It doesn’t eat like animals do, or make its own food the way plants do. It must hijack the cellular machinery of a living cell in order to survive.

Web     (in computing) An abbreviation of World Wide Web, it is a slang term for the internet.

Citation

Journal: F. Zollo and W. Quattrociocchi. Misinformation spreading on FacebookComplex Spreading Phenomena in Social Systems. Published online June 22, 2018, p. 177. 10.1007/978-3-319-77332-2_10.

Journal: G. Bastidas et al. Semi-supervised content-based detection of misinformation via tensor embeddings. arXiv:1804.09088. Posted April 24, 2018.

Journal: S. Vosoughi et al. The spread of true and false news online. Science. Vol. 359, March 9, 2018, p. 1146. doi:10.1126/science.aap9559.

Journal: Z. Zhao et al. Fake news propagate differently from real news even at early stages of spreading. arXiv:1803.03443. Posted March 9, 2018.

Journal: E. Tacchini et al. Some like it hoax: Automated fake news detection in social networks. ECML PKDD, Skopje, Macedonia, September 18, 2017.

Journal: G. Pennycook and D.G. Rand. The implied truth effect: Attaching warnings to a subset of fake news stories increases perceived accuracy of stories without warningsSSRN. Posted September 14, 2017. doi: 10.2139/ssrn.3035384.

Journal: K. Shu et al. Fake news detection on social media: A data mining perspective. arXiv.org. August 7, 2017.

Journal: B. Horne and S. Adali. This just in: Fake news packs a lot in title, uses simpler, repetitive Content in text body, more similar to satire than real news. International Conference on Web and Social Media, Montreal, Canada, May 15, 2017.

Journal: Z. Jin et al. News verification by exploiting conflicting social vewpoints in microblogs. Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Ariz., February 14, 2016.

Journal: G. Ciampaglia et al. Computational fact checking from knowledge networksPLOS One. Published online June 17, 2015. doi: 10.1371/journal.pone.0128193.

Further Reading