Software spots evidence of potential illness in texts of restaurant reviews on Yelp
Apr 20, 2018 — 6:45 am EST
People who get sick at restaurants may post about it online rather than file a complaint with the health department. Now computers can scout for these posts and alert officials.


Reviews of restaurants on social media can be helpful. They may guide you to a popular eatery, or point you to what dishes are popular. They might also tell you when and where the food left someone feeling ill. Now scientists are training computers to scout online reviews for such signs of sickening food.

Tainted food and drink sicken some 48 million people in the United States each year. That’s according to the Centers for Disease Control and Prevention. Nearly seven in every 10 of those incidents came after dining in restaurants.

Government agencies collect reports of these incidents. They typically learn of them when someone calls in a complaint to the local health department. Not everyone will do that. There is, however, a good chance someone posted about it online. Those posts can be helpful to epidemiologists (Ep-ih-dee-me-OLL-oh-gizts). These researchers study disease outbreaks and their spread.

Online sources “may give you an early signal of something happening,” says Mauricio Santillana. He works at Harvard University Medical School in Boston, Mass. He was not involved in the study. An expert in digital epidemiology, he uses the internet to track disease.

In 2012, scientists at Columbia University in New York City built a computer program to help public-health officials. They taught it to read local restaurant reviews on the social-media site Yelp. Epidemiologists told the computer program what it should look for. They did this by labeling clues in 500 reviews.

The prototype computer program then looked for specific words that might point to illness. These words included sick, vomit, diarrhea and food poisoning. If any of these words appeared, the computer would flag the review. It also checked to see if a review indicated that multiple people had gotten sick. This might signal an outbreak. That’s when there’s a sudden uptick in the number of cases of a particular illness. Epidemiologists with New York City’s health department then read the flagged Yelp reviews. When they spotted a suspected outbreak, they sent out experts to investigate.

“There are too many reviews for health-department epidemiologists to read and manually search,” says Tom Effland. He is a computer scientist at Columbia.

The computer program Effland worked on made what was impossible possible. It allowed health-department epidemiologists to monitor millions of reviews. In doing so, they detected 10 disease outbreaks. They also turned up 8,523 complaints of food poisoning from local restaurants. These cases all occurred between July 2012 and May 2017.

Effland’s group shared its findings January 10 in the Journal of the American Medical Informatics Association.

The restaurant reviews helped point to disease outbreaks that health officials otherwise might have missed. A pilot test of the computer program ran for just a few months. It showed that most foodborne disease had not been reported to city health officials. Only three in every 100 cases had been phoned in to the official complaint center.

Younger people spend much of their life interacting with social media. They are less likely than their parents to report illness through traditional ways, the study’s authors note. That makes scouting for illness on social-media sites ever more useful, they add.

Effland and his colleagues are now working to improve their system. They also are exploring how to apply this approach to other social-media sites, such as Twitter.

Power Words

Centers for Disease Control and Prevention, or CDC     An agency of the U.S. Department of Health and Human Services, based in Atlanta, Ga. CDC is charged with protecting public health and safety by working to control and prevent disease, injury and disabilities. It does this by investigating disease outbreaks, tracking exposures by Americans to infections and toxic chemicals, and regularly surveying diet and other habits among a representative cross-section of all Americans.

colleague     Someone who works with another; a co-worker or team member.

computer program     A set of instructions that a computer uses to perform some analysis or computation. The writing of these instructions is known as computer programming.

diarrhea     (adj. diarrheal) Loose, watery stool (feces) that can be a symptom of many types of microbial infections affecting the gut.

digital     (in computer science and engineering)  An adjective indicating that something has been developed numerically on a computer or on some other electronic device, based on a binary system (where all numbers are displayed using a series of only zeros and ones).

epidemiologist     Like health detectives, these researchers figure out what causes a particular illness and how to limit its spread.

generation     A group of individuals (in any species) born at about the same time or that are regarded as a single group. Your parents belong to one generation of your family, for example, and your grandparents to another. Similarly, you and everyone within a few years of your age across the planet are referred to as belonging to a particular generation of humans.

hygiene     Behaviors and practices that help to maintain health.

informatics     The study of how humans create, process and understand information. Informatics is useful in many areas such as healthcare, ecology and studies of human behavior.

internet     An electronic communications network. It allows computers anywhere in the world to link into other networks to find information, download files and share data (including pictures).

media     (in the social sciences) A term for the ways information is delivered and shared within a society. It encompasses not only the traditional media — newspapers, magazines, radio and television — but also Internet- and smartphone-based outlets, such as blogs, Twitter, Facebook and more. The newer, digital media are sometimes referred to as social media. The singular form of this term is medium.

monitor     To test, sample or watch something, especially on a regular or ongoing basis.

online     (n.) On the internet. (adj.) A term for what can be found or accessed on the internet.

outbreak     The sudden emergence of disease in a population of people or animals. The term may also be applied to the sudden emergence of devastating natural phenomena, such as earthquakes or tornadoes.

prototype     A first or early model of some device, system or product that still needs to be perfected.

social     (adj.) Relating to gatherings of people; a term for animals (or people) that prefer to exist in groups. (noun) A gathering of people, for instance those who belong to a club or other organization, for the purpose of enjoying each other’s company.

social media     Internet-based media, such as Facebook, Twitter and Tumblr, that allow people to connect with each other (often anonymously) and to share information.

Twitter     An online social network that allows users to post messages containing no more than 280 characters (until November 2017, the limit had been just 140 characters).


