When a study can’t be replicated | Science News for Students

When a study can’t be replicated

Many factors can prevent scientists from repeating research and confirming results
Sep 11, 2015 — 7:00 am EST
white male scientist

Sometimes the findings of research that was done well can’t be replicated — confirmed by other scientists. The reasons may vary or never be fully understood, new studies find. 

ViktorCap / iStockphoto

In the world of science, the gold standard for accepting a finding is seeing it “replicated.” To achieve this, researchers must repeat a study and find the same conclusion. Doing so helps confirm that the original finding wasn’t a fluke — one due to chance.

Yet try as they might, many research teams cannot replicate, or match, an original study’s results. Sometimes that occurs because the original scientists faked the study. Indeed, a 2012 study looked at more than 2,000 published papers that had to be retracted — eventually labeled by the publisher as too untrustworthy to believe. Of these, more than 65 percent involved cases of misconduct, including fraud.

But even when research teams act honorably, their studies may still prove hard to replicate, a new study finds. Yet a second new analysis shows how important it is to try to replicate studies. It also shows what researchers can learn from the mistakes of others.

The first study focused on 100 human studies in the field of psychology. That field examines how animals or people respond to certain conditions and why. The second study looked at 38 research papers reporting possible explanations for global warming. The papers presented explanations for global warming that run contrary to those of the vast majority of the world’s climate scientists.

Both new studies set out to replicate the earlier ones. Both had great trouble doing so. Yet neither found evidence of fraud. These studies point to how challenging it can be to replicate research. Yet without that replication, the research community may find it hard to trust a study’s data or know how to interpret what those data mean.

Trying to make sense of the numbers

Brian Nosek led the first new study. He is a psychologist the University of Virginia in Charlottesville. His research team recruited 270 scientists. Their mission: to reproduce the findings of 100 previously published studies. All of the studies had appeared in one of three major psychology journals in 2008. In the end, only 35 of the studies could be replicated by this group. The researchers described their efforts in the August 28 issue of Science.

Two types of findings proved hardest to confirm. The first were those that originally had been described as unexpected. The second were ones that had barely achieved statistical significance. That raises concerns, Nosek told Science News, about the common practice of publishing attention-grabbing results. Many of those types of findings appear to have come from data that had not been statistically strong. Such studies may have included too few individuals. Or they may have turned up only weak signs of an effect. There is a greater likelihood that such findings are the result of random chance.

No one can say why the tests by Nosek’s team failed to confirm findings in 65 percent of their tries. It’s possible the initial studies were not done well. But even if they had been done well, conflicting conclusions raise doubts about the original findings. For instance, they may not be applicable to groups other than the ones initially tested.

Rasmus Benestad works at the Norwegian Meteorological Institute in Oslo. He led the second new study. It focused on climate research.

In climate science, some 97 percent of reports and scientists have come to a similar conclusion: that human activities, mostly the burning of fossil fuels, are a major driver of a recent global warming. The 97 percent figure came from the United Nations’ Intergovernmental Panel on Climate Change. This is a group of researchers active in climate science. The group reviewed nearly 12,000 abstracts of published research findings. It also received some 1,200 ratings by climate scientists of what the published data and analyses had concluded about climate change. Nearly all came up with the same source: us.

But what about the other 3 percent? Was there something different about those studies? Or could there be something different about the scientists who felt that humans did not play a big role in global warming? That’s what this new study sought to probe. It took a close look at 38 of these “contrarian” papers.

Benestad’s team attempted to replicate the original analyses in these papers. In doing so, the team pored over the details of each study. Along the way, they identified several common problems. Many started with false assumptions, the new analysis says. Some used a faulty analysis. Others set up an improper hypothesis for testing. Still others used “incorrect statistics” for making their analyses, Benestad's group reports. Several papers also set up a false either/or situation. They had argued if one thing influenced global warming, then the other must not have. In fact, Benestad’s group noted, that logic was sometimes faulty. In many cases, both explanations for global warming might work together.

Mistakes or an incomplete understanding of previous work by others could lead to faulty assessments, Benestad’s team concluded. Its new analysis appeared August 20 in Theoretical and Applied Climatology.

What to make of this?

It might seem like it should be easy to copy a study and come up with similar findings. As the two new studies show, it’s not. And there can be a host of reasons why.

Some investigators have concluded that it may be next to impossible to redo a study exactly. This can be true especially when a study works with subjects or materials that vary greatly. Cells, animals and people are all things that have a lot of variation. Due to genetic or developmental differences, one cell or individual may respond differently to stimuli than another will. Stimuli might include foods, drugs, infectious germs or some other aspect of the environment.

Similarly, some studies involve conditions that are quite complicated. Examples can include the weather or how crowds of people behave. Consider climate studies. Computers are not yet big enough and fast enough to account for everything that affects climate, scientists note. Many of these factors will vary broadly over time and distance. So climate scientists choose to analyze the conditions that seem the most important. They may concentrate on those for which they have the best or the most data. If the next group of researchers uses a different set of data, their findings may not match the earlier ones.

Eventually, time and more data may show why the findings of an original study and a repeated one differ. One of the studies may be found weak or somewhat flawed. Perhaps both will be.

This points to what can make advancing science so challenging. “Science is never settled, and both the scientific consensus and alternative hypotheses should be subject to ongoing questioning,” Benestad’s group argues.

Researchers should try to prove or disprove even those things that have been considered common knowledge, they add. Resolving differences in an understanding of science and data is essential, they argue. That is true in climate science, psychology and every other field. After all, without a good understanding of science, they say, society won’t be able to make sound decisions on how to create a safer, healthier and more sustainable world.


Power Words

(for more about Power Words, click here)

abstract  Something that exists as an idea or thought but not concrete or tangible (touchable) in the real world. Beauty, love and memory are abstractions; cars, trees and water are concrete and tangible. (in publishing) A short summary of a scientific paper, a poster or a scientist’s talk. Abstracts are useful to determine whether delving into the details of the whole scientific paper will yield the information you seek.

Anthropocene Term coined by scientists to describe the age in which humans have been the strongest force of change on the planet. It is generally believed to date from at least the dawn of the Nuclear Age (in the middle 1940s), and possibly even earlier — from the beginning of the Industrial Revolution in the early 1800s.

anthropogenic   An adjective that describes a human influence on something. It was coined by putting together the prefix “anthro,” meaning human, and the suffix “genic,” meaning caused by.

climate  The weather conditions prevailing in an area in general or over a long period.

climate change Long-term, significant change in the climate of Earth. It can happen naturally or in response to human activities, including the burning of fossil fuels and clearing of forests.

climatology    The study of climate over seasons, decades or millennia. Climate varies over time and this field looks at measuring all aspects of climate and using such data to better understand what factors are behind those changes.

consensus   An opinion or conclusion shared by most if not all of a specific group.

data  Facts and statistics collected together for analysis but not necessarily organized in a way that give them meaning. For digital information (the type stored by computers), those data typically are numbers stored in a binary code, portrayed as strings of zeros and ones.

development  (adj. developmental) In biology: The growth of an organism from conception through adulthood, often undergoing changes in chemistry, size and sometimes even shape.

fossil fuel Any fuel — such as coal, petroleum (crude oil) or natural gas —  that has developed in the Earth over millions of years from the decayed remains of bacteria, plants or animals.

fraud   To cheat; or the resulting effects of something done by cheating. Or to make a mistake and intentionally cover up the error.

genetic Having to do with chromosomes, DNA and the genes contained within DNA. The field of science dealing with these biological instructions is known as genetics. People who work in this field are geneticists.

gold standard    A common term used to mean the premier currently most reliable standard for judging the quality or authenticity of something.

hypothesis   (plural: hypotheses)A proposed explanation for a phenomenon. In science, a hypothesis is an idea that must be rigorously tested before it is accepted or rejected.

Intergovernmental Panel on Climate Change, or IPCC.   This international group keeps tabs on the newest published research on climate and on how ecosystems are responding to it. The United Nations Environment Programme and the World Meteorological Organization jointly created the IPCC in 1988. Their aim was to provide the world with a clear scientific view on the current state of knowledge in climate change and its potential environmental and social impacts.

meteorology  (adj. meteorological) The study of weather as it pertains to future projects or an understanding of long-term trends (climate). People who work in this field are called meteorologists.

psychology  The study of the human mind, especially in relation to actions and behavior. To do this, some perform research using animals. Scientists and mental-health professionals who work in this field are known as psychologists.

replicate  (in experimentation) To copy an earlier test or experiment — often an earlier test performed by someone else — and get the same general result. Replication depends upon repeating every step of a test, one by one. If a repeated experiment generates the same result as in earlier trials, scientists view this as verifying that the initial result is reliable. If results differ, the initial findings may fall into doubt. Generally, a scientific finding is not fully accepted as being real or true without replication.

statistical significance  In research, a result is significant (from a statistical point of view) if the likelihood that an observed difference between two or more conditions would be due to chance. Obtaining a result that is statistically significant means there is a very high likelihood that any difference that is measured was not the result of random accidents.

statistics The practice or science of collecting and analyzing numerical data in large quantities and interpreting their meaning. Much of this work involves reducing errors that might be attributable to random variation. A professional who works in this field is called a statistician.

stimulus    (plural: stimuli) Something that prompts a response in a living organism or in a controlled environment (including a chemical or physical test system).

sustainability  (n: sustainable) To use resources in a way that they will continue to be available in the future.

weather Conditions in the atmosphere at a localized place and a particular time. It is usually described in terms of particular features, such as air pressure, humidity, moisture, any precipitation (rain, snow or ice), temperature and wind speed. Weather constitutes the actual conditions that occur at any time and place. It’s different from climate, which is a description of the conditions that tend to occur in some general region during a particular month or season.

Further Reading

S. Ornes. “Retractions: Righting the wrongs of science.” Science News for Students. Sept. 11, 2015.

B. Bower. “Psychology results evaporate upon further review.” Science News. August 27, 2015.

B. Brookshire. “Oops. Correcting scientific errors.” Eureka! Lab blog. August 25, 2015.

K. Kowalski. “Explainer: Correlation, causation, coincidence and more.” Science News. July 24, 2015.

B. Geiger. “The heat that keeps on giving.” Science News for Students. July 6, 2015.

B. Brookshire. “Cookie Science 15: Results aren’t always sweet.” Eureka! Lab blog. April 21, 2015.

B. Brookshire. “Statistics: Make conclusions cautiously.” Eureka! Lab blog. November 3, 2014.

B. Brookshire. “Cookie Science 8: The meaning of the mean.” Eureka! Lab blog. November 14, 2014.

B. Brookshire. “Cookie Science 9: How data can spread.” Eureka! Lab blog. November 17, 2014.

S. Oosthoek. “World leaders call for action on climate change.” Science News for Students. November 12, 2014.

Original Journal Source:  Open Science Collaboration. Estimating the reproducibility of psychological science. Science. Vol. 349, August 28, 2015, p. aac4716-1. doi:10.1126/science.aac4716.

Original Journal Source:  R.E. Benestad et al. Learning from mistakes in research. Theoretical and Applied Climatology. Published early online August 20, 2015.

Original Journal Source: F.C. Fang et al. Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy Sciences of the United States of America. Vol. 109, Oct. 16, 2012, p. 17028.  doi: 10.1073/pnas.1212247109.