The evidence for nonreproducibility in basic and preclinical biomedical research is compelling. Accumulating data from diverse subdisciplines and types of experimentation suggest numerous problems that can create a fertile ground for nonreproducibility.1 For example, most raw data and protocols are often not available for in-depth scrutiny and use by other scientists. The current incentive system rewards selective reporting of success stories. There is poor use of statistical methods, and study designs are often suboptimal. Simple laboratory flaws—eg, contamination or incorrect identification of widely used cell lines—occur with some frequency.
The scientific community needs to recognize and respond effectively to these problems. Survey data suggest that the majority of scientists acknowledge that they have been unable to replicate the work of other scientists or even their own work.2 The National Institutes of Health have struggled to improve the situation.3 However, whatever improvements are needed to enhance science, they must not worsen an already daunting bureaucracy. Some scientists suggest that reproducibility is not a problem, confusing the high potential value of this research with immunity to bias.
Empirical efforts of reproducibility checks performed by industry investigators on a number of top-cited publications from leading academic institutions have shown reproducibility rates of 11% to 25%.4,5 Critics pointed out that these empirical assessments did not fully adhere to advocated principles of open, reproducible research (eg, full data sharing), so the lack of reproducibility may have occurred because of the inability to exactly reproduce the experiment. However, in the newest reproducibility efforts,6 all raw data become available, full protocols become public before any experiments start, and article reports are preregistered. Moreover, extensive efforts try to ensure the quality of the materials used and the rigor of the experimental designs (eg, randomization). In addition, there is extensive communication with the original authors to capture even minute details of experimentation. Results published recently7 on 5 cancer biology topics are now available for scrutiny, and they show problems. In brief, in 3 of them, the reproducibility efforts could not find any signal of the effect shown in the original study, and differences go beyond chance; eg, a hazard ratio for tumor-free survival of 9.37 (95% CI, 1.96-44.71) in the original study vs 0.71 (95% CI, 0.25-2.05) in the replication effort. However, in 2 of these 3 topics, the experiments could not be executed in full as planned because of unanticipated findings; eg, tumors growing too rapidly or regressing spontaneously in the controls. In the other 2 reproducibility efforts, the detected signal for some outcomes was in the same direction but apparently smaller in effect size than originally reported.
Acknowledging due caution given the small number of topics examined so far, what do these results mean? Reproducibility of inferences may be as contested as reproducibility of results.8 Original authors of nonreproduced studies may point to other evidence that indirectly supports their original claims or may question the competence of the reproducibility efforts. A similar debate evolved in psychological science in which findings from 64 of 100 top-impact articles could not be reproduced,8 yet some psychologists still failed to see anything concerning in these results and defended the status quo.
When results disagree, it is impossible to be 100% certain whether the original experiments, the subsequent experiment, both, or none are correct or wrong.7 However, the recurrent nonreproducibility and the large diversity in results are concerning. The reproducibility efforts have generally followed high standards, with full transparency and meticulous attention to detail. If those efforts could not reproduce the original findings, it is unlikely that the average laboratory investigator (who probably spends less effort to so meticulously repeat experiments by other scientists) will be able to do this. Furthermore, the reproducibility effort demonstrated that unanticipated outcomes (eg, unforeseen spontaneous regression of tumors) further complicate experiments. Outcomes diverge even with minor modifications in the experimental conditions.
Biological processes are very complex and multifactorial. To have robust understanding of an experiment, scientists need to capture all the major factors that affect the outcome. This may be quite difficult. Apparently, authors do not report in sufficient detail factors in laboratory experiments that are essential for repeating the process. However, even if these factors were communicated in detail, would a process that is so sensitive to background conditions be of interest for translational processes; eg, for developing a treatment or prognostic or diagnostic test for widespread clinical use? If a research finding changes abruptly, seemingly randomly, with minute experimental manipulation in animals or cell cultures, is it going to work reliably when involving the even more complex biology of individual humans?
Overall, basic and preclinical research probably have a much larger challenge of nonreproducibility than clinical research. Sample sizes are generally smaller, statistical literacy is often limited, there is limited external oversight or regulation, and investigator conflicts to publish significant results (“publish or perish”) are probably as potent as investigator and sponsor conflicts in clinical research. Nonreproducibility may be a key reason for the low rate of translation to clinical advances of these seemingly spectacular but spurious biological reports.
Three dimensions may explain much of the complex problem of nonreproducibility in bench research: the misaligned reward/incentives system, use of poor research methods, and lack of transparency. These dimensions may also be areas in which effective interventions may be considered. Rewards and incentives should focus on reproducible results, open science, transparency, rigorous experimental methods, and efficient safeguards. For example, funding, hiring and promotion decisions could consider whether a scientist has a record of sharing data, protocols and software, and high-quality experimental standards. Research methods can be improved similarly to the changes that have occurred in clinical investigation, particularly in the conduct and reporting of randomized clinical trials. The majority of basic and preclinical experiments are not randomized or properly controlled and the results are read without the investigators being blinded, allowing for experimenter bias. There is also no preregistration, and discarding “negative” results is common. Typically, peer review for what are mostly complex experiments is unavoidably superficial. Peer reviewers often judge manuscripts on their beliefs and aesthetics. Open, transparent practices would make meaningful peer review possible either at the journal submission stage or at other stages; eg, in assessing preprints or postpublication by any scientist who wants to reproduce the published work. Reproducibility efforts would also benefit from transparent registration. The results could then be compared against what the the investigators originally set out to do.
In addition, for research that goes beyond just curiosity and is intended to provide specific clinical deliverables (such as new drugs or diagnostics), reproducing the published experiments does not suffice. Selection bias is still possible. Results may not be generalizable to other experimental settings and may not be applicable to humans. The most promising and seemingly reproduced deliverables should eventually be validated in rigorous clinical research.9 For example,7 one of the recently contested reevaluated results was whether cimetidine was effective for lung cancer. The reproducibility check in mouse xenografts showed similar (but not formally statistically significant) results. The real check, however, is whether cimetidine can increase survival in patients with lung cancer in a rigorous randomized trial.
What is the future of reproducibility checks? The reproducibility checks performed to date are still relatively few and include few replications each. It would be useful to understand which disciplines have high consistency in their results, which show high heterogeneity, and which have consistently nonreplicated results. Of course, it makes no sense to perform reproducibility checks for everything. Checks could be prioritized for pivotal studies on which many other investigations depend (eg, those that are well cited and used by other scientists) and those that are reaching the point of translation for human use.1 Nonreproducibility may offer insights about why results do not replicate and what the important parameters are that shape response in experimental systems. In this way, they may offer the essential knowledge that a single original publication, no matter how excellent, misses by default.
The reproducibility checks also offer an opportunity for scientific disciplines to reflect on how they can improve their research practices. Use of standard optimal methods in study design, conduct, and analysis requires proper training and continuing methodological education of scientists and their teams. Registration of protocols and even of full reports are potentially applicable when specific hypotheses, as opposed to exploratory research, are tested. The research community should reassess whether it can have the luxury of continuing to fund so much research that is nonreproducible. Basic and preclinical science is extremely important. It has contributed to the development of vaccines, new drugs, and understanding of disease and improved human health. Its overall funding is worth defending and increasing. However, as the research enterprise gets larger and enters many new areas of investigation, reproducibility assessments may help funders and the scientific community prioritize investments in specific areas.
Corresponding Author: John P. A. Ioannidis, MD, DSc, Stanford University, Stanford Prevention Research Center, Medical School Office Bldg, Room X306, 1265 Welch Rd, Stanford, CA 94305 (email@example.com).
Published Online: February 13, 2017. doi:10.1001/jama.2017.0549
Conflict of Interest Disclosures: The author has completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Ioannidis reports having been an unpaid member of the scientific advisory board of the Center for Open Science and the Reproducibility Initiative. No other disclosures were reported.
Ioannidis JPA. Acknowledging and Overcoming Nonreproducibility in Basic and Preclinical Research. JAMA. Published online February 13, 2017. doi:10.1001/jama.2017.0549