[Skip to Navigation]
Sign In
March 22/29, 2016

Data Sharing: An Ethical and Scientific Imperative

Author Affiliations
  • 1Editor in Chief, JAMA
  • 2Deputy Editor, JAMA
  • 3Executive Editor, JAMA
JAMA. 2016;315(12):1238-1240. doi:10.1001/jama.2016.2420

Sharing data produced from clinical trials has 2 principal purposes: verification of the original analysis and hypothesis generation. It has the potential to advance scientific discovery, improve clinical care, and increase knowledge gained from data collected in these trials. As such, data sharing has become an ethical and scientific imperative. The data sharing process has generated controversy,1-5 for example, about which data should be shared, with whom, and how quickly. However, there is limited information to help guide the discussion.

In this issue of JAMA, Navar et al6 detail the experience of data sharing that has occurred in 3 clinical trial databases to which 14 pharmaceutical companies have contributed individual patient data from 3255 trials. Through 2015, 154 studies from 177 proposals that had been processed and met initial requirements had been approved. In a previous study in JAMA, Ebrahim and colleagues7 searched MEDLINE from inception through March 2014 and identified 37 reanalyses that were reported in 36 published articles. The authors concluded that 13 of the reanalyses led to different interpretations compared with the original article, but they did not specify whether the reanalyses fundamentally changed the implications of the trial findings with respect to either ongoing research or clinical care.

These 2 studies shed some light on what is known about data sharing. First, given that close to 500 000 clinical trials have been published in MEDLINE,8 the number of studies that represent reanalysis is very small. Second, the request for access to data to reanalyze as currently offered by some pharmaceutical companies is not common. It is quite possible that with time, reflecting increasing interest in reanalysis for any purpose, the number will increase.

The discussion around the need for data sharing has focused on several key issues. One concern is that the evidence base resulting from primary trial publications may be incomplete, and data are lost to the research and patient communities if they are not shared and made available for reanalysis. In support of this hypothesis is the important contribution that meta-analysis of published data has had in advancing clinical care. However, there is even greater potential for deriving new insights from meta-analysis of individual participant data, which requires sharing of data and collaboration among investigators. Routine sharing of clinical trial data would facilitate individual participant data meta-analyses, with a goal of achieving estimates of the effects of interventions studied in individual clinical trials that are more precise, accurate, and directly applicable to different types of patients—a form of precision medicine.9

Another important issue is that some researchers either intentionally or inadvertently do not always report important findings from their investigations. For instance, previous reports involving rofecoxib10 and oseltamivir11 illustrate the problem of incomplete or misleading reporting of clinical trial data. The advent and widespread implementation of trial registration and the detailed review of trial registration information, trial protocols, and statistical analysis plans in the evaluation of clinical trial manuscripts by editors, peer reviewers, and readers has likely reduced inaccurate reporting of clinical trials, but cannot eliminate the problem. By having all data available for reexamination and replication of analyses, data sharing may help ensure that the publications have fidelity to the trial plan. It may inhibit withholding important results, or at least allow them to be discoverable.

However, perhaps the most compelling reason in support of data sharing has received less attention: the responsibility to study participants. According to a recent position statement from the International Committee of Medical Journal Editors (ICMJE), “The ICMJE believes that there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.”12 Such risks may include not only major physical or psychological harms, but also more minor and common harms such as discomfort, inconvenience, and loss of work time. The informed consent process makes clear that participants in trials should not expect benefit to themselves as a result of their participation. The social contract for taking these risks and experiencing these harms imposes an ethical obligation that the results lead to the greatest possible benefit to society.

This contract is violated if the trial fails to provide useful information, for example, if it is underpowered and uninterpretable because inaccurate assumptions were made about the potential effect of the intervention, or if the trial is executed in a way that leads to extremely high dropout or introduces other sources of bias. An extension of this is that the contract is also violated if society is not able to make full use of the data generated by trial participants, or if the analyses are invalid or incomplete. Because it is unlikely that the original researchers can or will conduct all useful analyses, the data must be available to other researchers to continue to gain insights or to replicate the findings. The absence of data sharing therefore constitutes a failure of the obligation of researchers to the study participants, and therefore a failure of the ethical underpinnings of conducting clinical trials.

Sharing of the data that serve as the basis for the results reported in published clinical trials will happen in the coming years. Data sharing is increasingly being mandated by trial sponsors and has been supported by numerous influential groups, including the Institute of Medicine/National Academy of Medicine, European Medicines Agency, and ICMJE.12-14 Journals, funders, investigators, and industry must find common ground to ensure data sharing occurs. However, it is far easier to call for data sharing than to create a system that protects the privacy of patients and is efficient, effective, and fair to the investigators who have collected the data. Three issues need to be addressed for a system to be successful.

First, as another ethical obligation to the individual study participants, the shared data must be deidentified for their protection. It is increasingly clear that effective data deidentification may be difficult to achieve.15 Regardless of whether an investigator believes data are deidentified, it is important that the actual success of deidentification be validated. Moreover, during the informed consent process at the time of enrollment into a clinical trial, patients must be made aware that their deidentified data may be shared, and they must have an opportunity to decline to participate in a clinical trial because of this specific mandate. It is possible that 2 levels of consent may be necessary, one for consent to participate in the current trial and to have data used by these researchers, and a second to have the data shared with others. However, it is also possible that some vulnerable groups of patients who have had concerns about participating in research because of past ethical lapses by investigators may be less willing to participate in research if they are asked to share their data. If so, this could lead to underrepresentation of marginalized populations in some studies, with consequent limitations in the ability to generalize the trial results.

Second, a system of sharing data must be efficient. Investigators who have conducted clinical trials are aware of the complexities of contemporary trial design and the underlying statistics and methods. Clinical trials often include detailed procedures and documentation (eg, a “play book”) about how each data element and variable are coded. Even with a detailed analytic plan, data interpretation may be necessary during the analytic phase. If another researcher not involved with the conduct of the initial trial tries to verify an analysis, who is responsible for ensuring that precisely the same procedures are followed, and who will provide the resources that this replication requires? If funders mandate sharing of data they should consider providing support for the original investigators to help others reanalyze the data. If support for data sharing is unavailable from the trial sponsors, it may be necessary for those requesting the data and who need assistance to provide support for the effort required by the original investigators.

Third, any system of data sharing must be fair to and respect the investment and contributions of the original trial investigators. Many clinical trials take years to conceive, conduct, and analyze. What is the most appropriate mechanism to ensure credit is given to those individuals who have participated in the original data collection? Should one or all of the authors of the original investigation be offered authorship in the reanalysis, or at least acknowledgment, or would this taint the credibility of the reanalysis? Should the original authors have no role in authorship but be expected to assist the individuals in the reanalysis if questions arise during this process? Should there be a fixed time after study conclusion or primary publication during which period there would be a moratorium on publications by investigators other than the original researchers? Should there be an independent and neutral group that determines the appropriateness of requests for the data?

In considering these issues, it may be important to differentiate between the use of shared data for verification of the original analyses and for novel hypothesis generation and analysis. Christakis and Zimmerman4 have suggested an approach to reanalyses focused on verification of findings of the original analyses: the methodological approach must be explicitly stated and justified in advance; the issue of financial conflict of interest and intellectual bias should be minimized in the reanalysis; differences in outcomes between analyses should be described in detail; and authors of the initial analysis should be able to review and comment on the reanalysis. Novel analyses should meet the same standards regarding a priori specification of hypotheses and methods, as well as conflicts of interest, but there may be less of a role for review and comment by the original authors.

In the past 2 decades, criteria for authorship, trial registration, and conflict of interest have been more clearly defined, and the reporting of scientific results has improved because of these changes. Data sharing has now emerged as the next critical advance in the conduct of clinical research. Despite the complexity of the task, and the concerns that this may generate among investigators, mandated data sharing of the results of clinical trials should and will occur in the future. However, greater understanding about the details surrounding data sharing is needed, including how it should be operationalized, and who and what systems are available to support it.

Effective implementation of data sharing will prove far more difficult than implementation of the requirement to register clinical trials. Conducting clinical trials is already expensive, so it will be important to create an effective, efficient, and fair system for data sharing that is not hugely burdensome and expensive to ensure it is worth the intellectual and financial cost; will achieve the goals of advancing scientific discovery, improving clinical care, and maximizing knowledge from clinical trial data; and above all, will fulfill the ethical obligations to participants in clinical trials.

Back to top
Article Information

Corresponding Author: Howard Bauchner, MD, JAMA (howard.bauchner@jamanetwork.org).

Ross  JS, Krumholz  HM.  Ushering in a new era of open science through data sharing: the wall must come down.  JAMA. 2013;309(13):1355-1356.PubMedGoogle ScholarCrossref
Lo  B.  Sharing clinical trial data: maximizing benefits, minimizing risk.  JAMA. 2015;313(8):793-794.PubMedGoogle ScholarCrossref
Krumholz  HM, Peterson  ED.  Open access to clinical trials data.  JAMA. 2014;312(10):1002-1003.PubMedGoogle ScholarCrossref
Christakis  DA, Zimmerman  FJ.  Rethinking reanalysis.  JAMA. 2013;310(23):2499-2500.PubMedGoogle ScholarCrossref
Longo  DL, Drazen  JM.  Data sharing.  N Engl J Med. 2016;374(3):276-277.PubMedGoogle ScholarCrossref
Navar  AM, Pencina  MJ, Rymer  JA, Louzao  DM, Peterson  ED.  Use of open access platforms for clinical trial data.  JAMA. doi:10.1001/jama.2016.2374.Google Scholar
Ebrahim  S, Sohani  ZN, Montoya  L,  et al.  Reanalyses of randomized clinical trial data.  JAMA. 2014;312(10):1024-1032.PubMedGoogle ScholarCrossref
Chavalarias  D, Wallach  JD, Li  AHT, Ioannidis  JPA.  Evolution of reporting P values in the biomedical literature, 1990-2015.  JAMA. 2016;315(11):1141-1148.Google ScholarCrossref
Berlin  JA, Golub  RM.  Meta-analysis as evidence: building a better pyramid.  JAMA. 2014;312(6):603-605.PubMedGoogle ScholarCrossref
Karha  J, Topol  EJ.  The sad story of Vioxx, and what we should learn from it.  Cleve Clin J Med. 2004;71(12):933-939.PubMedGoogle ScholarCrossref
Loder  E, Tovey  D, Godlee  F.  The Tamiflu trials.  BMJ. 2014;348:g2630. doi:10.1136/bmj.g2630.PubMedGoogle ScholarCrossref
Taichman  DB, Backus  J, Baethge  C,  et al.  Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors.  JAMA. 2016;315(5):467-468.PubMedGoogle ScholarCrossref
Committee on Strategies for Responsible Sharing of Clinical Trial Data. Sharing clinical trial data: maximizing benefits, minimizing risk. IOM website. http://iom.nationalacademies.org/Reports/2015/Sharing-Clinical-Trial-Data.aspx. Accessed March 1, 2016.
Publication and access to clinical data: an inclusive development process. European Medicines Agency website. http://www.ema.europa.eu/ema/index.jsp?curl=pages/special_topics/general/general_content_000556.jsp. Accessed March 1, 2016.
de Montjoye  Y-A, Radaelli  L, Singh  VK, Pentland  AS.  Identity and privacy—unique in the shopping mall: on the reidentifiability of credit card metadata.  Science. 2015;347(6221):536-539.PubMedGoogle ScholarCrossref
2 Comments for this article
Impact of Big Data Paradox on Data Sharing
Gary LaFever | Founder & CEO, Anonos Inc.
This article notes the undesired effect of statistically relevant – and potentially discovery rich – data subjects refusing to consent to use of their data so that important, original data may not be included in data sets for analysis (see the paragraph with footnote 15 to “unicity” article).

Jim Waldo, Harvard University CTO, and fellow researchers highlight further errors introduced by traditional approaches to de-identification that purposefully (1) reduce accuracy, (2) restrict access, and/or (3) delete data - every single one of which is inconsistent with the goals of data sharing. A link to Jim Waldo’s video on the
Big Data Paradox is available at https://youtu.be/XwH52ryZO4s
CONFLICT OF INTEREST: Founder & CEO, Anonos Inc.
Publishing Replication Studies is the Second Step to Improving Scientific Rigor
Florence Lecraw, M.D. | Georgia State University
I commend editorial and research leaders in their effort to adopt the policy that researchers are required to make their de-identified data and codes available when possible as a prerequisite for publication. This is an advancement of the current practice by medical journals that "encourages" open data access. It is a difficult task that requires collaboration by many to change our culture. However, the next step may be even more difficult to adopt: publishing replication studies in medical journals. As found by Dr. Navarre and colleagues, only one replication paper in their data set was published. In my limited review, I have found only one individual replication study published in a medical journal. The author of the paper told me that the editor-in-chief at JAMA Int Med requested the researcher submit his replication study to their journal upon hearing about a methodological error in a paper published in a high impact medical journal when both attended a conference (1).

Replication studies are labor intensive. A researcher has no economic or noneconomic incentives to perform the labor-intensive work if the paper is not publishable. What are the obstacles preventing medical journals from publishing replication studies? Some medical editorial leaders have hypothesized that publishing replication papers would negatively affect their impact factor. One colleague of a high impact medical journal said their readership would not be interested in replication papers, so they do not publish them.

My question to editorial leaders, researchers, and our scientific community: How can we encourage medical journals to publish replication papers? I believe if more editors had the same commitment to correcting methodological errors as the editor of JAMA Int Med, it would not only improve the public good but would decrease methodological errors from recurring.


1. Brophy JM. Bayesian Interpretation of the EXCEL Trial and Other Randomized Clinical Trials of Left Main Coronary Artery Revascularization. JAMA Intern Med. doi:10.1001/jamainternmed.2020.1647