Sharing data produced from clinical trials has 2 principal purposes: verification of the original analysis and hypothesis generation. It has the potential to advance scientific discovery, improve clinical care, and increase knowledge gained from data collected in these trials. As such, data sharing has become an ethical and scientific imperative. The data sharing process has generated controversy,1- 5 for example, about which data should be shared, with whom, and how quickly. However, there is limited information to help guide the discussion.
In this issue of JAMA, Navar et al6 detail the experience of data sharing that has occurred in 3 clinical trial databases to which 14 pharmaceutical companies have contributed individual patient data from 3255 trials. Through 2015, 154 studies from 177 proposals that had been processed and met initial requirements had been approved. In a previous study in JAMA, Ebrahim and colleagues7 searched MEDLINE from inception through March 2014 and identified 37 reanalyses that were reported in 36 published articles. The authors concluded that 13 of the reanalyses led to different interpretations compared with the original article, but they did not specify whether the reanalyses fundamentally changed the implications of the trial findings with respect to either ongoing research or clinical care.
These 2 studies shed some light on what is known about data sharing. First, given that close to 500 000 clinical trials have been published in MEDLINE,8 the number of studies that represent reanalysis is very small. Second, the request for access to data to reanalyze as currently offered by some pharmaceutical companies is not common. It is quite possible that with time, reflecting increasing interest in reanalysis for any purpose, the number will increase.
The discussion around the need for data sharing has focused on several key issues. One concern is that the evidence base resulting from primary trial publications may be incomplete, and data are lost to the research and patient communities if they are not shared and made available for reanalysis. In support of this hypothesis is the important contribution that meta-analysis of published data has had in advancing clinical care. However, there is even greater potential for deriving new insights from meta-analysis of individual participant data, which requires sharing of data and collaboration among investigators. Routine sharing of clinical trial data would facilitate individual participant data meta-analyses, with a goal of achieving estimates of the effects of interventions studied in individual clinical trials that are more precise, accurate, and directly applicable to different types of patients—a form of precision medicine.9
Another important issue is that some researchers either intentionally or inadvertently do not always report important findings from their investigations. For instance, previous reports involving rofecoxib10 and oseltamivir11 illustrate the problem of incomplete or misleading reporting of clinical trial data. The advent and widespread implementation of trial registration and the detailed review of trial registration information, trial protocols, and statistical analysis plans in the evaluation of clinical trial manuscripts by editors, peer reviewers, and readers has likely reduced inaccurate reporting of clinical trials, but cannot eliminate the problem. By having all data available for reexamination and replication of analyses, data sharing may help ensure that the publications have fidelity to the trial plan. It may inhibit withholding important results, or at least allow them to be discoverable.
However, perhaps the most compelling reason in support of data sharing has received less attention: the responsibility to study participants. According to a recent position statement from the International Committee of Medical Journal Editors (ICMJE), “The ICMJE believes that there is an ethical obligation to responsibly share data generated by interventional clinical trials because participants have put themselves at risk.”12 Such risks may include not only major physical or psychological harms, but also more minor and common harms such as discomfort, inconvenience, and loss of work time. The informed consent process makes clear that participants in trials should not expect benefit to themselves as a result of their participation. The social contract for taking these risks and experiencing these harms imposes an ethical obligation that the results lead to the greatest possible benefit to society.
This contract is violated if the trial fails to provide useful information, for example, if it is underpowered and uninterpretable because inaccurate assumptions were made about the potential effect of the intervention, or if the trial is executed in a way that leads to extremely high dropout or introduces other sources of bias. An extension of this is that the contract is also violated if society is not able to make full use of the data generated by trial participants, or if the analyses are invalid or incomplete. Because it is unlikely that the original researchers can or will conduct all useful analyses, the data must be available to other researchers to continue to gain insights or to replicate the findings. The absence of data sharing therefore constitutes a failure of the obligation of researchers to the study participants, and therefore a failure of the ethical underpinnings of conducting clinical trials.
Sharing of the data that serve as the basis for the results reported in published clinical trials will happen in the coming years. Data sharing is increasingly being mandated by trial sponsors and has been supported by numerous influential groups, including the Institute of Medicine/National Academy of Medicine, European Medicines Agency, and ICMJE.12- 14 Journals, funders, investigators, and industry must find common ground to ensure data sharing occurs. However, it is far easier to call for data sharing than to create a system that protects the privacy of patients and is efficient, effective, and fair to the investigators who have collected the data. Three issues need to be addressed for a system to be successful.
First, as another ethical obligation to the individual study participants, the shared data must be deidentified for their protection. It is increasingly clear that effective data deidentification may be difficult to achieve.15 Regardless of whether an investigator believes data are deidentified, it is important that the actual success of deidentification be validated. Moreover, during the informed consent process at the time of enrollment into a clinical trial, patients must be made aware that their deidentified data may be shared, and they must have an opportunity to decline to participate in a clinical trial because of this specific mandate. It is possible that 2 levels of consent may be necessary, one for consent to participate in the current trial and to have data used by these researchers, and a second to have the data shared with others. However, it is also possible that some vulnerable groups of patients who have had concerns about participating in research because of past ethical lapses by investigators may be less willing to participate in research if they are asked to share their data. If so, this could lead to underrepresentation of marginalized populations in some studies, with consequent limitations in the ability to generalize the trial results.
Second, a system of sharing data must be efficient. Investigators who have conducted clinical trials are aware of the complexities of contemporary trial design and the underlying statistics and methods. Clinical trials often include detailed procedures and documentation (eg, a “play book”) about how each data element and variable are coded. Even with a detailed analytic plan, data interpretation may be necessary during the analytic phase. If another researcher not involved with the conduct of the initial trial tries to verify an analysis, who is responsible for ensuring that precisely the same procedures are followed, and who will provide the resources that this replication requires? If funders mandate sharing of data they should consider providing support for the original investigators to help others reanalyze the data. If support for data sharing is unavailable from the trial sponsors, it may be necessary for those requesting the data and who need assistance to provide support for the effort required by the original investigators.
Third, any system of data sharing must be fair to and respect the investment and contributions of the original trial investigators. Many clinical trials take years to conceive, conduct, and analyze. What is the most appropriate mechanism to ensure credit is given to those individuals who have participated in the original data collection? Should one or all of the authors of the original investigation be offered authorship in the reanalysis, or at least acknowledgment, or would this taint the credibility of the reanalysis? Should the original authors have no role in authorship but be expected to assist the individuals in the reanalysis if questions arise during this process? Should there be a fixed time after study conclusion or primary publication during which period there would be a moratorium on publications by investigators other than the original researchers? Should there be an independent and neutral group that determines the appropriateness of requests for the data?
In considering these issues, it may be important to differentiate between the use of shared data for verification of the original analyses and for novel hypothesis generation and analysis. Christakis and Zimmerman4 have suggested an approach to reanalyses focused on verification of findings of the original analyses: the methodological approach must be explicitly stated and justified in advance; the issue of financial conflict of interest and intellectual bias should be minimized in the reanalysis; differences in outcomes between analyses should be described in detail; and authors of the initial analysis should be able to review and comment on the reanalysis. Novel analyses should meet the same standards regarding a priori specification of hypotheses and methods, as well as conflicts of interest, but there may be less of a role for review and comment by the original authors.
In the past 2 decades, criteria for authorship, trial registration, and conflict of interest have been more clearly defined, and the reporting of scientific results has improved because of these changes. Data sharing has now emerged as the next critical advance in the conduct of clinical research. Despite the complexity of the task, and the concerns that this may generate among investigators, mandated data sharing of the results of clinical trials should and will occur in the future. However, greater understanding about the details surrounding data sharing is needed, including how it should be operationalized, and who and what systems are available to support it.
Effective implementation of data sharing will prove far more difficult than implementation of the requirement to register clinical trials. Conducting clinical trials is already expensive, so it will be important to create an effective, efficient, and fair system for data sharing that is not hugely burdensome and expensive to ensure it is worth the intellectual and financial cost; will achieve the goals of advancing scientific discovery, improving clinical care, and maximizing knowledge from clinical trial data; and above all, will fulfill the ethical obligations to participants in clinical trials.
Corresponding Author: Howard Bauchner, MD, JAMA (firstname.lastname@example.org).
Bauchner H, Golub RM, Fontanarosa PB. Data SharingAn Ethical and Scientific Imperative. JAMA. 2016;315(12):1238-1240. doi:10.1001/jama.2016.2420