Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

Key Points Question What are the rates of declared and actual sharing of clinical trial data after the medical journals’ implementation of the International Committee of Medical Journal Editors data sharing statement requirement? Findings In this cross-sectional study of 487 clinical trials published in JAMA, Lancet, and New England Journal of Medicine, 334 articles (68.6%) declared data sharing. Only 2 (0.6%) individual-participant data sets were actually deidentified and publicly available on a journal website, and among the 89 articles declaring that individual-participant data would be stored in secure repositories, data from only 17 articles were found in the respective repositories as of April 10, 2020. Meaning These findings suggest that there is a wide gap between declared and actual sharing of clinical trial data.


Introduction
Responsible sharing of individual-participant data (IPD) from clinical studies has gained increasing traction and has been advocated for many years by many scientists and scientific leadership organizations. 1 However, promoting data sharing from clinical trials has not been straightforward, and there has been much debate surrounding privacy risks and the optimal incentives for clinical trialists and sponsors. [2][3][4][5][6] Recently, the International Committee of Medical Journal Editors (ICMJE) implemented a clinical trial data sharing policy. The policy does not mandate 7,8 data sharing but requires a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. [9][10][11] Prior work has identified a range of potential risks preventing trialists from sharing IPD. 3,12,13 These risks include protection of patient privacy and confidentiality, 4,13 inappropriate data reuse and replication, 12 and researchers' and sponsors' potential losses of secondary publications and product advantage, respectively, because of the use of the shared data by competitors. 3,5,14 Repositories for clinical data from industry-funded [15][16][17] and publicly funded 18 trials have provided a safeguarded mechanism for responsible IPD sharing, thereby substantially minimizing patient privacy and confidentiality risks. However, perceived risks of inappropriate reuse and competition have been difficult to mitigate, especially when the current reward system for researchers predominantly incentivizes high-impact publications, often based on exclusive data, at the expense of transparency, reproducibility, and data reuse. [19][20][21][22] Disincentives for data sharing are known to have a disproportionate impact on clinical studies because the process of conducting those studies is time, cost, and labor intensive. 3 Yet the role of prevalent disincentives and incentives (eg, data authorship 23,24 ) for clinical trial data sharing have only recently entered the public realm, 3,23,25,26 in part accelerated by discussions surrounding the ICMJE's data sharing policy 11,27 when many points of agreement and disagreement among stakeholders were articulated. 3,5,6,27,28 Many data repositories have been established to facilitate secure sharing of IPD from clinical trials. 15,17,18,[29][30][31][32] Some industry sponsors, such as GlaxoSmithKline and Johnson & Johnson, have initiated their own data sharing repositories and partnerships with ClinicalStudyDataRequest.com (CSDR) 17  clinical trials registered on ClinicalTrials.gov between January 2016 and August 2017 found that NIH-funded trials were more likely to indicate data sharing intentions than industry-funded trials.
The ICMJE policy requires investigators to state whether they will share data (or not) while simultaneously providing an opportunity for them to place multiple restrictions and conditions regarding data access. Specifically, the DSS provides an opportunity for authors and sponsors to specify periods of data exclusivity or embargo. In addition, authors can specify in the DSS how the data will be made available, reasons for data availability or unavailability, and related preferences (for examples of DSSs, see eAppendix 1 in the Supplement). Thus, the DSSs, required by the ICMJE's policy, provide a window into data sharing norms, practices, and perceived risks among trialists and sponsors. We set out to evaluate how the ICMJE's data sharing policy has been implemented in 3 leading medical journals that are also member journals of ICMJE: JAMA, 9 Lancet, 10 and New England Journal of Medicine (NEJM). 11

Methods
Because this study used publicly available data and did not involve human participants, institutional review board approval and informed consent were not sought, in accordance with 45   Drawing on prior research reporting differences in intention to share clinical trial data between industry-funded and nonindustry (including NIH)-funded clinical trials, 37 we classified funding sources as industry, nonindustry NIH, nonindustry non-NIH, and mixed. Industry refers to research funding from companies. Nonindustry NIH refers to research funding from the US NIH. Nonindustry non-NIH refers to research funding from foundations, trusts, associations, national institutes outside the US, and so forth. Mixed refers to any combination of the other research-funding categories.

Statistical Analysis
We conducted a descriptive analysis of variables associated with data sharing by type of funding and publication journal. For the primary outcome variables, declared and actual data sharing, we report the 95% CIs determined by bootstrapping (100 000 iterations). The χ 2 test was used for comparing differences in prevalence of declared data sharing between types of funding. The 2-tailed Fisher exact test was used for comparing differences in prevalence of data availability in repositories between types of funding. P < .05 was considered significant. To perform data analysis and to generate summary statistics and graphs, we used the Python programming language version 3.8.3 (Python Software Foundation), a Jupyter Notebook, 39 and the following libraries: SciPy, 40 Pandas, 41 NumPy, 42 Matplotlib, 43 Scikits-Bootstrap, 44 and Seaborn. 45 For the statistical tests, we used the Python package Statsmodels and R statistical software version 4.0.2 (R Project for Statistical Computing).
The ranking of funders regarding declared data sharing is largely consistent across the 3 journals-NIH, nonindustry non-NIH, mixed, and industry-although the 95% CIs overlap ( Figure 1B). No substantial changes in the prevalence of declared data sharing were observed over the span of the first 7 quarters of policy implementation ( Figure 1C).
The presence of multiple articles from the same clinical trials would violate the assumption of independence and could also introduce social dependencies, 46 such as clustering by authors, funder, or institution. To address this issue, we identified 12 clusters of multiple publications for the same trial. Each cluster contained 2 or (in 1 instance) 3 articles that had the same declared data sharing and funding source and were associated with common authors and institutions. We treated each cluster as a single article observation (474 articles) and recomputed our results about declared data sharing by funding as a way of assessing the impact of clustering on our results. Results were qualitatively similar: 326 of 474 articles declared data sharing (68.8%; 95% CI, 64%-73%), and differences in declared data sharing by funding sources were similar to the ones in the entire data set: NIH (88.4%; 95% CI, 79%-98%), nonindustry non-NIH (73.5%; 95% CI, 67%-80%), mixed (65.4%; 95% CI, 55%-76%), and industry (61.2%; 95% CI, 54%-68%). Owing to the small effect of clustering of articles from the same trial, the subsequent analysis uses the entire data set of 487 articles. and Chugai (1 study). All of those industry sponsors are listed as current members of CSDR; industry sponsors that declared in their DSS that they would deposit data in CSDR but were not listed as members of the registry were excluded from this analysis. The rate of declared data sharing for all industry members of CSDR (56.8%; 95% CI, 38%-70%) was similar to the rate we established for all industry-funded trials (61.3%; 95% CI, 54%-68%). We could, therefore, exclude the possibility that the lowest rate of declared data sharing of industry funders is due to differences between companies that are members of data repositories and companies that are not members of data repositories.   Data repositories have a central role in improving sharing, security, discoverability, and reuse of research data, 47,48 particularly IPD from clinical trials. 29,36,49 Among the 89 articles proposing to make IPD available through repositories, many planned to store data in general-purpose repositories, including the CSDR (31 articles), the YODA Project (7 articles), and Vivli (7 articles). Another 30 articles planned to store IPD in NIH-supported, domain-specific data repositories, such as the NCTN/

NCORP Data Archive (10 articles), the NHLBI Biologic Specimen and Data Repository Information
Coordinating Center (9 articles), and the National Institute of Child Health and Human Development Data and Specimen Hub (5 articles) (Figure 2).
We compared declared with actual data availability in repositories ( Table 2) Most trials provided neither information nor data in the respective repositories, mostly because of embargo and pending regulatory approval. Specifically, among the 72 articles that declared their intent but did not store data on repository, 37 (51%) made data access conditional on embargo or product approval.

Discussion
Most trials published in JAMA, Lancet, and NEJM after the endorsement of the ICMJE policy declared their intent to make clinical data available. Non-industry-funded trials communicated greater intent to share data than industry-funded trials, which exhibited low declared data sharing rates even among industry funders that are members of the largest data sharing repository, the CSDR. This result is consistent with prior research on intention to share data at the ClinicalTrials.gov 37 but departs from a trend among industry sponsors to establish mechanisms and repositories for sharing of clinical trial data. 15,17,30,31,50 The commitment to data sharing substantially decreases when we consider indicators of actual vs declared data sharing: of 334 articles declaring that they would share data, only 2 IPD sets (0.6%; 95% CI, 0.0%-1.5%) were actually deidentified and publicly available on the journal website. Among the 89 articles declaring that they would store IPD in repositories, data from only 17 articles were found in the respective repository ( Figure 3).
Although it is encouraging that data sharing appears widespread as a research norm 51 among trialists and that repositories for secure data sharing are often considered in DSSs, the low rate of actual data availability we identified is concerning. Some data sets that are currently unavailable may simply require additional time to be released, particularly those that are associated with embargo periods, but for many unavailable data sets, actionable information about availability is lacking. This points to the need for detailed requirements that encourage authors to engage with tangible and

Limitations
Our study has limitations that should be acknowledged. First, only 3 journals were considered.
Moreover, we could readily investigate declared vs actual data sharing practices only for repositories.
Furthermore, only 2 IPD sets were deidentified and available on journal websites, so we could not meaningfully examine the usability of shared data or reproducibility 8 of the clinical trial studies. As more IPD sets become available, it would be interesting to assess whether they are easy to use, and how complete is the information being provided. More broadly, a repeated evaluation of data sharing intentions and practices could be valuable, particularly in the context of the ongoing clinical trial research response to the coronavirus disease 2019 pandemic that may have changed some norms and practices of clinical research. [56][57][58][59] In addition, we determined that repeated articles from the same clinical trial have very small effect on our estimates but did not examine clustering by authors and institutions that could occur beyond the level of single trials.

Conclusions
To promote transparency and data reuse, journals and funders should work toward incentivizing data sharing via funding mechanisms 21 and data authorship, 23 and simultaneously discourage ambivalent wording in DSSs and possibly mandate data sharing. They can promote the use of unique pointers to data set location in repositories and to data request forms. Standardized choices for embargo periods, access requirements, and conditions for data use as part of the data sharing process could also reduce unnecessary data withholding and turn declarative data sharing into actual transparency in clinical trial data.