[Skip to Navigation]
Sign In
July 22, 2020

Pooling Data From Individual Clinical Trials in the COVID-19 Era

Author Affiliations
  • 1Department of Population Health, New York University Grossman School of Medicine, New York, New York
  • 2Nathan Kline Institute for Psychiatric Research, Orangeburg, New York
  • 3Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
JAMA. 2020;324(6):543-545. doi:10.1001/jama.2020.13042

The rapid pace of the coronavirus disease 2019 (COVID-19) pandemic caused many research efforts to be initiated quickly. In some cases, nationally based platform trials have begun to report results.1 More frequently, however, randomized clinical trials (RCTs) were launched in local settings and in several cases missed the peak of the pandemic in their region. Now, some individual studies are at risk of failing to meet recruitment targets because of declining numbers of patients with COVID-19 who are being cared for at some participating sites.2 It may take several more COVID-19 surges to achieve full enrollment. Although the recent increase in COVID-19 cases reported in the US and several other countries offers the potential for enrollment in those regions, it is not certain that there will be sufficient number of centers ready with RCTs to address the pandemic in new hot spots. Launching RCTs in localities with currently increasing numbers of COVID-19 cases should be done; however, it is a time-consuming process and does not constitute a feasible short-term solution.

Because the collective goal of the research community is to reach conclusions about the efficacy of potential treatments as quickly as possible, in weeks or months rather than years, there is an imperative to find a solution to the mismatch between the location of enrolling sites and the regional incidence of new cases of COVID-19.

Combining information from multiple trials that were not originally configured as a network of sites is another potential approach.3 Such a pooling effort must be scientifically justified, prespecified, inferentially rigorous, and convincing to the medical community; it must also be valid to honor the participation and risk assumed by the cohorts of study participants who deserve the maximal opportunity to have their participation result in useful findings. This Viewpoint proposes a practical approach for real-time pooling of individual patient data from RCTs during a pandemic. Although the model could be extensible to any relevant set of trials and group of institutions, the ideas are illustrated with a specific example for estimating the therapeutic effects of convalescent plasma in hospitalized patients with COVID-19 by pooling data from several studies, each one of which is at risk of failing because of inability to recruit enough patients, similar to a recently reported RCT conducted in China.2

The perspectives of investigators that should be considered in pooling data include establishing rules for publication and dissemination of individual and pooled trial findings, funding and sponsorship, and ownership of data and other intellectual property considerations. The urgent nature of the pandemic and the shifting landscape of surges and hot spots should spur the expeditious creation of governance documents that establish rules for handling these issues. This proposal envisions starting with a data sharing agreement that is discussed and agreed to by all the participating study teams and their data and safety monitoring boards.

The biostatistical challenges are significant but not insurmountable. The analytic approach for pooling individual patient data from multiple studies must address variation across trials and provide a valid estimate of treatment efficacy. There will likely be differences in target populations, treatment protocols, control conditions, consent documents, and even outcome definitions. Regardless, combining individual patient data from different studies could enable more reliable inferences about the treatment effect estimate than would aggregate data from individual trials.4

Principles of the Pooling Proposal

With this proposal (Figure), data sharing agreements would be executed among participating investigators and a governance document established prior to data sharing. A minimal data set would have to be agreed on and individual patient data would be sent to a secure central repository in which studies, recruitment centers, and patients would be assigned unique IDs. Potentially, every 2 weeks thereafter, participating studies would transfer newly accumulated data to the repository. Simultaneously with preparing the data sharing agreements, a statistical analysis plan (SAP) would be finalized in a collaborative fashion among all participating study teams. The plan would include guidelines for continuous monitoring using preestablished stopping rules for safety, efficacy, futility, and harm; this SAP for monitoring would be adhered to by independent unblinded biostatisticians.

Figure.  Schema of the Proposal for Data Pooling as Planned for Studies of Convalescent Plasma in Hospitalized Patients With COVID-19
Schema of the Proposal for Data Pooling as Planned for Studies of Convalescent Plasma in Hospitalized Patients With COVID-19

The data sharing agreement would govern publications and other aspects of the pooling effort. A secure central repository for the pooled data would be established, with continuous updating with new data at 2-week intervals. Unblinded biostatisticians would conduct the interim analyses and report to a collective DSMB. When evidence with a high degree of confidence emerges, the DSMB would make a joint recommendation to the leadership of all trials. RCT indicates randomized clinical trial.

A collective data and safety monitoring board (DSMB), representing all participating studies, would meet biweekly to review results and make collective recommendations. When evidence with a high degree of confidence about the efficacy (or lack thereof) and safety of the treatment has been accumulated, the DSMB would make a recommendation to terminate enrollment in participating trials. This is critically important in a pandemic so that effective treatments can be quickly identified and ineffective or harmful treatments abandoned. The prespecified agreed-upon stopping rules will ensure the integrity of the pooling effort and the acceptance of the inferences by the medical community.

To provide details about the individual studies, related to the methodological approach, participating RCT teams would prepare reports describing the specific details of their trials, which could be submitted as appendices to the main article that reports the overall results of the pooling project. Study investigators would review the DSMB recommendation and determine whether to suspend further enrollment in their specific trial. The data repository would be made available to participating investigators for additional analyses that would be approved by a publications committee.

Minimal Data Set

The consortium of participating investigators would identify a minimal data set, representing information that is collected in common across the trials.5 The minimal data set would contain study-level information, such as whether the study was blinded, types of control treatment, and recruitment sites; patient baseline characteristics, such as medical history and concomitant medications; test-agent-related adverse events; and the World Health Organization (WHO) ordinal 11-point scale6 at 2 and 4 weeks postrandomization.

Statistical Strategy for Pooling the Data

An innovative biostatistical approach will be needed for analysis of the pooled individual patient data and for frequent monitoring of the accumulating data. Continuous monitoring, with use of bayesian stopping rules that allow real-time decisions without the penalties for multiple data looks and alpha spending associated with the classic RCT monitoring approach, is an attractive and efficient potential approach.7,8 At each interim analysis, the posterior distribution of the parameter describing the treatment effect could be reported and the prespecified stopping criteria could guide the recommendations of the collective DSMB. The bayesian monitoring approach enables straightforward, actionable rules for efficacy, futility, harm, and safety, all of which can incorporate information accrued across all studies. The process involves estimation of the posterior probability of a favorable or unfavorable outcome (expressed as an odds ratio or risk ratio) and the stopping rules would be based on the posterior probability that the odds ratio exceeds a prespecified threshold.

Because many factors influence outcomes, a statistical model that accounts for these factors is essential. The model should be generalizable to a range of treatments for COVID-19 and a range of possible control groups. The primary outcome could be the WHO scale, supplemented by all-cause mortality at 2 and 4 weeks posttreatment initiation as the secondary end point. For the WHO ordinal outcome scale, the cumulative logit model9 could be used with bayesian hierarchic modeling techniques10 that specify the full prior distributions of the overall treatment effect as well as study-specific treatment effects. In the case of convalescent plasma trials, multiple controls are possible (standard of care, nonconvalescent plasma, or saline). To allow a global comparison of the convalescent plasma treatment vs any control, the model would use convalescent plasma as the reference treatment, impose priors on the different control treatments and a hyper-prior over all control priors, thus estimating a common control effect. Emerging evidence about COVID-19 disease progression and treatment effects suggests that several patient characteristics might be important, including clinical status at baseline, age, sex, and other factors that would be captured in the proposed minimal data set. Those factors would be included in the SAP, along with bayesian stopping rules. Throughout, care should be taken regarding the possibility of missing data, whether caused by unintentional missingness or because a variable in the minimal data set was not collected in all studies. More details about the proposal, including the minimal data set, algorithm for converting the WHO scales, and the models, are available on a website prepared by the authors.5

As the COVID-19 pandemic continues to exact its toll on the world, collaborative efforts are needed to provide timely data to respond to current clinical issues and public health problems. Models such as this proposal for prospective pooling of individual patient data from ongoing individual clinical trials, and lessons learned from it, are likely to inform the collective response to future health crises.

Back to top
Article Information

Corresponding Author: Elliott M. Antman, MD, Cardiovascular Division, Brigham and Women’s Hospital, 75 Francis St, Boston, MA 02115 (eantman@rics.bwh.harvard.edu).

Published Online: July 22, 2020. doi:10.1001/jama.2020.13042

Conflict of Interest Disclosures: Dr Petkova reports serving as statistician on the data and safety monitoring board (DSMB) of COVID-19 randomized clinical trials coordinated by NYU Langone Health. Dr Antman reports serving as chair of the DSMB for trials of therapies for COVID-19 that are being coordinated by NYU Langone Health. Dr Troxel reports serving as biostatistician for trials of therapies for COVID-19 that are being coordinated by NYU Langone Health.

Additional Contributions: We acknowledge the substantive contributions of Keith Goldfeld, DrPH, Mengling Liu, PhD, Arthur Caplan, PhD, and Judith Hochman, MD, all from NYU Grossman School of Medicine; and David DeMets, PhD, University of Wisconsin School of Medicine and Public Health. None of those acknowledged received any compensation for their contributions.

The Randomised Evaluation of COVID-19 Therapy (RECOVERY). Accessed June 25, 2020. https://www.recoverytrial.net
Li  L, Zhang  W, Hu  Y,  et al.  Effect of convalescent plasma therapy on time to clinical improvement in patients with severe and life-threatening COVID-19: a randomized clinical trial.   JAMA. Published online June 3, 2020. doi:10.1001/jama.2020.10044PubMedGoogle Scholar
Bauchner  H, Fontanarosa  PB.  Randomized clinical trials and COVID-19: managing expectations.   JAMA. 2020;323(22):2262-2263. doi:10.1001/jama.2020.8115PubMedGoogle ScholarCrossref
Tierney  JF, Fisher  DJ, Burdett  S, Stewart  LA, Parmar  MKB.  Comparison of aggregate and individual participant data approaches to meta-analysis of randomised trials: an observational study.   PLoS Med. 2020;17(1):e1003019. doi:10.1371/journal.pmed.1003019PubMedGoogle Scholar
NYU Langone Medical Center. International convalescent plasma for COVID-19 hospitalized patients pooling project: statistical modeling proposal. Published July 22, 2020. http://nyulmc.org/compile
Marshal  JC; WHO Working Group on the Clinical Characterisation and Management of COVID-19 Infection.  A minimal common outcome measure set for COVID-19 clinical research.   Lancet Infect Dis. 2020;S1473-3099(20)30483-7. doi:10.1016/S1473-3099(20)30483-7PubMedGoogle Scholar
Lewis  RJ, Angus  DC.  Time for clinicians to embrace their inner bayesian? reanalysis of results of a clinical trial of extracorporeal membrane oxygenation.   JAMA. 2018;320(21):2208-2210. doi:10.1001/jama.2018.16916PubMedGoogle ScholarCrossref
Saville  BR, Connor  JT, Ayers  GD, Alvarez  J.  The utility of bayesian predictive probabilities for interim monitoring of clinical trials.   Clin Trials. 2014;11(4):485-493. doi:10.1177/1740774514531352PubMedGoogle ScholarCrossref
Agresti  A.  Categorical Data Analysis. 2nd ed. John Wiley & Sons; 2002. doi:10.1002/0471249688
Gelman  R, Carlin  JB, Stern  HS, Dunson  DB, Vehtari  A, Rubin  DB.  Bayesian Data Analysis. 3rd ed. Taylor & Francis; 2013. doi:10.1201/b16018