Key PointsQuestion
In randomized clinical trials (RCTs) of COVID-19 that report statistically significant results, what is the fragility index, ie, the minimum number of participants who would need to have had a different outcome for the RCT to lose statistical significance?
Findings
In this cross-sectional study of 47 RCTs with a total of 138 235 participants that had statistically significant results, the median fragility index was 4. That is, a median of 4 events was required to change the analysis findings from statistically significant to not significant.
Meaning
In this study, many RCTs for COVID-19 had a low fragility index, challenging confidence in the robustness of the results.
Importance
Interpreting results from randomized clinical trials (RCTs) for COVID-19, which have been published rapidly and in vast numbers, is challenging during a pandemic.
Objective
To evaluate the robustness of statistically significant findings from RCTs for COVID-19 using the fragility index.
Design, Setting, and Participants
This cross-sectional study included COVID-19 trial articles that randomly assigned patients 1:1 into 2 parallel groups and reported at least 1 binary outcome as significant in the abstract. A systematic search was conducted using PubMed to identify RCTs on COVID-19 published until August 7, 2021.
Exposures
Trial characteristics, such as type of intervention (treatment drug, vaccine, or others), number of outcome events, and sample size.
Main Outcomes and Measures
Fragility index.
Results
Of the 47 RCTs for COVID-19 included, 36 (77%) were studies of the effects of treatment drugs, 5 (11%) were studies of vaccines, and 6 (13%) were of other interventions. A total of 138 235 participants were included in these trials. The median (IQR) fragility index of the included trials was 4 (1-11). The medians (IQRs) of the fragility indexes of RCTs of treatment drugs, vaccines, and other interventions were 2.5 (1-6), 119 (61-139), and 4.5 (1-18), respectively. The fragility index among more than half of the studies was less than 1% of each sample size, although the fragility index as a proportion of events needing to change would be much higher.
Conclusions and Relevance
This cross-sectional study found a relatively small number of events (a median of 4) would be required to change the results of COVID-19 RCTs from statistically significant to not significant. These findings suggest that health care professionals and policy makers should not rely heavily on individual results of RCTs for COVID-19.
Since December 2019, the number of people with COVID-19 has surged worldwide.1 Information about this newly discovered infectious disease has been widely reported in both traditional and social media, resulting in global awareness of a previously unknown respiratory infection and increased public perception of risk. This emergency situation has pressured researchers to conduct randomized clinical trials (RCTs) immediately, at various study scales and of varied quality.2 Regardless of the scale and quality of RCTs, the results of each received attention from the general public and health care researchers, via different media, and people alternated between optimism and despair based on the individual findings of these trials.3
In particular, there is risk that the results depend on the number of outcome events, as designing a trial for an expected number of outcome events is unrealistic in an emergent situation. P values are likely to change if the number of events is small.4 Furthermore, P values can be affected by methodological limitations, such as loss to follow-up or inadequate blinding. However, there is still a strong reliance on P values for quick clinical decisions, despite several statements critiquing the superficial interpretation of P values.5,6
The fragility index is helpful in interpreting the robustness of results obtained from clinical trials.7 It outlines the minimum number of participants in a positive trial who would need to have had a different outcome for the results of the trial to lose statistical significance. A lower number on the fragility index indicates that the statistical significance of the trial depends on fewer events. For example, a score of 2 on this measure means that if 2 participants in the intervention group had different event outcomes, the RCT would not have a statistically significant result when using the conventional P value cutoff of less than .05 (Figure 1). Specifically, P values from studies with low fragility indexes should be carefully interpreted because they can change easily depending on the number of events. Thus, the fragility index can be an intuitive indicator for the careful interpretation of clinical trial findings conducted under emergency status. The aim of this study was to evaluate the robustness of statistically significant findings from RCTs for COVID-19 using the fragility index.
Study Design and Data Source
For this cross-sectional study, we systematically searched PubMed to identify articles reporting RCTs on COVID-19 until August 7, 2021, using the following search strategy: (COVID-19 OR COVID-19 [Medical Subject Heading (MeSH) Terms] OR COVID-19 Vaccines OR COVID-19 Vaccines [MeSH Terms] OR COVID-19 serotherapy OR COVID-19 serotherapy [Supplementary Concept] OR COVID-19 Nucleic Acid Testing OR covid-19 nucleic acid testing [MeSH Terms] OR COVID-19 Serological Testing OR covid-19 serological testing [MeSH Terms] OR COVID-19 Testing OR covid-19 testing [MeSH Terms] OR SARS-CoV-2 OR sars-cov-2 [MeSH Terms] OR Severe Acute Respiratory Syndrome Coronavirus 2 OR NCOV OR 2019 NCOV OR coronavirus [MeSH Terms] OR coronavirus OR COV) AND (randomized controlled trial [Publication Type] OR (randomized [Title/Abstract] AND controlled [Title/Abstract] AND trial [Title/Abstract])) AND (2019/11/01 [PDAT]: 3000/12/31 [PDAT]).
Per the Common Rule, this study did not require ethical approval because we analyzed only published results and did not include patients. We followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for cross-sectional studies.
After removing duplicate records from the initial search results, 2 pairs of reviewers (T.I. and K.K.; Y.I. and S.S.) screened the titles and abstracts of all identified articles in accordance with the following prespecified eligibility criteria. The inclusion criteria were RCTs that (1) were superiority trials, (2) randomly assigned patients 1:1 into 2 parallel groups, (3) reported at least 1 dichotomous or time-to-event outcome as statistically significant in the abstract, and (4) tested an intervention for COVID-19. Exclusion criteria were RCTs that were (1) not original articles, (2) preprint articles, (3) phase 1 or 2 trials, (4) noninferiority trials, (5) cluster or crossover RCTs, and (6) non-English articles.
The 4 reviewers independently extracted data from each trial in duplicate using a prespecified data collection form. Discrepancies were discussed in pairs; if not resolved, they were addressed by a third reviewer from the review team. We extracted the following data: type of intervention (treatment drug, vaccine, or others); outcome definitions (primary or secondary, time-to-event or not, composite or not); analytical strategy (adjusted confounders or not, intention to treat or not); allocation concealment (adequate or no/unclear); the number of participants lost to follow-up; the reported P value; the number of outcome events; the sample size; funding (nonprofit, profit, both, no funding, or not reported).
The primary outcome of this study was the fragility index. We calculated the fragility indexes in each RCT based on a previous report.7 Using 2 × 2 contingency tables, the fragility index was calculated by the iterative addition of an event to the experimental or control group with a smaller number of events and concomitant subtraction of a nonevent from that same group. We continued this calculation until statistical significance (defined as P < .05) was lost, while maintaining the total number of events and nonevents. P values were recalculated using a 2-sided Fisher exact test. In terms of time-to-event outcome, based on previous studies,7 we calculated the fragility index by the number of events and nonevents during the observation period, without considering censoring.
To summarize study characteristics, continuous variables are presented as medians with IQRs, and categorical variables are presented as counts with percentages. We plotted the fragility index as a histogram and described the fragility index by subgroups based on trial characteristics. All statistical analyses were performed using Stata version 16.1 (StataCorp).
We identified 1187 articles. After excluding duplicate articles and applying the exclusion criteria, 401 articles were deemed eligible for the full-text review. These articles were checked according to the eligibility criteria, and 47 articles, with 138 235 participants, were included in the study.8-54 At the full-text review stage, 73 articles were studies with binary outcomes but were excluded because they did not have statistically significant results. The detailed study selection flow is presented in Figure 2.
Table 1 summarizes the characteristics of the included studies. Of the 47 RCTs, 36 (77%) were studies of the effects of treatment drugs, 5 (11%) were vaccines, and 6 (13%) were other topics. The median (IQR) sample size was 111 (72-392) participants, with a median (IQR) of 44 (18-112) outcome events. Approximately half the trials were conducted based on nonprofit funding.
The Fragility Index in COVID-19 Trials
The median (IQR) fragility index for the 47 trials was 4 (1-11): a median of 4 events was required to change the analysis findings from statistically significant to not significant. Figure 3 shows the distribution of the fragility index for the included studies. We describe the fragility index by subgroups of trial characteristics in Table 2. The median (IQR) fragility indexes of RCTs in treatment drugs was 2.5 (1-6); in others it was 4.5 (1-18). In contrast, the median (IQR) fragility index of vaccine trials was 119 (61-139). In addition, among 26 trials (55%), the fragility index was 1% or less of the total sample size.
Our study found that the fragility index was 4 or less in 50% of binary outcomes from RCTs on COVID-19 reported in medical journals published until the beginning of August 2021. This result means that for half the COVID-19 trials, reversing the outcome status of 4 patients in the intervention group would change the result from statistically significant to not significant. In terms of types of interventions, most COVID-19 vaccine trials had a large fragility index, whereas most RCTs studying treatment drugs and other interventions had a very small fragility index. In addition, the fragility index among most of the studies was less than 1% of each sample size.
Our findings were consistent with those reported in various clinical fields surveyed before the pandemic, such as spine surgery,55,56 anesthesia and critical care,57-59 sports medicine and arthroscopic surgery,60 and nephrology.61 These previous studies reported a median fragility index of 2 to 5, which is similar to our results. In addition, consistent with that reported in previous studies, the fragility index appeared to be associated with the sample size and P values. In this study, the sample size of clinical trials examining vaccines was very large, and the fragility index was large in many of these studies. These RCTs of vaccines not only had large sample sizes, but also a high number of events. This result was consistent with those of previous studies that focused on clinical trials in 5 high-impact medical journals, such as JAMA and the New England Journal of Medicine,7 and in heart failure.62 These RCTs also had both large sample sizes and large numbers of outcome events.
We need to carefully interpret the results of COVID-19 trials with a small fragility index. A small fragility index means that the results may be less robust in terms of statistical significance; in other words, a change in the outcome occurrence for a small number of participants in an intervention group can easily change the study result. However, a small fragility index does not imply that the study is not trustworthy. Small RCTs with low fragility indexes may still prove useful if the aggregated or the individual patient data they provide can be combined on evidence synthesis platforms, such as the COVID-NMA project.63
Strengths and Limitations
Our study had several strengths. We used a systematic and rigid approach to identify all RCTs related to COVID-19. We systematically identified the articles using a predefined search strategy for all articles in PubMed, which is the most commonly used medical literature database. In addition, we included all eligible COVID-19 trials, regardless of publication period; this makes our findings relatively comprehensive for COVID-19 research and reflects the overall state of the evidence currently available.
This study also has limitations. First, the concept of the fragility index can only be applied to trials performing 1:1 randomization and reporting statistically significant findings for binary outcomes.7 Although many clinically relevant end points have binary outcomes, many articles in this study were excluded because they had more than 2 parallel arms (n = 41), no positive dichotomous outcome (n = 73), and only continuous variables (n = 55). Second, we included only articles written in English. This restriction may have led to selection bias, but as the leading studies on COVID-19 are often published in international journals that are PubMed-listed in English, it is unlikely to have caused major problems. Third, the current study did not assess the study quality and the study protocol of individual RCTs in detail and only focused on the fragility index. We only considered a few major aspects of study quality, such as intention-to-treat analysis and allocation concealment. A study with a large fragility index does not necessarily indicate a good study. A larger sample size is likely to result in a larger fragility index, but ethical considerations require that RCTs recruit the minimum number of participants necessary based on the findings of previous studies. The fragility index is only a metric to ascertain the robustness of clinical trials and should not be used alone to judge the merits of a study. Furthermore, there is no clear cutoff point for the fragility index.64 Although we have to pay attention to these limitations, the fragility index is an intuitive aid for interpreting RCT results because the simple metric is easy to interpret and may help allay complex concerns regarding smaller trials with fewer events that are difficult to understand intuitively.
In this study, we found that the statistically significant findings of many COVID-19 trials depended on few events. Therefore, health care professionals and policy makers should not rely heavily on individual results of RCTs on COVID-19. The fragility of RCT results should be considered before applying them to clinical settings. Nevertheless, small RCTs with low fragility indexes may still provide robust and useful findings using evidence synthesis platforms.
Accepted for Publication: January 29, 2022.
Published: March 18, 2022. doi:10.1001/jamanetworkopen.2022.2973
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Itaya T et al. JAMA Network Open.
Corresponding Author: Takahiro Itaya, RN, MPH, Department of Healthcare Epidemiology, Graduate School of Medicine and Public Health, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan (itaya-kyt@umin.ac.jp).
Author Contributions: Mr Itaya had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Itaya.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Itaya.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Itaya, Isobe.
Administrative, technical, or material support: Itaya, Isobe, Suzuki, Koike.
Supervision: Nishigaki, Yamamoto.
Conflict of Interest Disclosures: None reported.
8.Alessi
J, de Oliveira
GB, Franco
DW,
et al. Telehealth strategy to mitigate the negative psychological impact of the COVID-19 pandemic on type 2 diabetes: a randomized controlled trial.
Acta Diabetol. 2021;58(7):899-909. doi:
10.1007/s00592-021-01690-1PubMedGoogle ScholarCrossref 9.Aref
ZF, Bazeed
SEES, Hassan
MH,
et al. Clinical, biochemical and molecular evaluations of ivermectin mucoadhesive nanosuspension nasal spray in reducing upper respiratory symptoms of mild COVID-19.
Int J Nanomedicine. 2021;16:4063-4072. doi:
10.2147/IJN.S313093PubMedGoogle ScholarCrossref 11.Barnabas
RV, Brown
ER, Bershteyn
A,
et al; Hydroxychloroquine COVID-19 PEP Study Team. Hydroxychloroquine as postexposure prophylaxis to prevent severe acute respiratory syndrome coronavirus 2 infection: a randomized trial.
Ann Intern Med. 2021;174(3):344-352. doi:
10.7326/M20-6519PubMedGoogle ScholarCrossref 13.Cadegiani
FA, McCoy
J, Gustavo Wambier
C, Goren
A. Early antiandrogen therapy with dutasteride reduces viral shedding, inflammatory responses, and time-to-remission in males with COVID-19: a randomized, double-blind, placebo-controlled interventional trial (EAT-DUTA AndroCoV Trial—Biochemical).
Cureus. 2021;13(2):e13047. doi:
10.7759/cureus.13047PubMedGoogle Scholar 15.Cohen
MS, Nirula
A, Mulligan
MJ,
et al; BLAZE-2 Investigators. Effect of bamlanivimab vs placebo on incidence of COVID-19 among residents and staff of skilled nursing and assisted living facilities: a randomized clinical trial.
JAMA. 2021;326(1):46-55. doi:
10.1001/jama.2021.8828PubMedGoogle ScholarCrossref 16.Davoudi-Monfared
E, Rahmani
H, Khalili
H,
et al. A randomized clinical trial of the efficacy and safety of interferon β-1a in treatment of severe COVID-19.
Antimicrob Agents Chemother. 2020;64(9):e01061-e20. doi:
10.1128/AAC.01061-20PubMedGoogle ScholarCrossref 17.Deftereos
SG, Giannopoulos
G, Vrachatis
DA,
et al; GRECCO-19 investigators. Effect of colchicine vs standard care on cardiac and inflammatory biomarkers and clinical outcomes in patients hospitalized with coronavirus disease 2019: the GRECCO-19 randomized clinical trial.
JAMA Netw Open. 2020;3(6):e2013136. doi:
10.1001/jamanetworkopen.2020.13136PubMedGoogle Scholar 18.Dilogo
IH, Aditianingsih
D, Sugiarto
A,
et al. Umbilical cord mesenchymal stromal cells as critical COVID-19 adjuvant therapy: a randomized controlled trial.
Stem Cells Transl Med. 2021;10(9):1279-1287. doi:
10.1002/sctm.21-0046PubMedGoogle ScholarCrossref 19.Edalatifard
M, Akhtari
M, Salehi
M,
et al. Intravenous methylprednisolone pulse as a treatment for hospitalised severe COVID-19 patients: results from a randomised controlled clinical trial.
Eur Respir J. 2020;56(6):2002808. doi:
10.1183/13993003.02808-2020PubMedGoogle Scholar 20.Gonzalez-Ochoa
AJ, Raffetto
JD, Hernández
AG,
et al. Sulodexide in the treatment of patients with early stages of COVID-19: a randomized controlled trial.
Thromb Haemost. 2021;121(7):944-954. doi:
10.1055/a-1414-5216PubMedGoogle Scholar 21.Grieco
DL, Menga
LS, Cesarano
M,
et al; COVID-ICU Gemelli Study Group. Effect of helmet noninvasive ventilation vs high-flow nasal oxygen on days free of respiratory support in patients with COVID-19 and moderate to severe hypoxemic respiratory failure: the HENIVOT randomized clinical trial.
JAMA. 2021;325(17):1731-1743. doi:
10.1001/jama.2021.4682PubMedGoogle ScholarCrossref 23.Hu
K, Guan
WJ, Bi
Y,
et al. Efficacy and safety of Lianhuaqingwen capsules, a repurposed Chinese herb, in patients with coronavirus disease 2019: a multicenter, prospective, randomized controlled trial.
Phytomedicine. 2021;85:153242. doi:
10.1016/j.phymed.2020.153242PubMedGoogle Scholar 25.Kratzke
IM, Rosenbaum
ME, Cox
C, Ollila
DW, Kapadia
MR. Effect of clear vs standard covered masks on communication with patients during surgical clinic encounters: a randomized clinical trial.
JAMA Surg. 2021;156(4):372-378. doi:
10.1001/jamasurg.2021.0836PubMedGoogle ScholarCrossref 28.Li
L, Zhang
W, Hu
Y,
et al. Effect of convalescent plasma therapy on time to clinical improvement in patients with severe and life-threatening COVID-19: a randomized clinical trial.
JAMA. 2020;324(5):460-470. doi:
10.1001/jama.2020.10044PubMedGoogle ScholarCrossref 30.Lopes
MI, Bonjorno
LP, Giannini
MC,
et al. Beneficial effects of colchicine for moderate to severe COVID-19: a randomised, double-blinded, placebo-controlled clinical trial.
RMD Open. 2021;7(1):e001455. doi:
10.1136/rmdopen-2020-001455PubMedGoogle Scholar 31.Lopes
RD, de Barros E Silva
PGM, Furtado
RHM,
et al; ACTION Coalition COVID-19 Brazil IV Investigators. Therapeutic versus prophylactic anticoagulation for patients admitted to hospital with COVID-19 and elevated D-dimer concentration (ACTION): an open-label, multicentre, randomised, controlled trial.
Lancet. 2021;397(10291):2253-2263. doi:
10.1016/S0140-6736(21)01203-4PubMedGoogle ScholarCrossref 32.Luo
Z, Chen
W, Xiang
M,
et al. The preventive effect of Xuebijing injection against cytokine storm for severe patients with COVID-19: a prospective randomized controlled trial.
Eur J Integr Med. 2021;42:101305. doi:
10.1016/j.eujim.2021.101305PubMedGoogle Scholar 34.McCoy
J, Goren
A, Cadegiani
FA,
et al. Proxalutamide reduces the rate of hospitalization for COVID-19 male outpatients: a randomized double-blinded placebo-controlled trial.
Front Med (Lausanne). 2021;8:668698. doi:
10.3389/fmed.2021.668698PubMedGoogle Scholar 35.Mesri
M, Esmaeili Saber
SS, Godazi
M,
et al. The effects of combination of
Zingiber officinale and echinacea on alleviation of clinical symptoms and hospitalization rate of suspected COVID-19 outpatients: a randomized controlled trial.
J Complement Integr Med. 2021;18(4):775-781. doi:
10.1515/jcim-2020-0283PubMedGoogle ScholarCrossref 39.Ranjbar
K, Moghadami
M, Mirahmadizadeh
A,
et al. Methylprednisolone or dexamethasone, which one is superior corticosteroid in the treatment of hospitalized COVID-19 patients: a triple-blinded randomized controlled trial.
BMC Infect Dis. 2021;21(1):337. doi:
10.1186/s12879-021-06045-3PubMedGoogle ScholarCrossref 40.Ravikirti
RR, Roy
R, Pattadar
C,
et al. Evaluation of ivermectin as a potential treatment for mild to moderate COVID-19: a double-blind randomized placebo controlled trial in eastern India.
J Pharm Pharm Sci. 2021;24:343-350. doi:
10.18433/jpps32105PubMedGoogle ScholarCrossref 41.Réa-Neto
Á, Bernardelli
RS, Câmara
BMD, Reese
FB, Queiroga
MVO, Oliveira
MC. An open-label randomized controlled trial evaluating the efficacy of chloroquine/hydroxychloroquine in severe COVID-19 patients.
Sci Rep. 2021;11(1):9023. doi:
10.1038/s41598-021-88509-9PubMedGoogle ScholarCrossref 43.Roostaei Firozabad
A, Meybodi
ZA, Mousavinasab
SR,
et al. Efficacy and safety of levamisole treatment in clinical presentations of non-hospitalized patients with COVID-19: a double-blind, randomized, controlled trial.
BMC Infect Dis. 2021;21(1):297. doi:
10.1186/s12879-021-05983-2PubMedGoogle ScholarCrossref 44.Roozbeh
F, Saeedi
M, Alizadeh-Navaei
R,
et al. Sofosbuvir and daclatasvir for the treatment of COVID-19 outpatients: a double-blind, randomized controlled trial.
J Antimicrob Chemother. 2021;76(3):753-757. doi:
10.1093/jac/dkaa501PubMedGoogle ScholarCrossref 45.Rosén
J, von Oelreich
E, Fors
D,
et al; PROFLO Study Group. Awake prone positioning in patients with hypoxemic respiratory failure due to COVID-19: the PROFLO multicenter randomized clinical trial.
Crit Care. 2021;25(1):209. doi:
10.1186/s13054-021-03602-9PubMedGoogle ScholarCrossref 46.Sadeghipour
P, Talasaz
AH, Rashidi
F,
et al; INSPIRATION Investigators. Effect of intermediate-dose vs standard-dose prophylactic anticoagulation on thrombotic events, extracorporeal membrane oxygenation treatment, or mortality among patients with COVID-19 admitted to the intensive care unit: the INSPIRATION randomized clinical trial.
JAMA. 2021;325(16):1620-1630. doi:
10.1001/jama.2021.4152PubMedGoogle Scholar 50.Supady
A, Weber
E, Rieder
M,
et al. Cytokine adsorption in patients with severe COVID-19 pneumonia requiring extracorporeal membrane oxygenation (CYCOV): a single centre, open-label, randomised, controlled trial.
Lancet Respir Med. 2021;9(7):755-762. doi:
10.1016/S2213-2600(21)00177-6PubMedGoogle ScholarCrossref 51.Suppan
M, Abbas
M, Catho
G,
et al. Impact of a serious game (Escape COVID-19) on the intention to change COVID-19 control practices among employees of long-term care facilities: web-based randomized controlled trial.
J Med Internet Res. 2021;23(3):e27443. doi:
10.2196/27443PubMedGoogle Scholar 58.Mazzinari
G, Ball
L, Serpa Neto
A,
et al. The fragility of statistically significant findings in randomised controlled anaesthesiology trials: systematic review of the medical literature.
Br J Anaesth. 2018;120(5):935-941. doi:
10.1016/j.bja.2018.01.012PubMedGoogle ScholarCrossref 59.Grolleau
F, Collins
GS, Smarandache
A,
et al. The fragility and reliability of conclusions of anesthesia and critical care randomized trials with statistically significant findings: a systematic review.
Crit Care Med. 2019;47(3):456-462. doi:
10.1097/CCM.0000000000003527PubMedGoogle ScholarCrossref 62.Docherty
KF, Campbell
RT, Jhund
PS, Petrie
MC, McMurray
JJV. How robust are clinical trials in heart failure?
Eur Heart J. 2017;38(5):338-345.
PubMedGoogle Scholar 63.Boutron
I, Chaimani
A, Meerpohl
JJ,
et al; COVID-NMA Consortium. The COVID-NMA project: building an evidence ecosystem for the COVID-19 pandemic.
Ann Intern Med. 2020;173(12):1015-1017. doi:
10.7326/M20-5261PubMedGoogle ScholarCrossref