Key PointsQuestion
How stable are the conclusions of phase 3 randomized clinical trials of immune checkpoint inhibitors in oncology?
Findings
This cross-sectional study of 45 randomized clinical trials calculated the survival-inferred fragility index and found that many oncologic trials assessing immune checkpoint inhibitors have a low survival-inferred fragility index, often less than a small fraction of the sample size and less than the number of patients censored soon after randomization.
Meaning
These results challenge the robustness of many phase 3 randomized clinical trials of immune checkpoint inhibitors in oncology and address the uncertainty regarding their potential clinical benefit.
Importance
In science and medical research, extreme and dichotomous conclusions may be drawn based on whether the P value falls above or below the threshold. The fragility index (ie, the minimum number of changes from nonevents to events resulting in loss of statistical significance) captures the vulnerability of statistics in trials with binary outcomes. There are a growing number of clinical trials of immune checkpoint inhibitors (ICIs), as well as expanding eligibility for patients to receive them. The robustness of survival outcomes in randomized clinical trials (RCTs) should be evaluated using the fragility index extended to time-to-event data.
Objective
To calculate the fragility of survival data in RCTs evaluating ICIs.
Design, Setting, and Participants
In this cross-sectional study, data on phase 3 prospective RCTs investigating ICIs included in PubMed from inception until January 1, 2020, were extracted. Two- or three-group studies reporting results for overall survival were eligible for the survival-inferred fragility index (SIFI) calculation, which is the minimum number of reassignments of the best survivors from the interventional group to the control group resulting in loss of significance (defined as P < .05 by log-rank test). For nonsignificant results, a negative SIFI was calculated by reversing the direction of reassignment (from the control group to the interventional group).
Main Outcomes and Measures
Survival-inferred fragility index.
Results
A total of 45 phase 3 prospective RCTs (4 of which had 3 groups, for a total of 49 groups) were identified, of which 6 (13%) investigated anti–cytotoxic T-lymphocyte–associated protein 4 (CTLA-4) agents, 25 (56%) investigated anti–programmed cell death 1 (PD-1) agents, 12 (27%) investigated anti–programmed cell death 1 ligand 1 agents, and 3 (7%) investigated the combination of anti–CTLA-4 and anti–PD-1 agents. The median SIFI was 5 (interquartile range, –4 to 12) for the intention-to-treat analysis; for these trials, the SIFI was 1% or less of the total sample size in 17 of 49 populations (35%). In 25 of the 49 intention-to-treat populations (51%), the SIFI was less than the number of censored patients in the intervention group shortly after randomization (defined as <5% of the follow-up time).
Conclusions and Relevance
This study suggests that many phase 3 RCTs evaluating ICI therapies have a low SIFI for overall survival, resulting in uncertainty regarding their potential clinical benefit. Although not a definitive solution for the problems arising from dichotomization, SIFI provides an additional means of assessing and communicating the strength of statistical conclusions.
Immune checkpoint inhibitors (ICIs) targeting cytotoxic T-lymphocyte–associated protein 4 (CTLA-4) or programmed cell death 1 (PD-1) and programmed cell death 1 ligand 1 (PD-L1) have revolutionized cancer treatment and led to their approval as first-line therapies, either alone or in combination with chemotherapy, for many solid tumors and hematologic malignant neoplasms.1 However, the clinical benefit associated with ICIs cannot be generalized into a single category, as the therapeutic effectiveness varies widely across different cancer indications.2-7 The number of active clinical trials of ICIs is growing rapidly, along with an increased pace of accelerated approvals by the US Food and Drug Administration (FDA).8,9 The eligibility criteria for ICI therapy are dynamic, and results of postmarketing studies often lead to label revisions, with more changes expected to follow.10 Despite the popularity of ICIs and the expanding eligibility for expensive and potentially toxic treatments, the percentage of eligible patients who benefit from ICIs is decreasing.10,11 This gap between ICI eligibility and clinical benefit is concerning and is not fully understood.
Since the introduction of the P value almost a century ago, reliance on a fixed cutoff serving as the gatekeeper for establishing significance in clinical trials has caused controversy.12,13 Statistically significant differences in outcomes using an arbitrary threshold (P < .05) may not be clinically relevant, especially when the estimated outcome does not offer substantial clinical benefit.14,15 The fragility of statistical inference can be signified by the ease with which a significant P value (P < .05) crosses over the significance threshold (P > .05).16,17 Johnson et al18 introduced a method to compute the fragility for survival analysis by iteratively adding artificial patients to the experimental group with events at the mean exposure time of all individuals until significance is lost. Using this method, one study has recently shown that the fragility index of time-to-event data can be used to estimate the level of confidence of positive results reported in randomized clinical trials (RCTs) leading to FDA approval of anticancer drugs.19 However, this approach that simulates average “virtual” patients might inflate the fragility estimate as patients at the extreme, who contribute the most to the survival curves, are disregarded. Many possible ways could be formulated to estimate the fragility of survival data. Therefore, we aimed to define a simple and intuitive fragility measure for survival analysis, based on real-life conditions, that captures the vulnerability of the data. Hence, we define the survival-inferred fragility index (SIFI) as the minimum number of reassignments of the best survivors (defined as the patients with the longest follow-up time, regardless of having an event or being censored; the worst survivors were defined as the patients with the earliest events) from the experimental group to the control group resulting in loss of significance (Figure 1). The purpose of this study is to evaluate the fragility of phase 3 RCTs comparing ICIs with control or standard treatments in a time-aware context.
The cross-sectional study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.20 We searched PubMed from inception until January 1, 2020, for phase 3 RCTs of ICIs (anti–CTLA-4, anti–PD-1, and anti–PD-L1) compared with standard treatment in solid and hematologic malignant neoplasms. Key words for the literature search included randomised, randomized, phase 3, phase III, ipilimumab, nivolumab, pembrolizumab, cemiplimab, durvalumab, avelumab, and atezolizumab. For the fragility analysis, we included 2- or 3-group studies that reported overall survival as a primary or secondary outcome. We excluded retrospective studies, pooled studies, and post hoc subgroup analyses. When duplicate publications for the same trial were identified, we included the most updated publication. We abstracted information on trial design and the number of enrolled patients in the study. According to institutional review board policy, ethical approval is not required because no human data were included and publicly available information was used.
Overall survival data from 45 trials were extracted from Kaplan-Meier curves in the main text using DigitizeIt software (DigitizeIt) and the method by Wei and Royston21 using Stata, version 13.0 (StataCorp). This reverse-engineering strategy enabled us to reproduce survival time and censoring status at the individual patient level with minor differences between reconstructed and published data.19 We excluded publications of trials with raster images in which data extraction could not be performed directly. We separated the populations into 2 cohorts—the intention-to-treat (ITT) populations, which also included modified ITT populations, and subgroup populations.
The SIFI was calculated from Kaplan-Meier curves by the iterative redesignation of the best survivors from the experimental group to the control group until positive significance (defined as P < .05 obtained with a 2-sided log-rank test) was lost. Negative SIFI was calculated similarly, but the direction was opposite—redesignation of the best survivors from the control group to the experimental group. In addition to the default SIFI application (flipping the best survivor from the intervention group to the control group), we defined 3 alternative approaches: flipping the worst survivor from the experimental group to the control group, cloning the best survivor in the experimental group into the control group, and cloning the worst survivor in the control group to the experimental group. P values were calculated with the 2-sided unstratified log-rank test. The follow-up time distribution was calculated using the prodlim package in R (R Foundation for Statistical Computing). All other analyses were performed in R, version 3.5.0. The code used to calculate SIFI is available online.22
To provide a reference for the ranges of SIFI for various parameters of survival data, we generated synthetic survival data with the survsim package in R.23 The “simple.surv.sim” function was used with the Weibull distribution for both the time to event and the time to censoring. The cohort size was set to range from 100 to 1200 individuals in intervals of 100 (with a 1:1 allocation). The ancillary parameter for the events was set to 1.5, and the ancillary parameter for the censoring was set to 2, 4, 6, 8, or 10. The covariate for the effect size was set to all values between −1 and 0.2 in increments of 0.05. The β0 parameter for the event distribution was set to 2.0, and the β0 for the censoring distribution was set to 2.01.
For the period until January 1, 2020, we identified 45 phase 3 RCTs (4 of which had 3 groups, for a total of 49 groups)2-7,24-62 evaluating ICI therapies that met the inclusion criteria for survival fragility analysis. All except 2 multiple myeloma trials (4%)2,47 investigated solid tumors. Six trials (13%) investigated an anti–CTLA-4 agent (ipilimumab),6,24-28 25 trials (56%) investigated anti–PD-1 agents (nivolumab and pembrolizumab),2,3,29-51 12 trials (27%) investigated anti–PD-L1 agents (atezolizumab, avelumab, and durvalumab),5,7,52-61 and 3 trials (7%) investigated the combination of anti–CTLA-4 and anti–PD-1 agents (ipilimumab and nivolumab).4,36,62 We could not calculate the SIFI for 2 trials (CA184-002 and CA184-043)63,64 because of an incompatible graphical format of the Kaplan-Meier plots. The median sample size for the eligible trials was 559 (interquartile range [IQR], 418-727). The SIFI was calculated for an additional 36 subgroups (eg, PD-L1, ≥1%) in 15 trials with a median sample size of 362 (IQR, 217-486).4,7,28,31,36,37,41,46,51-53,56,57,59,62
Thirty-four of the 49 reconstructed overall survival curves in the ITT population (69%), which includes the modified ITT population, and 26 of the 36 subgroup populations (72%) were significant (P < .05) (Table 1).2-7,24-62 The median SIFI for ITT populations was 5 (IQR, –4 to 12) (ie, a median of 5 patients [among best survivors] reassigned to the control group was required to shift the results from significant to nonsignificant). The median SIFI for subgroup populations was 3.5 (IQR, 1-6.3) (eTable in the Supplement). In comparison, the fragility estimate for survival data by Johnson et al18 is unable to estimate fragility for nonsignificant results (negative fragility) and depicts higher values, with a median of 29 (IQR, 0-51) for the ITT populations and 29 (IQR, 0-43) for the subgroup populations. The absolute SIFI was less than 1% of the sample size in 17 (35%) of the 49 ITT populations and 10 (28%) of the 36 subgroup populations. Furthermore, in 25 (51%) of the 49 ITT populations and 16 (44%) of the 36 subgroup populations, the SIFI was less than the number of patients censored in the interventional group during only the first ventile (1/20th) of the follow-up time (eFigure 1 in the Supplement).
A comparison between positive SIFI levels in different tumor types among ITT populations (Figure 2) showed that non–small cell lung carcinoma, renal cell carcinoma, and melanoma had the highest values and that hepatocellular carcinoma, head and neck squamous cell carcinoma, and small cell lung carcinoma had the lowest values. Examining the association between SIFI and P values (in logarithmic scale) revealed a high correlation in ITT populations (R = 0.70; P < 1 × 10−7) and subgroup populations (R = 0.82; P < 1 × 10−9). However, the level of SIFI was not explained entirely by the variation in P values. For example, despite having relatively similar P values, hazard ratios, and sample sizes, the SIFI was 2-fold higher in KEYNOTE-02440 compared with IMpower133,55 and in ATTRACTION-233 compared with CheckMate 06736 monotherapy (Table 1,2-7,24-62 Figure 3), indicating higher robustness. These examples demonstrate that statistical significance depends on the distribution of the longest-surviving patients, with more fragile studies relying on fewer patients to drive the significance, compared with less fragile studies that are associated with a higher “reserve” of patients. Similar associations between SIFI as a proportion of the population and P values are shown in eFigure 2 in the Supplement. To explore the potential association of longer follow-up periods with the SIFI, we identified trials that published overall survival results for earlier follow-up periods. We found that the SIFI is stable and displays only a small variation for trials at different follow-up periods (Table 2),3,4,24,36,37,45,66-70 including studies with median follow-up time more than twice as long as in the original publication. Furthermore, we explored the operating characteristics of the SIFI, including sample size, censoring rate, and effect size (eFigures 3-5 in the Supplement). Performing simulations using combinations of the parameters resulted in 15 000 synthetic time-to-event data sets. Hazard ratios ranged from 0.13 to 1.95, and the percentage of individuals censored ranged from 17.5% to 50%. The simulated results provide a reference for the ranges of the SIFI for the various parameters of survival data.
The fragility for survival data can be calculated in various ways. Overall, we calculated 4 versions of SIFI, which include reassigning patients (flip) or adding patients (clone) to the opposite group using the best survivors from the experimental group or worst survivors from the control group. A comparison of the different SIFI approaches is shown for the ITT populations in eFigure 6 in the Supplement. Compared with the default SIFI (flipping the best survivors to the opposite group) with a magnitude of 9 (IQR, 5-18) for ITT populations, the 3 alternative versions are associated with higher values in most studies. The SIFI magnitudes are 11 (IQR, 8-18) for flipping the worst survivors to the opposite group, 17.5 (IQR, 7-38.3) for cloning the best survivors to the opposite group, and 24 (IQR, 16-35) for cloning the worst survivors to the opposite group. These findings suggest that the SIFI using the version that flips the best survivors to the opposite group is the most sensitive approach for detecting the minimum changes required to overturn the conclusions.
In our study, we found that the statistical significance of a substantial amount of phase 3 trials of ICIs could be lost or gained with a change in assignment of very few of the best surviving patients, often less than 1% of the respective trial sample size. Although this is an arbitrary number and does not reflect a random sampling of the patients, it represents a small fraction of the population that can overturn the statistical conclusions. Also, the change in the number of patients required for fragility is often smaller than the number of patients censored in the experimental group shortly after randomization, adding further uncertainties and raising concerns about the statistical outcomes had these and other patients been assessed to their end point. Eligibility for treatment with ICIs is assessed by concluding whether results of a trial are positive or negative. Our findings demonstrate how unstable these conclusions may be, and explain, in part, the widening gap between eligibility and benefit associated with ICIs.
The original fragility index has been applied to RCTs in oncology and other areas of medicine.17,19,71-74 However, the original fragility index is based on binary outcomes and the Fisher exact test, which could be misleading for time-to-event data, in which the primary interest is the timing of events.19 Although descriptions of time-to-event fragility exist,18,19 to our knowledge, no previous peer-reviewed original investigations have estimated time-aware fragility index for clinical trials, including oncology trials. Also, to our knowledge, no study has evaluated negative fragility measures for survival analysis.
In general, the P value serves as a measure of the compatibility of collected data with a defined statistical model. In a testing framework, smaller P values indicate greater evidence against the null hypothesis—a conjecture of no difference between outcomes of the intervention and control groups.75 Undoubtedly, the P value plays a central role in the clinical testing of new drugs, and since the 1960s, the FDA has relied on significance testing to establish their effectiveness in the approval process.76 As such, nowhere is this role more important than in clinical trials, where the smallest change in the P value can decisively influence the drug approval process and result in trial success or failure. Consequently, passing the statistical significance threshold has become the ultimate goal, and unless an analysis is adequately prespecified, most research designs allow enough leeway to manipulate the results to claim importance.77-80 Therefore, reliance on P values falling to either side of the significance threshold can result in extreme conclusions and be misleading, especially for a low threshold such as P < .05. Recently, an influential commentary published in Nature12 has even called for the abandonment of the conventional threshold for statistical significance, regardless of the level (eg, P < .05), owing to this imposed dichotomization. However, statistical inferences are unavoidably dichotomous in many scientific fields. Most decisions in medicine are dichotomous, such as a new drug will either be approved or not, and will either be prescribed or not.77
This study introduces the SIFI as a novel measure that enables us to estimate the vulnerability of the statistical conclusions of clinical trials with time-to-event outcomes. This index transforms the dichotomous conclusion to a discrete variable that provides more perspective regarding the potential benefit associated with ICIs or any other intervention. The SIFI provides context to the P value and statistical significance, which may not necessarily be intuitive and are often poorly understood.77 Therefore, the SIFI translates uncertainty to a specified number that represents actual patients and events and places it on a linear scale that allows for assessment of the robustness of the results. For example, consider 2 comparable studies with similar P values. Although the SIFI is not a measure of effect, a trial with a high SIFI with an acceptable association with the sample size and censoring provides more robustness than a trial with a small SIFI representing a small fraction of the sample size and censoring. The latter relies on fragile evidence with higher uncertainty regarding the incompatibility with the null hypothesis. We did not define criteria for fragile vs nonfragile values, nor do we believe that a measure aimed to address the dichotomization of results by a threshold should be replaced by another. Perhaps trials involving the addition of a costly and a toxic drug to the standard treatment with a small effect size would require a higher level of robustness than trials comparing 2 drugs with similar overall properties. In contrast, concluding that statistically significant results show no real association when the fragility measure is very low is discouraged; it is equally inaccurate to claim that nonsignificant results with very small negative fragility point to an important signal. However, the SIFI allows for putting these 2 scenarios in context, expressing uncertainty and suggesting that the interpretation of their importance should be similar or, de facto, the same. In both cases, and especially for negative fragility measures, small values indicate that the true underlying effects either are negligible or lack statistical power. Nevertheless, considerations such as study design, data quality, comprehension of the underlying mechanisms, and other factors may often have more importance than statistical findings12 such as P values or fragility indices.
The default solution for improving the confidence level would be making the barrier more demanding; however, this is a suboptimal option because the chance for false-negative results increases accordingly, and it still fails to address the vulnerability of the statistics. Nevertheless, fragility corresponding to one threshold is not comparable with another, and it is reasonable to expect lower fragility measures for lower P value thresholds, as they are interrelated. Hence, the approach encourages using lower significance thresholds. A trial not meeting a low prespecified significance threshold (eg, P < .0001), with a small negative SIFI (eg, −2), may provide higher confidence in the validity of the results compared with a trial that meets a higher threshold (eg, P < .05) but has a low positive SIFI (eg, 2). The SIFI relative to sample size can be useful to estimate the robustness of the results, but it could be misleading for small sample sizes. Although SIFI less than 1% in many RCTs could suggest extreme fragility, small trials with less than 100 patients cannot achieve a SIFI of less than 1%, even when the results are certainly less robust. Therefore, the SIFI relative to sample size, especially for small trials, should not be interpreted alone and must be accompanied by the SIFI.
Several limitations of the study should be recognized. We did not address prespecified P value thresholds, which were allocated and controlled differently in every trial and are often much lower than .05. Instead, we used the standard α level of .05 as a common reference; therefore, some trials did not meet the prespecified threshold but resulted in a positive SIFI. Although not a strict rule by the FDA, the standard 2-trial α level is .05 but is smaller for approval based on a single trial.76 The analysis of overall survival was based on an unstratified log-rank test at a 2-sided significance level as a uniform statistical test for all trials; however, studies have analyzed the data differently (eg, stratified or weighted log-rank test). Therefore, small differences exist between the published P value and the calculated P value. Furthermore, we found a small discrepancy in the numbers of patients at risk published in the original publications and the reproduced curves. For 19 of the 49 populations in the trials (39%), there was no discrepancy between the published and estimated number at risk at any time point. In the time points for which discrepancy existed, we found the difference to be small, with a median of 1 patient (IQR, 1-2).
The SIFI can be calculated in various ways. Our comparison of different implementations of the SIFI demonstrates that reassigning or adding the best survivors to the opposite group provides lower fragility estimates compared with the worst survivors, for most trials. This finding indicates that the longest-surviving patients can tilt the balance between the groups more strongly compared with the shortest-surviving patients. The association of the longest survivors with the survival curves is potentially unlimited, as they are constrained only by the follow-up time, whereas the shortest-surviving patients cannot have an event before time zero. By both removing a long-time survivor from one group and adding them to the other group, the total number of patients required to pass the significance threshold is reduced compared with other techniques. This approach coincides with the essence of fragility—identifying the minimum required changes to overturn the conclusions. Furthermore, we aimed to define a simple and intuitive method that can be recreated using existing routines, is quantifiable in all conditions, and is applicable to real-world practice in which patients are randomly assigned from a pool of eligible patients. Although random variations alone can lead to large disparities in P values, the calculation of the SIFI is not based on random variations in the assignment of patients but on the reassignment of patients at the extreme ends of the scale. However, the random allocation of patients can lead to different proportions of the best (or worst) survivors in the groups, which may impact the outcomes. Therefore, the SIFI serves as a simple and conservative approach to reflect the fragility of the statistics. Alternatively, the mean or median survival time can be exploited in different ways to quantify the fragility18,19; however, this approach can underestimate the fragility if the few patients who cause most of the difference are not captured.
The results of this study suggest that many phase 3 RCTs evaluating ICI therapies are fragile and challenge the confidence in rejecting or concluding superiority for these drugs compared with standard treatments. Low fragility levels express uncertainty when there is no appreciable difference between the interpretative significance of data. In contrast, high fragility levels can provide robustness and aid in binary decision-making, especially for treatments associated with high cost and toxic effects that require strong support. Interpretation of any outcome is far more complicated than just significance testing, and the SIFI as a statistical and communication tool may serve as a better starting point for discerning between science and fiction.
Accepted for Publication: July 13, 2020.
Published: October 23, 2020. doi:10.1001/jamanetworkopen.2020.17675
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Bomze D et al. JAMA Network Open.
Corresponding Authors: Gal Markel, MD, PhD (gal.markel@sheba.health.gov.il), and Tomer Meirson, BSc (tomermrsn@gmail.com), Ella Lemelbaum Institute for Immuno-Oncology, Sheba Medical Center, Ramat-Gan 526260, Israel.
Author Contributions: Messrs Bomze and Meirson had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Bomze, Hasan Ali, Azoulay, Markel, Meirson.
Acquisition, analysis, or interpretation of data: Bomze, Asher, Flatz, Meirson.
Drafting of the manuscript: Bomze, Meirson.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Bomze, Meirson.
Administrative, technical, or material support: Asher, Meirson.
Supervision: Bomze, Azoulay, Markel, Meirson.
Conflict of Interest Disclosures: Dr Asher reported receiving personal fees from MSD, BMS, Medison, and Novartis outside the submitted work. Dr Flatz reported receiving grants from Swiss National Science Foundation, Swiss Cancer League, Hookipa Pharma, and Novartis Foundation outside the submitted work. Dr Markel reported receiving personal fees from MSD and Roche; grants and personal fees from BMS and Novartis; personal fees and stock options from 4C Biomed; and stock options from Nucleai, Biond Biologics, and Ella Therapeutics outside the submitted work. Mr Meirson reported receiving a grant from the Foulkes Foundation for MD/PhD students. No other disclosures were reported.
Funding/Support: Dr Flatz is supported by a Swiss National Science Foundation professorship (PP00P3_157448). Dr Markel is supported by the Samulei Foundation Grant for Integrative Immuno-Oncology. Mr Meirson is supported by the Foulkes Foundation fellowship for MD/PhD students.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
2.Usmani
SZ, Schjesvold
F, Oriol
A,
et al; KEYNOTE-185 Investigators. Pembrolizumab plus lenalidomide and dexamethasone for patients with treatment-naive multiple myeloma (KEYNOTE-185): a randomised, open-label, phase 3 trial.
Lancet Haematol. 2019;6(9):e448-e458. doi:
10.1016/S2352-3026(19)30109-7
PubMedGoogle ScholarCrossref 3.Schachter
J, Ribas
A, Long
GV,
et al. Pembrolizumab versus ipilimumab for advanced melanoma: final overall survival results of a multicentre, randomised, open-label phase 3 study (KEYNOTE-006).
Lancet. 2017;390(10105):1853-1862. doi:
10.1016/S0140-6736(17)31601-X
PubMedGoogle ScholarCrossref 4.Motzer
RJ, Rini
BI, McDermott
DF,
et al; CheckMate 214 investigators. Nivolumab plus ipilimumab versus sunitinib in first-line treatment for advanced renal cell carcinoma: extended follow-up of efficacy and safety results from a randomised, controlled, phase 3 trial.
Lancet Oncol. 2019;20(10):1370-1385. doi:
10.1016/S1470-2045(19)30413-9
PubMedGoogle ScholarCrossref 5.Eng
C, Kim
TW, Bendell
J,
et al; IMblaze370 Investigators. Atezolizumab with or without cobimetinib versus regorafenib in previously treated metastatic colorectal cancer (IMblaze370): a multicentre, open-label, phase 3, randomised, controlled trial.
Lancet Oncol. 2019;20(6):849-861. doi:
10.1016/S1470-2045(19)30027-0
PubMedGoogle ScholarCrossref 6.Beer
TM, Kwon
ED, Drake
CG,
et al. Randomized, double-blind, phase III trial of ipilimumab versus placebo in asymptomatic or minimally symptomatic patients with metastatic chemotherapy-naive castration-resistant prostate cancer.
J Clin Oncol. 2017;35(1):40-47. doi:
10.1200/JCO.2016.69.1584
PubMedGoogle ScholarCrossref 7.Schmid
P, Rugo
HS, Adams
S,
et al; IMpassion130 Investigators. Atezolizumab plus nab-paclitaxel as first-line treatment for unresectable, locally advanced or metastatic triple-negative breast cancer (IMpassion130): updated efficacy results from a randomised, double-blind, placebo-controlled, phase 3 trial.
Lancet Oncol. 2020;21(1):44-59. doi:
10.1016/S1470-2045(19)30689-8
PubMedGoogle ScholarCrossref 18.Johnson
KW, Rappaport
E, Shameer
K, Glicksberg
BS, Dudley
JT. fragilityindex: an R package for statistical fragility estimates in biomedicine. Preprint. Posted online February 27, 2019. bioRxiv 562264. doi:
10.1101/562264 20.von Elm
E, Altman
DG, Egger
M, Pocock
SJ, Gøtzsche
PC, Vandenbroucke
JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies.
Int J Surg. 2014;12(12):1495-1499. doi:
10.1016/j.ijsu.2014.07.013
PubMedGoogle ScholarCrossref 24.Maio
M, Grob
JJ, Aamdal
S,
et al. Five-year survival rates for treatment-naive patients with advanced melanoma who received ipilimumab plus dacarbazine in a phase III trial.
J Clin Oncol. 2015;33(10):1191-1196. doi:
10.1200/JCO.2014.56.6018
PubMedGoogle ScholarCrossref 26.Reck
M, Luft
A, Szczesna
A,
et al. Phase III randomized trial of ipilimumab plus etoposide and platinum versus placebo plus etoposide and platinum in extensive-stage small-cell lung cancer.
J Clin Oncol. 2016;34(31):3740-3748. doi:
10.1200/JCO.2016.67.6601
PubMedGoogle ScholarCrossref 28.Ascierto
PA, Del Vecchio
M, Robert
C,
et al. Ipilimumab 10 mg/kg versus ipilimumab 3 mg/kg in patients with unresectable or metastatic melanoma: a randomised, double-blind, multicentre, phase 3 trial.
Lancet Oncol. 2017;18(5):611-622. doi:
10.1016/S1470-2045(17)30231-0
PubMedGoogle ScholarCrossref 31.Tomita
Y, Fukasawa
S, Shinohara
N,
et al. Nivolumab versus everolimus in advanced renal cell carcinoma: Japanese subgroup 3-year follow-up analysis from the phase III CheckMate 025 study.
Jpn J Clin Oncol. 2019;49(6):506-514. doi:
10.1093/jjco/hyz026
PubMedGoogle ScholarCrossref 33.Kang
YK, Boku
N, Satoh
T,
et al. Nivolumab in patients with advanced gastric or gastro-oesophageal junction cancer refractory to, or intolerant of, at least two previous chemotherapy regimens (ONO-4538-12, ATTRACTION-2): a randomised, double-blind, placebo-controlled, phase 3 trial.
Lancet. 2017;390(10111):2461-2471. doi:
10.1016/S0140-6736(17)31827-5
PubMedGoogle ScholarCrossref 34.Larkin
J, Minor
D, D’Angelo
S,
et al. Overall survival in patients with advanced melanoma who received nivolumab versus investigator’s choice chemotherapy in CheckMate 037: a randomized, controlled, open-label phase III trial.
J Clin Oncol. 2018;36(4):383-390. doi:
10.1200/JCO.2016.71.8023
PubMedGoogle ScholarCrossref 35.Ascierto
PA, Long
GV, Robert
C,
et al. Survival outcomes in patients with previously untreated
BRAF wild-type advanced melanoma treated with nivolumab therapy: three-year follow-up of a randomized phase 3 trial.
JAMA Oncol. 2019;5(2):187-194. doi:
10.1001/jamaoncol.2018.4514
PubMedGoogle ScholarCrossref 36.Hodi
FS, Chiarion-Sileni
V, Gonzalez
R,
et al. Nivolumab plus ipilimumab or nivolumab alone versus ipilimumab alone in advanced melanoma (CheckMate 067): 4-year outcomes of a multicentre, randomised, phase 3 trial.
Lancet Oncol. 2018;19(11):1480-1492. doi:
10.1016/S1470-2045(18)30700-9
PubMedGoogle ScholarCrossref 37.Ferris
RL, Blumenschein
G
Jr, Fayette
J,
et al. Nivolumab vs investigator’s choice in recurrent or metastatic squamous cell carcinoma of the head and neck: 2-year long-term survival update of CheckMate 141 with analyses by tumor PD-L1 expression.
Oral Oncol. 2018;81:45-51. doi:
10.1016/j.oraloncology.2018.04.008
PubMedGoogle ScholarCrossref 38.Kato
K, Cho
BC, Takahashi
M,
et al. Nivolumab versus chemotherapy in patients with advanced oesophageal squamous cell carcinoma refractory or intolerant to previous chemotherapy (ATTRACTION-3): a multicentre, randomised, open-label, phase 3 trial.
Lancet Oncol. 2019;20(11):1506-1517. doi:
10.1016/S1470-2045(19)30626-6
PubMedGoogle ScholarCrossref 39.Wu
YL, Lu
S, Cheng
Y,
et al. Nivolumab versus docetaxel in a predominantly Chinese patient population with previously treated advanced NSCLC: CheckMate 078 randomized phase III clinical trial.
J Thorac Oncol. 2019;14(5):867-875. doi:
10.1016/j.jtho.2019.01.006
PubMedGoogle ScholarCrossref 41.Cohen
EEW, Soulières
D, Le Tourneau
C,
et al; KEYNOTE-040 investigators. Pembrolizumab versus methotrexate, docetaxel, or cetuximab for recurrent or metastatic head-and-neck squamous cell carcinoma (KEYNOTE-040): a randomised, open-label, phase 3 study.
Lancet. 2019;393(10167):156-167. doi:
10.1016/S0140-6736(18)31999-8
PubMedGoogle ScholarCrossref 42.Shitara
K, Özgüroğlu
M, Bang
YJ,
et al; KEYNOTE-061 investigators. Pembrolizumab versus paclitaxel for previously treated, advanced gastric or gastro-oesophageal junction cancer (KEYNOTE-061): a randomised, open-label, controlled, phase 3 trial.
Lancet. 2018;392(10142):123-133. doi:
10.1016/S0140-6736(18)31257-1
PubMedGoogle ScholarCrossref 45.Fradet
Y, Bellmunt
J, Vaughn
DJ,
et al. Randomized phase III KEYNOTE-045 trial of pembrolizumab versus paclitaxel, docetaxel, or vinflunine in recurrent advanced urothelial cancer: results of >2 years of follow-up.
Ann Oncol. 2019;30(6):970-976. doi:
10.1093/annonc/mdz127
PubMedGoogle ScholarCrossref 46.Burtness
B, Harrington
KJ, Greil
R,
et al; KEYNOTE-048 Investigators. Pembrolizumab alone or with chemotherapy versus cetuximab with chemotherapy for recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE-048): a randomised, open-label, phase 3 study.
Lancet. 2019;394(10212):1915-1928. doi:
10.1016/S0140-6736(19)32591-7
PubMedGoogle ScholarCrossref 47.Mateos
MV, Blacklock
H, Schjesvold
F,
et al; KEYNOTE-183 Investigators. Pembrolizumab plus pomalidomide and dexamethasone for patients with relapsed or refractory multiple myeloma (KEYNOTE-183): a randomised, open-label, phase 3 trial.
Lancet Haematol. 2019;6(9):e459-e469. doi:
10.1016/S2352-3026(19)30110-3
PubMedGoogle ScholarCrossref 48.Finn
RS, Ryoo
BY, Merle
P,
et al; KEYNOTE-240 investigators. Pembrolizumab as second-line therapy in patients with advanced hepatocellular carcinoma in KEYNOTE-240: a randomized, double-blind, phase III trial.
J Clin Oncol. 2020;38(3):193-202. doi:
10.1200/JCO.19.01307
PubMedGoogle ScholarCrossref 50.Long
GV, Dummer
R, Hamid
O,
et al. Epacadostat plus pembrolizumab versus placebo plus pembrolizumab in patients with unresectable or metastatic melanoma (ECHO-301/KEYNOTE-252): a phase 3, randomised, double-blind study.
Lancet Oncol. 2019;20(8):1083-1097. doi:
10.1016/S1470-2045(19)30274-8
PubMedGoogle ScholarCrossref 51.Mok
TSK, Wu
YL, Kudaba
I,
et al; KEYNOTE-042 Investigators. Pembrolizumab versus chemotherapy for previously untreated, PD-L1–expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial.
Lancet. 2019;393(10183):1819-1830. doi:
10.1016/S0140-6736(18)32409-7
PubMedGoogle ScholarCrossref 52.Powles
T, Durán
I, van der Heijden
MS,
et al. Atezolizumab versus chemotherapy in patients with platinum-treated locally advanced or metastatic urothelial carcinoma (IMvigor211): a multicentre, open-label, phase 3 randomised controlled trial.
Lancet. 2018;391(10122):748-757. doi:
10.1016/S0140-6736(17)33297-X
PubMedGoogle ScholarCrossref 53.Fehrenbacher
L, von Pawel
J, Park
K,
et al. Updated efficacy analysis including secondary population results for OAK: a randomized phase III study of atezolizumab versus docetaxel in patients with previously treated advanced non–small cell lung cancer.
J Thorac Oncol. 2018;13(8):1156-1170. doi:
10.1016/j.jtho.2018.04.039
PubMedGoogle ScholarCrossref 56.West
H, McCleod
M, Hussein
M,
et al. Atezolizumab in combination with carboplatin plus nab-paclitaxel chemotherapy compared with chemotherapy alone as first-line treatment for metastatic non-squamous non-small-cell lung cancer (IMpower130): a multicentre, randomised, open-label, phase 3 trial.
Lancet Oncol. 2019;20(7):924-937. doi:
10.1016/S1470-2045(19)30167-6
PubMedGoogle ScholarCrossref 57.Rini
BI, Powles
T, Atkins
MB,
et al; IMmotion151 Study Group. Atezolizumab plus bevacizumab versus sunitinib in patients with previously untreated metastatic renal cell carcinoma (IMmotion151): a multicentre, open-label, phase 3, randomised controlled trial.
Lancet. 2019;393(10189):2404-2415. doi:
10.1016/S0140-6736(19)30723-8
PubMedGoogle ScholarCrossref 58.Bang
YJ, Ruiz
EY, Van Cutsem
E,
et al. Phase III, randomised trial of avelumab versus physician’s choice of chemotherapy as third-line treatment of patients with advanced gastric or gastro-oesophageal junction cancer: primary analysis of JAVELIN Gastric 300.
Ann Oncol. 2018;29(10):2052-2060. doi:
10.1093/annonc/mdy264
PubMedGoogle ScholarCrossref 59.Barlesi
F, Vansteenkiste
J, Spigel
D,
et al. Avelumab versus docetaxel in patients with platinum-treated advanced non-small-cell lung cancer (JAVELIN Lung 200): an open-label, randomised, phase 3 study.
Lancet Oncol. 2018;19(11):1468-1479. doi:
10.1016/S1470-2045(18)30673-9
PubMedGoogle ScholarCrossref 61.Paz-Ares
L, Dvorkin
M, Chen
Y,
et al; CASPIAN investigators. Durvalumab plus platinum-etoposide versus platinum-etoposide in first-line treatment of extensive-stage small-cell lung cancer (CASPIAN): a randomised, controlled, open-label, phase 3 trial.
Lancet. 2019;394(10212):1929-1939. doi:
10.1016/S0140-6736(19)32222-6
PubMedGoogle ScholarCrossref 64.Kwon
ED, Drake
CG, Scher
HI,
et al; CA184-043 Investigators. Ipilimumab versus placebo after radiotherapy in patients with metastatic castration-resistant prostate cancer that had progressed after docetaxel chemotherapy (CA184-043): a multicentre, randomised, double-blind, phase 3 trial.
Lancet Oncol. 2014;15(7):700-712. doi:
10.1016/S1470-2045(14)70189-5PubMedGoogle ScholarCrossref 72.Gaudino
M, Hameed
I, Biondi-Zoccai
G,
et al. Systematic evaluation of the robustness of the evidence supporting current guidelines on myocardial revascularization using the fragility index.
Circ Cardiovasc Qual Outcomes. 2019;12(12):e006017. doi:
10.1161/CIRCOUTCOMES.119.006017
PubMedGoogle Scholar 76.Kennedy-Shaffer
L. When the alpha is the omega:
P-values, “substantial evidence,” and the 0.05 standard at FDA.
Food Drug Law J. 2017;72(4):595-635.
PubMedGoogle Scholar 80.Chan
AW, Hróbjartsson
A, Haahr
MT, Gøtzsche
PC, Altman
DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles.
JAMA. 2004;291(20):2457-2465. doi:
10.1001/jama.291.20.2457
PubMedGoogle ScholarCrossref