Key Points español 中文 (chinese) Question
Do bias-adjusted hazard ratios differ from unadjusted hazard ratios when oncology clinical trials are stopped for efficacy at the interim analysis?
Findings
In this systematic review of 19 clinical trials, 2 bias-adjusted hazard ratios—calculated using conditional mean-adjusted estimator and weighted conditional mean-adjusted estimator—were distinct from the unadjusted hazard ratio in small trials. Larger differences between the unadjusted and bias-adjusted values were observed when the estimated hazard ratio was greater than 0.5.
Meaning
These findings suggest presenting the bias-adjusted hazard ratios, along with the unadjusted hazard ratio, in the data monitoring committee meeting because bias-adjusted estimators may play an important role in the committee’s decision.
Importance
Group sequential designs allow potential early trial termination at the interim analysis, before study completion. Traditional maximum likelihood estimate is commonly used to quantify the treatment effect in group sequential design trials; however, in published clinical trials, a bias-adjusted estimator has rarely been reported.
Objective
To emphasize the need for considering overestimation of treatment effect by applying 2 bias-adjusted estimators to previously published, early-terminated oncology clinical trials.
Evidence Review
Trials published from 2013 to 2017 were identified by searching MEDLINE and Embase on February 23, 2018. This review was restricted to oncology clinical trials using group sequential designs with a single preplanned interim analysis as well as 2-arm randomized clinical trials that were subsequently stopped for efficacy reasons. Each article was independently reviewed by 3 biostatisticians during text screening, and differences in opinion were resolved by discussion. This report presents the unadjusted hazard ratio (HR) of an experimental arm to a reference arm and 2 bias-adjusted HRs calculated by using the conditional mean-adjusted estimator (CMAE) and weighted CMAE (WCMAE).
Findings
In total, 198 abstracts were screened for eligibility, of which, 19 eligible clinical trials were identified as applicable to the bias-adjusted estimators. Unadjusted HRs ranged from 0.203 (95% CI, 0.150-0.276) to 0.71 (95% CI, 0.60-0.84), number of events at the interim analysis from 58 to 540, and information time from 48% to 82%. In each study, the HRs adjusted by CMAE and WCMAE were higher than the unadjusted HR. Bias-adjusted estimates in large trials (243 and 414 events at the interim analysis) were similar to the unadjusted HR. However, in small trials (eg, with 58 events at the interim analysis), bias-adjusted estimates were highly disparate from the unadjusted HR. In trials with large treatment effects (eg, HRs of 0.20 and 0.22), the difference between unadjusted and bias-adjusted HRs was small even though the number of events at the interim analysis was small; larger differences were observed when the unadjusted HR was greater than 0.5.
Conclusions and Relevance
In this systematic review of oncology clinical trials that were stopped for efficacy at the interim analysis, relatively large differences were noted between the unadjusted and adjusted HRs when the number of events at the interim analysis was small or when the unadjusted HR was close to the boundaries. These findings suggest presenting the 2 bias-adjusted HRs along with the unadjusted HR in the data monitoring committee meeting.
In clinical trials with long follow-up times (eg, oncology), group sequential designs (GSDs) allow early trial termination based on the interim analysis. Group sequential design is adaptive and has boundaries to stop a trial when there is sufficient evidence of efficacy. In a clinical trial, the boundaries of the GSD control the overall type I error rate (the probability of erroneously concluding a beneficial effect when there is no effect on the experimental treatment) by suppressing the significance level at each stage. Group sequential design is widely used to assess whether a trial should be terminated early for efficacy or lack thereof, especially in oncology clinical trials.
The maximum likelihood method is commonly used to quantify treatment effects, such as hazard ratio (HR) and risk ratio, in GSD trials. It is an intuitive and standard statistical method used in both non-GSDs and GSDs. Controlling the overall type I error rate is based on a maximum likelihood estimate (MLE), derived from the maximum likelihood method, by adjusting efficacy stopping boundaries. However, even controlling type I errors does not address the bias of MLE, and the MLE may overestimate treatment effects, especially when a trial is terminated early for efficacy reasons. Magnitude of the overestimations of treatment effects generally reported in medical journals may not be negligible.1 For example, an estimate of efficacy (eg, HR) allows a data monitoring committee (DMC) to assess the risks and benefits of treatment in addition to providing safety information. The US Food and Drug Administration has issued draft guidance on adaptive designs, including GSD,2 stating that a bias-adjusted estimate informing the stopping rule of a GSD should be prospectively planned and used when reporting study results. Several researchers have proposed bias-adjusted estimators, such as the conditional mean-adjusted estimator (CMAE) and weighted CMAE (WCMAE), to address the issue.1,3,4 The WCMAE is a modified CMAE, which is calculated by weighting the MLE obtained at the interim analysis and the effect size prespecified when calculating the sample size. The statistical detail of the CMAE and WCMAE is shown in eAppendix 1 in the Supplement.
However, in published clinical trials, a bias-adjusted estimate of the HR is rarely reported. This suggests that a unadjusted estimate obtained using the maximum likelihood method may be nonchalantly presented to the DMC at the interim analysis.5 It might be critical to display bias-adjusted estimators in addition to the MLE when the DMC, sponsor, and investigators interpret the results of the trial. Therefore, we applied the CMAE and the WCMAE to early-terminated oncology clinical trials published in major journals to emphasize the need for adjusting HR overestimation.
We conducted this systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline.6 The procedures for systematic review are provided in eAppendix 2 in the Supplement. This review was restricted to oncology clinical trials using GSD with a single preplanned interim analysis. We included 2-arm randomized clinical trials that were stopped on the basis of efficacy. We identified trials published in each year from 2013 to 2017 by searching MEDLINE and Embase on February 23, 2018. The search was restricted to 11 scientific journals: Annals of Internal Medicine, CA: A Cancer Journal for Clinicians, Cancer Discovery, JAMA, JAMA Oncology, Journal of Clinical Oncology, Lancet Oncology, Nature Reviews Clinical Oncology, The BMJ, The Lancet, and The New England Journal of Medicine. We used a free-text search with relevant keywords, including hazard ratio and at least 1 of the following keywords: interim analys(i/e)s, group sequential, two stage, stop, stopping, terminate, termination, halt, close, continue, continuation, prematurely, independent data monitoring, data and safety monitoring board, DSMB, Brien-Fleming, Pocock, Lan-DeMets, Fisher information, boundar(y/ies), stop, terminate, or independent data monitoring.
Eligibility Criteria and Quality Control
We included articles that met all the following inclusion criteria: oncology clinical trials, 2-arm randomized trials, trials that were stopped at an interim analysis on the basis of efficacy, and clinical trials with a preplanned interim analysis. We excluded articles that met any of the following exclusion criteria: protocol articles, articles on statistical methodology, retrospective studies, meta-analyses, articles on integrated database analyses, follow-up analyses, subgroup analyses, noninferiority clinical trials, and clinical trials that were stopped because of safety concerns. We defined articles as eligible if they met all the inclusion criteria and did not meet any exclusion criteria. The definition of eligibility was the same in abstract screening and full-text screening.
We used abstract screening to reduce the number of articles examined with full-text screening. We excluded articles that obviously did not meet the eligibility criteria, including articles that were not based on oncology, not clinical trials, not randomized, or were negative studies. Then, we retrieved the full texts of the remaining articles and determined their eligibility for full-text screening.
Each article was independently reviewed by 3 biostatisticians (M.S., S.N., and M.W.) during abstract and full-text screening, and differences in opinion were resolved by discussion.
Eligibility Screening and Data Extraction
A total of 198 abstracts were screened for eligibility (Figure 1). Ninety-seven articles that obviously were not eligible were excluded, including duplicate articles that were eliminated by comparing registration numbers from 3 registration databases: clinical trial.gov, UMIN, and ISRCTN Registry. The full texts of the remaining 101 articles were assessed, and 19 eligible clinical trials were identified for application of the bias-adjusted estimators.7-25
Of the 19 eligible trials, two of them (Lonial et al7 and Dimopoulos et al15) included 2 efficacy end points and a single planned interim analysis. However, these 2 trials met the termination criteria for only 1 end point. No trials used any bias-adjusted estimators. If the information needed to calculate the bias-adjusted estimates (alpha spending function, number of events, and planned HR) was missing, the information was gathered by referring to the protocols, statistical analysis plan, and related articles. All data extracted are shown in the eTable in the Supplement.
Comparison of Unadjusted HRs and Bias-Adjusted HRs
Figure 2 shows the unadjusted HRs based on the standard maximum likelihood method, HRs adjusted by the CMAE and WCMAE, study number, end points, sample sizes, events at the interim analysis, and planned events at the final analysis for each trial. The experimental treatment was more efficacious than the reference treatment if the HR was less than 1. The most common type of end point for the interim analysis was progression-free survival time (79% of the trials), followed by overall survival time (16%) and disease-free survival time (5%). The unadjusted HRs ranged from 0.203 (95% CI, 0.150-0.276)21 to 0.71 (95% CI, 0.60-0.84),22 number of events at the interim analysis from 58 to 540, and information time from 48% to 82%. The information time is a proportion of the number of events at the interim analysis and that at the final analysis. The planned information time (designated as IT[P] in Figure 2) was extracted from the main text of the 19 articles. Two of them could not be shown as a percentage because the interim analysis of the two studies was not based on the number of events. The actual information time (designated as IT[A] in Figure 2) was calculated by the actual number of events at the interim analysis dividing the planned number of events at the final analysis. Progression-free survival was the end point in all trials with a unadjusted HR of 0.6 or less. In each study, the HRs adjusted by the CMAE and WCMAE were higher than the associated unadjusted HR.
The differences between the unadjusted HRs and adjusted HRs was associated with the number of events at the interim analysis and the unadjusted HR itself (Figure 2). The bias-adjusted estimators in the large trials, such as the study by Hortobagyl et al11 (243 events at the interim analysis) and the one by Dimopoulos et al16 (414 events at the interim analysis), were similar to the unadjusted HR. However, the bias-adjusted estimators in the small trials, such as the study by Oza et al14 (58 events at the interim analysis), were highly disparate from the unadjusted HRs. In trials with large treatment effects (eg, Dimopoulos et al19 and Byrd et al20), when the number of events at the interim analysis was small, the difference between unadjusted and bias-adjusted HRs was also small. For treatments with extremely positive effects, such as 0.20 (Chanan-Kahn et al21) and 0.22 (Byrd et al20), the risks of overestimating unadjusted HR would be minimal. Conversely, larger differences between the unadjusted and bias-adjusted HRs were generally observed when the estimated HR was larger than 0.5.
If the estimated treatment effect is not large or the number of events at the interim analysis is small, the unadjusted HR should be interpreted with skepticism. The effect of other factors on bias adjustment, such as information time, was explained by the these 2 factors. Even if the total number of events is small, large information time reduced the difference between unadjusted HR and bias-adjusted HRs. It is difficult to control the bias of the unadjusted HR before starting the trial, but researchers can reduce the magnitude of overestimation by increasing the number of events at the interim analysis in advance. The performance of the CMAE and WCMAE depends on the unmeasured true treatment effect. Thus, we recommend presenting the 2 bias-adjusted estimators with the MLE, especially when the trial is terminated early, although most standard statistical software (eg, SAS; SAS Institute Inc) cannot directly produce bias-adjusted estimators.
The importance of the number of events is easy to interpret; data about treatment effects from small trials are generally limited, and the amount of data at the interim analysis is smaller than that at the final analysis. Thus, a positive result at the interim analysis tends to exaggerate the treatment effect when trials are halted for efficacy reasons. The DMC members and stakeholders of clinical trial sponsors should carefully interpret and report the results from such trials when the number of events at the interim analysis is small.
Since the true HR is unknown, operating characteristics of CMAE and WCMAE should be evaluated statistically under the hypothetical situation where the true HR is known. Shimura et al4 had analytically compared the unadjusted HR, CMAE, and WCMAE along with computer simulations and found that the CMAE and WCMAE reduced the bias that was defined as the difference of HR between estimates and the true value. Although the WCMAE underestimated the difference in treatment effect when the true HR was relatively low (a true HR of 0.5), the magnitude of underestimation was not serious.
Study end points might also be critical when interpreting study results. In Figure 2, the treatment effect of progression-free survival tended to be larger than that of overall survival. This caused differences in the estimated treatment effect between the unadjusted and bias-adjusted estimators because, as discussed in the Results section, the adjustment becomes larger when the unadjusted HR is close to 1.
In GSDs, the conditional bias-adjusted estimators (the CMAE and WCMAE) do not affect the boundaries and significance levels of early stopping of a trial at the interim analysis. Therefore, researchers can use the same boundaries and significance levels used in standard GSDs. Another advantage of the CMAE and WCMAE is that they do not require patient-level data. To calculate the bias-adjusted estimators, we require the unadjusted HR, boundary value, number of events at the interim analysis, and planned HR (only for WCMAE). The information is generally provided in the article or in the protocol. Thus, researchers can calculate the HR for the CMAE and WCMAE using the existing literature.
The bias-adjusted estimators may play an important role in the decision of the DMC. In many cases, the estimate from CMAE and WCMAE becomes more conservative than the unadjusted HR. Thus, presenting them in the DMC meeting might reduce the probability of early termination. Data monitoring committees are generally wary of interim results with few events, since 1 more or 1 less event in either group could change the conclusion of the interim analysis. Therefore, it is recommended to present both the bias-adjusted HRs as well as the unadjusted HR.
One of the major limitations of our study is that the bias-adjusted estimators we considered could only be applied to superiority trials with a single interim analysis. As a result, the literature search was restricted to trials in which only 1 interim analysis was planned. Therefore, we have no information on the bias in HR estimation for long-term trials with more than 1 interim analysis.
In actual clinical trials, this systematic review found relatively large differences between the unadjusted and adjusted HRs when the number of events at the interim analysis was small or unadjusted HR was close to the boundaries. Therefore, we recommend presenting the 2 bias-adjusted estimators with the unadjusted HR in the DMC meeting to assess whether early termination for efficacy reasons is recommended at the interim analysis.
Accepted for Publication: April 16, 2020.
Published: June 23, 2020. doi:10.1001/jamanetworkopen.2020.8633
Open Access: This is an open access article distributed under the terms of the CC-BY-NC-ND License. © 2020 Shimura M et al. JAMA Network Open.
Corresponding Author: Masashi Shimura, PhD, Data Science Department, Taiho Pharmaceutical Co, Ltd, 1-2-4, Uchikanda, Chiyoda-ku, Tokyo, Japan (m-shimura@taiho.co.jp).
Author Contributions: Dr Shimura had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Shimura, Wakabayashi, Maruo, Gosho.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Shimura, Nomura, Wakabayashi.
Critical revision of the manuscript for important intellectual content: Shimura, Nomura, Maruo, Gosho.
Statistical analysis: Shimura, Wakabayashi.
Administrative, technical, or material support: Shimura, Nomura.
Supervision: Shimura, Maruo, Gosho.
Conflict of Interest Disclosures: Dr Nomura reported receiving personal fees from AstraZeneca, Chugai Pharmaceutical Co, Ltd, Taiho Pharmaceutical, and Pfizer outside the submitted work. Dr Wakabayashi reported receiving personal fees from Chugai Pharmaceutical Co, Ltd and Johnson & K. K. Medical Company outside the submitted work. No other disclosures were reported.
1.Shimura
M, Gosho
M, Hirakawa
A. Comparison of conditional bias-adjusted estimators for interim analysis in clinical trials with survival data.
Stat Med. 2017;36(13):2067-2080. doi:
10.1002/sim.7258PubMedGoogle ScholarCrossref 2.US Department of Health and Human Services; Food and Drug Administration; Center for Drug Evaluation and Research (CDER); Center for Biologics Evaluation and Research (CBER). Adaptive designs for clinical trials of drugs and biologics: guidance for industry. Published November 2019. Accessed 24 May, 2019.
https://www.fda.gov/media/78495/download 4.Shimura
M, Maruo
K, Gosho
M. Conditional estimation using prior information in 2-stage group sequential designs assuming asymptotic normality when the trial terminated early.
Pharm Stat. 2018;17(5):400-413. doi:
10.1002/pst.1859PubMedGoogle Scholar 12.Sehn
LH, Chua
N, Mayer
J,
et al. Obinutuzumab plus bendamustine versus bendamustine monotherapy in patients with rituximab-refractory indolent non-Hodgkin lymphoma (GADOLIN): a randomised, controlled, open-label, multicentre, phase 3 trial.
Lancet Oncol. 2016;17(8):1081-1093. doi:
10.1016/S1470-2045(16)30097-3PubMedGoogle ScholarCrossref 15.Dimopoulos
MA, Moreau
P, Palumbo
A,
et al; ENDEAVOR Investigators. Carfilzomib and dexamethasone versus bortezomib and dexamethasone for patients with relapsed or refractory multiple myeloma (ENDEAVOR): a randomised, phase 3, open-label, multicentre study.
Lancet Oncol. 2016;17(1):27-38. doi:
10.1016/S1470-2045(15)00464-7PubMedGoogle ScholarCrossref 16.van Oers
MH, Kuliczkowski
K, Smolej
L,
et al; PROLONG study investigators. Ofatumumab maintenance versus observation in relapsed chronic lymphocytic leukaemia (PROLONG): an open-label, multicentre, randomised phase 3 study.
Lancet Oncol. 2015;16(13):1370-1379. doi:
10.1016/S1470-2045(15)00143-6PubMedGoogle ScholarCrossref 21.Chanan-Khan
A, Cramer
P, Demirkan
F,
et al; HELIOS investigators. Ibrutinib combined with bendamustine and rituximab compared with placebo, bendamustine, and rituximab for previously treated chronic lymphocytic leukaemia or small lymphocytic lymphoma (HELIOS): a randomised, double-blind, phase 3 study.
Lancet Oncol. 2016;17(2):200-211. doi:
10.1016/S1470-2045(15)00465-9PubMedGoogle ScholarCrossref