Do value frameworks developed by the American Society of Clinical Oncology and European Society for Medical Oncology measure absolute or relative clinical benefit?
In this evidence review and analysis of 107 randomized clinical trials, the survival efficacy component of the American Society of Clinical Oncology’s Value Framework correlated better with relative measures of survival benefit than with absolute measures. The European Society for Medical Oncology Magnitude of Clinical Benefit Scale maintained low- to moderate-strength correlations with relative and absolute measures.
Both frameworks did not appear to possess the measurement characteristics of an absolute measure of survival benefit, and the current versions may not be ideal for comparing clinical benefit across different drugs or combining clinical benefit with cost to establish value.
The American Society of Clinical Oncology (ASCO) and the European Society for Medical Oncology (ESMO) have independently published value frameworks. To date, whether the clinical benefit scoring algorithms from these framework were intended to measure absolute or relative survival benefit remains unclear.
To empirically examine the measurement characteristics of these frameworks by comparing their survival efficacy components (ASCO clinical benefit score [CBS] and ESMO preliminary magnitude of clinical benefit grade [PMCBG]) with established measures of absolute (median survival difference and restricted mean survival time [RMST] difference) and relative (hazard ratios [HRs]) survival benefit.
The US Food and Drug Administration (FDA)’s Hematology and Oncology Approvals and Safety Notifications database was retrospectively reviewed to identify phase 3 randomized controlled trials (RCTs) cited for clinical efficacy evidence in oncology drug approvals from January 1, 2006, through December 31, 2017.
Two reviewers searched the database for initial trials cited for approval. Phase 3 trials with overall survival, progression-free survival, and/or time to progression as their primary or coprimary end points were included. Notifications for noncancer indications or presenting label changes and trials that did not report HRs for the required end points and/or did not publish survival curves with number-at-risk data were excluded. Of 269 notifications initially identified, 107 met the selection criteria.
Data Extraction and Synthesis
Sensitivity analyses were conducted by calculating the scores using (1) the framework-defined end point, including tail-of-curve bonus points (ASCO) or long-term plateau adjustments (ESMO) (framework-defined end point plus tail-of-curve bonus), (2) overall survival data only, and (3) progression-free survival data only. For primary and sensitivity analyses, Spearman correlation coefficients were calculated to examine the relationships between (1) ASCO-CBS or ESMO-PMCBG and RMST difference, (2) ASCO-CBS or ESMO-PMCBG and median survival difference, and (3) ASCO-CBS or ESMO-PMCBG and HR. Data were analyzed from January 7 through April 30, 2018.
Main Outcomes and Measures
In the primary analysis, ASCO-CBSs and ESMO-PMCBGs were calculated for the included trials using the framework-defined end point.
Compared with measures of absolute survival benefit, ESMO-PMCBGs showed low to moderate correlations with RMST difference (ρ = 0.44) and moderate to high correlations with median survival difference (ρ = 0.64). ASCO-CBSs showed low to moderate correlations with both measures of absolute benefit (ρ = 0.43 for RMST difference; ρ = 0.44 for median survival). Compared with a relative measure of survival (HRs), ESMO-PMCBGs showed a low correlation (ρ = 0.47) and ASCO-CBSs showed a higher correlation (ρ = 0.76).
Conclusions and Relevance
Neither framework consistently performed as an absolute measure of survival benefit. The incorporation of a direct measure of absolute clinical benefit, such as RMST difference, into the survival efficacy components of their algorithms should be considered.
The American Society of Clinical Oncology (ASCO) and the European Society for Medical Oncology (ESMO) have independently published value frameworks that allow for the systematic assessment of clinical benefit of anticancer drugs with the aim of establishing the value of these therapies.1-3 Both frameworks consist of a preliminary survival benefit score that is further adjusted by incorporating other value dimensions such as toxic effects and quality of life.1,2
The ASCO Value Framework (ASCO-VF) incorporates only hazard ratios (HRs) in order to construct clinical benefit scores (CBSs), whereas the ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS) considers HRs and absolute gains in median survival in order to generate preliminary magnitude of clinical benefit grades (PMCBGs).1,2 The inclusion of HRs in the calculation of CBSs in both frameworks suggests that they likely measure relative rather than absolute survival benefit (this is also explicitly stated by ASCO regarding their framework).1 Although both frameworks were not designed to solely measure absolute survival benefit, calculating the CBS as a relative measure does not seem to correspond with the frameworks’ intended uses. The ASCO-VF aims to generate a net health benefit score that can be juxtaposed against the cost of treatment (an absolute and not a relative monetary measure) to establish value.1 The ESMO-MCBS was developed for use in a variety of settings, including public policy applications in which ESMO scores (ranging from 1 to 5 in the noncurative setting) can provide a “backbone for value evaluations for cancer medicines.”4(p1559) The ESMO has also established threshold scores (ie, ESMO-MCBS score of 4 or 5 in the noncurative setting) to classify treatments as having minimal clinical benefit.2 The presence of these cutoffs implies that ESMO-MCBS can be used as an absolute measure of benefit, allowing decision makers to compare different treatments and endorse those with sufficient additional clinical benefits. Indeed, both frameworks have been applied to compare drug costs and thereby establish the value of anticancer therapies.5-8 Furthermore, both frameworks incorporate bonus points and adjustments that add or subtract fixed values from the preliminary CBSs, acting as an absolute measure of those dimensions. Therefore, an internal inconsistency in both frameworks combines relative and absolute measures into a single score. To satisfy the intentions of these frameworks and to compare CBSs with incremental treatment cost, ASCO-VF and ESMO-MCBS should ideally measure absolute rather than relative survival benefit. Relative measures of benefit cannot be compared across treatments in a consistent manner for decision-making purposes or for price comparisons from the payer’s perspective.
Although the intentions of both frameworks necessitate that they provide absolute measures while their calculations incorporate relative measures, we sought to understand whether their empirical performance still allows them to function as absolute measures. The objective of our study was to compare the survival efficacy components of both frameworks (defined as ASCO-CBS and ESMO-PMCBG) with established metrics of absolute (restricted mean survival time [RMST] difference and median survival difference) and relative survival benefit (HRs) to empirically examine their measurement characteristics.
Selection of Randomized Controlled Trials
The US Food and Drug Administration’s Hematology and Oncology Approvals and Safety Notifications pages were reviewed to identify randomized controlled trials (RCTs) cited for clinical efficacy evidence in oncology drug approvals from January 1, 2006, through December 31, 2017 (Figure 1). Only the initial trials cited for Food and Drug Administration approval were included because they likely reflect the best evidence available to payers or decisions makers at the time of their deliberations. Phase 3 RCTs with overall survival (OS), progression-free survival (PFS), and/or time to progression as their primary or coprimary end points were included. The Food and Drug Administration notifications for noncancer indications or those presenting label changes were excluded. Any RCTs that did not report HRs for the required end points (OS, PFS, and/or time to progression) were excluded. Trials that did not publish survival curves and/or report the number-at-risk data with their survival curves were excluded. The institutional review board of Sunnybrook Research Institute exempted this study from review because no human data were included, and publicly available information was used.
Data Extraction and Framework Scoring
For each included RCT, 2 independent reviewers (R.S. and L.E.) calculated ASCO-CBS, ESMO-PMCBG, and RMST difference values. Interrater reliability was assessed using intraclass correlation coefficients.9 Trials with time to progression as an end point were evaluated using the PFS scoring algorithms. From each trial, HRs for OS and/or PFS and the median OS and/or PFS difference between the arms (when reported) were extracted. Published OS and/or PFS Kaplan-Meier survival curves from each trial were digitized using DigitizeIt software, version 2.0.4,10 and individual patient data were reconstructed using an established algorithm.11 Individual patient data were used to calculate RMST in the experimental and control groups at time point t, specified as the minimum of the longest observed event times in each group.12,13 For each trial, 2 reviewers (R.S. and L.E.) calculated the mean RMST difference values (experimental arm RMST minus control arm RMST) for the final analysis. Substantial differences in RMST values (>0.5 months) between the reviewers were resolved through recalculation by a third reviewer (S.C.).
The advanced disease framework and noncurative intent forms of ASCO-VF version 2 and ESMO-MCBS version 1.1 were used to calculate the ASCO-CBS and ESMO-PMCBG, respectively. Because this analysis is focused solely on survival benefits, only the survival components of the ASCO and ESMO scores (ASCO-CBS and ESMO-PMCBG), which do not incorporate any toxic effects, quality of life measures, or bonus points, were analyzed.1,2
Data were analyzed from January 7 through April 30, 2018. In our primary analysis, we attempted to calculate ASCO-CBSs and ESMO-PMCBGs for the included trials using the value framework–defined end point as per the criteria in their publications.1,2 Three separate sensitivity analyses were also conducted by calculating the ASCO-CBS and ESMO-PMCBG for each trial using (1) the framework-defined end point, including the tail-of-curve bonus points (ASCO-VF) or long-term plateau adjustments (ESMO-MCBS), (2) OS data only, and (3) PFS data only. Sensitivity analyses were conducted to analyze the different ways each framework captured survival efficacy. As part of the sensitivity analyses, OS and PFS data were examined separately because both frameworks applied different weighting and thresholds for these end points, recognizing that PFS is less clinically meaningful and not always an appropriate surrogate for improved survival.1,2 The tail-of-curve bonus was included in the sensitivity analyses because this is another efficacy component of the frameworks, capturing long-term survival or disease control that is not captured by the HR component of the efficacy scores. Although HRs are statistical measures of the magnitude of difference between 2 Kaplan-Meier curves, the tail-of-curve adjustments are captured differently and are relatively arbitrarily determined.
Comparison of ASCO-CBS and ESMO-PMCBG With Absolute and Relative Clinical Benefit Measures
Spearman correlation coefficients were calculated to examine the correlation of ASCO-CBS and ESMO-PMCBG with established measures of absolute and relative survival benefit. In comparing scores with absolute measures, RMST difference was considered the primary measure for correlation. We correlated the framework scores with median survival difference (for RCTs in which both arms reached median survival) as a secondary comparison because this is a more commonly reported measure of absolute survival, and readers may be more comfortable with this traditionally used metric. For the relative measure comparison, the scores were correlated with published HRs, the most commonly reported relative effect measure.
When calculating the correlations, we ensured that all metrics measured the same end point. For instance, OS ASCO-CBS and ESMO-PMCBG were correlated with OS RMST difference, median OS difference, and OS HRs. Spearman correlation values of 0.30 to 0.50 were considered low; 0.50 to 0.70, moderate; and 0.70 to 0.90, high.14 All statistical analyses were performed using R, version 3.2.0 (R Foundation for Statistical Computing).
Characteristics of Included Trials
In this analysis, 107 unique phase 3 clinical trials (84 solid tumor and 23 hematology trials) were included. The characteristics of the included trials are presented in Table 1 and eTables 1 and 2 in the Supplement. In total, 106 ASCO-CBSs and 84 ESMO-PMCBGs were computed in our primary analysis. The distributions of the ASCO-CBSs and ESMO-PMCBGs for the primary and sensitivity analyses are presented in eFigures 1 and 2 in the Supplement. In our primary analysis, the mean (SD) ASCO-CBS was 30.85 (16.26) (range, −25 to 73), and the mean (SD) ESMO-PMCBG was 2.65 (0.83) (range, 1-3) (Table 2). For 106 RCTs eligible for ASCO scoring, the mean (SD) OS RMST difference was 1.83 (1.30) months (range, −0.81 to 7.54 months), and the mean (SD) PFS RMST difference was 3.19 (2.19) months (range, −0.38 to 11.14 months) (Table 2). For the 79 studies scored with ESMO, the mean (SD) OS RMST difference was 1.86 (1.20) months (range, −0.81 to 7.54 months), and the mean (SD) PFS RMST difference was 2.74 (1.59) months (range, 0.42 to 8.21 months). The mean RMST difference for an ESMO-PMCBG score of 1 was 1.64 months; ESMO-PMCBG 2, 2.20 months; ESMO-PMCBG 3, 3.13 months; and ESMO-PMCBG 4, 3.43 months.
Summary statistics for our sensitivity analyses using OS, PFS, and framework-defined end point plus tail-of-curve bonus as the end points and median OS and PFS differences are presented in Table 2. Mean RMST differences for ESMO-PMCBGs calculated using the other end points are presented in eTable 3 in the Supplement. Overall, interrater reliability was good to excellent for the framework scores and RMST differences (intraclass correlation coefficient range, 0.72 [95% CI, 0.60-0.82] to 0.98 [95% CI, 0.97-0.99]) (eTable 4 in the Supplement).
Primary Analysis: Framework-Defined End Point
Correlations with the RMST difference were low for the ASCO-CBS (ρ = 0.43) and the ESMO-PMCBG (ρ = 0.44) (Figure 2). In our secondary analysis with median survival difference, the ASCO-CBS revealed a low correlation (ρ = 0.43), whereas the ESMO-PMCBG showed a moderate correlation (ρ = 0.64) (eFigure 3 in the Supplement).
In comparison, the ASCO-CBS showed a high correlation with HRs (ρ = 0.76), stronger than that observed with both absolute measures of survival. In contrast, the ESMO-PMCBG maintained a low correlation with HRs (ρ = 0.47) (Figure 3).
Overall, the ASCO-CBS appeared to correlate better with relative measures (HRs of survival benefit) than with absolute measures (RMST and median survival difference), and the ESMO-PMCBG maintained similar low to moderate correlations with relative and absolute measures of survival benefit. Similar results were observed when the primary analysis was conducted with OS and PFS data separately (eTable 5 in the Supplement).
Secondary Analysis: Framework Plus Tail of Curve, OS, and PFS End Points
Results of sensitivity analyses showed low to moderate correlations between the ASCO-CBS and the RMST difference (framework-defined end point plus tail-of-curve bonus, ρ = 0.40; OS, ρ = 0.46; PFS, ρ = 0.55) and median survival difference (framework-defined end point plus tail-of-curve bonus, ρ = 0.33; OS, ρ = 0.51; PFS, ρ = 0.54) for all other end points evaluated (Figure 2 and eFigure 3 in the Supplement). For the ESMO-PMCBG, the sensitivity analysis showed improved moderate correlations with RMST difference when the tail-of-curve bonus was added to the framework score (ρ = 0.67) and when OS and PFS data were analyzed separately (ρ = 0.63 and ρ = 0.62, respectively). Similarly, stronger correlations were observed for all other end points when compared with median survival difference (framework-defined end point plus tail-of-curve bonus, ρ = 0.77; OS, ρ = 0.88; PFS, ρ = 0.70). Whereas correlations of similar magnitude were observed between the ASCO-CBS and RMST difference or median survival difference at all end points, correlations between the ESMO-PMCBG and RMST difference were weaker than correlations between the ESMO-PMCBG and median survival difference.
The ASCO-CBS maintained strong correlations with HRs when the tail-of-curve bonus was included (ρ = 0.74) or when OS and PFS data were analyzed separately (ρ = 1.00 for OS and PFS) (Figure 3). For the ESMO-PMCBG, the correlations were predominantly moderate but higher than those observed in the primary analysis (framework-defined end point plus tail-of-curve bonus, ρ = 0.62; OS, ρ = 0.61; PFS, ρ = 0.75) (Figure 3).
Our results showed that the ASCO-CBS and ESMO-PMCBG had low to moderate correlations with absolute measures of clinical benefit such as RMST difference and median survival difference. Comparing the frameworks, ESMO-PMCBG showed stronger correlations with RMST difference and median OS or PFS difference than the ASCO-CBS, indicating that the efficacy component of this framework had a stronger correlation with absolute measures than the ASCO-CBS. These results are not surprising, because the ESMO-PMCBG incorporates HRs and absolute gains in median survival, whereas ASCO-CBS is calculated preferentially using HRs.15 The ESMO-PMCBG also showed improved correlations with RMST difference, median survival difference, and HRs when the tail-of-curve bonus was included in the framework score. Overall, our empirical findings suggest that neither framework produces absolute measures of survival benefit.
Unlike RMST, which considers the entire survival distribution up to a specified time (ie, the end of follow-up), median survival reflects survival probability only at a particular point and does not adequately capture long-term survival or durable PFS (often termed the tail of the curve).13,16 Seruga et al17 have also shown that absolute benefits measured using snapshot methods (ie, median survival) tend to be larger, more variable, and more dependent on curve shape than area methods such as RMST difference. Therefore, RMST difference is likely a better representation of true absolute benefit than median survival. However, median survival is still the more widely reported absolute survival metric and is thus easier to extract from clinical trials than RMST difference.
The low to moderate correlations observed between the ESMO-PMCBG and RMST difference suggest that the ESMO-PMCBG does not capture survival similarly to RMST difference and is not optimal for measuring absolute clinical benefit despite incorporation of median survival difference. In the case of ASCO-CBS, similar low to moderate correlations were observed with median survival difference and RMST difference. This finding was expected because the ASCO-CBS does not incorporate any measures of absolute clinical benefit in survival efficacy calculations. This difference in incorporating absolute survival benefit has also been shown to be the primary factor associated with divergent scoring between the frameworks.15
In oncology, the value of an intervention is generally defined as clinical benefits achieved per dollar spent.18 When comparing clinical benefits with cost to establish value, benefits should be absolute measures. For instance, in the realm of cost-effectiveness analyses for resource allocation decisions, incremental quality-adjusted life-years are commonly used as a single measure of absolute benefit.19 As a result, incremental quality-adjusted life-years when combined with incremental cost allow for the value comparison of multiple mutually exclusive interventions.19 Based on the intentions outlined in both value frameworks, the CBS should ideally function in a similar manner. However, our empirical results do not show a strong correlation with absolute measures of clinical benefit (Figure 2). Because these frameworks behave more as relative measures, comparing therapies from different trials using these frameworks is not possible, and their applicability is limited to interventions that have been directly compared. Although relative measures have the advantage of being potentially stable across populations with differing prognoses and risks (a limitation of absolute measures and the reason ASCO elected to incorporate only HRs), they tend to overestimate the benefits of an intervention and cannot differentiate between small and large treatment effects.20 Therefore, if the frameworks behave more as relative measures, they may overestimate CBSs when the baseline survival is poor, which is a common scenario in most oncology palliative RCT settings. The ASCO, recognizing this limitation of relative measures, urges caution in interpreting HRs and encourages health care professionals to consider the absolute survival difference at the point of care.1,15 The ASCO believes that its framework should allow patients in the clinical setting to individually decide the magnitude of absolute clinical benefit they would choose for a particular therapy with specific toxic effects.15 However, the ASCO is not explicit as to why absolute survival measures cannot be used instead of relative survival measures in their framework to quantify within-trial differences in benefit between novel therapies and the current standards.
The RMST difference provides a clinically meaningful and readily interpretable summary of evidence that allows cross-trial comparisons, and it can be compared with cost to establish the value of anticancer drugs.13 The RMST-based measures do not rely on model assumptions, and RMST difference has been recommended as a standard measure when the proportional hazards assumption is not valid.21,22 Trinquart et al13 also found that RMST-based measures provide more conservative estimates of treatment effect and are more efficient than HRs, particularly when the number of events is small. Despite these advantages, RMST-based measures are not routinely reported in RCTs. Nevertheless, methods of reconstructing pseudo-individual patient data from survival curves have been validated and may facilitate more frequent use of RMST-based measures.11
Although RMST difference is a reasonable measure of absolute clinical benefit, a limitation of its use is the ability of the value framework user to independently calculate RMST differences from clinical trial data. The RMST calculations can be time consuming because pseudo-individual patient data often need to be reconstructed first. Limitations associated with the individual patient data–reconstructing algorithms, such as potential difficulty in accurately digitizing survival curves,11 can also affect the accuracy of the final RMST values calculated. To minimize these limitations, we recommend that RCTs routinely report RMST-based measures or that value framework developers take on the task of calculating the RMSTs of practice-changing RCTs to improve framework usability. With RCT investigators having direct access to individual patient data without the necessity of time-consuming reconstruction methods, RMST should be relatively straightforward to calculate. In addition, the ESMO has recently established a database with scores for various therapies. If such efforts to routinely publish scores continue, RMST differences as a potential component of these frameworks could realistically be calculated and reported by the framework developers.
The ASCO-VF and ESMO-MCBS do not appear to possess the measurement characteristics of an absolute measure of survival benefit, limiting their ability to achieve their intentions. Therefore, the current versions of these frameworks may not be ideal for comparing clinical benefit across different drugs or combining clinical benefit with cost to establish value. In fact, the authors of the ASCO framework have recognized that an ideal scale should take into account relative and absolute benefit gains.15 With this acknowledgment, the importance of including absolute measures is evident, and further rationale by the framework developers on how relative measures contribute to the intent of the frameworks would be helpful.
Given that RMST difference is an obvious direct measure of absolute clinical benefit, the developers of both frameworks may consider incorporating RMST difference into the survival efficacy components of their algorithms to strengthen their measurement characteristics within the frameworks’ intentions. We recommend that RCTs routinely report RMST-based measures of treatment effects in addition to the other commonly reported measures when evaluating time-to-event outcomes to allow readers to fully appreciate the true absolute magnitude of the observed survival benefits.
Accepted for Publication: February 26, 2019.
Published Online: May 16, 2019. doi:10.1001/jamaoncol.2019.0818
Correction: This article was corrected on July 11, 2019 to remove the incorrect label of “meta-analysis” from the subtitle and Key Points.
Corresponding Author: Kelvin K. W. Chan, MD, MSc, PhD, Odette Cancer Centre, Sunnybrook Health Sciences Centre, 2075 Bayview Ave, Toronto, ON M4N 3M5, Canada (firstname.lastname@example.org).
Author Contributions: Mr Saluja and Dr Chan had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Cheung, Chan.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Saluja, Cheung, Chan.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Saluja, Everest, Cheng.
Administrative, technical, or material support: Saluja, Cheung.
Supervision: Cheung, Chan.
Conflict of Interest Disclosures: None reported.
Additional Information: The Canadian Centre for Applied Research in Cancer Control is funded by grant 2015-703549 from the Canadian Cancer Society Research Institute.
et al. Do the American Society of Clinical Oncology Value Framework and the European Society of Medical Oncology Magnitude of Clinical Benefit Scale measure the same construct of clinical benefit? J Clin Oncol
. 2017;35(24):2764-2771. doi:10.1200/JCO.2016.71.6894PubMedGoogle ScholarCrossref
et al. A standardised, generic, validated approach to stratify the magnitude of clinical benefit that can be anticipated from anti-cancer therapies: the European Society for Medical Oncology Magnitude of Clinical Benefit Scale (ESMO-MCBS). Ann Oncol
. 2015;26(8):1547-1573. doi:10.1093/annonc/mdv249PubMedGoogle ScholarCrossref
R. Clinical benefit, price and approval characteristics of FDA-approved new drugs for treating advanced solid cancer, 2000-2015. Ann Oncol
. 2017;28(5):1111-1116. doi:10.1093/annonc/mdx053PubMedGoogle ScholarCrossref
R. Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials. J Clin Oncol
. 2016;34(15):1813-1819. doi:10.1200/JCO.2015.64.2488PubMedGoogle ScholarCrossref
MM. Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J
. 2012;24(3):69-71.PubMedGoogle Scholar
NI, de Vries
et al. Comparative assessment of clinical benefit using the ESMO-Magnitude of Clinical Benefit Scale version 1.1 and the ASCO Value Framework Net Health Benefit Score. J Clin Oncol
. 2019;37(4):336-349. doi:10.1200/JCO.18.00729PubMedGoogle ScholarCrossref
et al. Better guidelines for better care: accounting for multimorbidity in clinical guidelines—structured examination of exemplar guidelines and health economic modelling. Health Services and Delivery Research
. 2017;5(16):51-71. doi:10.3310/HSDR05160
MKB. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med
. 2011;30(19):2409-2421. doi:10.1002/sim.4274PubMedGoogle ScholarCrossref