We scored strength of trial-level correlation according to a modification to surrogate criteria proposed by the Institute of Quality and Efficiency in Health Care34: low correlation (r ≤ 0.7), medium strength correlation (r > 0.7 to r < 0.85), and high correlation (r ≥ 0.85).
eTable. Setting and tumor type addressed by treatment level surrogate meta-analyses
Prasad V, Kim C, Burotto M, Vandross A. The Strength of Association Between Surrogate End Points and Survival in OncologyA Systematic Review of Trial-Level Meta-analyses. JAMA Intern Med. 2015;175(8):1389-1398. doi:10.1001/jamainternmed.2015.2829
Copyright 2015 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
The strength of association between surrogate end points and survival in oncology is important to understand because surrogate end points are frequently used in oncology clinical trials, supporting US Food and Drug Administration approvals and National Comprehensive Cancer Network guideline recommendations.
To identify and evaluate trial-level meta-analyses of randomized clinical trials quantifying the association between a surrogate end point and overall survival in medical oncology. Trial-level correlations test whether treatments that improve the surrogate end point also improve the final end point and are widely considered the strongest evidence to validate a surrogate end point.
Our literature search was built on earlier reported data sets and updated with Google Scholar and MEDLINE searches conducted on December 26, 2014. For MEDLINE, search terms included (“regression” or “correlation”) and “surrogate” and “end point [or endpoint]” and (“oncology” or “cancer”). For Google scholar, search terms included (“regression” or “correlation”) and “surrogate end point [or endpoint]” and “overall survival” and “trial level.” A total of 108 abstracts were retrieved, and 62 articles were read in full in addition to articles identified through prior reviews.
We found 36 articles in which 65 specific correlations between a surrogate end point and survival were identified. Surrogate end points were studied in the neoadjuvant, adjuvant, locally advanced, and metastatic settings. The most common sources for trials included in the 36 articles were systematic reviews of the published literature (10 of 36; 28%), and published literature and meeting abstracts (14 of 36; 39%). Four meta-analyses (11%) used a convenience sample, and only 5 studies (14%) attempted to include unpublished trials by surveying clinical trial registries. Among these 5 studies, only 352 of 684 eligible trials (51.1%) were included in the analyses. More than half of reported correlations (34 of 65; 52%) were of low strength (r ≤ 0.7). Approximately a quarter (16 of 65; 25%) were of medium strength (r > 0.7 to r < 0.85), and 15 of 65 (23%) were highly correlated (r ≥ 0.85) with survival.
Conclusions and Relevance
Most trial-level validation studies of surrogate end points in oncology find low correlations with survival. All validation studies use only a subset of available trials. The evidence supporting the use of surrogate end points in oncology is limited.
Although there is growing recognition that adopting new medical practices based on improvements in surrogate outcomes can lead to misleading conclusions,1- 6 surrogate end points continue to play a prominent role in oncology.7 In the United States, most new cancer drugs are approved based on surrogate measures,8 such as response rate (RR) and progression-free survival (PFS), through the US Food and Drug Administration’s (FDA) Accelerated Approval pathway. In addition, clinical practice guidelines often expand treatment recommendations for approved drugs based on studies assessing surrogates. For example, the drug carfilzomib (Kyprolis; Onxy Pharmaceuticals) received FDA approval in 2012 based on RR for relapsed and/or refractory multiple myeloma. In 2013, the National Comprehensive Cancer Network (NCCN) expanded the drug’s indication to include untreated myeloma—based on RR in 2 small, uncontrolled phase 1/2 studies9,10 (category 2A).11 Notably, NCCN endorsement obliges many commercial insurers and Medicare to reimburse the therapy.12- 14 Thus, surrogate end points are used both to introduce new cancer drugs and to promote new uses for the drugs we already have.
The use of surrogate end points in oncology has led to use of toxic drugs that do not improve survival. For example, bevacizumab (Avastin; Roche/Genentech) gained FDA accelerated approval for metastatic breast cancer in 2008 based on evidence that it could improve PFS.15 In 2011, that approval was withdrawn when multiple randomized trials confirmed that bevacizumab did not improve overall survival and that gains in PFS were more modest than those seen in the earlier trial.16 Nevertheless, Medicare and other insurers are still obliged to pay for bevacizumab because it remains endorsed by the NCCN for this indication.14
For surrogates such as PFS or tumor RR to be reliable, it is important that they are validated. Several authors have described statistical methods to validate surrogates,17- 21 but the framework that uses a hierarchy (Figure 1) is the most useful.22 In this model, level 3 is surrogacy based on biological plausibility alone. Level 2 occurs when a strong correlation exists between the surrogate and the final end point across cohorts or at the level of the individual patient. The highest level is trial-level surrogacy, meaning that across many trials treatments that improve the surrogate end point also improve the final outcome. Trial-level surrogacy directly asks the question that faces regulators and guideline writers: “If a drug improves a surrogate, will it also improve survival?” Others have noted that individual patient correlations do not directly validate surrogate measures.23- 25
Establishing trial-level (level 1) surrogacy requires a meta-analysis of all randomized trials on a question of interest, with each trial serving as a unique data point. The x coordinate is typically the improvement in the surrogate when a drug is used, and the y coordinate is the improvement in the final outcome. Regression analysis is typically performed to assess what correlation exists between the change in the surrogate end point and the change in overall survival, across the randomized studies. The correlation coefficient (r) for the analysis, ranging from 0 to 1, provides a measure of the strength of the surrogate-survival correlation.
Two general medical examples can illustrate this approach. Consider the question of hypertension (the surrogate), blood pressure control (the intervention), and adverse cardiovascular end points (the final outcome). A level 2 analysis would show that groups of patients with higher average blood pressure have worse cardiovascular outcomes or that this correlation is observed at an individual patient level. A level 1 or trial-level analysis would show that across many trials, drugs that lowered blood pressure also decreased cardiovascular events. If a drug lowered blood pressure by 10%, and another lowered it by 20%, we would “validate” hypertension as a surrogate if there were fewer cardiovascular events for the drug with greater blood pressure reduction. Looking at many blood pressure trials, one could perform regression analysis and draw a general conclusion about the relationship between blood pressure reduction and adverse outcomes. In fact, this analysis has been performed.17
The case of premature ventricular contractions (PVCs) postmyocardial infarction illustrates the importance of level 1 analysis. Although studies showed a correlation between frequent PVCs and death26 (level 2), and certain anti-arrhythmic drugs reduce PVCs, it did not follow that anti-arrhythmic drugs (level 1) that lowered PVCs reduced mortality. In fact, a randomized clinical trial showed that these drugs actually increased mortality.27 Thus, only a level 1 analysis asks the right question: “Can we can trust a reduction in PVCs as a reliable surrogate for improved survival in pharmacologic trials?” Because oncology typically has dozens of trials examining both surrogate end points and survival in many clinical situations, performing level 1 analyses is more straightforward than in many other medical fields. For these reasons, we only examined level 1 studies.
We sought to examine all level 1 studies aiming to validate a surrogate in oncology. Specifically, we asked what percentage of these meta-analyses used data from published and unpublished studies. We also ascertained the median number of studies used to establish surrogacy in these studies and the reported correlation coefficients (r).
We identified meta-analyses of randomized clinical trials quantifying the association between a surrogate end point—RR, pathologic complete response, locoregional control, disease-free survival, event-free survival, time to progression, and/or PFS—and overall survival (OS) in medical oncology. As such, we excluded studies of radiation therapies, surgery, procedures, or supportive measures.
We sought only studies that examined trial-level correlations in meta-analyses of randomized clinical trials—ie, level 1 surrogate studies. We placed no restriction on the setting of medical therapy; thus, we include analyses in the neoadjuvant, adjuvant, locally advanced, and metastatic settings.
We excluded all analyses that did not examine trial-level (level 1) surrogacy. Specifically, we excluded analyses that only studied individual patient level correlations or those that did not examine the difference between investigational and control arms, considering each trial arm as a unique data point (level 2). We excluded conference abstracts, letters to the editor, and descriptive reviews. Finally, we excluded meta-analyses that used each treating center or nation (rather than each trial) as a unique data point. We were not convinced that such analyses accurately capture the question of our study because the individual data points in such analyses are no longer independent (comparisons are both within and across trial) and may spuriously inflate trial-level correlation. Nevertheless, including these studies20,28- 30 would not have changed our results.
We constructed our data set by updating prior systematic reviews. An analysis by Sherril et al,31 conducted in October 2010, systematically identified all studies prior to that date that studied surrogate end points in oncology. Over 1000 articles were screened in that investigation, and all articles assessing the relationship between a surrogate end point and overall survival were listed. We then evaluated this list to ensure that the articles performed trial-level correlation—all other analyses were discarded. We also drew on a prior report commissioned for the National Institutes of Clinical Excellence, which identified meta-analyses of surrogate end points in malignant solid tumors.32,33 Again, only trial-level analyses were selected from this set.
We built on this data set, updating it for the last 4 years. Specifically, Google Scholar and MEDLINE were searched with the following search terms on December 26, 2014: For MEDLINE, search terms included (“regression” or “correlation”) and “surrogate” and “end point [or endpoint]” and (“oncology” or “cancer”). For Google scholar, search terms included (“regression” or “correlation”) and “surrogate end point [or endpoint]” and “overall survival” and “trial level.” A total of 108 abstracts were retrieved, and 62 articles were read in full.
Descriptive statistics were ascertained for the following end points: the percentage of meta-analyses that used data from published and unpublished studies, the number of individual randomized trials included in each surrogate meta-analyses, and the reported correlation coefficients. We considered convenience sample to mean a set of trials that the authors were able to obtain readily. We only credited authors with performing a search of the published literature, meeting abstracts, and/or unpublished trials if they performed a systematic search. We scored strength of trial-level correlation according to a modification to surrogate criteria proposed by the Institute of Quality and Efficiency in Health Care34: low correlation (r ≤ 0.7), medium strength correlation (r > 0.7 to r < 0.85), and high correlation (r ≥ 0.85). The specific cut points were adapted to function even when confidence intervals were not presented. Finally, for meta-analyses using published trials, abstracts, and registered studies, we investigated what percentage of eligible studies were ultimately included in analyses.
We identified 36 articles20,23,25,30,35- 68 that met our inclusion criteria and were considered trial-level or level 1 analyses. Claims of surrogacy were examined for 19 distinct clinical questions in the neoadjuvant, adjuvant, metastatic, and locally advanced settings (eTable in the Supplement).
The sources of randomized clinical trials included in the 36 articles most commonly were systematic reviews of the published literature (10 of 36; 28%) and published literature and meeting abstracts (14 of 36; 39%). Four meta-analyses (11%) used a convenience sample, and only 5 studies (14%) attempted to include unpublished trials by surveying clinical trial registries. Figure 2 summarizes search strategies.
Across the 36 studies, we noted 65 unique trial-level analyses. The median number of individual randomized clinical trials used in these analyses was 21, with as few as 9 and as many as 191 trials. For each analysis, we documented the study characteristics and the correlation coefficient (r) for all trial-level analyses (Table 1). There were 65 trial-level correlations scored according to criteria proposed by the Institute of Quality and Efficiency in Health Care (Figure 3).34 More than half of trial-level correlations (34 of 65; 52%) were of low strength (r ≤ 0.7). Approximately a quarter (16 of 65; 25%) were of medium strength (r > 0.7 to r < 0.85), and 15 of 65 (23%) were highly correlated (r ≥ 0.85) with survival.
We further examined the 5 studies39,41,48,66,68 that analyzed both published and unpublished trials. Although these articles searched the broadest possible collection of trials, they were unable to obtain data from all relevant studies. Table 2 lists the numbers of eligible and included trials for each of these meta-analyses. Notably, only 352 of 684 eligible studies (51.1%) were ultimately included. Reasons for omission included missing or lost data,41,74 missing information on progression,68 or missing median PFS or median OS reports.48,66
We also examined the 15 correlations of high strength. Six occurred in the adjuvant setting,23,25,38- 41 6 in the metastatic setting,41,47,52,55,62,67 and 3 in the locally advanced setting.40,41 The median number of individual randomized trials included in these analyses was 14. Notably, 3 high correlations occurred in the same setting—adjuvant colorectal cancer—and 4 others occurred in settings where multiple other studies had found lesser correlations, specifically: metastatic breast47 and colon cancer52,55,58 (Table 1). In both these settings, the largest analyses conducted to date found low correlation.48,50
Most trial-level meta-analyses in oncology found low correlation between a surrogate end point and overall survival, and all were based on only a subset of clinical trial evidence. Our findings call into question the widespread use of surrogate end points in oncology as a basis for treatment decisions.
Trial-level or level 1 meta-analyses are the highest form of evidence that surrogate end points can predict efficacy regarding final outcomes, such as overall survival, for new therapies or new combinations (Figure 1)22,23,75 The majority of these claims resulted in correlations of low (52%) or medium strength (25%) (Figure 3), which are generally considered insufficient evidence on which to base clinical or regulatory decisions.34
We were unable to identify any analyses that looked at all randomized trials on a topic. Most analyses relied on published articles (28%) or articles and meeting abstracts (39%), and only 5 authors (14%) sought unpublished trials. Even when authors were able to identify published and unpublished studies, they were unable to obtain data from all eligible trials. Only 51.5% of eligible studies were ultimately included in even the most rigorous meta-analyses (Table 2). The major barriers to including more studies that we noted were that primary investigators did not provide data, or primary study publications did not include information that the meta-analysts needed, such as the rates of surrogate improvement or improvement in overall survival.
Our findings suggest that most correlations of surrogacy in oncology are based on only a subset of potentially informative trials. Unpublished trials (and those that do not report data that meta-analysts require) may have poorer correlations than those that are published, and such discrepancies may contribute to reluctance among sponsors and authors to submit those findings to journals and conferences and to a poorer likelihood of acceptance.
The use of surrogate end points in oncology, particularly PFS, has grown in recent years and it is frequently the primary end point of cancer clinical trials and the basis for regulatory approval of novel agents.8,76- 78 Surrogate end points are also used as the basis of many NCCN recommendations, which obliges many insurers and Medicare to cover those drugs for those indications.12- 14 Our analysis confirms that the use of this end point occurs beyond the specific settings in which it has been validated.76 Simply because a surrogate is measureable does not make it predictive or meaningful,76 and our analysis builds on a prior study that concludes that the association between surrogates and survival in oncology is generally poor.33
High-profile medical reversals, such as the case of bevacizumab in breast cancer, emphasize the dangers of excessive reliance on unvalidated surrogates.79,80 Although bevacizumab was approved for breast cancer on the basis of an improvement in PFS,15,16 this surrogate-survival correlation is not well supported. We identified 8 meta-analyses examining whether gains in PFS predict overall survival in metastatic breast cancer. Six reported low correlation; 1 reported medium correlation; and only 1 reported a strong correlation (Table 1). Despite this evidence, in 2012 another drug, everolimus, was approved for metastatic breast cancer based on improvement in PFS.81 In 2014, updated follow-up from the everolimus study did not find an overall survival benefit.82 In 2015, palbociclib received accelerated approval for metastatic breast cancer on the basis of improvement in PFS. Again, there is no demonstration of an overall survival advantage.83 Both palbociclib and everolimus remain on the market.
In 2007, liposomal doxorubicin received FDA approval for treatment of multiple myeloma based on a delay in time to progression when given in combination with bortezomib compared with bortezomib alone.84 In our investigation, we were unable to identify any analysis validating this surrogate and survival in multiple myeloma, and an updated analysis of liposomal doxorubicin confirms that the drug confers no survival benefit.85 Liposomal doxorubicin remains FDA approved.
The FDA’s use of pathologic complete response as grounds for accelerated approval in the neoadjuvant setting (the basis for the approval of pertuzumab [Perjeta; Roche/Genentech]) also appears to run counter to both meta-analyses we identified. Both meta-analyses on this topic found that improvements in pathologic complete response were poorly correlated with subsequent event-free survival and OS35,36 (Table 1).
There are several limitations to our study. First, the specific thresholds we used to grade the strength of correlation were adapted from a guidance document but have not been externally validated and are subject to disagreement. Although we believe that few would dispute that a correlation coefficient (r) below 0.7 is weak support for a surrogate end point, some may be critical of the thresholds for medium and high strength. For this reason, we provide all coefficients in Table 1 to permit alternative analyses.
Second, we cannot say whether correlations will improve or weaken based on a more comprehensive assessment of clinical trials. However, prior research has found that unpublished trials are different than published studies.86- 91 Just as meta-analyses on specific clinical topics may change with consideration of published and unpublished evidence,92,93 the strength of surrogacy may also change with a more comprehensive collection of data. However, this is an assumption and should be treated as such. Unfortunately, we were unable to identify even 1 study that considered the totality of the evidence.
Third, the use of crossover in cancer clinical trials is thought to complicate the role of surrogacy in oncology. Unidirectional crossover from the control or placebo arm of cancer trials to the experimental drug (but not vice versa) is thought to explain why some cancer drugs that improve PFS fail to improve OS. However, crossover is unlikely to affect our analysis because most of the individual randomized trials included in the surrogate validation studies we examined occurred prior to the widespread use of crossover.
Moreover, there are several reasons to question the common narrative regarding crossover. First, effective cancer drugs can show overall survival benefits despite crossover.94- 96 Second, favorable PFS but negative OS results seen with everolimus,97 palbociclib,83 and bevacizumab98 in metastatic breast cancer occurred in the absence of crossover. And finally, there are several alternative interpretations for the effect of crossover, including that crossover can hide the harms of a drug or a mask a lack of clinical benefit.99,100 For these reasons, we do not believe that crossover alters our findings.
Finally, a recent study suggests that approving new drugs based on surrogates (even inaccurate ones) may provide a greater benefit for society than waiting for OS data.101 The authors found favorable societal benefits for approving non–small-cell lung cancer (NSCLC) drugs with PFS benefits of 3 months or more. This occurred despite the fact that the authors noted a poor correlation (as we also did in Table 1) between net mean PFS and OS in NSCLC cancer (r = 0.56). The major limitation of this analysis, like other modeling studies, is that it rests on a number of assumptions. For instance, the authors likely overestimate the delay between PFS and OS results by using date of publication rather than the data cutoff date. Also the authors draw broad conclusions from a very small set of trials, ie, the few NSCLC trials with greater than 3-month PFS gains from an overall set of 27 trials. Finally, the model does not account for the wider unintended consequences of approval based on weak surrogates, which may encourage the pharmaceutical industry to pursue even more cancer drugs with marginal to no benefits.102
Most trial-level validation studies of surrogate end points in oncology find low or medium strength correlations with overall survival. All validation studies use only a subset of available trials. The evidence that surrogate end points predict overall survival in oncology is limited.
Accepted for Publication: April 11, 2015.
Corresponding Author: Vinay Prasad, MD, MPH, National Cancer Institute, National Institutes of Health, 10 Center Dr, 10/12N226, Bethesda, MD 20892 (firstname.lastname@example.org).
Published Online: June 22, 2015. doi:10.1001/jamainternmed.2015.2829.
Author Contributions: Dr Prasad had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Prasad.
Acquisition, analysis, or interpretation of data: Prasad, Kim, Burotto, Vandross.
Drafting of the manuscript: Prasad, Kim.
Critical revision of the manuscript for important intellectual content: Prasad, Kim, Burotto, Vandross.
Statistical analysis: Prasad, Kim.
Administrative, technical, or material support: Prasad, Burotto, Vandross.
Study supervision: Prasad.
Conflict of Interest Disclosures: None reported.
Disclaimer: The views and opinions of Drs Prasad, Kim, and Burotto do not reflect those of the National Cancer Institute.