Evaluation of Continuous Tumor-Size–Based End Points as Surrogates for Overall Survival in Randomized Clinical Trials in Metastatic Colorectal Cancer

Key Points Question Can end points based on the kinetics of tumor size after treatment be used as surrogates for overall survival in metastatic colorectal cancer? Findings In this pooled analysis of data from 20 randomized clinical trials, time to nadir and depth of nadir were modeled and assessed as potential surrogates for overall survival at the patient and trial levels. The associations found were weak or moderate; there were notable differences in tumor-size kinetics between antiangiogenic agents and anti–epidermal growth factor receptor agents. Meaning The implications of these results for early drug development and clinical practice are unclear and warrant further studies; the findings of this study reinforce the need to develop more reliable end points that reflect tumor biology and patient benefit.


Introduction
The availability of active treatments for use in subsequent lines have called into question the use of overall survival (OS) as a primary end point in phase 3 trials on first-line therapy for metastatic colorectal cancer (mCRC). 1 As a result, there has been a long-standing interest in developing and validating surrogate end points for OS in this setting. 2,3 Such validation requires demonstration of a strong association between the surrogate and the final end point at the patient level (ie, patients with improvements in the surrogate end point also tend to have improvements in the final end point) and a strong association between the treatment effects on the surrogate end point and the final end point (the trial-level association). 4 Tumor-size-based end points have generated interest in the search for early treatment end points in mCRC. [5][6][7][8][9] These end points may be categorical or continuous and, among the latter type, the end point receiving the most attention has been the depth of response, defined as the maximum percent tumor shrinkage during treatment. In work published in abstract form, the depth of response was found to be associated with OS at the patient level in first-line cetuximab-based therapy. 10 That study was based on 2 randomized trials and did not assess the trial-level surrogacy. To obtain a more in-depth view of this question, we assessed the individualand trial-level surrogacy for OS of 2 continuous tumor-size-based end points in first-line treatment of mCRC.
Tumor measurements consisted of the longest diameters of target lesions, used in the original trials according to the Response Evaluation Criteria in Solid Tumors (RECIST) guideline, version 1.1. 32 Eight trials involved only chemotherapy; of the 12 trials that had at least 1 biological agent, 6 evaluated antiangiogenic (anti-ANG) agents as the only biological, 4 investigated an anti-epidermal growth factor receptor (anti-EGFR) agent as the only biological, and 2 trials had both an anti-ANG and anti-EGFR agent. The analysis was based on comparisons between 2 arms (henceforth termed contrasts) nested within trials, with control and experimental arms defined according to historical evolution. An exception to this rule was made for HORIZON III, 24 for which the cediranib arm was considered as control to have bevacizumab as the uniform experimental intervention for anti-ANG agents (Table 1). For 8 trials with more than 2 arms, each experimental arm was compared with a control arm created by randomly splitting the set of patients originally randomized to the control arm. This procedure was applied to avoid including each patient twice in the analysis, which would artificially induce a correlation that would confound the associations under investigation.

Statistical Analysis
Target lesions measured up to 24 months after randomization were used, as 98% of the available postbaseline measurements were made within 24 months. Individual trials had tumor-assessment schedules that varied between 6 and 12 weeks, but this variation does not influence the models used here. Overall survival was defined as the time from randomization to death from any cause, with censoring of data from patients who were alive at the last contact date. Separate analyses were conducted for chemotherapy-only contrasts, anti-ANG-agent contrasts, and anti-EGFR-agent contrasts. Because KRAS (OMIM *190070) is a predictive biomarker for anti-EGFR treatment, only patients with wild-type KRAS were considered in contrasts evaluating the effects of such treatments.
For trials of different treatment sequences, only contrasts for which the 2 arms testing different regimens at the beginning of the treatment sequence were analyzed. For the Bolus, Infusional, or Capecitabine With Camptosar-Celecoxib trial, 12 treatment arms with celecoxib were not analyzed. Tumor-size measurements (the sum of all target lesions) were modeled using the relative tumor-size change (RTSC) vs baseline, defined (for time t) as follows: RTSC(t) = (tumor size at time t -tumor size at baseline) / (tumor size at baseline).
Repeated values of RTSC and the time to death were analyzed in joint models. 33,34 In particular, RTSC measures were analyzed by linear mixed-effects models with contrast-specific fixed and random linear and square-root time effects. Overall survival was analyzed by proportional hazards models that included the random effects from the RTSC model to account for the association between RTSC and survival time. Based on the joint models, treatment effects on RTSC and OS were estimated. For OS, the effects were estimated using the natural logarithm of the hazard ratio (HR) obtained from the proportional hazards model (logHR). For RTSC, the outcomes were defined based on the mean treatment-specific time profiles estimated using the linear mixed-effects model. In particular, for each profile, the nadir (ie, the local minimum RTSC value) was obtained, together with the time at which the nadir took place. Treatment effects were then defined in terms of differences in time to nadir and differences in depth of nadir; the latter variable is analogous to depth of response but is estimated from the model rather than coming directly from patient data. Figure 1 illustrates  To assess the validity of time to nadir and depth of nadir as surrogates for OS, we applied the correlation approach. 33 Specifically, a linear regression was fitted to the estimated pairs of treatment effects on time to nadir or depth of nadir and OS. The regression was weighted by the contrastspecific sample size. The coefficient of determination (R 2 ) was used to quantify the strength of association at the trial level between the treatment effects on time to nadir or depth of nadir and OS.

JAMA Network Open | Oncology
An R 2 value greater than 0.75 was considered an indicator of good surrogacy. 35,36 We also quantified the strength of association at the individual level between RTSC and OS. With this aim, we measured the correlation between the individual random effects included in the linear mixed-effects model for RTSC and the proportional hazards model for OS using a correlation coefficient, denoted by R(t). 33 This correlation coefficient is a time-dependent measure, since the association between RTSC and the death process can be defined relative to any time over the course of tumor-size measurements.
In the analysis, 2-sided 95% CIs were used. Analyses were conducted with SAS, version 9.4 (SAS Institute Inc) and Stata, version 13.1 (StataCorp LLC).

Chemotherapy Alone
There were 6224 patients in the ARCAD database enrolled in 9 trials eligible for this analysis (8 trials involving only chemotherapy and 1 trial that included bevacizumab but provided chemotherapyalone contrasts). After excluding patients without any tumor-size information or with tumor-size measurements available only more than 24 months after randomization, 4289 patients (68.9%) could be analyzed (Table 1). Such patients were grouped in 14 contrasts, with the median follow-up per trial ranging from 14 to 128 months. eFigure 1A in the Supplement presents the Kaplan-Meier OS curves for these 14 contrasts, with the corresponding HRs presented in Table 2.

Anti-ANG Agents
For anti-ANG agent contrasts, data on 5390 patients enrolled in 6 trials were available for analysis.
After excluding patients with no tumor-size information or with tumor-size measurements available only more than 24 months after randomization, 4854 (90.1%) of the patients could be analyzed (Table 1). Eleven contrasts could be formed, with median follow-up in each trial ranging from 14 to 31 months. eFigure 1B in the Supplement shows the OS curves for each of these contrasts, and the corresponding HRs are presented in Table 2. eFigure 2B in the Supplement presents the longitudinal RTSC profiles for these contrasts, and the corresponding estimates of treatment effects on time to nadir and on depth of nadir are presented in Table 2. All effects on time to nadir were positive, suggesting that the nadir for the experimental treatments took place later than for the control treatments. At the same time, all but 2 (for HORIZON III A and N016966 C) effects on depth of nadir  were negative, suggesting that the experimental treatments led to a larger relative reduction in tumor size than the control treatments. This finding reflects that the RTSC profiles for the control arms exhibited a higher curvature than the profiles for the experimental arms (eFigure 2B in the Supplement).

Anti-EGFR Agents
Of 3081 eligible patients enrolled in 6 trials involving anti-EGFR agents, 2684 patients (87.1%) could be analyzed after excluding those without any tumor-size information or with tumor-size measurements available only more than 24 months after randomization (  Table 2).
The associations between treatment effects are depicted in Figure 2E  suggesting that RTSC provided little information on a patient's OS.

Discussion
Given the continuum of care in mCRC, it becomes increasingly difficult to demonstrate gains in OS in first-line treatment trials. This difficulty has heightened interest in alternative strategies, such as adaptive designs 37 and the use of surrogate end points, including those based on tumor measurements. The latter approach is contrary to the key finding from the present study that neither time to nadir nor depth of nadir can be considered a valid surrogate for OS using contemporary regimens for first-line therapy of mCRC. At best, time to nadir appears to display a moderate association with OS at the trial level with chemotherapy alone or combined with an anti-ANG agent, while depth of nadir appears to display a weak association with OS in all treatment classes. Another finding from this study is the apparent difference between the response kinetics of regimens that include an anti-ANG agent and those that involve an anti-EGFR agent.
The difference in tumor-growth kinetics between anti-ANG and anti-EGFR agents may warrant further exploration. Data presented in Table 2 and eFigure 2 in the Supplement suggest that the addition of an anti-ANG agent to chemotherapy is associated with a later, although not often deeper, nadir. Conversely, the addition of an anti-EGFR agent often produces a deeper nadir, with lessconclusive results about its timing of occurrence. These exploratory observations are based on a relatively small number of contrasts, but they may support the clinical impression that the addition of an anti-EGFR agent produces a larger influence on the depth of responses than the addition of an anti-ANG agent. Albeit subject to bias owing to the above-mentioned reasons, the often-divergent slopes after nadir between control and experimental arms as shown in eFigure 2 in the Supplement suggest that the tumor-growth kinetics with both classes of agents are not marked by a rebound effect after progression. The differences in tumor-growth kinetics among different classes of agents are also reflected on the individual-level associations between the RTSC and OS processes. For chemotherapy, it seems that RTSC may provide a strong prediction of a patient's survival. For anti-ANG agents, a strong correlation might be inferred after the initial half-year of treatment.

JAMA Network Open | Oncology
However, for anti-EGFR agents, the correlation appeared to be weak. These individual-level estimates depend largely on the form of the models applied and should be interpreted with caution.

Strengths and Limitations
Strengths of this study are the large sample size and representativeness in terms of contemporary This study has limitations. The chief limitation of this study is the absence of tumor measurements for all patients, which is a potential source of bias through exclusion of individuals with features that may differ systematically from those of included patients. Likewise, extended RAS testing was not available at the time that these trials were conducted, leading to a predictably small percentage of patients being falsely considered as having wild-type tumors. Moreover, no data were available on tumor sidedness or other potential prognostic or predictive molecular markers, such as the status of microsatellite instability, BRAF, or HER2. Limitations also apply to the model building, which is affected by the absence of postprogression measurements. Moreover, if progression is due to new lesions before the sum of target lesions has reached the nadir, there is increased uncertainty in the estimation of time to nadir and depth of nadir. Also, new lesions could not be included in the definition of RTSC, because the size of such lesions was not reported. In addition, the strength of the association between treatment effects on time to nadir or depth of nadir and on OS was assessed by using a linear regression model weighted by the sample size to account for the uncertainty in the estimated treatment effects. A methodologically more appropriate approach would be to take into account estimates of the SEs and correlation of the estimated treatment effects. 39 However, obtaining such estimates for the joint model used in our analysis was not possible, because the model was fitted by using the expectation-maximization algorithm.

Conclusions
Neither time to nadir nor depth of nadir appears to be an acceptable surrogate for OS. These findings are not surprising, given the weak trial-level association between conventional response rates and OS in mCRC, despite their association with OS at the patient level, both in mCRC and advanced breast cancer. 40,41 This distinction indicates that achieving response may convey prognostic information for patients in clinical practice, but at the same time suggests that response-based end points cannot replace OS in clinical trials. In none of the treatment classes analyzed was the association between treatment effects strong enough to warrant reasonable precision of the prediction of the treatment effect on OS from the effect on time to nadir or depth of nadir. Such a reasonable precision of the prediction is currently considered the key requirement for a surrogate end point. 38 Nevertheless, at