Development and Assessment of a Model for Predicting Individualized Outcomes in Patients With Oropharyngeal Cancer

Key Points Question Can a model be developed for individualized survival, locoregional recurrence, and distant metastasis prognostication for patients newly diagnosed with oropharyngeal cancer, incorporating clinical, oncologic, and imaging data? Findings In this prognostic study, model predictions for 5-year overall survival demonstrated excellent discrimination in cohort study training data for models with and without imaging variables. This model appeared to possess good calibration and well stratified patients in terms of likely outcomes among many competing events. Meaning This prognostic model and web-based application may serve as a useful tool to provide clinicians, researchers, and patients with oropharyngeal cancer robust, individualized prognostic estimations.


eAppendix 1. Multistate Model Specification
In this section, we provide additional details related to the multistate model specification in Figure 1 (reproduced below): Defining Initial States Let G represent a patient's initial (post-treatment) state. We define initial state G to take the following values: G=1: patient is not cured and also not persistent G=2: patient is cured of their primary cancer G=6: patient has persistent disease after treatment The value G takes is not usually known at the time of treatment, but we will suppose that it takes some fixed, unknown value. For some patients, the value of G can be determined based on what happens during follow-up. In our University of Michigan data, persistence status (whether G=6 or not) is not usually known at the time of treatment. However, disease persistence becomes apparent within the first 12 weeks after treatment, either through scans by 12 weeks after treatment to evaluate treatment response or through other indicators during routine follow-up in the first 12 weeks after treatment. For a small number of patients with less than three months clinical follow-up, persistence was defined based on whether or not they died within 6 months, where patients dying within 6 months of diagnosis were assumed to have persistent disease. Therefore, persistence status (G = 6 or G ≠ 6) is known for all patients in our dataset by the end of follow-up. Among the patients determined to be non-persistent (G ≠ 6), the possible values of the initial state are 1 (not cured and not persistent) or 2 (cured). Non-persistent patients with observed recurrence during follow-up are known to have G=1 as their initial state. Additionally, we assume patients followed for at least 72 months without observed recurrence were in the "cured" initial state (G=2). For the remaining 50.2% of patients in our University of Michigan dataset who did not have persistent disease, did not have an observed recurrence, and were followed for recurrence for less than 72 months, initial state G is unknown and can be viewed as missing data. We will call this missingness in initial state G Missingness Type I in eAppendix 3.

Defining Outcome Events
In our model, we consider three types of outcome events (measured in months): locoregional recurrence (without prior metastasis), distant metastasis (without prior locoregional recurrence), and death. Let be the underlying recurrence time (first recurrence of either type), and let be the underlying death time. In practice, these times may not always be observed, and we define to be infinite for cured and persistent patients. In our model structure, therefore, only non-cured and non-persistent patients (G=1) have the potential to have a recurrence event.
We observe a censored version of our target outcome variables. Let be the observed death time or the censoring time for death (whichever is first), and let indicate whether death was observed. Let be the underlying recurrence time (first recurrence of either type) or the censoring time for recurrence (whichever is first). Let indicate whether recurrence was censored ( = 0) or whether a locoregional recurrence ( = 1) or distant metastasis ( = 2) was observed. For some patients (n=12), we know that a recurrence occurred and when it occurred, but we do not know whether = 1 or = 2. We call this missingness in type of recurrence Missingness Type II in eAppendix 3.
For now, we will assume that recurrence and death have the same censoring mechanism (i.e. we follow for recurrence as long as we follow for death for all patients, with = 0 implying = ). In reality, we have some patients (n=248) for whom recurrence is censored at some time 0 < , and for these patients, recurrence status and/or recurrence time is unknown during the time interval [ 0 , ]. We call this missingness due to unequal censoring times for recurrence and death Missingness Type III in eAppendix 3. We describe an imputation strategy we use to handle unequal censoring times for the University of Michigan data later on (in eAppendix 3).

Model Structure
Let represent the set of covariates (e.g. p16 status, T classification, etc.) that will be included in the multistate model. X may have missing values, called Missingness Type IV in eAppendix 3. We will use a different set of covariates for each component model of the multistate model described below, but we will describe the various component models below using the same set of covariates in each model for notational convenience. The multistate model as a whole consists of 8 component models: two logistic regression models related to the initial states and 6 Cox proportional hazards regression models related to the rate of different types of events after treatment (given initial state). First, we model the initial state G using two logistic regression models as follows: Component 1, probability of have persistent disease (G=6 vs. G=1 or 2): ( ( = 6| )) = 0 + Component 2, probability of being non-cured given non-persistent (G=1 vs. G=2 given G≠6) ( ( = 1| , ≠ 6)) = 0 + To describe events after baseline, we construct the multistate model in terms of relationships between different underlying states. Locoregional recurrence, distant metastasis, and death are denoted as states 3, 4, and 5 respectively. Let ( ) represent the hazard of transitioning between states j and k at time t. We model these transition hazards using the following proportional hazards regression model structures: Component 3, locoregional recurrence among non-cured, non-persistent patients (G=1): 1 → 3: 13 ( ) = 13 0 ( ) 13 Component 4, distant metastasis among non-cured, non-persistent patients (G=1): 1 → 4: 14 ( ) = 14 0 ( ) 14 Component 5, death without prior recurrence for non-persistent patients (G=1 or G=2): 1 → 5 2 → 5: 15 ( ) = 15 0 ( ) 15 = 25 ( ) Component 6, death for persistent patients (G=6): 6 → 5: 65 ( ) = 65 0 ( ) where 0 ( ) represents the baseline hazard for the transition between states j and k. Here, we do not include covariate effects in the transition to death among persistent patients (Component 6, states 6→5). This is because this event tends to happen quickly, and we do not have many persistent patients in our dataset to support estimation of covariate effects for that transition. In order to improve our ability to estimate model parameters, we assume that transitions to death without prior recurrence for cured and non-cured non-persistent patients (2→5 and 1→5) are the same (so 15 ( ) = 25 ( ) as in Component 5). This same assumption is common for multistate models incorporating a cured initial state, as discussed in Conlon  When modeling death rates after either locoregional recurrence or distant metastasis, we use a clock-reset approach, where time is reset back to zero upon entering the recurrence state. We assume this model structure because cancer recurrence often triggers additional treatments, effectively starting off a new phase of patient care. These two transitions are modeled as follows: Component 7, death after locoregional recurrence (without or without subsequent metastasis): using Weibull hazards as follows: For Components 3 and 4, standard Weibull baseline hazards fit the data poorly. In response to additional exploration of event rates in the data, we specified the following piecewise Weibull baseline hazards for these transitions:

Inclusion of Covariates in Multistate Model
In applying this model to the head and neck cancer data, we use different sets of covariates in the different component models. Due to the large number of possible predictors, we did not include all predictors in all models and instead chose among the available predictors. For each transition, we chose covariates that we believed to be "important" or relevant to the corresponding model. For example, age, smoking status, and comorbidities would be important for modeling death without prior recurrence. Inclusion of covariates in each component model is summarized in eTable 1 below. In modeling whether or not a patient is persistent (Component 1), we included a small number of covariates, since there are not many persistent patients available to estimate parameters for this model. Additionally, cT classifications 1 through 3 were merged for this transition, since the majority of persistent patients had very high cT. Similarly, moderate and severe comorbidities were merged into a single parameter for the transitions to death after intermediate locoregional recurrence or distant metastasis due to the small number of patients in these categories with observed recurrences. In the model for the rate of metastasis (Component 4, states 1→4), we allow the effect of age to be different before or after age 65. This choice was made based on exploratory analyses that indicated different age associations among patients above and below roughly age 65.
We also consider two different model fits. In the second, we add biomarker information in the form of rECE and MTV. Due to small numbers of persistent and recurrent patients with observed biomarker values, biomarkers rECE and MTV were not included in the models for the probability of having persistent disease and for the models for death with or without prior recurrence.
A summary of the outcome events observed for the University of Michigan and Erasmus Medical Center cohorts can be found in eTable 2.

Prior Distributions
As part of the multistate model structure, we specify prior distributions for each of the model parameters. In eTables 3 and 4, we provide information regarding the prior distributions assumed for this estimation.
The prior distributions for the Weibull baseline hazard parameters were chosen to allow a large range for the scale parameter (N(0,16) priors for log-scales) and were somewhat informative for the shape parameters (Gamma(2.4,0.4) priors). The Gamma hyperparameters were chosen to give a mean near 1, where a value of 1 for the shape parameter implies an exponential distribution for the corresponding event time.
For all covariate associations for Components 1-4, we used assumed wide N(0,4) priors (possibly with truncation to enforce order restrictions) for the log-hazard ratios and log-odds ratios. These fairly uninformative priors allow very strong covariate associations. More informative priors were chosen for Components 7-8. The rationale for this was two-fold: (1) less data was available for estimating these parameters, and more informative priors built in greater model stability and (2) we expect effects of baseline parameters to have weak associations with survival time, conditional on having experienced a recurrence.
Prior distributions for the intercepts of the two logistic regression models were normal distributions with prior means obtained based on preliminary descriptive statistics in the observed data, including the marginal proportion of patients experiencing persistence and the estimated proportion of patients who never experienced a recurrence obtained from Kaplan-Meier plotting of time to recurrence.

Estimation
Model parameters were estimated using a Markov Chain Monte Carlo (MCMC) algorithm, where at each iteration, each parameter was drawn one-by-one from its posterior distribution (a combination of its prior distribution and the likelihood from the multistate model of interest). Since the posterior distribution for each model parameter was complicated, we use the Metropolis-Hastings method to draw each model parameter using a normal random walk proposal distribution. For parameters with order restrictions, this proposal distribution is truncated to satisfy the order restrictions. Variance parameters for these proposal distributions were tuning parameters with values chosen to give somewhere between 20 and 80% acceptance rates and ensure good operating characteristics of the final MCMC parameter draws.
At the end of each MCMC iteration, missing data was filled in through imputation. Details about the missing data imputation strategies can be found in eAppendix 3. The MCMC algorithm was run for a total of 25,000 iterations, with the first 10,000 iterations removed as burn-in. The remaining 15,000 parameter draws were used to obtain posterior means for the parameters of interest. We use these posterior means to obtain individualized predictions as in eAppendix 4. 95% quantiles are obtained from these 15,000 iterations and used to construct 95% credible intervals for model parameters as presented in the main paper. The MCMC algorithm was written as custom software in R. Numerical integration performed as part of the missing data handling was performed using R package cubature (version 2.0.4), and the strategy for handling different censoring times for recurrence and death described in eAppendix 3 involved imputing recurrence times for some patients using the methods in Beesley et al. (2018) and R function uniroot in the stats package (version 3.6.3).

Posterior Means
In the main paper, we provide posterior means for all covariate associations in Components 1-8 of the multistate model. Here, we provide posterior means for baseline hazard parameters along with logistic regression intercepts in eTable 5.

eAppendix 3: Imputation of Missing Data
We have missing data of four types as described in eAppendix 1. We handle each of these four types of missing data within the MCMC estimation algorithm through imputation. This involves drawing values of the missing data. Below, we provide details regarding how this imputation was performed for each type of missing data. In performing this imputation, we define the following variables:  (1) we know ≠ 6 and (2) 0 = 0 and 0 < 72.
We impute initial state G for patients missing G from a Bernoulli distribution with probability:

Missingness Type II: Imputing Cause of Known Recurrence
For some patients, we know that a recurrence occurred and when it occurred, but we don't know which type (locoregional recurrence or distant metastasis). We impute the probability that it is a locoregional recurrence as In handling imputation for Missingness Type III, we will define partially observed outcomes and to be the recurrence outcomes we would have observed if we had followed every patient for recurrence as long as we followed them for death. Relating these theoretical outcomes to observed data, we have

Missingness Type IV: Imputing Missing Covariate Values
For the covariate imputation, we use a chained equations-type approach called substantive model compatible fully conditional specification (SMC-FCS) described in Bartlett et al. (2014). 3 The goal of this imputation strategy is to allow for flexible imputation distributions for each covariate while ensuring that imputation distributions are compatible with the analysis model (i.e., the multistate model). The covariate imputation procedure involves imputing each covariate from its assumed distribution given all the other variables in the dataset. This distribution can be rewritten as a function of the analysis model (the multistate cure model) and a model for that particular covariate given the other covariates (e.g., model for HPV status given age, comorbidities, etc.). Below, we provide additional details regarding how the covariate models were specified. Imputation of the covariates in the outcome model is improved by incorporating other auxiliary covariates, which are also imputed as part of the algorithm. eTable 6 provides descriptives for other covariates incorporated into the imputation algorithm.

eAppendix 4: Obtaining Individualized Predictions
In this section, we describe how we can use the multistate model structure to obtain state occupancy probabilities. These probabilities represent the proportion of patients with a given set of covariate values that would be in different outcome states at time t. We can estimate these probabilities across t to get a sense of the likelihood of different outcome events over time. Below, we provide formulas for calculating the probabilities, which can be viewed as cumulative incidence probabilities. Estimates of these various probabilities given a fixed covariate set are provided in the online web tool.
First, we define the following probabilities: We note that probabilities 3 ( ) and 4 ( ) could have been separated into contributions for cured (G=2) and non-cured (G=1) patients separately. However, it is well-known in cure model literature that the intercept related to the probability of being non-cured (here, the intercept from Component 2 in the multistate model) is the hardest to estimate. Therefore, we have greater confidence in our predictions of the merged survival outcomes without prior recurrence for cured and non-cured initial states than in the probabilities stratified by cure status. This is not a problem when separating out these probabilities for the persistent initial state (G=2 and probabilities 5 ( ) and 6 ( )) because persistence status is observed for all patients, and the corresponding intercept in Component 1 is well-identified.
Overall and Event-Free Survival Using these quantities, we can calculate the overall survival probability at time t as ( ) = 2 ( ) + 2 ( ) + 3 ( ) + 5 ( ) and the event-free probability (with persistence counted as an event) as

eAppendix 5: Model Evaluation in University of Michigan Data
We calculate predictions of 5-year overall survival and 5-year event-free survival using our multistate models for all patients in our University of Michigan dataset. These predictions are based only on covariate information. For patients with missing covariate information, we calculate the prediction for each of 10 imputations of the covariates and average the resulting 5year predictions.
We first compare the predictions for 5-year overall survival obtained from the models with and without imaging variables (MTV and rECE) in eFigure 1. For this figure, we only compare predictions for patients with observed imaging variables. We find that these predictions are generally close together, but there are some patients for whom the predictions differ substantially. In particular, the predicted survival probabilities tend to be lower for the model incorporating imaging data for patients with extracapsular expansion observed via imaging (rECE = 'yes'). Now, we seek to compare our predicted 5-year overall survival and event-free survival probabilities with observed data. One complication with evaluating observed event-free time is that some patients are censored for recurrence prior to censoring for death (Missingness Type III), and artificially censoring our event-free survival outcome accordingly could create bias. As a result, we evaluate our predictions using a single imputation of the recurrence information as discussed in eAppendix 3.
We now focus our attention on the model without the imaging variables. In eFigure 2, we evaluate how well our model-based predictions for 5-year overall survival and 5-year eventfree survival can predict outcomes seen in our data in terms of calibration and discrimination. Unsurprisingly, these plots indicate good model calibration to the data used to fit the models. We see strong discrimination for both outcomes.
Previously, we published an article evaluating existing calculators of 5-year overall survival in the University of Michigan data. This previous analysis used a cohort of N=856, and the current paper considers data from a subset of N=840 of these patients. In developing this new calculator, we did extensive chart review to obtain additional recurrence information and to carefully evaluate persistence status. During this evaluation, several patients were determined not to meet our inclusion criteria, which requires patients to be newly diagnosed with oropharyngeal cancer and excludes patients with a second concurrent cancer, e.g. lung cancer. Here, we compare how our new model-based predictions compare to predictions from those existing calculators in the current cohort of N=840 patients. eFigure 3 and eTable 7 show the results.
We find that our proposed models tend to perform as well or better than existing calculators on our data (based solely on estimated AUC and C-Indices from the current cohort of N=840). The Denmark model also performs well in this cohort, producing a comparable AUC to the model without imaging biomarkers. Of course, we are evaluating our proposed models in the same data used to fit them, so these AUC and C-Index estimates will be overly optimistic.
Generally, our predictions tend to be highly correlated with other well-performing calculators (e.g. RTOG, Erasmus, Denmark) evaluated in Beesley et al. (2019). 4 We also present discrimination of age, 8 th edition cT classification, and 8 th edition overall cancer stage in eTable 7. We find that our model-based predictions provide much better discrimination than the single summary variables.
We use Cox-Snell diagnostic plots to evaluate how well the model predictions S(t) fit our observed data over time t. We calculate the predicted probability S(t) at the event/censoring time for each patient. If the model fits well, 1-S(t) should have an approximately uniform distribution. The last row of eFigure 4 shows the results, which indicate excellent predictions for both outcome events.
In addition to calibration and discrimination, we are also interested in how well our model predictions can stratify patients in terms of observed outcomes, where higher predicted survival probabilities should correspond to better overall survival. In eFigure 5, we calculate the Kaplan-Meier overall survival curve within strata defined by our model-based 5-year overall survival predictions for the models with and without imaging variables. For comparison, we also include a similar plot stratified by cT classification. We find that our 5-year overall survival predictions do a good job stratifying patients in terms of survival risk.

eAppendix 6: Model Evaluation in Erasmus Validation Cohort
In this section, we evaluate how our model-based individualized predictions (model without imaging variables) perform in an independent validation dataset. We evaluate our model using data from 447 patients with oropharyngeal cancer treated at Erasmus MC in The Netherlands. Table 1 provides descriptives of these patients. Investigators at Erasmus MC were not involved in the development of the statistical model. Code for generating model predictions based on baseline covariates was provided to the Erasmus group. Missing data were handled using MICE multiple imputation, and predictions were averaged across 15 imputed datasets.
Results are shown in eFigures 6, 7, and 8. We found generally good calibration properties for 5-year overall survival and event-free survival and reasonable discrimination. Calibration of overall survival estimates before 5 years was poor (eFigure 7-8), with modelpredicted rates of death without prior recurrence being much lower than observed rates in the Erasmus MC data. This may be due to the very different demographics of the Erasmus cohort relative to the UM cohort. 78.3% of the Erasmus cohort reported being a current smoker, compared to only 31.9% of the UM cohort. The overall proportion of patients with positive p16 status was also strongly different between the two cohorts, with at least 50.6% of UM patients being p16 positive, compared to only 18.8% in the Erasmus cohort. These large demographic differences may translate into differing rates of other cause mortality that are not adequately captured by smoking status, age, and ACE27 comorbidities in our model for death without prior recurrence. Additionally, the Erasmus MC cohort is generally older (treatment 2000-2006) compared to the UM cohort (treatment 2003-2016), which may also contribute to differences in death rates without prior recurrence.

eAppendix 7: Characterizing each variable's impact on predictions
Model-based predictions from eAppendix 4 are complicated functions of the model parameters. Figure 2 in the main paper provides point estimates and 95% credible intervals relating to each covariate's association in each component model of the larger multistate model. However, it can be difficult to understand the aggregate impact of each variable on predictions. In this section, we apply regression tree techniques to directly model the relationship between input covariates (age, T classification, smoking status, N classification, anemia status, p16 status, ACE267, rECE, MTV, and gender) on model-based predictions for overall survival and event-free survival. For different time horizons (3, 6, 10, 20 30, 40, 50, and 60 months post-diagnosis), we obtain our predictions for overall and event-free survival for each of the 840 patients. For each time horizon, we estimate the association between covariates and each of the predicted outcomes using random forest modeling using the gbm package in R. The goal of this modeling is to characterize the complicated non-linear relationship between each of the covariates and the proposed multistate model predictions. Restated, this analysis is an attempt to approximate and describe the mapping between each patient characteristic and our multistate model predictions.
Using these statistical methods, we can obtain a measure of the relative influence of each covariate on multistate model predictions, where influence is measured as the negative impact of randomly shuffling the given covariate on our ability to recover the model-based predictions. Higher relative influence values indicate a stronger association between that variable and the multistate model predictions. This association can vary for different time horizons, where cancerrelated covariates may be very strongly related to mortality sooner after diagnosis and factors related to other cause mortality may be more strongly related to mortality long after diagnosis. eFigure 9 shows the results. We generally find that the relative influence of age increases for later time horizons. Additionally, baseline T classification initially dominates event-free and overall survival rates soon after baseline. We should be careful not to over-interpret changes in the magnitude of the relative influence across different time horizons. Instead, we can focus on the ordering of the relative importance measure across time horizons, which can give us a sense of which covariate or covariates are dominating the predicted values at each time horizon. Expanded modeling using a larger range of input covariates may do a better job at characterizing the complicated relationship between inputs and outputs, and this exploration can be viewed as a rough summary of these variable relationships in our UM cohort.  (1) no event, (2) alive or died with prior metastasis as first event, (3) alive or died with prior locoregional recurrence as first event or with persistent disease, and (4) death without prior recurrence. By definition, these predicted probabilities sum to 1 at each time horizon. Plotted points indicate the corresponding cumulative incidence estimate for outcomes (1), (1) or (2), and (1) or (2) or (3) respectively, based on observed outcome data.