Dots represent the deciles of patients’ observed 1-year probability of death plotted against their predicted 1-year probability of death.
eTable 1. Baseline Characteristics of Development and Validation Cohorts at Each Year
eTable 2. Fully Adjusted Main Effects Model Associations for Development Cohort Following Backward Elimination for All Years
eFigure 1. Calibration Accuracy for 1-Year Survival
eFigure 2. Kaplan Meier Survival Curve When Patients Grouped by Quintiles of 1-Year Predicted Risk of Death
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Seow H, Tanuseputro P, Barbera L, et al. Development and Validation of a Prognostic Survival Model With Patient-Reported Outcomes for Patients With Cancer. JAMA Netw Open. 2020;3(4):e201768. doi:10.1001/jamanetworkopen.2020.1768
Is it possible to develop a risk prediction model of survival for patients with cancer that incorporates patient-reported outcomes over time?
In this prognostic study of data from 255 494 patients with cancer, the mean) time to death from diagnosis was 567 days. The model found that the following factors were associated with increased risk of death by more than 10%: being hospitalized; having congestive heart failure, chronic obstructive pulmonary disease, or dementia; having moderate to high pain; having worse well-being; having functional status in the transitional or end-of-life phase; having any problems with appetite; receiving end-of-life home care; and living in a nursing home.
Patients and families may be more informed when making decisions, such as about treatments or palliative care, if they can easily calculate prognostic information.
Existing prognostic cancer tools include biological and laboratory variables. However, patients often do not know this information, preventing them from using the tools and understanding their prognosis.
To develop and validate a prognostic survival model for all cancer types that incorporates information on symptoms and performance status over time.
Design, Setting, and Participants
This is a retrospective, population-based, prognostic study of data from patients diagnosed with cancer from January 1, 2008, to December 31, 2015, in Ontario, Canada. Patients were randomly selected for model derivation (60%) and validation (40%). The derivation cohort was used to develop a multivariable Cox proportional hazards regression model with baseline characteristics under a backward stepwise variable selection process to predict the risk of mortality as a function of time. Covariates included demographic characteristics, clinical information, symptoms and performance status, and health care use. Model performance was assessed on the validation cohort by C statistics and calibration plots. Data analysis was performed from February 6, 2018, to November 6, 2019.
Main Outcomes and Measures
Time to death from diagnosis (year 0) recalculated at each of 4 annual survivor marks after diagnosis (up to year 4).
A total of 255 494 patients diagnosed with cancer were identified (135 699 [53.1%] female; median age, 65 years [interquartile range, 55-73 years]). The cohort decreased to 217 055, 184 822, 143 649, and 109 569 patients for each of the 4 years after diagnosis. In the derivation cohort year 0, and the most common cancers were breast (30 855 [20.1%]), lung (19 111 [12.5%]), and prostate (18 404 [12.0%]). A total of 47 614 (31.1%) had stage III or IV disease. The mean (SD) time to death in year 0 was 567 (715) days. After backward stepwise selection in year 0, the following factors were associated with increased risk of death by more than 10%: being hospitalized; having congestive heart failure, chronic obstructive pulmonary disease, or dementia; having moderate to high pain; having worse well-being; having functional status in the transitional or end-of-life phase; having any problems with appetite; receiving end-of-life home care; and living in a nursing home. Model discrimination was high for all models (C statistic: 0.902 [year 0], 0.912 [year 1], 0.912 [year 2], 0.909 [year 3], and 0.908 [year 4]).
Conclusions and Relevance
The model accurately predicted changing cancer survival risk over time using clinical, symptom, and performance status data and appears to have the potential to be a useful prognostic tool that can be completed by patients. This knowledge may support earlier integration of palliative care.
Several randomized clinical trials demonstrated the benefits of early integration of palliative care (ie, at diagnosis) with active cancer treatment, such as improved quality of life and symptom control.1-3 This evidence led the American Society of Clinical Oncology to endorse the provision of early palliative care concurrently with standard oncologic care.4 Although evidence indicates the positive effect of palliative care integration at the time of diagnosis, patients generally receive palliative care close to death or not at all. Data from the US5 indicate that palliative care was accessed in 45% of all deaths for a median of 17 days before death. An increasing body of research has focused on developing prognostic tools, particularly online tools, to help practitioners predict death in patients with cancer. A systematic review identified 22 online prognostic tools that addressed 89 different cancers.6 However, prognostic tools largely fail to integrate palliative care earlier in the disease trajectory for several reasons.
First, most tools were designed for use by oncologists. However, even with prognostic tools, oncologists often struggle with when to discuss prognosis and palliative care because they do not want to take away patient hope, and cancer advancements have increased treatment options and clinical trials.7,8 When practitioners discuss prognosis, research shows their predictions are overly optimistic.9,10 Second, existing tools have limitations in their usefulness as the disease progresses. Many of the tools predict death from diagnosis but do not account for changes over time, such as in treatment plan or health services use. Most tools also do not incorporate patient-reported outcomes, such as performance status or symptom burden, which have clinical and statistical prognostic value across multiple cancers.11,12 Third, patients face barriers to use the tools directly. Systematic reviews13,14 have found that many prognostic tools require biological and laboratory variables, such as cancer antigen levels, elevated C-reactive protein level, and leukocytosis, which are not typically known by patients. This requirement prevents patients from obtaining prognostic information that could help them initiate discussions about palliative care.
In this study, we aimed to develop and validate a prognostic model to predict survival in patients with cancer. To address prior limitations, the model uses easily known clinical information and incorporates patient-reported outcomes (ie, common symptoms and performance status) over time using unique databases available in Ontario, Canada.15,16 Thus, the model can be completed by patients and families. We named the model PROVIEW by blending the goal to help patients preview the future to be proactive. PROVIEW aims to provide changing survival predictions as the disease progresses over time, which could support discussions about integrating palliative care even alongside disease-modifying therapies.
We performed a population-based, retrospective prognostic study of data from adults diagnosed with cancer, as confirmed by the provincial cancer registry in Ontario, Canada, from January 1, 2008, to December 31, 2015. Data analysis was performed from February 6, 2018, to November 6, 2019. The study was reviewed by Hamilton Integrated Research Ethics Board and deemed exempt because it was a deidentified, secondary data analysis. This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.17
We used the following linked administrative databases (and corresponding covariates): (1) Ontario Cancer Registry (cancer type, diagnosis date, and stage); (2) Vital Statistics (age, sex, and date of death); (3) Statistics Canada (rurality, income quintile, and region); (4) Activity Level Reporting (chemotherapy and radiation treatment); (5) Discharge Abstract Database (hospitalization dates, diagnoses, surgery for cancer, and comorbidity); (6) National Acute Care Registry System (emergency department visits and reasons); (7) physician billing (physician visits and billing codes); (8) Home Care database (nursing and personal support); (9) Symptom Management database (symptoms and performance status); and (10) interRAI database (performance status and symptoms).
The 2 databases that contain population-based performance status and symptom data are the Symptom Management and interRAI databases. The Symptom Management database began in 2007 when Cancer Care Ontario mandated the systematic screening of outpatients with cancer for symptoms using the Edmonton Symptom Assessment System (ESAS) and for performance status using the Palliative Performance Scale (PPS).15 Every patient being treated at a cancer center is eligible to complete the ESAS and PPS, both of which are validated tools in populations with cancer.18,19 The monthly provincial screening completion rate is 56%.20 The ESAS asks patients to self-report the severity of 9 symptoms (ie, pain, depression, well-being, shortness of breath, anxiety, nausea, tiredness, drowsiness, and appetite) on a scale of 0 (symptom absent) to 10 (most severe), whereas the PPS describes a patient’s performance status based on a patient’s level of ambulation, level of activity, and ability to perform self-care. The PPS is scored from 0 to 100 (in 10-point increments), with 80 to 100 indicating stable, 40 to 70 indicating transitional, 10 to 30 indicating end of life, and 0 indicating dead. The PPS is completed by the practitioner during the patient’s visit. In 2013, Ontario also began collecting functional scores using a patient-completed Eastern Cooperative Oncology Group score, which is comparable and highly correlated with the physician-reported PPS.21,22
The interRAI database began in 2002, when Ontario mandated the use of the Resident Assessment Instrument for Home Care, a standardized tool for patients receiving publicly funded home care services for an expected 60 days or more. The assessment is akin to the Minimum Data Set used internationally and is valid and reliable.23-25 Seventy percent of patients with cancer use home care in the last year of life.26 The assessment collects quality-of-life data for approximately 300 unique items that measure domains, such as the presence of moderate to severe pain or depression, presence of caregiver living in patient’s home, and performance status via the health instability CHESS scale (change in decision-making, change in activities of daily living status, and end-stage disease).27 The assessment is completed by the case manager at intake and reassessed at least every 6 months.
The primary outcome was time to death (days) per date of death in the Vital Statistics database. The initial index date for each patient was the date of diagnosis. Because covariates and treatments may change over time, we also aimed to predict conditional survival probabilities; thus, prediction models were redeveloped by moving the index date to the 1-, 2-, 3-, and 4-year survival marks. Only patients who were alive at those marks contributed to each corresponding conditional analysis. All covariates were recalculated at each new index date to avoid incorporating time-varying covariates into the regression model because predictions are meant to be based on information known only at the current time.
Each model included the following baseline covariates: demographic characteristics (age at diagnosis, sex, caregiver living with the patient [yes or no], and lives within 50 km of a cancer center [yes or no]); clinical data (diagnosis date, cancer type, cancer stage, presence of 1 of 13 other chronic diseases as determined by validated algorithms,28,29 type of chemotherapy [publicly funded oral drugs, immunotherapy, and systemic agents], receipt of radiation treatment [yes or no], and/or cancer surgery [yes or no] in the past [from diagnosis up to 3 months previously] and recently [within the past 3 months]); patient-reported outcomes (performance status and 9 symptom scores within 3 months of index date); and health care use within 3 months of the index date (prior hospitalization, hospitalizations for palliative care [including palliative care consultation], living in long-term care, receipt of end-of-life home care services, having a regular family physician, and received physician home visit).
For the primary outcome of time to death, prediction methods for time-to-event data were implemented starting from diagnosis and then reimplemented at each of the 4 yearly survival marks after diagnosis. Each derived prediction model followed the below steps.
We randomly selected 60% of eligible patients for model derivation and used the other 40% for validation. To ensure random sampling, we assessed and compared the distribution of baseline characteristics between the derivation and validation cohorts. Using the derivation cohort, we used a multivariable Cox proportional hazards regression model with baseline (time-fixed) characteristics to predict the hazard of mortality as a function of time. A priori, we created a multivariable model that consisted of all potential variables mentioned above. We then used the backward stepwise selection procedure for variable selection with a liberal 2-sided P < .15 as the retention criteria.30 The proportional hazards assumption was assessed by including interactions with time and each covariate into the model. We centered continuous covariates, such as age, and explored linear and quadratic terms. Missing data from patient-reported categorical variables were handled by creating an additional missing category for that variable. Most of the missing data were attributable to patients not completing an ESAS, although occasionally patients skip certain questions on the ESAS. Because there was no obvious missing pattern, we elected to create a missing category rather than to impute or remove these patients from the analysis. Interactions between cancer type and stage were also incorporated with a goal of achieving maximal discriminative ability within the derivation cohort,31,32 as determined by the concordance index.33
After the final regression model was established, the 1-year predicted probability of death was calculated for each patient in the validation cohort based on their specific covariate values, the estimates of the regression variables from step 1, and the estimate of the baseline survival function from step 1.34 Calibration (how close the model-estimated risk is to the observed risk) was examined by grouping patients into deciles of model-estimated 1-year risk of death. We then reviewed the plot of the observed against the predicted 1-year probabilities of death for patients in each decile.35
We measured the model’s discriminative ability (ability to distinguish between patients who died from those who did not die) via a concordance index (C index).36,37 Concordance for survival data was calculated as the proportion of pairs in which the patient who died had a higher predicted probability than the patient who did not die. All analyses were conducted using the statistical software R, version 2.15 (R Project for Statistical Computing) and SAS, version 9.3 (SAS Institute Inc).
We identified 255 494 patients (135 699 [53.1%] female; median age, 65 years [interquartile range, 55-73 years]) diagnosed with cancer during 2008 to 2015. Because we repeated the derivation and validation process each year up to 4 years after diagnosis conditional on survival, the total cohort decreased to 217 055 in year 1, 184 822 in year 2, 143 649 in year 3, and 109 569 in year 4 (Figure 1). We randomly split each total cohort into derivation (60%) and validation cohorts (40%). Characteristics between the derivation and validation cohort were nearly identical in the diagnosis year (year 0).(Table 1). In the derivation cohort year 0, the most common cancers were breast (30 855 [20.1%]), lung (19 111 [12.5%]), prostate (18 404 [12.0%]), and colorectal (16 776 [10.9%]). A total of 47 614 (31.1%) of the cohort had stage III or IV disease, 66 958 (43.7%) had stage I or II disease, and 38 724 (25.2%) had unknown stage in the registry. Within the first 3 months of diagnosis, 71 479 (46.6%) had cancer-related surgery, 41 486 (27.1%) received chemotherapy, and 37 581 (24.5%) received radiation therapy. Although half of the patients did not have a performance status recorded within 3 months of diagnosis, 13 320 (8.7%) were in the transitional stage and 1 709 (1.1%) were at end of life. A total of 23 818 (15.5%) of the cohort had moderate to high pain, 43 879 (28.6%) had no pain, and 62 049 (40.5%) had missing values. Within 3 months of diagnosis, 10 172 (6.6%) were hospitalized for palliative care intent, and 9 038 (5.9%) received end-of-life home care services. The main difference between the year 4 and year 0 derivation and validation cohorts was that fewer patients were still receiving treatment in year 4 (eTable 1, eTable 2, eFigure 1, and eFigure 2 in the Supplement include all variables across all years and additional analyses).
After backward stepwise selection, each yearly survival model had a different set of variables included in the final prediction model (Table 2). In the year 0 model, the following factors were associated with increased instantaneous risk of death by more than 10%: having lung cancer; having worse than stage I disease; being hospitalized for any reason; and, especially if the main reason was for palliative care, having congestive heart failure, chronic obstructive pulmonary disease, or dementia; having moderate or high pain; having worse well-being; having a performance status in the transitional or end-of-life phase; having any problems with appetite; receiving end-of-life home care; and living in a nursing home.
Figure 2 gives calibration plots for year 0 and year 4 in the validation cohorts. Model discrimination in the validation cohorts was high. The C index for the 5 yearly models was 0.902 (year 0), 0.912 (year 1), 0.912 (year 2), 0.909 (year 3), and 0.908 (year 4).
To exemplify how the model could be used, we consider the following hypothetical scenario. A 70-year-old man was diagnosed with stage III lung cancer 2 years ago (ie, the calculator would use the year 2 model). His baseline characteristics at year 2 were that he received chemotherapy and radiation therapy in the past (ie, between diagnosis until 3 months ago) and received chemotherapy recently (ie, within the past 3 months) but stopped receiving radiation therapy recently. He had no other chronic conditions, no symptoms except a score of 10 (severe) for worst appetite, and a performance status score of 60 (transitional). For someone with these baseline characteristics in our model, the probability of surviving another 365 days would be 82.4% (95% CI, 80.2%-84.6%) and another 1825 days (5 more years) would be 23.4% (95% CI, 19.2%-28.6%). If the man was hospitalized shortly thereafter, with the use of the same baseline characteristics in the model except indicating a “yes” for a recent hospitalization, the probability of surviving another 365 days would be 71.1% (95% CI, 67.7%-74.5%). If the man experienced adverse effects of chemotherapy and wondered how stopping chemotherapy would affect his long-term survival, with the use of the same baseline characteristics except indicating a “no” for recent chemotherapy, the probability of surviving another 365 days would be 90.1% (95% CI, 88.8%-91.4%). If everything stayed the same and the man lived to 3 years after diagnosis, the probability of surviving another 365 days to year 4 would be 83.8% (95% CI, 81.0%-86.7%) and another 1825 days (to year 8) would be 24.0% (95% CI, 18.3%-31.6%). A first iteration of the PROVIEW calculator is available online.38
In this study, we developed and validated a predictive survival model that can be used for all cancer types and incorporates patient-reported outcomes of performance status and symptom severity. By using a large population-based cohort, we achieved high calibration and discrimination. To our knowledge, PROVIEW is the only cancer prognostic model that uses these patient-reported outcomes and updates the risk yearly after diagnosis. Because the covariates are self-reportable by patients and predict risk in days, the model has potential to be a patient-completed online tool, allowing patients to examine survival predictions during various periods as their condition changes.
Compared with other online prognostic tools, such as the UK’s PREDICT for breast cancer39 or prostate cancer nomograms,40 PROVIEW has features that may make the model easier for patients to use. PROVIEW uses only variables that are easily reported by the patient, whereas other tools require clinical knowledge (eg, biomarkers), which may not be known by patients. For instance, PREDICT uses ERBB2 (formerly HER2 or HER2/neu) status, estrogen receptor status, and tumor size; the prostate cancer nomograms require knowledge of prostate-specific antigen levels, biopsy cores, or pathology reports. In addition, some tools report mean life expectancy or survival at predetermined periods (eg, 5- and 10-year survival from diagnosis), whereas PROVIEW models survival in days, allowing the user to choose long or short periods in the future. For example, patients at the end of life may be more interested in 30-day survival than 5-year survival. Moreover, because the model was recalculated at each 1-year anniversary and includes symptoms and performance status, it can be used at any time within the first 5 years after diagnosis and accounts for changes in a patient’s condition over time. Some tools have different versions for various posttreatment phases (eg, after radical prostatectomy),41 but they do not differentiate among individuals who had the same treatments but have drastically different performance status.
The hypothetical case example described earlier gives potential scenarios in which the model may be useful to inform decision-making and initiate palliative care discussions earlier. In the scenario in which the patient was hospitalized, the 1-year survival risk would decrease. This change in predicted survival may trigger patients and families to review the general outlook of disease trajectory with practitioners, which may lead to discussions about palliative care even though death is not imminent. In the scenario in which the patient considers stopping chemotherapy, the 1-year survival risk would increase. This increase may be associated with the confounding fact that patients who stop chemotherapy might have responded well to treatment, achieved remission, and thus live longer. These nuances need to be discussed with practitioners, along with clinical factors that are not available in the model and preferences and goals of care. Patients can use the model’s survival predictions, which uniquely incorporate changes in symptoms, performance status, treatment, and hospital use along the disease trajectory, to inform discussions and improve decision-making with practitioners.
This study has limitations. Data were not available on genetic biomarkers and specific targeted therapies, which would increase the accuracy of our predictions, particularly for cancer-specific models. Symptom and performance status data at various time points were missing because some patients chose not to voluntarily report them at cancer centers or they did not receive home care assessments and services. Nonetheless, the largest, longitudinal, population-based databases with this information were used. In this version, worsening symptoms and performance status were not considered as outcomes or how they could be modified by other variables. This analysis is planned as a subsequent step, which would further support the model’s usefulness for early palliative care integration. Although the model was validated and the initial online calculator is available, an important next step is to test, validate, and refine the online tool with patient and family users.42
The PROVIEW model appeared to accurately predict changing cancer survival risk over time using administrative clinical data and patient-reported outcomes of symptoms and performance status. Because the model covariates can be completed by patients, PROVIEW may be a useful patient-facing online tool, allowing them to prepare questions around goals of care and treatment preferences before an oncologist visit. In this way, PROVIEW could help patients and families initiate conversations with practitioners about the changing disease trajectory and explore the benefits of palliative care supports earlier.
Accepted for Publication: January 30, 2020.
Published: April 1, 2020. doi:10.1001/jamanetworkopen.2020.1768
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Seow H et al. JAMA Network Open.
Corresponding Author: Hsien Seow, PhD, Department of Oncology, McMaster University, 699 Concession St, Room 4-229, Hamilton, ON L8V 5C2, Canada (firstname.lastname@example.org)
Author Contributions: Drs Seow and Sutradhar had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Seow, Barbera, Earle, Guthrie, Isenberg, Myers, Brouwers, Sutradhar.
Acquisition, analysis, or interpretation of data: Seow, Tanuseputro, Barbera, Earle, Guthrie, Isenberg, Juergens, Brouwers, Sutradhar.
Drafting of the manuscript: Seow, Myers, Sutradhar.
Critical revision of the manuscript for important intellectual content: Seow, Tanuseputro, Barbera, Earle, Guthrie, Isenberg, Juergens, Brouwers, Sutradhar.
Statistical analysis: Seow, Guthrie, Myers, Sutradhar.
Obtained funding: Seow, Barbera, Sutradhar.
Administrative, technical, or material support: Seow, Tanuseputro, Isenberg.
Supervision: Tanuseputro, Sutradhar.
Conflict of Interest Disclosures: Dr Barbera reported receiving personal fees from Genentech outside the submitted work. Dr Juergens reported receiving personal fees from AbbVie, Amgen, EMD Serono, Fusion Pharmaceuticals, Novartis, Pfizer, Roche, and Takeda outside the submitted work and receiving grants and personal fees from AstraZeneca, Bristol-Myers Squibb, and Merck Sharp and Dohme. No other disclosures were reported.
Funding/Support: This study was funded by grants 379009 and 383402 from the Canadian Institutes for Health Research. The study used databases maintained by ICES (formerly known as the Institute for Clinical Evaluative Sciences), Cancer Care Ontario, Canadian Institutes of Health Information, and the Ontario Association of Community Care Access Centers, which receive funding from the Ontario Ministry of Health and Long-Term Care.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The opinions, results, and conclusions reported in this article are those of the authors solely.
Additional Contributions: The following people provided analytic and coordination support as paid staff: Julia Ma, MPH, McMaster University; Erin O’Leary, MSc, McMaster University; Semra Tibebu, MPH; and Amina Benmessaoud, MEngD. Lesley Moody, PhD, provided helpful comments during the data analysis and was not compensated for this work.
Create a personal account or sign in to: