Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing

Key Points Question Can natural language processing be used to predict the survival of patients with cancer from their initial oncologist consultation document? Findings In this prognostic study of 47 625 patients with cancer, 6-, 36-, and 60-month survival were predicted using both traditional and neural language models with performance similar to or better than that found in prior work. Meaning These findings suggest it is feasible to predict survival of patients with cancer using a common cancer document without additional data and without training separate models for specific types of cancer.


Introduction
Cancer is a leading cause of death globally, with survival depending on variables including cancer site, tumor type, age, sex, and comorbidities. 1 Accurately predicting an individual patient's survival could be used to improve cancer care.For example, it might suggest earlier referral to palliative care resources or consideration of more aggressive therapies upfront.Traditionally, survival rates are calculated retrospectively and categorized by only a few factors, primarily by cancer site and histology. 1Despite familiarity with these odds, oncologists can be inaccurate when predicting an individual patient's survival prospectively 2 ; they have trouble factoring in personal factors such as age.
Predictive models trained by machine learning may allow more personalized predictions by using many features of a patient's particular characteristics and disease, and have been shown to outperform the prediction of treating oncologists. 35][6][7][8] This may limit the widespread use of such models, as data availability varies among cancer treatment centers and between patients.It also limits what data can be used for a model, as not all clinical data are easily coded or categorized for extraction and analysis. 9The use of unstructured data, such as text within medical documents, may address these drawbacks.Almost all patients receiving treatment for cancer have an initial consultation document from their oncologist.Such documents generally have many details relevant to survival, for example, tobacco use or marital status, even if the clinic does not routinely store such data in structured data sets.
1][12] Many of the applications both in medicine generally and cancer specifically have utilized smaller, specific documents such as radiology or pathology reports. 13,14Some studies such as that by Liu et al 15 have used patient encounter documents to predict the onset of 3 noncancer illnesses.They found that models using the unstructured text data in these documents outperformed using only structured data, and that adding structured data like demographics and laboratory data increased performance by only a marginal amount.Within cancer, recent work predicted survival in patients with lung cancer by extracting structured data from unstructured document data 16 and used nonneural NLP on progress notes to update an individual's prognosis. 17ural networks are a type of machine learning modeled after the interconnectedness of neurons.When utilized in NLP, neural models can develop a more complex understanding of language, such as the presence of words with respect to each other, even when not adjacent. 4We were unable to find prior work using neural NLP methods to predict the survival of patients with general cancer using oncologist consultation documents, nor were such works identified in recent reviews of neural network applications in cancer, 13 in general medical applications, 11 or in a recent review of machine learning techniques used in cancer survival prediction. 18r work sought to develop and evaluate neural NLP models that predict the survival of patients with general cancer using their initial oncologist consultation, without the use of structured data.By using this common document without structured data, we hoped to build models that would not be constrained by requiring the collection and processing of specific data.Similarly, we utilized our models with the patient population with general cancer seen across a provincial cancer control system, as opposed to focusing on predicting survival in patients with 1 cancer location treated in a single center.This allowed us to investigate the performance of more generalizable models.We also did not utilize any feature extraction techniques for our neural methods, instead allowing the neural networks to directly utilize the text after only low-level text preparation.We further sought to contribute to the field by comparing 2 common neural networks used in NLP with a recently developed neural network using transformers. 19We hypothesized that our models would be able to achieve predictive performance at least in line with prior work, predicting with an accuracy,

Data Selection and Preparation
We excluded patients who had more than 1 cancer diagnosis.We included all patients who had at least 1 valid document recorded as being a medical or radiation oncologist consultation generated within 180 days of diagnosis irrespective of disease and selected the document closest to the date of diagnosis.At BC Cancer, some patients may not see an oncologist for a few months after diagnosis as they may first receive surgery elsewhere, and come to the organization for chemotherapy, radiation therapy, or other specialized treatments.
We applied some preprocessing to documents before they were used by our models, as outlined in eMethods in Supplement 1.When used with BoW models, words had their endings removed to become tokens.We calculated survival as the number of months between the selected document being generated, and either the patient's recorded death date or April 6, 2022, when mortality data were last extracted from administrative data.We then produced binary labels for whether a patient survived 6, 36, or 60 months or not.

Natural Language Models
Language models are used in NLP to assign probabilities to the order of words, 20 such as determining whether one sequence of words is more likely than another.We can extend this understanding to other tasks, such as predicting the binary survival outcomes in this work.Herein, we compare 4 language models: the traditional nonneural method, BoW, 21,22 and 3 models using neural networks, consisting of convolutional neural networks (CNN), [23][24][25] long short-term memory (LSTM), 26 and a more recent transformer model, bidirectional encoder representations from transformers (BERT). 19 show simplified diagrams of these models in the Figure .Neural networks allow models to have more complicated understandings of language, such as how nearby or even distant words can change each other's meaning.We provide further description of these models, including hyperparameters, in the eMethods in Supplement 1.We implemented our models in the Python 3 programming language.We describe further details, including libraries used [27][28][29][30] and techniques for dealing with imbalance between survivors and nonsurvivors, in the eMethods in Supplement 1.Our code and trained models are publicly available through a GitHub repository.

JAMA Network Open | Health Informatics
Predicting Survival of Patients With Cancer Using Natural Language Processing

Statistical Analysis
The primary outcome was model performance when predicting patients' survival of 6, 36, or 60 months.To avoid overfitting, when a model performed well on its training data but not on new data, 31 we randomly separated our data into training (70%), development (10%) and testing (20%) sets.We trained our neural models on the training data up to 100 times (epochs), stopping when performance evaluated on the development set did not increase for 5 epochs.We used the training and development sets to tune our model hyperparameters, then evaluated the model on the test set to ensure no tuning or development could overfit the test set.We assessed model performance by reporting the accuracy, BAC, AUC, F1 score, sensitivity, specificity, and other metrics of this test run, which we define in eTable 1 in Supplement 1.
To better understand our BoW models, we measured a word's importance by the absolute value of coefficient weights of the L2-regularized logistic regression model.For our neural models, we used the Captum interpretability library for PyTorch 32 to implement integrated gradients. 33This

Patient and Document Selection
Of the 59 800 patients from BC Cancer, we excluded 2784 who were recorded as starting cancer care multiple times.Of these remaining 57 016 patients, 9391 did not have a consultation from a medical or radiation oncologist within 180 days following their cancer diagnosis.This left 47 625 patients fulfilling our inclusion and exclusion criteria; all had updated mortality data.This cohort consisted of 25 428 female patients (53.4%) and 22 197 male patients (46.6%) with a mean (SD) age of 64.9 (13.7) years (Table 1).We observed patients surviving a mean (SD) of 61.7 (40.3) months after diagnosis, and 59.9 (39.9) months after their initial oncologist consultation; these numbers are limited by the observation period, which was a minimum of 5 years.For our prediction targets, 41 447 (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from the initial oncologist consultation.

Model Performance
In Table 2, we show how well our different NLP models predict whether patients will survive 5 years after their initial oncologist consultation when evaluated on a holdout test set.We see numerically similar performance between BoW, CNN, and LSTM, with BAC above 0.800 and AUC above 0.900.
BERT has a lower performance across most metrics.

Performance by Survival Length
Table 3 compares the performance for our BoW model with one of our best neural models, CNN, when predicting the different survival lengths.On a holdout test set, BoW performed best for predicting 6-month survival, achieving a BAC of 0.856 (AUC 0.928); CNN had best performance for predicting 36-month survival, with a BAC of 0.842 (AUC, 0.918) and for 60-month survival, with a BAC of 0.837 (AUC, 0.918).We see similar performance between both models.Additional metrics, and performance of all models when predicting all durations, are described in eTable 2 in Supplement 1.

Interpretation
When observing the importance of tokens (words with endings removed), we see some similar tokens have top 10 importance for BoW models used to predict both 6-and 60-month survival (Table 4), but also some differences.For example, the token palliat, which can stem from the words palliative and palliation, is the most important feature in both.Different cancer locations or types are important for the 2 durations; breast and prostate are positive predictors for 6-month survival, while liver, glioblastoma, and lung are negative predictors for 60-month survival.The token N0 is a top 10 positive predictor for 60-month survival, although we see no other TNM classification tokens in top 10 features for either survival duration.eFigures 1 and 2 in Supplement 1 show the importance of words in a patient's document for our CNN models, which correctly predict that this patient survives 6 but not 60 months.We again see similarities and differences between the models developed to predict surviving both lengths.For example, the patient's age had negative importance when predicting whether they would survive 60 months, but not 6 months.Positive aspects of the surgery and pathology, such as having clear margins and no lymphatic or vascular involvement, predicted positively in both models, but more so for the 6-month survival prediction.Medical comorbidities such as congestive heart failure and Barrett esophagus were negative predictors for 60-month survival, but less so in predicting 6-month survival.b Indicates the number of months patients survived during the study's observation period, which was at least 60 months.
c Indicates the number of months survived since the initial oncologist consultation document used in this study was generated.

Discussion
In this study, we trained and evaluated NLP models using both traditional and neural models to predict whether a patient with cancer will survive 6, 36, and 60 months using only their initial oncologist consultation document and no other data.The performance of our models, when evaluated on a never-before-seen internal holdout set, achieved accuracy, BAC, and AUC over 0.800, with our best models achieving AUC over 0.900.This performance was similar to or better than prior work seeking to predict survival of patients with cancer, 5,8,16,[34][35][36] despite using data that were more generalizable and readily available.Prior studies have predicted cancer survival for specific tumor sites such as breast or lung, or with the use of structured data such as processed clinical and genetic characteristics.
Our results suggest it is possible to predict the survival of patients with cancer without having to construct structured data sets, or limiting the predictions to specific types or locations of cancer.
The availability of structured data may vary.Given the widespread availability of initial oncologist consultation documents, this opens up the possibility of more easily training and using such models across cancer types at different cancer centers.To help our models be generalizable, we did minimal text processing.We did not extract features from the text to be used by our models, instead providing text directly.The documents in our data set were generated by oncologists practicing at 6 centers located in geographically varied regions without set templates or formatting requirements.Future work will be needed to externally validate our models on documents generated in other jurisdictions.
The performance of our CNN, LSTM, and BoW models was numerically similar.BERT's inferior performance, despite its more deeply connected network, may be due to its limitation of only being able to use the first 512 tokens.The lack of clear performance gain with neural models compared with the traditional BoW model is surprising, but in line with some prior work. 37,38This may suggest survival prediction is largely dependent on the presence of words, compared with understanding how words relate to each other in a document.
We interpreted our neural models using integrated gradients and our BoW models using coefficient values.Both methods have their limitations.Feature correlation impacts L2-regularized coefficients, while integrated gradients show word importance based on their specific context.
However, our results generally supported that our models used generalizable, interpretable portions of text to make their predictions, consistent with known mortality risk and protective factors. 39,40r example, palliative care is often accessed in the last months of a patient's life, so it is unsurprising that an oncologist mentioning palliative care would support that a patient may not survive 6 months.
The 60-month BoW model included N0, that a patient has no known spread to lymph nodes, but not the 6-month BoW model.Nodal status could be expected to be less relevant in the shorter time.
We found that the neural model weighed clinically relevant data such as age, positive outcomes from a surgery, and comorbidities differently when predicting 6-or 60-month survival.
There are multiple avenues for further development to build on our results.While some related work found adding structured data led to only a small, likely not clinically significant improvement in performance, 15 it may still be worthwhile investigating the addition of structured data.Future work could improve on our results by using different models or configurations, including those both more and less complex.Hardware advancement will allow more complex versions of our models, and there are new transformer models such as Longformer 41 and BigBird 42 that can use more text.Neural models can especially benefit from training on very large amounts of data.Future work could use our methodology to train models using widely available oncologist consultation documents from multiple jurisdictions or fine-tune our models by further training them with a relatively small number of a different jurisdiction's documents.Future research could also adapt our models to update survival prediction after the initial consultation, such as by using progress notes.Further performance improvements and external validation may allow clinicians to use this methodology for improving care, such as better facilitating palliative resources with short-term survival predictions or using long-term predictions to consider more intense initial treatment.

Limitations
This study has some limitations.We took steps to ensure the validity of our models, including the use of a holdout test set, and eliminating location-specific text that was automatically added to documents, but we did not externally validate our models.Our work calculated survival from the first oncologist consultation document, as opposed to from diagnosis, as is common.This was done because the time from diagnosis to consultation can vary from a few weeks to a few months at BC Cancer if patients first received treatment such as surgery from outside clinicians, or initial workup from family physicians.We also note our data set consisted of patients first seen between 2011 and 2016, so we could establish 5-year survival.Given the rapid advancement of cancer treatment, accuracy may be limited for patients starting treatment presently.Finally, we did not compare our models' performance with that of oncologists, as has been done in previous work.

JAMA Network Open | Health Informatics
Predicting Survival of Patients With Cancer Using Natural Language Processing

Conclusions
In this prognostic study of 47 625 patients with cancer, we trained and evaluated traditional and neural models to predict whether patients will survive 6, 36, or 60 months based on their initial oncologist consultation.We evaluated our model on an internal holdout set of data, and found our best models achieved accuracy and BAC above 0.800 and AUC above 0.900.This performance was comparable with or superior to that of prior work, which has predicted survival only for specific types of cancer or used data that are more difficult to obtain.
Our findings suggest that a clinically useful survival prediction model for patients with general cancer may be possible without needing specific models for different cancer types and by using data that are readily accessible without complex data processing or data mining.These results still require external validation and have multiple avenues for further improvement; regardless, they suggest that this methodology may one day assist in care by helping patients with cancer and their treatment team by providing an individualized expectation of survival.
method allows us to visualize what words in a document positively or negatively contribute to a prediction.While interpretable, the visualization is specific to words in an individual document; a word's or phrase's shown importance may be different in the context of a different document.To protect privacy, we anonymized the shown text by changing dates, names, and other identifying components, ensuring this did not change the interpretation.

Figure . C
Figure.Simplified Diagrams of the Language Models Used in This Work

Table 1 .
Characteristics of Patients in the Final Data Set a a Unless otherwise indicated, data are expressed as No. (%) of patients.

Table 2 .
Model Performance for Predicting 60-Month Survival After Patients' Initial Oncologist Consultation Document Was Generated Abbreviations: AUC, receiver operating characteristics area under the curve; BAC, balanced accuracy; BERT, bidirectional encoder representations from transformers; BoW, bag-of-words; CNN, convolutional neural network; LSTM, long short-term memory.

Table 3 .
Model Performance for Predicting the Given Numbers of Months After Patients' Initial Oncologist Consultation Document Was Generated a Comparison of the traditional model, BoW, with our best performing neural model, CNN.