Customize your JAMA Network experience by selecting one or more topics from the list below.
Karadaghy OA, Shew M, New J, Bur AM. Development and Assessment of a Machine Learning Model to Help Predict Survival Among Patients With Oral Squamous Cell Carcinoma. JAMA Otolaryngol Head Neck Surg. 2019;145(12):1115–1120. doi:10.1001/jamaoto.2019.0981
How can machine learning be used to further our ability to create prediction models for survival of oral cancer?
In this cohort study of more than 30 000 patients, a prediction model using a variety of patients, tumors, treatment facilities, and treatment types predicted 5-year overall survival with an accuracy of 71%, precision of 71%, and recall of 68%.
Novel machine learning forms of analysis may help in the creation of prediction models using large data registries, and the inclusion of several aspects of the health status of a patient with cancer can more accurately predict survival.
Predicting survival of oral squamous cell carcinoma through the use of prediction modeling has been underused, and the development of prediction models would augment clinicians’ ability to provide absolute risk estimates for individual patients.
To develop a prediction model using machine learning for 5-year overall survival among patients with oral squamous cell carcinoma and compare this model with a prediction model created from the TNM (Tumor, Node, Metastasis) clinical and pathologic stage.
Design, Setting, and Participants
A retrospective cohort study was conducted of 33 065 patients with oral squamous cell carcinoma from the National Cancer Data Base between January 1, 2004, and December 31, 2011. Patients were excluded if the treatment was considered palliative, staging demonstrated T0 or Tis, or survival or staging data were missing. Patient, tumor, treatment, and outcome information were obtained from the National Cancer Data Base. The data were split into a distribution of 80% for training and 20% for testing. The model was created using 2-class decision forest architecture. Permutation feature importance scores were used to determine the variables that were used in the model’s prediction and their order of significance. Statistical analysis was conducted from August 1, 2018, to January 10, 2019.
Main Outcomes and Measures
Ability to predict 5-year overall survival assessed through area under the curve, accuracy, precision, and recall.
Among the 33 065 patients in the study, the mean (SD) age was 64.6 (14.0) years, 19 791 were men (59.9%), 13 274 were women (40.1%), and 29 783 (90.1%) were white. At 60 months, there were 16 745 deaths (50.6%). The median time of follow-up was 56.8 months (range, 0-155.6 months). Age, pathologic T stage, positive margins at the time of surgery, lymph node size, and institutional identification were identified among the most significant variables. The calculated area under the curve for this machine learning model was 0.80 (95% CI, 0.79-0.81), accuracy was 71%, precision was 71%, and recall was 68%. In comparison, the calculated area under the curve of the TNM staging system was 0.68 (95% CI, 0.67-0.70), accuracy was 65%, precision was 69%, and recall was 52%.
Conclusions and Relevance
Using machine learning algorithms, a prediction model was created based on patient social, demographic, clinical, and pathologic features. The developed prediction model proved to be better than a prediction model that exclusively used TNM pathologic and clinical stage according to all performance metrics. This study highlights the role that machine learning may play in individual patient risk estimation in the era of big data.
Oral squamous cell carcinoma (OSCC) is a significant health problem globally.1,2 Prognostic indicators for OSCC have been investigated, and nodal metastasis has frequently been found to be the most significant prognostic indicator.3,4 Other investigations have suggested that female sex, metastasis, advanced age, extracapsular spread, perineural invasion, and tobacco use are additional significant clinicopathologic variables associated with survival.3,5-9 Despite the identification of prognostically significant variables, absolute risk estimates for individual patients remain understudied and are not commonly used in counseling patients.10,11
Development of clinical prediction models is crucial to enhance a clinician’s ability to provide absolute risk estimates.10,11 Clinicians rely on the Tumor, Node, Metastasis (TNM) classification to convey the gravity of cancer diagnosis and prognosis.12 With the increasing availability of large national databases and computing power, the amount of data input has increased, allowing for novel approaches to data analysis.13 One such method includes machine learning, a subdiscipline of artificial intelligence that helps researchers analyze large amounts of data to find patterns to better solve problems by making predictions.14 The use of machine learning in health care has been rapidly growing during the past decade.15,16 In clinical research, machine learning is being used to enhance current prediction modeling by providing more accurate and precise predictions for outcomes of interest.13,17-20 As the availability of patient, pathologic, and genetic information increases, machine learning may prove to be a novel tool for predicting survival.21,22
The goal of this study was to use machine learning to develop a model that predicts 5-year overall survival among patients with OSCC. With machine learning, we anticipate the ability to incorporate multiple clinical and pathologic variables to create an improved and personalized prediction model to better counsel patients.
The National Cancer Data Base (NCDB) is a jointly sponsored initiative led by the American College of Surgeons and the American Cancer Society as a clinical oncology database sourced from hospital registry data. All data are collected from more than 1500 Commission on Cancer–accredited facilities and captures more than 70% of newly diagnosed cancers nationwide.23,24 The study was granted exemption by the Kansas University Medical Center Institutional Review Board because the database is publicly available to participating sites and all patient information is deidentified.
Patients with OSCC were identified in the NCDB from January 1, 2004, to December 31, 2011, to allow a sufficient follow-up time of 5 years. The included anatomical subsites of the oral cavity were oral lip, floor of mouth, gum of mouth, anterior two-thirds of tongue, hard palate, and buccal mucosa. Exclusion criteria included patients with tumor stage T0 and Tis and patients who underwent palliative treatment or if critical variable information, such as survival data and staging data, were missing.
Variables studied are largely divided into 4 separate categories: patient, facility, tumor, and treatment characteristics. Patient characteristics include age, sex, race/ethnicity, and comorbid disease as calculated by the Charlson-Deyo score.25 Insurance status, educational level, median household income, rural or urban residence, and distance from the hospital were also collected and included in analysis. Facility characteristics explored include whether the treating hospital is a community program, academic program, integrated network cancer program, or other. The NCDB divides the locations into New England, Middle Atlantic, South Atlantic, East North Central, East South Central, West North Central, West South Central, Mountain, and Pacific. Tumor characteristics collected include the T, N, and M score and stage group determined both clinically and pathologically. The TNM edition number is taken into consideration. Additional analyzed tumor variables include tumor grade, extracapsular spread, and perineural invasion. Information regarding metastatic disease is dispersed through multiple variables; therefore, a single new variable encompassing the overlapping variable is created. Treatment characteristics are limited to primary course information with either surgical or radiotherapy treatment modalities. Although several aspects of treatment information are obtained by the NCDB, the decision to limit the inclusion of all available treatment characteristics allows greater clinical utility of the prediction model. The primary study end point is 5-year overall survival, which was calculated using the vital status variable and date of last contact.
Missing data for covariates of interest were explored and categorized into the categories missing completely at random, missing at random, and missing not at random.26,27 Variables determined to be missing at random were handled using single-value imputation of median values. No data imputation was used for variables determined to be missing not at random. No data imputation was used for variables with missing information greater than 40%.
The construction of a supervised, machine learning classification model was achieved using open source Azure Machine Learning Studio (Microsoft Corp). We randomly split the data into a distribution of 80% of data as a training set and the remaining 20% as a test set. We considered multiple 2-class decision models, including decision forest, decision jungle, logistic regression, and neural network. For optimization of the models, a parameter range was established that would allow the machine learning model to calculate the best parameters through multiple combinations. Bootstrap aggregation was built into model generation for each of the fitted models. For model construction, we created 2 separate experiments: the first included all variables for model creation and the second included only the pathologic TNM stage, or clinical stage if pathologic stage was unavailable, for model creation. The outcome of interest, 5-year overall survival, was identified as the label.
After the models were built, they were scored and evaluated using the test set data. The performance of the model was measured using area under the curve (AUC), accuracy, precision, and recall in accordance with previous recommendations for results reporting of clinical prediction models.10 The permutation feature importance scores obtained from the model using the test data provided insight as to the most significant variables used in the model’s prediction. The permutation feature importance scores are determined as the difference in model performance determined by the AUC before and after alteration of a given dependent variable. This process is repeated for each variable included in the model. Thus, the absolute magnitude of a permutation feature importance score reveals a feature with great effect on model performance. Finally, a prediction model using only pathologic and clinical TNM stage was created to compare the performance of the 2 models.
Statistical analysis was conducted from August 1, 2018, to January 10, 2019. Preparation of the data was achieved using SPSS, version 25 (IBM Corp), and the analysis was conducted using both SPSS and open source Azure Machine Learning Studio. Basic descriptive statistics were used.
A total of 38 477 patients were eligible participants for this study. A total of 33 065 patients were included for analysis after excluding 4917 patients because of missing survival data and 495 patients because of missing staging information. The mean (SD) age of participants was 64.6 (14.0) years. Men comprised 59.9% of the cohort (n = 19 791). A total of 90.1% of the patients were white (n = 29 783). Full patient demographics are summarized in Table 1. Median follow-up time was 56.8 months (range, 0-155.6 months). At 60 months, 16 745 patients (50.6%) died, resulting in a 49.4% overall survival rate at 5 years.
For model development, 80.0% (n = 26 452) of the data were randomly selected and used. The classification models explored included 2-class decision forest, 2-class decision jungle, 2-class logistic regression, and 2-class neural network. For the development of the first clinical prediction model, all variables available in the NCDB were made available for use to the machine learning model. On completion, the prediction model was applied to the test data set where performance metrics were measured. The decision forest classification was the most robust, with an AUC of 0.80 (95% CI, 0.79-0.81), accuracy of 71%, precision of 71%, and recall of 68%. In comparison, the 2-class decision forest machine learning model using only pathologic and clinical TNM staging was less accurate, with an AUC of 0.68 (95% CI, 0.67-0.70), an accuracy of 65%, precision of 69%, and recall of 52%.
The permutation feature of importance allows insight into how the machine learning model weights different factors in creating its algorithm. The results are displayed in Table 2. The most important features are displayed in ascending order, along with their corresponding importance score. In the creation of the clinical prediction model, the most important variable was patient age, followed closely by several clinical and pathologic variables including pathologic T stage, insurance status, lymph node size, institutional identification, positive margins at time of surgery, and more. The ideal parameter determined by the model was a minimum of 1 sample per leaf node, 1024 random splits per node, maximum of 64 for depth of the decision tree, and limitation to 32 different decision trees.
In this study, a model predicting 5-year overall survival among patients diagnosed with OSCC was constructed using machine learning. To our knowledge, this is one of the largest studies using machine learning to examine survival among patients with head and neck cancer. We demonstrate that machine learning offers a novel solution to improve and personalize patient care through the development of absolute risk estimates for individual patients. By incorporating multiple factors that go beyond TNM staging, we also found improved accuracy, precision, and recall ability in predicting overall survival among patients with OSCC.
The use of TNM characteristics has been a cornerstone of clinical practice given their simplicity and relative accuracy; however, the use of TNM characteristics can be improved by incorporating both clinical and pathologic variables. In an era in which vast amounts of electronic patient health data are available, machine learning could be incorporated into electronic health records to provide clinicians with valuable evidence-based prognostic information. In this study, the top variables used by the machine learning model to predict survival were largely a combination of demographic, pathologic, and treatment variables. This finding signifies the importance of incorporating variables that can more holistically describe patients in future efforts to improve prediction or prognostic modeling. This finding is consistent with several studies that have evaluated the clinical importance of demographic, socioeconomic, and tumor-specific factors for patients with head and neck cancer.5,6,28-31
The rapid growth of large registries that capture several dimensions of health data has been beneficial in the attempts to further improve prediction modeling. However, strict adherence to more traditional statistical methods, such as Cox proportional hazards regression, logistic regression, and Kaplan-Meier estimates, may slow the progress of prediction models. This possibility is demonstrated by the inability of the aforementioned methods to handle medical data with high variability, nonlinear interactions, and heterogeneous distributions.15,32 Machine learning techniques may be a more suitable form of analysis in this context, as they have been demonstrated to handle large data sets with complex, nonlinear, heterogeneous distributions.15,18,32 The unique characteristic of machine learning that allows for this advantage is its ability to apply Boolean logic, absolute conditionality, conditional probabilities, and other unconventional strategies to model data while still drawing heavily for statistics and probabilities.15
Machine learning is not without its drawbacks. Data that are incorrectly or poorly classified will affect the quality of the model.21 Improvements are constantly being made in our ability to capture patient health information to more comprehensibly or efficiently assess its effect. One such example includes the method by which cancer registries capture comorbidity. The NCDB uses the Deyo adaption of the Charlson Comorbidity Index to quantify the extent of comorbidity burden.25 More novel tools that better capture comorbidity have been developed, such as the Adult Comorbidity Evaluation-27,33 and the use of better clinical assessment tools leads to more accurate predictions.34 Similar to traditional statistics, modeling data with too few events relative to predictive variables is also a quality that needs to be appreciated.21 Another limitation to machine learning is the lack of transparency in the analysis. Machine learning involves multiple layers of analysis to make a meaningful prediction,13 and these layers often cannot be meaningfully interpreted.35
In this study, there were several limitations to be considered. Primarily, a common issue in our data was inconsistency or inaccuracy of reporting. Specifically, there were several instances in which 2 or more variables measuring related aspects of each patient had seemingly contradictory information. This inconsistency occurred in several instances such as descriptions of tumor or lymph node size, presence of metastatic disease, treatment type, or more. Furthermore, in the data set, several variables had missing data in more than 50% of cases, which reduced our ability to assess the potential effect of such variables on outcomes. Examples of these variables included lymphovascular invasion and human papillomavirus status. These missing data limit our ability to assess the potential effect these variables would have in prediction of survival. Future efforts to ensure capture of such variables are needed to further our understanding of its potential effect.
In addition, use of machine learning to predict overall survival among patients with cancer raises several ethical questions that must be answered prior to widespread implementation.14 First, will there be unintended consequences of accurate prognostication about individuals’ survival? Are clinicians and, in turn, patients less likely to pursue curative treatment in the face of a low prognosis of survival? Who is responsible when predictive models are incorrect: the clinician who uses the model or the developer of the predictive model? Would patients even want to know their likelihood of being alive or dead in 5 years? Many important ethical questions remain unanswered about the increasing role of automated algorithms in health care.
There are numerous ways in which machine learning algorithms could be used in a real-world clinical setting. First, an interface to the machine learning model could be published on a website. This feature is supported by the Azure Machine Learning Studio and would allow clinicians to input individual patient data into a web-based form. Based on the decision forest created in this analysis, 5-year overall survival would be predicted. Among the ways in which machine learning algorithms could be used in a real-world clinical setting, probably the most effective and most difficult to achieve would be direct integration of machine learning into the electronic medical record (EMR).36
A model developed using machine learning presents interesting and novel challenges to its incorporation in clinical practice. Key obstacles to the widespread adoption of this type of algorithm include convenience, regulation, and financial considerations. Multivariable models can be burdensome to use and are unlikely to gain popularity if they require manual entry of patient data. Multivariable models developed using machine learning differ in that artificial intelligence may allow automatic extrapolation of variables from the EMR. If included in the EMR, a new feature allowing artificial intelligence to interface with patient records may be the most convenient way to incorporate machine learning algorithms into clinical practice. This scenario would present significant regulatory challenges to ensure the compliance of machine learning with the Health Insurance Portability and Accountability Act. Furthermore, the incorporation of machine learning into different EMR products would require a clear financial benefit to EMR developers, who may currently be disincentivized from adopting these technologies.36
Using a machine learning algorithm, a survival prediction model was created implementing a variety of patient, clinical, tumor, facility, and treatment variables as collected by the NCDB. The created prediction model was compared with a prediction model that used only clinical and pathologic TNM stage and was found to better predict overall survival. This study highlights the importance of a more holistic approach of describing patients and how, in the era of large databases, machine learning and artificial intelligence may stand to play a significant role in improving health care by furthering our ability to quantify individual patient risk estimates.
Accepted for Publication: April 1, 2019.
Corresponding Author: Omar A. Karadaghy, MD, Department of Otolaryngology–Head and Neck Surgery, University of Kansas Medical Center, 3901 Rainbow Blvd, Mail Stop 3010, Kansas City, KS 66160 (email@example.com).
Published Online: May 2, 2019. doi:10.1001/jamaoto.2019.0981
Author Contributions: Drs Karadaghy and Bur had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: All authors.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Karadaghy, Shew, Bur.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Karadaghy, Shew, New.
Administrative, technical, or material support: Karadaghy, New.
Supervision: Shew, Bur.
Conflict of Interest Disclosures: None reported.
Meeting Presentation: This paper was presented at the Annual Meeting of the American Head & Neck Society; May 2, 2019; Austin, Texas.
Create a personal account or sign in to: