Development and Validation of an Automated Image-Based Deep Learning Platform for Sarcopenia Assessment in Head and Neck Cancer

Key Points Question Can a deep learning model that uses standard-of-care head and neck computed tomography (CT) data assess sarcopenia in patients with head and neck squamous cell carcinoma (HNSCC) and predict overall survival and disease outcome? Findings In this prognostic study of CT data from 899 patients with HNSCC, a deep learning pipeline accurately segmented the C3 skeletal muscle area to derive a skeletal muscle index for sarcopenia. The sarcopenia status was associated with a significantly reduced overall survival and increased risk of feeding tube dependency. Meaning These findings represent the first externally and clinician-validated automated sarcopenia assessment deep learning pipeline for HNSCC, which may inform treatment decisions and triage of patients.


+ Supplemental content
Author affiliations and article information are listed at the end of this article.

Introduction
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide. 1imary treatment approaches include surgery, radiation therapy, and chemotherapy, with a multimodality approach generally needed for advanced disease. 2 Although HNSCC can be cured, treatment often results in substantial acute and long-term toxic outcomes. 3Sarcopenia, a skeletal muscle disorder characterized by age-associated decreased muscle function and reduced skeletal muscle mass, results from several factors, including aging, malnutrition, inactivity, neurologic disorders, and cancer. 4,5Progressive sarcopenia is a component of cancer cachexia, a multifactorial syndrome that leads to functional decline that is difficult to reverse and leads to early mortality. 6th sarcopenia and cachexia are negative prognostic indicators in many forms of cancer. 7Patients with HNSCC are particularly susceptible to sarcopenia due to disease-and treatment-related malnutrition and dysphagia. 8,9mputed tomography (CT) is a well-established method for quantifying body composition and has been used extensively in clinical research. 10Imaging-assessed sarcopenia typically has been performed by measurement of skeletal muscle at the L3 vertebra. 4,11,12However, routine CT imaging for HNSCC does not extend to the abdomen, substantially limiting the feasibility of monitoring sarcopenia in these patients.urrently, calculation of the skeletal muscle index (SMI) through C3 muscle segmentation relies primarily on manual or semiautomated techniques, 9, [13][14][15][16][17] which can be time consuming 18 and is prone to error and intra-and interobserver variability. 19,20Accurate manual segmentation requires input from clinical experts with specialized knowledge of the head and neck, as well as specific training for reproducible C3 muscle segmentation and access to segmentation software.A fully automated, accurate SMI assessment pipeline is thus necessary for clinical integration and utility in the management and monitoring of HNSCC.2][23][24][25][26] Although recent studies have applied deep learning techniques to determine skeletal muscle through abdominal CT scans on the L3 vertebral level, [27][28][29] few have been performed in head and neck cancer, a disease that has been increasing in prevalence and is known for its challenges in terms of patient vulnerability, treatment decisions, and long-term adverse effects.Recently, Naser et al 30 introduced a multistage deep learning approach for segmenting the C3 vertebral region using head and neck CT scans, which showed good model performance and a potential for predicting patient survival.However, the study was confined to a single institution and lacked external validation and clinical evaluation, thus limiting clinical translatability. 30 this study, we hypothesized that a fully automated, multistage deep learning pipeline could be developed and externally validated for skeletal muscle segmentation to calculate SMI and assess sarcopenia.We also evaluated the clinical relevance of these measurements by assessing the prognostic value of baseline quantification of sarcopenia and its association with survival and toxicity in patients undergoing treatment.By automating the process of imaging-assessed sarcopenia, we sought to generate fast, consistent, and precise measurements to facilitate clinical translation and guide clinical decision making for patients with HNSCC.

Study Cohort
This prognostic study was approved by the Mass General Brigham institutional review board with a waiver of informed consent given the retrospective nature of the research and that many included patients were since deceased or no longer followed at the study institution.The study was conducted in accordance with the Declaration of Helsinki and follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline. 31e model development data set (n = 479) was curated via The Cancer Imaging Archive 32 from publicly available, deidentified data from patients treated at the MD Anderson Cancer Center (MDACC) between January 1, 2003, and December 13, 2013.Ground truth segmentations for the development data set (n = 390) were obtained, in part, from a publicly available data set (n = 301).33 The remainder of the images in the development data set (n = 89) were manually segmented by an experienced radiation oncologist (A.S.).For external validation, 1316 consecutive patients who underwent primary radiation therapy for HNSCC between January 1, 1996, and December 31, 2013, were retrospectively collected from Brigham and Women's Hospital (BWH). A otal of 988 patients who had complete abdominal CT scans and linked clinical data were included.Subsequently, additional patients were excluded due to missing clinical database information (n = 328), surgery before radiation therapy (n = 190), and missing height information (n = 378), leading to a final set of 420 patients used in the external set (Figure 1).All images were standard diagnostic CT scans (eTables 1 and 2 in Supplement 1).A series of curation and preprocessing procedures were performed to ensure that all data met the quality and standard requirement of the segmentation and predictive models (Figure 1).

Model Development, Training, and Validation
To build an efficient, fully automated pipeline for accurate C3 segmentation, we adopted a 2-stage deep learning approach: (1) slice selection corresponding to the mid-C3 vertebral body in the axial plane and (2) segmentation of the skeletal muscle (eFigure 1 in Supplement 1).The slice selection step used a DenseNet, 34 and the segmentation step used a U-Net. 35The

External Validation
To determine whether the model could be generalized to patients from outside institutions, we used the BWH data set (n = 420) for the external test.Two experienced radiation oncologists (F.H. and B.H.K.) individually reviewed and evaluated the quality of model-generated C3 muscle segmentations by using a Likert scale from 0 to 3 to generate a clinical acceptability score, defined as follows: 0, the model selected an incorrect axial slice for segmentation that does not correspond to the C3 vertebral body; 1, the segmentation is unacceptable (defined as an estimated >5% muscle volume discrepancy compared with expert segmentation); 2, the segmentation is clinically acceptable, though when compared with expert segmentation would result in a small volume discrepancy of less than or equal to 5%; and 3, segmentation is acceptable with no difference from expert segmentation.

Definition of Sarcopenia and Association With Outcomes
L3 skeletal muscle area and SMI were calculated based on the C3 cross-sectional area (CSA), age, sex, weight, and height as proposed by Swartz et al 13 and van Rijn-Dekker et al 9 (eMethods in Supplement 1).Race and ethnicity data were not available in the institutional databases used for this analysis.The SMI thresholds of 52.4 cm 2 /m 2 for males and 38.5 cm 2 /m 2 for females were adopted to stratify patients into sarcopenia and no sarcopenia groups, as established by Prado et al. 37 Body mass index (BMI), measured by weight in kilograms divided by height in meters squared, is a commonly used metric to assess the health and prognosis of patients with HNSCC during therapy, though it does not provide an assessment of muscle mass directly.To evaluate whether SMI-based sarcopenia was more predictive than BMI for overall survival (OS) and percutaneous endoscopic gastrostomy (PEG) tube duration, we substituted BMI for sarcopenia in the univariable and multivariable analyses.For primary comparison, we stratified patients into underweight (BMI<18.5) and not underweight (BMIՆ18.5)groups based on World Health Organization classification. 38

Statistical Analysis
The data analysis was performed between May

Patient Characteristics
The total patient cohort comprised 899 patients with HNSCC (

Model Performance
Evaluation of model slice selection revealed that the difference between the estimated mid-C3 slice and the ground truth slice was minimal, as shown by histogram analysis (Figure 2A).The mean (SD) difference between the locations of predicted C3 slice and ground truth slice were 0.11 (1.13) mm and 0.07 (1.08) mm for the validation and internal test sets, respectively (Figure 2A).The DSC values obtained for the validation and internal test sets predicted segmentations vs ground truth were 0.90 (95% CI, 0.90-0.91)and 0.90 (95% CI, 0.89-0.91),respectively (eFigure 5A and eTable 4 in Supplement 1).Additionally, the precision, recall, and intraclass correlation coefficient scores, as summarized in Figure 2B, all showed excellent model performance in predicting C3 segmentations.
The C3 CSAs derived from predicted segmentations showed near-perfect correlations with the ground truth-calculated CSAs (validation set: r = 0.99 [P < .001];test set: r = 0.96 [P < .001])(Figure 2B).Representative examples of C3 section slices on sagittal CT images and ground truth segmentations on axial images with performance metrics are shown in Figure 3.

External Test
The reviewers conducted initial quality assessment for the external test set (n = 420) and identified 43 scans (10%) they judged to be problematic, resulting in a final set of 377 patients, which was then carefully reviewed and assigned acceptability scores.Representative cases with C3 slice predictions and ground truth segmentations for acceptability scores 0, 1, 2, and 3 are shown in Figure 3C.

Skeletal Muscle Index Measurement Comparisons
We calculated and compared the SMI values between model predictions and ground truth for both the validation set and internal test set (eFigure 6A and B in Supplement 1).Accurate model skeletal muscle segmentations led to predicted SMI values that were highly correlated with the ground truth values (Pearson r Ն 0.99; P < .001)for both female and male patients in all data sets (eFigure 6A and B in Supplement 1). a Kruskal-Wallis rank sum test.
b Fisher exact test for differences among the data sets.
c Patients with nonoropharyngeal carcinoma who did not undergo HPV p16 testing were coded as negative, given the very low incidence of HPV p16-positive tumors in these disease sites.

Predictive Analyses for Sarcopenia
A total of 342 patients with complete survival and toxicity information from the external test set were further included for sarcopenia predictive analysis (eTable 5 in Supplement 1).2).Sarcopenia was not associated with insertion of a PEG tube at diagnosis but was associated with a higher risk of having a PEG tube at last follow-up (odds ratio, 2.25; 95% CI, 1.02-4.99;P = .046)(eTable 6 in Supplement 1).Sarcopenia was not associated with a higher risk of hospitalization less than 3 months after radiation therapy, a higher risk of osteoradionecrosis, post- 08] mm).B) Scatterplots depict the C3 skeletal muscle cross-sectional area (CSA), with the ground truth manual segmentation on the x-axis and the calculated CSA using estimated segmentations on the y-axis for validation (r = 0.99; P < .001)and internal test (r = 0.96; P < .001)sets.

Discussion
In this study, we successfully developed and validated an end-to-end deep learning pipeline that uses head and neck CT images for efficient and accurate segmenting of cervical vertebral skeletal muscle, calculation of SMI, and diagnosis of imaging-assessed sarcopenia in patients with HNSCC.We applied our tool to a large external validation cohort, where we found that imaging-based sarcopenia was associated with poorer OS and longer PEG tube duration.Furthermore, sarcopenia was more predictive of these outcomes than BMI.This externally validated deep learning pipeline could translate clinically as a fast and fully automated prognostic tool for patients with HNSCC in routine clinical practice.This end-to-end deep learning pipeline is the first, in our knowledge, for determining sarcopenia that uses head and neck CT images and that has been externally validated with a substantial patient population.
We followed a 2-step process, similar to a recent study conducted by Naser et al, 30   Sarcopenia is an important prognostic factor of decreased OS in various types of cancers. 4,5,37r HNSCC, these findings appear to be irrespective of geographic area, head and neck tumor sites, and treatment approaches. 40,41We found sarcopenia to be associated with worse OS, similar to prior studies. 9,14,156][17] Although chemotherapy is not a primary treatment for HNSCC, it is often given with radiation therapy in either an adjuvant or a definitive setting.Chemotherapy may also be administered prior to other treatments as a neoadjuvant approach.Recent retrospective studies in patients with locally advanced HNSCC concluded that pretreatment of sarcopenia was an important factor associated with chemotherapy dose-limiting toxicity in patients treated with chemoradiation therapy using platinum-based chemotherapy. 44,45In our study, we tested the correlations between sarcopenia and a series of chemotherapy and radiation therapy toxicity end points.We found that sarcopenia was associated with longer PEG tube duration and a higher risk of having a PEG tube at last follow-up.This finding validates a study by Karsten et al 43 that showed that sarcopenia contributes to the risk of prolonged feeding tube dependency in patients with HNSCC treated with primary chemoradiation therapy.We did not find that sarcopenia was associated with a higher risk of hospitalization less than 3 months after radiation therapy.We did not see an association between sarcopenia with risk of osteoradionecrosis and post-radiation therapy stricture.In HNSCC surgical populations, sarcopenia has been shown to be a negative prognostic indicator for both overall complications and wound complications, including pharyngocutaneous fistulas in patients undergoing total laryngectomy for HNSCC. 12In our study, however, sarcopenia was not associated with chemoradiation-associated treatment complications requiring surgery.

Limitations
Our study had several limitations.First, the analysis is limited by the inherent constraints of a retrospective study.Due to various exclusion criteria, a number of patients were excluded from the final analysis, which may bias the distribution of patient characteristics.Our study consisted of patients whose cancers were managed nonoperatively; thus, validity of C3-based sarcopenia in surgically managed cancers requires further study.Our cohort was highly enriched for oropharynx carcinoma, and while we have no reason to believe that mucosal subsite would affect the accuracy of C3 skeletal muscle segmentation, it may modify the effect of sarcopenia as a prognostic factor.
Second, our median DSC scores were lower than those reported by Naser et al 30 (0.90 vs 0.95).We believe that our lower DSC scores are due to the preprocessing step we implemented to account for significant differences in CT imaging parameters between our development cohort (MDACC) and external test cohort (BWH).We were able to achieve a median DSC of 0.94 for validation and internal test sets in the MDACC cohort without this preprocessing step.However, the robustness of our model was compromised when applied to the external test data set.Moving forward, we plan to further optimize our preprocessing steps to improve the model's performance while maintaining its generalizability.Third, the overall proportion of patients without sarcopenia, as well as death events, were lower in our external cohort than other studies, and we were unable to find an association of PEG tube duration with other known risk factors of long-term dysphagia, including radiation dosevolume parameters of the pharyngeal constrictor muscles. 46Fourth, HPV status information was not available for a substantial proportion of the patients in our external test data set.In a relatively smaller patient cohort with available HPV status, we did not observe significant associations of OS and PEG tube duration with sarcopenia as well as with most other clinical variables (eTables 13 and 14 in Supplement 1).Given that HPV status is a crucial clinical risk factor for patients with HNSCC, it is In this prognostic study of CT data from 899 patients with HNSCC, a deep learning pipeline accurately segmented the C3 skeletal muscle area to derive a skeletal muscle index for sarcopenia.The sarcopenia status was associated with a significantly reduced overall survival and increased risk of feeding tube dependency.Meaning These findings represent the first externally and clinician-validated automated sarcopenia assessment deep learning pipeline for HNSCC, which may inform treatment decisions and triage of patients.

Figure
Figure 1.Flow Diagrams for Training, Validation, and Internal Test and External Test Data Sets

Figure 2 .
Figure 2. Performance of the Convolutional Neural Network Slice Selection Model and U-Net Segmentation Model for Segmentation of the C3 Vertebral Section

Figure 3 .
Figure 3. Representative Cases With Ground Truth Slices of C3 Vertebral Sections on Sagittal Computed Tomography (CT) Images and Ground Truth Segmentations on Axial Images to segment the C3 skeletal muscle.However, our methods differ substantially.Naser et al used a 3-dimensional (3D) ResUNet model to segment the C3 vertebral section first and then automatically selected the middle slice and applied a 2D ResUNet model to segment the skeletal muscle.In contrast, we used a 2D DenseNet-based regression model to automatically select the C3 skeletal muscle slice, and then used a 2D U-Net model to segment the selected slice.We achieved excellent model performance for both the slice selection and segmentation models in the validation and internal test sets.In a large external test set, 96.2% of skeletal muscle segmentations was also deemed acceptable by expert consensus review.Compared with 3D convolutional neural network models, 2D convolutional neural network models are generally much easier to train and implement, making our pipeline fast and

Model development Model external validation Model clinical application Segmentation model evaluation
1. Flow Diagrams for Training, Validation, and Internal Test and External Test Data Sets 38llis rank sum test and Fisher exact test were performed to test for differences among the training, validation, internal test, and external test data sets.The interrater reliability test using the agreement coefficient 1 introduced by Gwet39was used to measure the agreement between the ratings by 2 clinicians (F.H. and B.H.K.) on the acceptability scores.The predictive association of sarcopenia with toxicity end points was evaluated using univariable logistic regression analyses.The sarcopenia associations with OS and PEG tube duration were assessed using Cox proportional hazards regression.38Wecompared model fit using BMI in place of sarcopenia with absolute change in Akaike information criterion (AIC) and bayesian 1, 2022, and March 31, 2023.The Dice similarity coefficient (DSC), precision, recall, and intraclass correlation coefficient were calculated to assess JAMA Network Open | Oncology Deep Learning for Sarcopenia Analysis in Head and Neck Cancer JAMA Network Open.2023;6(8):e2328280.doi:10.1001/jamanetworkopen.2023.28280(Reprinted) August 10, 2023 4/15 Downloaded From: https://jamanetwork.com/ by a Aalto University Library User on 08/25/2023 segmentation model performance.The Kruskal-
the SMI calculation and designation of sarcopenia for the external test set.Detailed failure analyses are found in the eMethods, eTable 3, and eFigure 4 in Supplement 1.

Table 1 .
Patient Characteristics (N = 899) Deep Learning for Sarcopenia Analysis in Head and Neck Cancer , or treatment complication requiring surgery.On subgroup analysis of only patients with known HPV status (n = 225), sarcopenia was associated with survival and PEG tube duration on univariable analysis but not on multivariable analysis (eTables 13 and 14 in Supplement 1).

Table 2 .
Univariable and Multivariable Analyses for Overall Survival and Percutaneous Endoscopic Gastrostomy (PEG) Tube Duration PEG tube duration was defined as the time from insertion to removal of PEG tube (ie, HR<1.00 represents longer time to removal or greater PEG tube duration).

by a Aalto University Library User on 08/25/2023 efficient
for the C3 skeletal muscle segmentation for sarcopenia analysis.In this study, it took our experienced radiation oncologist 5 to 10 minutes to identify and segment C3 skeletal muscle for 1 patient.In contrast, our end-to-end deep learning pipeline required only 0.15 seconds for the same task, which is considerably quicker than a human expert.

JAMA Network Open | Oncology
imperative to include it in the sarcopenia-based outcome prediction model.Therefore, our next step will be to focus on curating additional patient data with HPV status to further validate and refine our model.ConclusionsIn this prognostic study, we developed and externally validated a fully automated deep learning platform for fast and accurate sarcopenia assessment that can be used on routine head and neck CT imaging.Our model has shown excellent C3 skeletal muscle segmentation capability on data sets from different institutions, with high agreement with an expert clinician's segmentation and high acceptability rates from expert clinicians' reviews.Furthermore, our findings show that the model's estimated SMI strongly correlates with the ground truth SMI and that the SMIs estimated worse OS and longer PEG tube duration in a large HNSCC cohort.If further validated, our end-to-end deep learning pipeline could be incorporated into standard clinical practice for directing future treatment approaches and clinical decision making, as well as for individualized supportive measures, including nutrition guidance and physical therapy.CTScanner Manufacturers and Models Used for Head and Neck Scans eTable 2. Head and Neck CT Scan Characteristic Deviation Table eTable 3. Summary on Failing Cases on External Test Set eTable 4. U-Net Segmentation Model Performance eTable 5. Patient Characteristics for Nonsarcopenic and Sarcopenic Groups eTable 6. Univariable Analysis for the Association of Sarcopenia With Various Toxicity End Points JAMA Network Open | Oncology Deep Learning for Sarcopenia Analysis in Head and Neck Cancer Baseline Univariable and Multivariable Analyses for Overall Survival With Underweight eTable 8. Baseline Univariable and Multivariable Analyses for Overall Survival With Overweight eTable 9. AIC and BIC for Overall Survival Model eTable 10.Multivariable Analyses for PEG Tube Duration With Overweight eTable 11.Univariable and Multivariable Analyses for PEG Tube Duration With Overweight eTable 12. AIC and BIC for PEG Tube Duration Model eTable 13.Univariable and Multivariable Analyses for Overall Survival With Available HPV Status eTable 14.Univariable and Multivariable Analyses for PEG Tube Duration With Available HPV Status eTable 15.Fairness Assessment of Deep Learning Pipeline eFigure 1. Workflow of the Fully Automated Deep Learning Pipeline for Accurate C3 Segmentation eFigure 2. Model Architecture for 2D DenseNet-Based Slice Selection Model, Adapted DenseNet CNN Architecture eFigure 3. Model Architecture for 2D U-Net-Based Segmentation Model eFigure 4. Representative Failing Cases With Different Failing Causes eFigure 5. C3 Segmentation Showed Excellent Performance in Both DSC Evaluation and Clinical Acceptability Evaluation eFigure 6. Scatter Plots of the Skeletal Muscle Index (SMI) Values and Kaplan-Meier Survival Estimates JAMA Network Open.2023;6(8):e2328280.doi:10.1001/jamanetworkopen.2023.28280(Reprinted) August 10, 2023 10/15 Downloaded From: https://jamanetwork.com/ by a Aalto University Library User on 08/25/2023 JAMA Network Open.2023;6(8):e2328280.doi:10.1001/jamanetworkopen.2023.28280(Reprinted) August 10, 2023 14/15 Downloaded From: https://jamanetwork.com/ by a Aalto University Library User on 08/25/2023 eTable 7.