Development of a Deep Learning Model to Identify Lymph Node Metastasis on Magnetic Resonance Imaging in Patients With Cervical Cancer

Key Points Question Can deep learning identify preoperative noninvasive lymph node metastasis diagnosis in cervical cancer? Findings This diagnostic study including a total of 479 patients developed a deep learning model to preoperatively and noninvasively identify lymph node metastasis on magnetic resonance imaging, achieving an area under the receiver operating characteristic curve of 0.933 in the independent validation cohort. The predicted lymph node metastasis probability was significantly associated with prognosis of cervical cancer. Meaning Findings from this study suggest that deep learning can be used as a preoperative noninvasive tool for diagnosing lymph node metastasis in cervical cancer.


Introduction
Cervical cancer is one of the most common cancers among women. 1 The treatment and management of cervical cancer are often guided by the International Federation of Gynaecology and Obstetrics (FIGO) staging system, which is based on clinical assessment and imaging rather than invasive investigations, such as surgery. 2 In the 2018 FIGO staging system, once lymph node (LN) metastasis (LNM) is identified either by imaging or pathologic testing, cancer will be considered as stage IIIC irrespective of other findings. 3Moreover, LNM has been reported to be associated with prognosis and treatment planning in cervical cancer. 4,5Specifically, patients who show evidence of LNM may undergo chemoradiotherapy rather than surgery as their first choice, 6 avoiding surgery followed by adjuvant chemoradiotherapy and possible serious complications thenceforth. 7,8Therefore, accurate identification of LN status preoperatively in patients with cervical cancer might avoid unnecessary surgical intervention and benefit treatment planning.
Magnetic resonance imaging (MRI), a commonly used imaging modality in cervical cancer, 9 provides a preoperative method for assessing LN status in cervical cancer.4][15][16] In previous research, the sensitivity of MR images to discriminate metastatic from nonmetastatic LN has shown improvement by using radiomic features. 13However, radiomic features need time-consuming tumor delineation, and they might not be adaptive to specific clinical issues.
Deep learning (DL) as an artificial intelligence method has recently shown promising performance in many medical image analysis tasks, [17][18][19] such as diagnosing Alzheimer disease, 20 screening for breast cancer, 21 and detecting thoracic diseases. 22Moreover, DL also exhibited predictive performance in cervical cancer, such as screening and predicting toxic rectal reactions to radiotherapy. 23,24Compared with traditional methods, DL has an advantage in automatically learning and hierarchically organizing task-adaptive image features. 25Even though these features cannot be identified visually, they tend to reflect the high-dimensional association between images and clinical issues. 26Furthermore, DL does not require precise tumor delineation, making it an easy-to-use method in clinical practice.8][29] In this research, we aimed to develop a DL model to provide a preoperative noninvasive tool for diagnosing LNM in cervical cancer.

Methods
Two outcomes were studied.The primary diagnostic outcome was LNM status, with the pathologic characteristics diagnosed by lymphadenectomy.We first developed a DL model that used MR images to diagnose LNM.Then we proposed a hybrid model that integrated tumor image information and MRI-reported LN (MRI-LN) status.Herein, MRI-LN status was defined as positive if the short-axis diameter of the largest LN shown on MRI was equal to or larger than 1 cm. 10 We assessed the models' performance by receiver operating characteristic analysis.The second primary clinical outcome was disease-free survival (DFS).We assessed the prognostic ability of the hybrid model with regard to DFS by the Kaplan-Meier method.
The institutional review boards of Sun Yat-sen University Cancer Center, Henan Provincial People's Hospital, and Yunnan Cancer Hospital approved this retrospective study with deidentified data, and the need for informed consent from patients was waived.This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline for diagnostic studies.
A total of 479 patients with cervical cancer who underwent radical hysterectomy and pelvic lymphadenectomy were enrolled in this research.A total of 338 patients from Sun Yat-sen University Cancer Center (n = 218, from January 2011 to December 2017) and Henan Provincial People's Hospital (n = 120, from December 2016 to June 2018) composed the primary cohort, and 141 patients from Yunnan Cancer Hospital between January 2011 and December 2017 composed the independent validation cohort.All of these patients met the following inclusion criteria: (1)   pathologically confirmed cervical cancer; (2) pelvic MRI performed within 2 weeks before the operation; (3) complete clinicopathologic data available, such as age, FIGO stage, histologic characteristics, differentiation, lymphovascular space invasion, LNM, and MRI-LN status; (4) no concurrent cancers; and (5) no preoperative treatment.We excluded patients if the tumor lesions were not visible on MRI or if the image quality was poor as assessed by 2 radiologists (Q.W. and J.F.) with more than 9 years' experience and blinded to all clinical information.The recruitment pathway is shown in eFigure 1 in the Supplement.
After surgery, patients from Sun Yat-sen University Cancer Center and Yunnan Cancer Hospital were followed up with MRI or positron emission tomographic and computed tomographic imaging every 3 to 4 months for the first 2 years, every 6 months from the third to fifth years, and then annually.The end point of this study was DFS, which was defined as the period from the date of the operation to the date of the first local-regional recurrence, distant metastasis, all-cause mortality, or the latest follow-up used for censoring.Local-regional recurrences and distant metastasis were confirmed by gynecologic examination; imaging modalities, such as computed tomographic imaging, MRI, and positron emission tomographic and computed tomographic imaging; or biopsy findings.

Image Acquisition and Preprocessing
All patients underwent pelvic MRI scans, including sagittal contrast-enhanced T1-weighted imaging (CET1WI), axial T2-weighted imaging (T2WI), and axial diffusion-weighted imaging (DWI).Magnetic resonance imaging scanning parameters are described in eMethods 1 in the Supplement.We generated apparent diffusion coefficient (ADC) maps to analyze DWI sequence (b values, 0 and 800 s/mm 2 ).
To extract tumor information for analysis, the same 2 radiologists (Q.W. and J.F.) used rectangular bounding boxes for the region of interest (ROI) to tightly encapsulate tumors on MRI.This tight ROI was defined as ROI tumor.Because peritumoral regions were reported to have diagnostic value in predicting LN status, 13 we also expanded ROI tumor by 5 pixels to add peritumoral information, defined as ROI tumor+peritumoral.Examples of ROI tumor and ROI tumor + peritumoral are shown in Figure 1.

Model Development and Visualization
We developed an end-to-end DL model for LNM prediction (subnetworks 1 and 2 in Figure 1).The network was the stack of multiple convolutions, zero padding, and batch normalization layers.Layers were basic computational units in DL models, 30 and the links of layers were similar to connections between neurons in brains; details of the layers are presented in eMethods 2 in the Supplement.Subnetwork 1 was similar to ResNet18, a widely used deep learning model, [31][32][33] and the detailed network architecture is described in eMethods 3 and eFigure 2 in the Supplement.To enhance model training, subnetwork 1 was pretrained by 14 million natural images from the ImageNet data set 34,35 and was fine-tuned using images from the primary cohort that comprised 5280 CET1WI, 1633 T2WI, and 1474 ADC map image sections.When an MR image of the tumor was fed into the DL model, subnetwork 2 predicted the LNM probability for the tumor.We defined the DL model-predicted LNM probability as the DL score.Owing to the inconsistency of previous research about the performance of MRI sequences, [13][14][15][16] we compared the DL model among 3 MRI sequences to find the optimal model for LNM prediction.
As some preoperative clinical characteristics of cervical cancer have been reported to be associated with LNM, 36 we evaluated 3 preoperative clinical factors (age, FIGO stage, and MRI-LN status) and selected the significant factors (P < .05) in the primary cohort to build clinical models.
Because the DL model can mine high-dimensional information from MRI and clinical features can reflect tumor information from clinicopathologic aspects, we developed a hybrid model to combine information from these sources to explore whether they can be complementary (subnetworks 1 and 3 in Figure 1).We defined the hybrid model-predicted LNM probability as the H score. Detailed training processes of the DL and hybrid models are described in eMethods 4 in the Supplement.
To gain further intuition and explore the underlying basis of the end-to-end DL model, we applied visualization algorithms to display how the network learned the LNM-related information (eMethods 5 in the Supplement). 37We evaluated the DL model using the following methods: (1)   visually assessing the area in the tumor that drew the attention of the DL model (defined as attention map), ( 2) visualizing convolutional features learned by the network (defined as DL feature), and (3)   exploring the association between the DL feature and LN status.A discriminative DL feature should have different responses between patients with node-negative and node-positive findings.

Statistical Analysis
All statistical analyses were performed with R, version 3.5.1 software (R Project for Statistical Computing).The statistical difference of clinical variables was assessed with an unpaired, 2-tailed χ 2 test for categorical variables or t test for continuous variables.The Mann-Whitney test was applied to assess the difference of the DL score between patients with node-negative and node-positive findings.The DeLong test was applied to assess the difference of the receiver operating characteristic curves between different models. 38The Kaplan-Meier method and 2-sided log-rank tests were applied to estimate DFS.P < .05indicated a statistically significant difference.

Results
We reviewed 894 patients with stage IB to IIB cervical cancer who underwent radical hysterectomy and pelvic lymphadenectomy; 479 patients fulfilled the eligibility criteria and were enrolled in the primary (n = 338) and validation (n = 141) cohorts.The mean (SD) age of the patients was 49.1 (9.7) years.A total of 71 patients (21.0%) in the primary cohort and 32 patients (22.7%) in the validation cohort had LNM confirmed by lymphadenectomy (Table 1).As of December 2017, 188 patients from Sun Yat-sen University Cancer Center (30 lost to follow-up) and 128 patients from Yunnan Cancer Hospital (13 lost to follow-up) had completed the DFS follow-up.

Diagnostic Performance of the Models
The MRI-LN status exhibited specificity of 94.38% in the primary cohort and 94.50% in the validation cohort, and sensitivity of 36.62% in the primary cohort and 21.88% in the validation cohort.The clinical model, which incorporated FIGO stage and MRI-LN status, yielded area under the curve (AUC) values of 0.704 (95% CI, 0.633-0.776) in the primary cohort and 0.622 (95% CI, 0.519-0.725) in the validation cohort (Table 2).
Among all the DL models (Figure 2A  Assisted by the DL visualization algorithms, we discovered a high-response area for each tumor (eFigure 4 in the Supplement).These high-response areas were more important than other parts of tumors because they drew more attention to the DL model and consequently contained more LNM-related information.These high-response areas included both intratumoral and peritumoral areas, indicating that both intratumoral and peritumoral regions were necessary for the DL model to make decisions.
To have a better understanding of the DL feature learned by the network, we visualized representative DL features from convolution layers (eFigure 5A in the Supplement).In the shallow convolution layers, the DL model extracted simple tumor edge features (the second and sixth layers), while in deeper convolution layers, it extracted complex tumor texture information (the tenth layer).
In the last convolution layer, the DL model extracted high-level abstract features (the fourteenth layer).Although these high-level features were so intricate that they were hard to interpret by general gross observation, they were associated with LN status.As shown in eFigure 5B in the Supplement, the patients with node-negative findings had weaker DL-feature responses and vice versa, indicating that the network learned discriminative DL features for LNM prediction.
In eFigure 6A in the Supplement, we visualized 2 DL features of the last convolution layer to explore the association between DL features and LNM.The positive DL feature had strong responses to patients with node-positive findings and weak responses to those with node-negative findings.
Similarly, the negative DL feature had strong responses to patients free of LNM and was nearly shut down in patients with LNM.The response value of negative and positive DL features also showed a statistically significant difference between patients with node-positive and node-negative findings in the primary (DL feature response among positive DL feature status: node-positive vs node-

Prognostic Value of the Hybrid Model
Because the LN status of cervical cancer has been reported to be a crucial prognostic factor, 40,41 we performed survival analyses to assess the prognostic ability of the hybrid model with regard to DFS.
We used the median H score to stratify patients into low-and high-risk groups.
The median survival time for DFS was 31 (IQR, 16-56) months in the primary cohort and 23 (IQR, 14-33) months in the validation cohort.Figure 2C, D shows a significant difference between low-and high-risk patients from the hybrid model in the primary cohort (hazard ratio, 3.24; 95% CI, 1.64-6.44;P < .001)and validation cohort (hazard ratio, 4.59; 95% CI, 2.04-10.31;P < .001).Patients with higher H scores had a shorter time to reach the DFS.

Discussion
In In previous studies, peritumoral regions in cervical cancer have been shown to be valuable in diagnosing LNM and estimating neoadjuvant chemotherapy response. 13,42Therefore, we compared the 2 DL models using ROI tumor + peritumoral and ROI tumor.Contrary to CET1WI tumor + peritumoral, the AUC of the CET1WI tumor decreased from 0.844 to 0.742, suggesting that peritumoral regions played a role in predicting LNM in cervical cancer.Adding peritumoral regions led to increased AUC, which can probably be explained by the fact that higher lymphatic vessel density in peritumoral regions might lead to higher regional LNM. 43As reported in previous studies, an increase in lymphatic vessel density can change the tumor microenvironment and metastatic propensity, 44 which is reflected in many cancers, including cervical, prostate, and breast cancer. 43,45,46Findings shown in eFigure 4 in the Supplement suggest that the DL model also used both intratumoral and peritumoral regions to make its final decision.
Owing to the high sensitivity of the CET1WI tumor + peritumoral model and the high specificity of MRI-LN status, we developed a hybrid model to integrate image-level and clinicopathologic-level information, resulting in an increase in the AUC from 0.844 to 0.933, sensitivity from 87.5% to 90.62%, and specificity from 70.64% to 87.16%.These improvements suggest that the DL model mined complementary information to the MRI-LN status.Therefore, with the apparent high sensitivity and specificity of our hybrid model, this model might be used preoperatively to help gynecologists make decisions.
In clinical practice, the following 2 scenarios may result in an inappropriate treatment plan: lymphadenopathy not detected on MRI but positive results shown in surgery (patient 2 in Figure 3) and lymphadenopathy detected on MRI but proved to be negative (patient 4 in Figure 3).Therefore, we applied stratified analysis to explore the added value of the DL model within MRI-LN subgroups.
As shown in eFigure 3B in the Supplement, the DL score from the CET1WI tumor + peritumoral model exhibited a significant difference between patients with node-positive and node-negative findings within MRI-LN subgroups in the primary and validation cohorts (all P < .001).Therefore, the DL model may benefit patients with false-negative and false-positive LN status on routine MRI.
In contrast with previous studies, our study develops an end-to-end DL model to detect LNM applying clinical factors, and radiomic analysis.9][50] The sensitivity of clinical characteristics (eg, FIGO stage and MRI-LN status) is not sufficient to help inform decision-making by clinicians.Radiomic analysis requires time-consuming tumor delineation, which affects the reproducibility of radiomic features. 51Although radiomic features can reflect some generalized image features, those characteristics might not be adaptive to LNM prediction.Consequently, we developed a DL model to try to overcome these problems by automatically learning LNM-related features, providing a helpful adjunct to assess LNM.

Limitations
Despite the favorable diagnostic performance of the DL model, our research has limitations.First, a more extensive and prospective data set is needed to generalize the performance of the DL model.
Second, although CET1WI showed better performance than T2WI and ADC maps, the combination of these sequences is unclear.

Conclusions
The findings of this study suggest that DL may serve as a preoperative noninvasive tool to diagnose LNM in women with cervical cancer.The H score from the hybrid model was significantly associated with the prognosis of cervical cancer.

Figure 1 .
Figure 1.Illustration of the DL Model and the Hybrid Model

SUPPLEMENT. eMethods 1 . 2 . 3 . 4 . 5 . 2 . 3 . 4 . 5 . 6 .
Magnetic Resonance Image Acquisition Parameters Used in the Present Study eMethods Mathematical Description of the Deep Learning Network eMethods Development of the DL Model and the Hybrid Model eMethods Training Process of the DL Model and the Hybrid Model eMethods Details of the DL Model Visualization eFigure 1. Patient Flowchart eFigure Architecture of the DL and Hybrid Model eFigure Performance of the DL Score eFigure Response Area of Representative Patients eFigure The DL-Feature Visualization eFigure The DL-Feature Analysis eReferences.

Table 2 .
Diagnostic Performance of Various Models Development of a Deep Learning Model to Identify Lymph Node Metastasis on Magnetic Resonance Imaging Abbreviations: ADC, apparent diffusion coefficient; AUC, area under the receiver operating characteristic curve; CET1WI, contrast-enhanced T1-weighted imaging; FIGO, International Federation of Gynaecology and Obstetrics; MRI-LN, magnetic resonance imaging-reported lymph node; T2WI, T2-weighted imaging.a Best performance.JAMA Network Open | Health Informatics JAMA Network Open.2020;3(7):e2011625.doi:10.1001/jamanetworkopen.2020.11625(Reprinted) July 24, 2020 6/13 Downloaded From: https://jamanetwork.com/ on 09/26/2023 this multicenter study, we developed an end-to-end DL model to diagnose LNM for patients with cervical cancer preoperatively.We compared the DL model among different MRI sequences (CET1WI, T2WI, and DWI) and explored the diagnostic value of intratumoral and peritumoral regions.Among all DL models, the CET1WI tumor + peritumoral model achieved the best performance, indicating that the CET1WI sequence probably contained more LNM-related information than the other 2 sequences (T2WI and DWI).To mine diagnostic information from both MR images and clinical characteristics, a hybrid model combining the CET1WI tumor + peritumoral model with MRI-LN status was established.This hybrid model appears to be able to identify more than 90% of metastatic LN cases with a specificity of more than 87%.Moreover, we found that the H score was significantly associated with DFS of cervical cancer, indicating that the hybrid model was a good prognostic indicator.