Use of Virus Genotypes in Machine Learning Diagnostic Prediction Models for Cervical Cancer in Women With High-Risk Human Papillomavirus Infection

Key Points Question Can human papillomavirus (HPV) screening results and commonly available clinical data be used to develop a high-performance prediction model for cervical cancer among women infected with high-risk HPV? Findings In this diagnostic study of 21 720 women with high-risk HPV infection, the developed prediction model had good performance for predicting cervical intraepithelial neoplasia grade 3 or worse and grade 2 or worse, especially when HPV genotype was included in the model. Meaning These findings suggest that this prediction model may be an important tool in screening and monitoring cervical cancer, particularly in low-resource settings where high-quality and extensive cytological and colposcopic examinations are unavailable.


Introduction
Cervical cancer is the fourth most common cancer in women worldwide according to Global Cancer Statistics 2020. 1 With an estimated 604 000 new cases and 342 000 deaths globally in 2020, cervical cancer poses serious threats to women's lives and heavy economic burdens on society, especially in developing countries, such as China. 1 In 2020, China accounted for approximately 18.2% and 17.3% of global new cases of and deaths from cervical cancer, respectively, 2 highlighting the importance of screening, diagnosis, and management of cervical cancer.
Well-established screening programs for early detection can lead to a clear decrease in the mortality and incidence of cervical cancer. However, the implementation of screening programs in developing countries remains a critical issue. 3,4 China implemented a national cervical cancer screening program in 2009, but the current coverage rate in China is still less than 30%, with lower rates in rural regions than in urban regions (22.6% vs 30.0%). 5 In addition, many developing countries, including China, are facing insufficient medical resources and a lack of skilled health care personnel. 6,7 Therefore, it remains challenging to launch high-quality cervical cancer screening programs covering all women in developing countries. This lack of high-quality screening suggests that the ability to accurately predict cervical cancer based on commonly available clinical information would be of great value in low-resource settings.
Previous studies 8 have constructed prediction models for cervical cancer based on common clinical information, but the participants were from only 1 or 2 hospitals, which may not be representative of the general population, and the sample sizes were insufficient. In addition, highrisk human papillomavirus (hrHPV) is recognized as an etiologic agent for cervical cancer, 9 and the different HPV genotypes are associated with different risks of cervical cancer. 10 It is reasonable to assume that the inclusion of HPV genotypes in the model may improve prediction ability. However, only 2 previous studies have considered HPV genotypes in prediction models. One study considered 3 HPV genotypes (HPV-16, HPV-52, and HPV-35) and observed a model accuracy of 74.4%, 11 and the other study used HPV genotype classifications of uninfected, low risk, and high risk and observed an area under the receiver operating characteristic curve (AUROC) of 0.73. 12 A more granular classification of HPV genotypes may further improve the prediction performance.
According to the American Society for Colposcopy and Cervical Pathology (ASCCP) guidelines 13 and the Chinese Society for Colposcopy and Cervical Pathology guidelines, 14 women who test negative for hrHPV infection are considered free of the disease, whereas women who test positive for hrHPV infection require subsequent testing during cervical cancer screening. However, approximately 10% to 30% of women who test positive for hrHPV infection do not adhere to the screening procedure for the subsequent cytological or colposcopic examination, and their final cervical lesion status is unknown. 3,15,16 It is important to predict cervical cancer among this high-risk subpopulation of women infected with hrHPV. But only 1 previous study developed a prediction model for this high-risk subgroup, and the model was not thoroughly evaluated or validated (eg, sensitivity and specificity were not reported). 11 Most previous studies that developed models to predict cervical cancer used regression models. 8,12,17,18 Unlike traditional regression models, machine learning approaches do not make assumptions on data distribution and tend to have better predictive performance. The stacking model, also known as a stacked generalization or a super learner, is an ensemble machine learning approach that permits researchers to incorporate several different prediction algorithms, including regression models, and therefore typically performs better than its submodels. 19 The stacking model has been widely used for the prediction of various diseases and has shown good performance, [20][21][22] and it may be an appropriate approach to predict cervical cancer.
The aim of the present study was to use a multicenter large-scale data set to develop and validate a stacking machine learning model for predicting cervical cancer among women who tested positive for hrHPV infection by incorporating HPV genotypes and commonly available clinical information. In particular, we examined whether the inclusion of HPV genotypes further improved the prediction ability of the stacking model.

Data Collection
Individual data included participant's demographic characteristics (age, body mass index, educational level, and insurance type), medical history (history of other cancers and cervical cancer screening history), menstrual status (whether in menopause or age at menopause), sexual behavior factors (gravidity, parity, contraceptive methods, postcoital bleeding, or abnormal leukorrhea), and family history of cancer. The pelvic examination involved the visual inspection of the vulva, internal speculum examination of the vagina and cervix, and bimanual palpation of the adnexa and uterus.
We also considered infection of several microorganisms in the vaginal microenvironment, including Cervical intraepithelial neoplasia 3 or worse (CIN3+) was considered the primary outcome because there is widespread agreement that detecting and treating CIN3+, an important premalignant condition of the cervix, can prevent the progression to invasive cervical cancer. 25 Cervical intraepithelial neoplasia 2 or worse (CIN2+) was the secondary outcome because it is a treatment threshold and a commonly used outcome in published studies. 12,17,18

Training and External Validation Set Splitting by Geography
Geographic validation as a form of external validation was used to assess the prediction performance of the model, with 100 primary care centers in China (6 districts: Laohekou, Nanzhang, Xiangcheng, Xiangzhou, Yicheng, and Zaoyang) used for model development and the remaining 36 primary care centers (3 districts: Baokang, Gucheng, and Fancheng) used for model validation.

Statistical Analysis
Women without cervical cancer screening outcomes were excluded from the analysis. We also excluded potential predictors that had more than 50% missing values. In both the training and validation data sets, all of the remaining predictors had less than 5% missing values, and the missing data were not imputed.
The continuous predictors (age, gravidity, and parity) were normalized using the equation where X a is the minimum value and X b is the maximum value, to speed the computation of machine learning models during the learning phase. The least absolute shrinkage and selection operator (LASSO) method was used to select predictors that had a potential association with the outcomes.
The stacking machine learning model was constructed to predict cervical cancer among women who tested positive for hrHPV infection, with age and other factors selected by LASSO included as predictors. The stacking model was composed of 2 layers. A traditional statistical method; a logistic regression model; and 4 extensively used machine learning algorithms, including random forest, gradient boosting machine, naive Bayes, and neural network, were selected for the first layer, with the logistic regression model considered the second layer model (eFigure 1 in Supplement 1). To examine whether the inclusion of HPV genotypes could improve predictive ability, models with or without HPV genotypes were separately fitted. Each prediction algorithm was built with 3 groups of predictors (only epidemiological factors and pelvic examination results; only HPV genotypes; and all predictors).
We assessed the performance of the stacking model and its submodel in terms of discrimination, calibration, and clinical utility. The AUROC was used to reflect the discrimination of prediction models. Sensitivity, specificity, positive likelihood ratio, and negative likelihood ratio were calculated using the maximum Youden index criterion. The calibration plot was then used to examine the agreement between model predictions and observed outcomes. Finally, we evaluated the clinical utility of the prediction models using a decision curve analysis. 26 Based on the assumption that referral for colposcopy brought benefits to women with CIN3+ (or CIN2+) and brought harm to women without CIN3+ (or CIN2+), the decision curve analysis quantified the standardized net benefit (SNB) among hrHPV-positive women based on the prediction model as compared with the default strategy that none receives the intervention (ie, referral to colposcopy). Using the true-positive rate (TPR), false-positive rate (FPR), clinical decision threshold probability (R), and disease prevalence (P), the SNB can be calculated as . Individuals would accept the intervention when their predictive probabilities were greater than R, which is affected by clinicians' and patients' preferences.
In the cervical cancer screening program, women who were positive for other hrHPV genotypes but had normal cytological examination results were considered free of cervical cancer. 14,27 A sensitivity analysis was conducted to assess whether potential misclassification moderated the results. Based on the mean incidence rates of CIN3+ and CIN2+ reported by a meta-analysis, 28 we randomly sampled 0.3% and 0.8%, respectively, of women positive for other hrHPV genotypes but who had normal cytological examination results in the validation set. The outcomes of these subsets were categorized as CIN3+ (or CIN2+). Then the prediction performance of the model was reevaluated.
Statistical analyses were performed from January 1, 2022, to July 14, 2022. A 2-sided value of P < .05 or the 95% CI of the AUROC excluding 0.5 was considered statistically significant. All

Predictors Selected by LASSO
The Table presents

Calibration of Prediction Models
The naive Bayes model overestimated the risk of CIN3+, whereas the other 5 models were well calibrated. When predicting CIN2+, all models slightly overpredicted the risk of CIN2+ (eFigure 3 in Supplement 1).  were lower than 23% for CIN3+ and lower than 17% for CIN2+.

Sensitivity Analysis
In the sensitivity analysis, a proportion of women who tested positive for other hrHPV genotypes but had normal cytological examination results were categorized as belonging to the CIN3+ (or CIN2+) group (the validation set). We found that the specificity of the 6 models did not change (range, 74.2%-86.2%), while the AUROC (range, 0.82-0.84) and sensitivity (range, 64.3%-80.4%) slightly decreased compared with the main analysis (eTable 4 in Supplement 1).

Discussion
To our knowledge, this diagnostic study is the first to build a cervical cancer diagnostic prediction  this study can be easily obtained at low cost in clinical settings. Therefore, the model can be used in low-resource settings when data on HPV genotypes are available. In addition to epidemiological factors, some biomarkers, such as vascular endothelial growth factor 31 and HPV-E6/E7 mRNA, 32,33 have been considered in some cervical cancer prediction models. Although adding biomarkers as predictors may improve the prediction performance of models, 34 it is difficult to popularize biomarker acquisition technology in primary care settings due to shortages of technicians and equipment and the relatively high costs. This complex and expensive technology has limited practical applications in undeveloped regions or for large-scale population screening programs.
The present study highlighted the importance of HPV genotype in developing cervical cancer. The higher specificity observed in those studies may be partly because sequential tests improve the specificity of the diagnosis. The aforementioned strategies calculated the specificity among all participants, including many healthy women, whereas the prediction model used in the present study focused on hrHPV-positive women had difficulties identifying true-negative participants. The prediction model in the present study did not need information on cytology, but the sensitivity of our prediction model was much higher than that of the first strategy and was similar to that of the second strategy.

JAMA Network Open | Obstetrics and Gynecology
The stacking model achieved superior net benefit when the clinical decision threshold was below 23% and 17% for CIN3+ and CIN2+, respectively. The ASCCP recommends a 4% threshold, that is, women with a predicted risk of CIN3+ higher than 4% are recommended for colposcopy. 39 Using the decision curves generated in the present study, when the threshold was 4% or lower, the SNB was approximately 50% for all models. Our model had significantly higher SNB than the strategy of colposcopy for all when the threshold was above 4%. Physicians may have different threshold preferences because of different perspectives on the relative harms of missing cervical cancer vs avoiding unnecessary colposcopy. However, within reasonable clinical thresholds, the SNB for the stacking model was superior to the strategies of colposcopy for all or colposcopy for none and may be used in clinical practice.

Limitations
This study has limitations. First, some of the predictors were obtained through a self-reported questionnaire, which may lead to reporting bias and recall bias. Second, some factors, such as smoking and oral contraceptive use, which have been documented to potentially influence cervical cancer, 40 were not collected in the present study. Third, the screening program included only women 30 years of age or older, and the risk factors for cervical cancer may differ in women younger than 30 years. Therefore, the performance of this prediction model in younger women needs to be validated in future studies.

Conclusions
This diagnostic study developed and validated a diagnostic prediction model for cervical cancer among women who test positive for hrHPV infection. Including HPV genotypes in the model markedly improved the prediction ability, suggesting that this prediction model may be an important auxiliary tool in screening for and early diagnosis of cervical cancer in low-resource settings when cytological and colposcopic examination results are unavailable.