Preprocessing (A), development of the Pentacam InceptionResNetV2 Screening System (PIRSS) model (B), and classifications derived with use of PIRSS (C). KC indicates keratoconus.
eFigure 1. The Workflow Diagram With Data Size
eFigure 2. Accuracy (%) by Humans on the Test Datasets
eFigure 3. Confusion Matrix of Senior RS Group and PIRSS on the Test Datasets
eFigure 4. Confusion Matrix of TKC and PIRSS on the Test Datasets
eFigure 5. Confusion Matrix of BAD and PIRSS on the Test Datasets
eFigure 6. An Efficient Future Web Service
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Xie Y, Zhao L, Yang X, et al. Screening Candidates for Refractive Surgery With Corneal Tomographic–Based Deep Learning. JAMA Ophthalmol. 2020;138(5):519–526. doi:10.1001/jamaophthalmol.2020.0507
Can deep learning be used in corneal tomographic screening of candidates for refractive surgery?
In this diagnostic study including 1385 patients, a deep learning model achieved an overall detection accuracy of 94.7% on the validation data set. On the independent test data set, the model achieved a discrimination rate (95.0%) comparable to that of senior ophthalmologists who perform refractive surgery (92.8%).
Corneal tomographic scanning with a deep learning algorithm may offer standardized results to reduce both the workload of surgeons and the risk of misclassification.
Evaluating corneal morphologic characteristics with corneal tomographic scans before refractive surgery is necessary to exclude patients with at-risk corneas and keratoconus. In previous studies, researchers performed screening with machine learning methods based on specific corneal parameters. To date, a deep learning algorithm has not been used in combination with corneal tomographic scans.
To examine the use of a deep learning model in the screening of candidates for refractive surgery.
Design, Setting, and Participants
A diagnostic, cross-sectional study was conducted at the Zhongshan Ophthalmic Center, Guangzhou, China, with examination dates extending from July 18, 2016, to March 29, 2019. The investigation was performed from July 2, 2018, to June 28, 2019. Participants included 1385 patients; 6465 corneal tomographic images were used to generate the artificial intelligence (AI) model. The Pentacam HR system was used for data collection.
The deidentified images were analyzed by ophthalmologists and the AI model.
Main Outcomes and Measures
The performance of the AI classification system.
A classification system centered on the AI model Pentacam InceptionResNetV2 Screening System (PIRSS) was developed for screening potential candidates for refractive surgery. The model achieved an overall detection accuracy of 94.7% (95% CI, 93.3%-95.8%) on the validation data set. Moreover, on the independent test data set, the PIRSS model achieved an overall detection accuracy of 95% (95% CI, 88.8%-97.8%), which was comparable with that of senior ophthalmologists who are refractive surgeons (92.8%; 95% CI, 91.2%-94.4%) (P = .72). In distinguishing corneas with contraindications for refractive surgery, the PIRSS model performed better than the classifiers (95% vs 81%; P < .001) in the Pentacam HR system on an Asian patient database.
Conclusions and Relevance
PIRSS appears to be useful in classifying images to provide corneal information and preliminarily identify at-risk corneas. PIRSS may provide guidance to refractive surgeons in screening candidates for refractive surgery as well as for generalized clinical application for Asian patients, but its use needs to be confirmed in other populations.
The prevalence of myopia in young adults in East and Southeast Asia is approximately 80% to 90% owing to intensive education and limited time outdoors.1 Because myopia is irreversible and requires eyeglasses, which are considered by some to be inconvenient, many adults undergo refractive surgery, including corneal refractive surgery and intraocular refractive surgery. Corneal refractive surgery is essentially laser vision correction with a femtosecond laser or excimer laser. However, iatrogenic ectasia due to biomechanical decompensation may occur if a patient with an at-risk cornea or a subclinical keratoconus (KC) has undergone ill-advised laser vision correction. This type of corneal ectasia is characterized by a thinning and forward protrusion of the center of the cornea, accompanied by irregular astigmatism, and can affect visual quality or even cause vision loss. Evaluating corneal morphologic characteristics with corneal topographic and tomographic testing before laser vision correction is necessary to exclude at-risk corneas.
Patients with at-risk corneas should be followed up over time for signs indicating further development of KC. This disease has hereditary, biomechanical, and biochemical causes involving chronic inflammatory events, such as frequent rubbing of the eyes or long-term use of contact lenses.2 However, the exact source of KC is still uncertain.
Various devices have been used to help identify at-risk corneas and KC, including Placido disc-based topographic,3 scanning-slit tomographic4 and Scheimpflug-based tomographic tools.5 There have been several artificial intelligence (AI) systems with different kinds of mathematical models for corneal topographic- or tomographic-based diagnosis, including statistical models,6-9 linear discriminant analysis,10,11 neural network models,12-17 decision tree models,18-20 support vector machine models,21-24 and random forest models.25 Each of these AI systems has advantages as a useful complementary diagnostic tool, suggesting clinical diagnoses of at-risk corneas or KC.
The Pentacam HR system (OCULUS) has been suggested to be one of the most sensitive screening instruments for at-risk corneas and KC.26 The system primarily includes 2 classifiers; one of these is topographic KC (TKC) classification, which is an adaptation of the Amsler-Krumeich classification.27 The use of TKC includes hierarchical prompts based on the front shape of the cornea: normal; possible or suspect KC; KC1, KC1-2, KC2, KC2-3, KC3, KC3-4, and KC4, with the numbers representing mild (KC1) to moderate and severe (KC2-4) stages; corneal surgery; and abnormal. However, this system has poor performance in classifying suspect KC with the deficiency of corneal posterior surface information. Early-stage KC is the mild stage in the adapted Amsler-Krumeich classification, most likely with no slitlamp changes. A KC is a moderate or severe stage in the Amsler-Krumeich classification. Another categorizer—Belin-Ambrósio enhanced ectasia display (BAD)—was added, a screening tool that uses regression analysis based on a large, normative database of corneal anterior and posterior surfaces, as well as pachymetry progression. A total deviation value is calculated in the BAD system, using 5 specifically defined parameters, and the results are color coded as white (normal), yellow (suspect), or red (KC).28 A deviation threshold greater than 2.11 has a sensitivity of 99.59% and specificity of 100% for diagnosing KC; a deviation threshold greater than 1.22 provides 93.62% sensitivity and 94.56% specificity for detecting mild and subclinical disease.29 However, these criteria were based mainly on a database of patients who were non-Hispanic white. To our knowledge, no diagnostic AI system has been developed for Asian patients, who have smaller corneal diameters. Therefore, after obtaining corneal topographic or tomographic data, a refractive surgeon needs to evaluate the corneal morphologic characteristics based on a comprehensive analysis of the shape and color combined with the predetermined index of the system. Hence, detecting irregular corneas or subclinical forms of KC is still a challenge for eye practitioners.
It is well known that deep learning is good at learning images and has achieved human-level performance in image classification.30 Consequently, we decided to develop an AI model based on an image learning technique trained with a convolutional neural network, which was different from the early neural networks. With use of a deep learning algorithm with corneal tomographic imaging, this model may aid in identifying at-risk corneas and determining which patients are unsuited for corneal refractive surgery, thereby assisting in surgery decision-making.
The initial corneal tomographic data were collected with use of a Pentacam HR, version 1.21r41, at the Zhongshan Ophthalmic Center, Guangzhou, China, with examination dates extending from July 18, 2016, to March 29, 2019. The investigation was performed from July 2, 2018, to June 28, 2019. Four-map composite refractive images, comprising the axial curvature, front elevation, back elevation, and corneal thickness, were used to obtain the overall profile of the cornea. The sample population was patients throughout China who wanted to undergo refractive surgery, had a primary diagnosis of KC, and had stable postoperative refractive states. All curvature, pachymetry, and elevation color bars adopt a 61-color setting (contrast, 2.0; brightness, −7; and gamma, 5.0). The elevation reference shape diameter was set to 8 mm.
The Zhongshan Ophthalmic Center Ethics Review Committee approved this retrospective observational study, and the study protocol was conducted following the tenets of the Declaration of Helsinki.31 Because deidentified data were used, the review committee indicated that patient consent did not need to be collected in this research. Patients’ names appeared on the file in pinyin format (romanized system of Chinese characters), which is used as a deidentified system. This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline for diagnostic studies.
In total, 6465 corneal tomographic images from 1385 patients were collected to develop the AI model, and an expert team grouped the images according to a comprehensive analysis of all the morphologic and characteristic indices according to an Asian database.32 The expert team included 3 senior ophthalmologists (Y.X., X.Y., Q.L.) with at least 5 years of practical experience in the refractive surgery center in our clinic. Each image was independently labeled by the 3 experts and each did not know the labels selected by the others. When the labels differed, the one chosen by 2 of the 3 experts was selected as the standard. To better meet our clinical needs, 5 categories were proposed: normal cornea, suspected irregular cornea, early-stage KC, KC, and myopic postoperative cornea. A normal cornea has a natural shape with all the indices within a normal range. Normal corneas included with-the-rule (ie, within accepted parameters) astigmatism or normally thin corneas. A suspected irregular cornea describes an at-risk cornea. Such a cornea may have inferior-superior values outside the reference range or aberrant C-shaped or round posterior surface elevations. Alternatively, a suspected irregular cornea may have an unusual pachymetric progression.
The sample sizes were as follows: 1887 images of normal corneas from 1368 eyes, 799 images of suspected irregular corneas from 369 eyes, 731 images of early-stage KC from 202 eyes, 1978 images of KC from 389 eyes, and 1070 images of myopic postoperative corneas from 474 eyes. Later, images from 94 separate patients were collected and labeled, also from our center, with examination dates extending from January 7 to March 29, 2019, and 100 images, 20 from each category, were selected as a test data set (eFigure 1 in the Supplement).
High-quality images with normalized sizes and resolutions were collected, including the separated discernible pictorial parts and the embedded patient information (words and numbers). The pictorial parts were cropped out and merged to form images for the AI model in the preprocessing step. Next, each patient’s pictures were divided on the basis of pinyin identifiers into 2 independent data sets for training (5130 images from 1108 of 1383 patients [80%]) and validation (1335 images from 277 of 1385 patients [20%]) of the AI model.
We then used InceptionResNetV2 architecture in a convolutional neural network on the TensorFlow platform to create the AI model with transfer learning technique. InceptionResNetV2 is a variation of InceptionV3, which borrows some ideas from ResNet.33 This architecture has been tested and achieved higher accuracy than other main convolutional neural network models, such as Inception and ResNet, on the validation data set of the ImageNet classification challenge.34 Transfer learning was adopted because it has been associated with improved diagnostic accuracy of biomedical images.35 During the training process, only the weights for fully connected layers were updated in our training data set, and the other weights were pretrained on ImageNet and frozen. At each epoch, the accuracy and loss were calculated on the training and validation data sets to monitor the performance. After 100 epochs, the training was stopped owing to the absence of further improvement (model convergence) in both accuracy and cross-entropy loss. The model with the highest accuracy on the validation data set was saved as the best model (Figure).
The independent test data set of 100 images was used to compare the accuracy of our model with that of human specialists. The accuracy of 5 senior ophthalmologists who perform refractive surgery, 5 medical students of refractive surgery, 5 senior ophthalmologists who are not refractive surgeons, and 5 medical students who are not studying refractive surgery was judged by the results of corneal tomographic scanning in the test data set. In the senior refractive surgery group, 3 individuals were from the expert team that did the labeling. The performance of each human was evaluated against the ground truth.
We also compared our model with TKC and BAD to check the performance, taking advantage of the abovementioned test data set. To compare our model with TKC, we defined the cohorts as normal cornea; possible group as suspected irregular cornea; the KC1, KC1-2, and KC2 grades of KC as early KC; the KC2-3, KC3, KC3-4, and KC4 grades of KC as KC; and corneal surgery as myopic postoperative cornea. One hundred images were divided into 5 categories, and each category contained 20 images. To compare our model with BAD, we also collected the enhanced ectasia display with the total deviation value and checked the white, yellow, or red classification for each participant. We equated a white display with normal, a yellow display with suspect, and a red display with early KC plus KC. Eighty images were divided into 3 categories, with 20 normal images, 20 suspect images, and 40 early KC plus KC images.
The receiver operator characteristic curves were plotted using Python, version 3.7 (Python Software Foundation) with packages of matplotlib 2.2.3 and scikit-learn 0.19.2. The 2-sided 95% CIs were Wilson score intervals for accuracy, sensitivity, and specificity, and were DeLong intervals for the area under receiver operator characteristic curve. The 2 proportion comparisons were tested with the McNemar test, analyzed using R, version 3.5.1 (R Foundation for Statistical Computing), with packages of Hmisc_4.2-0, pRoc_1.15.3, and stats (base package). All statistical tests were 2-sided with a significance level of .05.
We developed a model to be used for classifying corneal types for patients wanting to undergo refractive surgery. The model achieved a total detection accuracy of 94.7% (95% CI, 93.3%-95.8%) on the validation data set. The areas under the receiver operator characteristic curves were above 0.99 on average. The performances for each category are presented in Table 1. We based our AI system centered on the model the Pentacam InceptionResNetV2 Screening System (PIRSS).
Twenty reviewers were invited to classify the 100 corneal tomographic images in the test data set. The total mean accuracies of the reviewers were as follows: senior refractive surgeons, 92.8%; student refractive surgeons, 85.6%; senior nonrefractive surgeons, 68.2%; and students not studying refractive surgery, 55.8% (eFigure 2 in the Supplement). Our model achieved an overall accuracy comparable to that of senior ophthalmologists who are refractive surgeons in our clinic (95.0%; 95% CI, 88.8%-97.8% vs 92.8%; 95% CI, 91.2%-94.4%; P = .72). (Table 2) The overall accuracies achieved by the 5 reviewers in the senior refractive surgeon group were 93.0%, 94.0%, 94.0%, 91.0%, and 92.0% (eFigure 3 in the Supplement). All reviewers had relatively poor performance for the suspect and early KC categories, since the decision was also a clinical dilemma. However, when presented with data that were not clear on the suspect and early KC diagnoses, PIRSS obtained similar sensitivity to the senior refractive surgeon group (suspect: 80.0% vs 83.0%, P = .92; early KC: 95% vs 87%, P = .60).
After equating the scales in the TKC comparison (Table 2), we found that the overall accuracy of the TKC classifier was 81% (<95% of PRISS; P < .001): the TKC system correctly identified only 4 of 20 (20.0% accuracy) suspect images. For the other discrimination categories, TKC behaved similar to PIRSS (96.3%; 95% CI, 89.5%-98.7% vs 98.8%; 95% CI, 93.3%-99.9%; P = .48) owing to similar reference standards. BAD achieved a total accuracy of 86.2% (95% CI, 77.0%-92.1%), and PIRSS achieved a total accuracy of 93.7% (95% CI, 86.2%-97.3%) (P = .72) (Table 2). The false-positive rate of the suspect category was 10.0% in BAD and 1.7% in PIRSS. BAD misinterpreted 5 normal corneas as suspected irregular corneas among 6 false-positive cases (eFigure 4 and eFigure 5 in the Supplement).
We developed the output categories normal cornea, suspected irregular cornea, early-stage KC, KC, and myopic postoperative cornea, using this type of classification to offer therapeutic guidance. Candidates with normal corneas in both eyes are eligible for refractive surgery if the residual corneal thickness is consistent with the safety criteria. Risks are associated with laser vision correction for individuals with suspected irregular cornea in 1 or both eyes. An eye with early-stage KC should be observed and given crosslinking at a proper time. An eye classified as KC indicates a moderate or advanced case, and early, active intervention is needed.
There are many terms that describe at-risk corneas. These corneas can be termed suspected KC because the condition may manifest as an increasing K value, inferior-superior dioptric asymmetry, or posterior surface elevation. Forme fruste KC (FFKC)36 is the normal contralateral eye of patients with clinically diagnosed unilateral KC. Since both eyes of patients with unilateral KC have the same genetic makeup, the less affected eye is known to have KC. The contralateral eye that has no clinical findings except for specific topographic or tomographic changes should also carry the diagnosis of FFKC.37 However, in some studies, FFKC was used to indicate the unusual shape of both eyes. Another term is subclinical KC, which indicates that no abnormalities are noted in a slitlamp examination. Currently, there is a lack of consensus in defining these cases, except for the universally accepted tenet that unilateral KC is rare. In our AI model, suspected irregular cornea was the category used to define high-risk corneas with suspicious corneal morphologic abnormalities, which not only reflected the morphologic differences compared with normal corneas, but also indicated the probability that there was no KC. Without disturbing the biomechanical stability due to corneal thinning, a person with suspected irregular corneas in both eyes may not develop KC during their lifetime. A label of KC suspect might cause concern for an individual wanting to undergo refractive surgery who might suspect that they have a serious inherited disease.
Our deep learning model is composed of multiple processing layers originating from a neural network. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppressing irrelevant variations.38 Deep convolutional neural networks have yielded breakthroughs in image processing and are widely used in medical imaging; ophthalmic photography is an important link. Deep learning is useful in diagnosing or making clinical decisions related to cataracts, glaucoma, age-related macular degeneration, and diabetic retinopathy. Classic convolutional neural network models include LeNet, AlexNet, the visual geometry group network, Xception, Inception, and ResNet. The InceptionResNetV2 algorithm, which yields a better performance result, was chosen in this study.
In previous studies, researchers used significant corneal parameters to form many intelligent indices to perform screening (Table 3).10-17,19-23,25,39 These previous AI tools were based on only specific parameters or small training data set. However, we chose to use heat maps containing all information related to the cornea, with a relatively large amount of data. When faced with a corneal map, surgeons usually do not have time in a busy clinic for a detailed study, which includes repeatedly comparing and considering. It appears we need a medical system or equipment that can automatically offer standardized judgments to reduce both the workload of surgeons and the risk of misclassification.
This study represents a potentially useful development in translational medicine. A web service with PIRSS is being prepared (eFigure 6 in the Supplement). The PIRSS model also offers benefits for a refractive surgeon with less experience or for ophthalmologists who are not refractive surgeons but want to obtain the guidance from Pentacam HR tomographic scans. In addition, patients can upload their images to our service to receive advice. A larger sample size is needed to improve the performance of PIRSS, and we look forward to increasing our resources and the number of specialists to help build the model.
PIRSS has limitations. First, when future versions of the Pentacam software change the way that the heat maps are generated or the composite image is organized differently the model may not work properly. Therefore, the images we put into the system have to meet the criteria we set for the time being. We need continually to modify a more intelligent algorithm. Second, this model did not provide fully quantitative results in the case of high interindividual variability. Because the database population was younger people (most aged 18-40 years), we did not consider age and sex in this AI model. Although the normal cornea becomes steeper and shifts from with-the-rule to against-the-rule astigmatism with increasing age,40 corneal astigmatism is found to be stable until middle age. In younger people, there was barely a sex-related difference in the corneal curvature or astigmatism patterns.41 However, there was an interference factor in the evaluation criterion, as the corneal diameter may affect the tomographic appearance. In the posterior elevation map, an abnormal figure is more likely to be present in a particular small-diameter cornea with other common indicators. We think that for diagnostic integrity, slitlamp examinations, refractive errors, and family history should be considered.
Moreover, for refractive surgery candidates with a normal cornea, laser vision correction is not entirely safe. It appears the reasons for accidental corneal ectasia need further assessment. Postoperative education should include instruction in avoidance of eye rubbing and allergen prevention. Technology is no substitute for clinical understanding, and clinical expertise remains essential. Artificial intelligence can use only the information currently available. Many further developments are needed before we are more reliant on AI technology in corneal tomographic-based evaluation and refractive candidate screening. Furthermore, biomechanical assessment has been suggested to enhance the ability to screen for KC. An AI model combined with a biomechanical index or images could be a powerful supplement for clinical decision-making.
The findings of this study suggest that PIRSS is able to classify images to offer corneal information and preliminarily identify at-risk corneas. This AI system has the potential for generalized clinical applications and provides new opportunities for screening individuals for refractive surgery.
Accepted for Publication: February 9, 2020.
Corresponding Author: Haotian Lin, MD (firstname.lastname@example.org), and Quan Liu, MD (email@example.com), Zhongshan Ophthalmic Center, Sun Yat-sen University, Seven Jinsui Road, Guangzhou 510060, China.
Published Online: March 26, 2020. doi:10.1001/jamaophthalmol.2020.0507
Author Contributions: Drs Xie and Zhao contributed equally to this work. Drs Xie and Zhao had full access to all of the data in study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Xie, Zhao, Wu, Haotian Lin, Q. Liu.
Acquisition, analysis, or interpretation of data: Xie, Zhao, X. Yang, Y. Yang, Huang, F. Liu, Xu, L. Lin, Haiqin Lin, Feng, Haotian Lin.
Drafting of the manuscript: Xie, Zhao, Haiqin Lin, Feng, Haotian Lin.
Critical revision of the manuscript for important intellectual content: Zhao, X. Yang, Wu, Y. Yang, Huang, F. Liu, Xu, L. Lin, Haotian Lin, Q. Liu.
Statistical analysis: Zhao.
Obtained funding: Haotian Lin, Q. Liu.
Administrative, technical, or material support: Xie, Zhao, F. Liu, Xu, L. Lin, Feng, Haotian Lin, Q. Liu.
Supervision: X. Yang, Wu, Y. Yang, Haotian Lin, Q. Liu.
Conflict of Interest Disclosures: None reported.
Funding/Support: The research received funding through grants 2018YFC0116500 from the National Key R&D Program of China, 31671000 from the Natural Science Foundation of China, 201804020007 from the Guangzhou Science and Technology Planning Project, 81822010 from the National Natural Science Foundation of China, 2018B010109008 from the Science and Technology Planning Projects of Guangdong Province, and 2017TX04R031 from the Guangdong Science and Technology Innovation Leading Talents.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Create a personal account or sign in to: