AUC indicates area under the receiver operating characteristic curve.
eTable. Global retinal nerve fiber layer thickness, standard automated perimetry mean deviation and deep learning probability of glaucoma in the test sample
eFigure 1. Example of a left eye with perimetric glaucoma. Spectral-Domain Optical Coherence Tomography (SDOCT) B-scan (top left) and adjacent class activation map (heatmaps) (top middle) highlight the regions of the B-scan that had the greatest weight in the deep learning (DL) algorithm’s classification decision
eFigure 2. Example of a left eye with pre-perimetric glaucoma
eFigure 3. Example of a right eye from a healthy control. Spectral-Domain Optical Coherence Tomography (SDOCT) B-scan (top left) and adjacent class activation map (heatmaps) (top middle) highlight the regions of the B-scan that had the greatest weight in the deep learning (DL) algorithm’s classification decision
Customize your JAMA Network experience by selecting one or more topics from the list below.
Thompson AC, Jammal AA, Berchuck SI, Mariottoni EB, Medeiros FA. Assessment of a Segmentation-Free Deep Learning Algorithm for Diagnosing Glaucoma From Optical Coherence Tomography Scans. JAMA Ophthalmol. 2020;138(4):333–339. doi:10.1001/jamaophthalmol.2019.5983
Does a segmentation-free deep learning algorithm using the entire circle B-scan image from optical coherence tomography perform better than retinal nerve fiber layer for detecting glaucomatous damage?
In this cross-sectional study of 1154 eyes of 635 individuals, the deep learning algorithm had a greater area under the curve than retinal nerve fiber layer global and sector parameters. This appeared to be even more likely in early disease.
These findings suggest a deep learning algorithm using the entire B-scan may be better able to detect glaucomatous disease than conventional retinal nerve fiber layer parameters from optical coherence tomography.
Conventional segmentation of the retinal nerve fiber layer (RNFL) is prone to errors that may affect the accuracy of spectral-domain optical coherence tomography (SD-OCT) scans in detecting glaucomatous damage.
To develop a segmentation-free deep learning (DL) algorithm for assessment of glaucomatous damage using the entire circle B-scan image from SD-OCT.
Design, Setting, and Participants
This cross-sectional study at a single institution used data from SD-OCT images of eyes with glaucoma (perimetric and preperimetric) and normal eyes. The data set was randomly split at the patient level into a training (50%), validation (20%), and test data set (30%). Data were collected from March 2008 to April 2019, and analysis began April 2018.
A convolutional neural network was trained to discriminate glaucomatous from normal eyes using the SD-OCT circle B-scan without segmentation lines.
Main Outcomes and Measures
The ability to discriminate glaucoma from healthy eyes was evaluated by comparing the area under the receiver operating characteristic curve and sensitivity at 80% or 95% specificity for the DL algorithm’s predicted probability of glaucoma vs conventional RNFL thickness parameters given by SD-OCT software. The performance was also assessed in preperimetric glaucoma, as well as by visual field severity using Hodapp-Parrish-Anderson criteria.
A total of 20 806 SD-OCT images from 1154 eyes of 635 individuals (612 [53%] with glaucoma and 542 normal eyes [47%]) were included. The mean (SD) age at SD-OCT scan was 70.8 (10.4) years in individuals with glaucoma and 55.8 (14.1) years in controls. There were 187 women (53.3%) in the glaucoma group and 165 (59.8%) in the control group. Of 612 eyes with glaucoma, 432 (70.4%) had perimetric and 180 (29.6%) had preperimetric glaucoma. The DL algorithm had a significantly higher area under the receiver operating characteristic curve than global RNFL thickness (0.96 vs 0.87; difference = 0.08 [95% CI, 0.04-0.12]) and each RNFL thickness sector for discriminating between glaucoma and controls (all P < .001). At 95% specificity, the DL algorithm (81%; 95% CI, 64%-97%) was more sensitive than global RNFL thickness (67%; 95% CI, 58%-76%). The areas under the receiver operating characteristic curve were also significantly greater for the DL algorithm compared with RNFL thickness at each stage of disease, especially preperimetric and mild perimetric glaucoma.
Conclusions and Relevance
A segmentation-free DL algorithm performed better than conventional RNFL thickness parameters for diagnosing glaucomatous damage on OCT scans, especially in early disease. Future studies should investigate how such an approach contributes to diagnostic decisions when combined with other relevant clinical information, such as risk factors and perimetry results.
In clinical practice, glaucoma is usually diagnosed by a combined analysis of several clinical parameters, including risk factors, such as age and intraocular pressure; tests for evaluation of structural damage to the optic nerve; and visual function assessment with perimetry. Among tests for structural evaluation,1,2 spectral-domain optical coherence tomography (SD-OCT) is the most commonly used one, providing objective quantification of damage to the optic nerve head and retinal nerve fiber layer (RNFL).3 To provide quantitative RNFL thickness measurements, conventional SD-OCT software applies segmentation algorithms to delineate the RNFL and extract its thickness. Although SD-OCT RNFL thickness measurements can be generally accurate for diagnosing glaucoma,3,4 they may fail in the presence of segmentation errors.5,6 Such errors have been reported in 20% to more than 40% of scans5 and significantly affect the diagnostic accuracy of SD-OCT in glaucoma.5,6
Even in the absence of segmentation errors, the interpretation of conventional SD-OCT RNFL printouts may be difficult given the presence of a large number of summary parameters, in addition to maps and plots. The evaluation of multiple parameters increases the risk of making a type I error, ie, finding an abnormality just by chance. This has led to the phenomenon of red disease in which some patients receive a diagnosis of glaucoma based on SD-OCT in the absence of true disease.7
Recent advances in artificial intelligence have led to the development of deep learning (DL) algorithms that can accurately detect complex patterns in images, achieving levels of accuracy in image classification tasks that can sometimes surpass those of humans.8-12 Deep learning algorithms can be trained to analyze an entire SD-OCT image, potentially providing more information related to the presence of glaucomatous damage than individual SD-OCT parameters. Segmentation-free analysis of the SD-OCT image may also eliminate the need for manual refinement of conventionally segmented retinal layers. By interpreting the whole image, use of DL algorithms may further minimize false positives, or red disease, that arise when clinicians assess multiple individual parameters.
The purpose of this study was to develop a segmentation-free DL algorithm to assess glaucomatous structural damage using the whole peripapillary SD-OCT scan image and to compare its performance to that of conventional RNFL thickness parameters.
This cross-sectional study used data from the Duke Glaucoma Repository, a database of electronic research and medical records developed by the Duke University Vision, Imaging and Performance Laboratory. The study protocol adhered to the tenets of the Declaration of Helsinki13 and was conducted in accordance with the Health Insurance Portability and Accountability Act on approval by the Duke University institutional review board. A waiver of informed consent was granted owing to the retrospective nature of this research.
The database included information on ophthalmic diagnoses, medical history, and results from comprehensive ophthalmic examination including visual acuity, intraocular pressure, slitlamp biomicroscopy, gonioscopy, and dilated fundus examination. In addition, stereoscopic optic disc photographs (Nidek 3DX) and Spectralis SD-OCT (version 22.214.171.124.; Heidelberg Engineering) images and associated data were reviewed. Standard automated perimetry (SAP) using the 24-2 test pattern and Swedish interactive thresholding algorithm (Carl Zeiss Meditec) was included if the test was reliable, containing fewer than 33% fixation losses and 15% false-positive errors. All eyes with glaucoma had primary open-angle glaucoma based on open angles on gonioscopy, clinical examination, and grading of stereophotographs and visual fields. Patients with other ocular or systemic diseases that could adversely affect the optic nerve or visual field were excluded. Eyes with a refractive error greater than or equal to +6.0 or −6.0 diopters were excluded.
Two experienced graders masked to the participant’s identity and any other test information graded the photographs for the presence of signs of glaucomatous optic neuropathy as well as for change over time. Disagreements between graders were resolved by a third experienced grader. Eyes were categorized with perimetric glaucoma if they had evidence of glaucomatous optic neuropathy (ie, cupping, diffuse or focal rim thinning, optic disc hemorrhage, or RNFL defects) and a reproducible visual field defect on at least 2 consecutive SAP tests with pattern standard deviation less than 5% or glaucoma hemifield test outside normal limits. In addition, eyes with glaucomatous optic neuropathy whose contralateral eye had evidence of perimetric glaucoma were also included but categorized as preperimetric glaucoma. Eyes that had a history of documented optic disc progression on stereophotographs (ie, progressive rim thinning or enlargement of RNFL defects) in the absence of visual field loss were also categorized as having preperimetric glaucoma.
Eyes with perimetric glaucoma were further classified into mild, moderate, and severe visual field loss by applying the Hodapp-Parrish-Anderson criteria.14 Healthy control eyes had to have a normal optic disc stereophotograph with no evidence of glaucomatous optic neuropathy, ocular hypertension (ie, intraocular pressure >21 mm Hg), or SAP abnormality in either eye. A normal SAP test result was required to have mean deviation and pattern standard deviation with P > .05 and a glaucoma hemifield test within normal limits.
Circumpapillary 12° scans were acquired using Spectralis SD-OCT (version 126.96.36.199.; Heidelberg Engineering).15 Corneal curvature and axial length measurements were entered into the instrument’s software. The Spectralis Anatomic Positioning System (Heidelberg Engineering) was used to adjust for eye movements. All of the images were manually reviewed for image quality, scan centration, and artifacts. Those with signal strength less than 15 dB or with artifacts such as inversion or clipping of the image were excluded.
The accuracy of segmentation was reviewed by a reading center. Segmentation errors were corrected whenever possible. If correction was not possible, the image was discarded. Global and sectoral RNFL thicknesses parameters were automatically computed by the SD-OCT software.
We trained a segmentation-free DL algorithm to assess glaucomatous damage from the raw SD-OCT peripapillary B-scan image (ie, without segmentation lines). The algorithm was trained to differentiate glaucomatous from normal eyes, as defined above, and provide a probability of glaucoma as output. Spectral-domain optical coherence tomography images were randomly separated at the participant level into a training (50%), validation (20%), and test sample (30%). This approach prevented leakage and biased estimates of test performance by ensuring that no data of any participant were present in both the training and test samples.
The images used in the present study did not contain segmentation lines in them so that the DL algorithm could identify which features were most relevant to predict the presence of glaucoma without relying on conventional segmentation. The SD-OCT B-scans were preprocessed first by downsampling the images to 496 × 496 pixels followed by scaling of the pixel values to range from 0 to 1. The heterogeneity of the images was improved by augmenting the data through random lighting adjustment of image balance and contrast of up to 5%, random horizontal image flips, and random image rotations of up to 10°. These subtle image transformations were only applied to the training set; they helped to prevent overfitting and allowed the DL algorithm to appreciate the most relevant features of each image.16
A residual deep convolutional neural network (ResNet34) architecture was used for the DL algorithm, which had been previously trained on the ImageNet data set.17,18 Training was performed by first unfreezing the final 2 layers. Subsequently, all layers were unfrozen and the network was fine-tuned with differential learning rates and Adam optimizer. Gradient-weighted class activation maps were built over the SD-OCT images and helped identify the most important parts of the image for the DL algorithm’s classification.19,20
Receiver operating characteristic (ROC) curves were used to evaluate the diagnostic accuracies of the different parameters investigated in the study. A Probit ROC regression model with maximum likelihood estimator was used to adjust for the potentially confounding effects of age at the time of scan acquisition.21-23 The area under the ROC curve (AUC) was used to summarize diagnostic accuracy, with 1.0 representing perfect discrimination and 0.5 representing chance discrimination. The difference in the AUC of 2 curves was compared using a Wald test based on the bootstrap covariance.22 In addition, sensitivities at fixed specificities of 80% and 95% were calculated.
To maximize the data for the study, we included in the analyses all images and the corresponding diagnosis (ie, normal vs preperimetric vs perimetric glaucoma) that were available for each eye included in the study at the time of imaging. To account for the correlation between observations from the same eye, a bootstrap resampling procedure was used to derive 95% CIs and P values, where the eye-level clusters were considered as the units of resampling. This procedure is commonly used to account for the presence of multiple correlated measurements within the same participant.21 Deep learning models were implemented using Keras (version 2.1.4.; MIT), an open-source Python library. Statistical analyses used Stata (version 15, StataCorp). The α level (type 1 error) was set at .05. Analysis began April 2019.
The data set included 20 806 RNFL circle B-scans from SD-OCT from 1154 eyes of 635 participants, divided into training and validation (14 466 [70%]) and test (6340 [30%]) samples. The test sample consisted of 6340 SD-OCT scans acquired in 348 eyes of 191 participants. The mean (SD) age at SD-OCT scan was 70.8 (10.4) years in individuals with glaucoma and 55.8 (14.1) years in controls. There were 187 women (53.3%) in the glaucoma group and 165 (59.8%) in the control group. Table 1 displays the demographic and clinical characteristics of the participants and eyes in the training/validation vs test samples.
Table 2 reports AUCs and sensitivities at 80% or 95% specificity for the DL algorithm’s predicted probability of glaucoma and RNFL thickness parameters. The DL algorithm had a significantly higher AUC than global RNFL thickness (0.96 vs 0.87; difference = 0.08 [95% CI, 0.04-0.12]) and each of the RNFL sectors for discriminating between glaucoma and controls (all P < .001). In addition, the DL algorithm was more sensitive at 80% specificity (94% [95% CI, 87%-100%]) and 95% specificity (81% [95% CI, 64%-97%]) than global or sectoral RNFL thickness parameters (Table 2). The Figure, A, shows the ROC curves for the DL segmentation-free algorithm vs global RNFL thickness for discriminating glaucomatous from healthy eyes.
The eTable in the Supplement shows global RNFL thickness measurements, SAP mean deviation, and DL probability of glaucoma for the different diagnostic categories. As expected, SAP mean deviation was greater for perimetric (median, −4.41 dB; interquartile range, −10.13 dB to −2.09 dB) than preperimetric glaucoma (median, −0.30 dB; interquartile range, −1.32 dB to 0.43 dB) and both groups had greater values than normal eyes (median, 0.22 dB; interquartile range, −0.64 dB to 1.12 dB). The mean (SD) DL probability of glaucoma was 0.87 (0.24) in perimetric glaucoma, 0.75 (0.28) in preperimetric glaucoma, and 0.17 (0.24) in healthy eyes.
Table 3 demonstrates the performance of the DL algorithm for detection of preperimetric glaucoma, as well as detection of perimetric glaucoma stratified by mild, moderate, and severe visual field loss by Hodapp-Parrish-Anderson criteria. The AUC for the DL probability of glaucoma was significantly greater than that of global RNFL thickness for discriminating preperimetric glaucoma from healthy eyes (0.92 vs 0.83; difference = 0.09 [95% CI, 0.03-0.16]; P = .002). The Figure, B, shows the ROCs for discriminating preperimetric glaucoma from healthy eyes for the DL algorithm and global RNFL thickness. For 95% specificity, the DL algorithm had sensitivity of 70% compared with only 49% for global RNFL thickness. The DL algorithm also exhibited a significantly larger AUC for detection of mild as well as moderate and advanced glaucoma.
eFigure 1 in the Supplement shows an example eye with perimetric glaucoma that exhibited a superior arcuate visual field defect with corresponding inferior rim loss. The conventional SD-OCT inferior temporal thickness parameter is outside normal limits. The class activation heat map overlying the SD-OCT circle B-scan shows in red the area that had the greatest association with the algorithm prediction, which, as expected, corresponded to the inferior temporal region. eFigure 2 in the Supplement shows an example eye with preperimetric glaucoma. Although the conventional SD-OCT RNFL thickness parameters were mostly in the normal range, with only a borderline global RNFL thickness, the DL segmentation-free algorithm estimated a probability of 1.0 that the eye had glaucoma. eFigure 3 in the Supplement shows that the superior temporal and inferior temporal portions of the scan were the most important areas influencing the algorithm’s prediction in a healthy eye.
We developed a DL algorithm to estimate the probability of glaucomatous damage from evaluation of the entire circular B-scan from SD-OCT. The algorithm had greater accuracy for detecting structural glaucomatous damage compared with conventional RNFL thickness parameters, notably for eyes with preperimetric glaucoma and mild visual field defects.
Other groups have also used DL to detect glaucoma from OCT data.9,11 For example, Asaoka et al9 demonstrated that early perimetric glaucoma could be accurately diagnosed using a DL algorithm trained with SD-OCT parameters extracted from the macula. However, their method still required conventional segmentation to extract thicknesses of the RNFL and ganglion cell complex. In a small sample study, Muhammad et al10 trained a neural network to extract features from maps derived from conventional automated segmentation of wide-field swept-source SD-OCT, which were then used in a random forest model to predict glaucomatous damage.11 Similar to the work of Asaoka et al,8,9 Muhammad et al’s approach10 also required conventional segmentation and, therefore, would still be problematic in the presence of segmentation errors. In contrast, our approach used raw B-scans without requiring any segmentation of the retinal layers. As segmentation errors are very common in OCT scans, a segmentation-free approach is likely to provide results that are more robust when applied in a clinical practice scenario.
Besides outperforming all conventional RNFL thickness parameters for detecting glaucoma, the DL approach presented in this study may have additional advantages. The single probabilistic output may afford a simpler interpretation. Integration of information from the plethora of parameters given by the conventional SD-OCT printout may sometimes be difficult. In addition, the use of multiple parameters may increase the incidence of false-positive test results, as commonly seen in cases of red disease. The class activation maps may also help to highlight the areas of the scan that had the greatest contribution to the algorithm’s output (eFigures 1 and 2 in the Supplement). Interestingly, the maps seemed to include other layers beyond the RNFL, which may also be important in assessing glaucomatous damage. However, it should be noted that these maps have limited resolution due to downsampling of the final convolutional layers in a DL model, which limits their accuracy in pinpointing the areas of damage.19,20
This study has limitations. Although we used an independent test set for final assessment of diagnostic accuracy, external validation in populations from other clinical settings is desirable. It should be noted that although the ROC curve areas were statistically significantly different between the DL model and the conventional OCT parameters, some overlap was seen in the 95% CIs. Of note, in the overall comparison with global RNFL thickness, the difference in ROC curve areas in relation to the DL model had a lower limit of the 95% CI of 0.04 (Table 2). As ROC curve areas range from 0.5 to 1.0, this number would represent approximately 8% of the range, still a meaningful difference. As more data and studies accumulate, future meta-analyses could be done to obtain even more precise CIs around the point estimates of differences in diagnostic accuracy, helping clarify further the clinical relevance of DL applications on OCT data. In addition, it should be noted that the diagnosis of glaucoma is not based on the results of a single test, but rather on a combined interpretation of information on risk factors, such as age and intraocular pressure, and results of structural and functional tests. Therefore, it remains to be seen how the incorporation of such an algorithm would affect clinical diagnosis when combined with these other pieces of information that are acquired in clinical practice, as well as for assessing change over time.
In summary, we developed a segmentation-free DL algorithm that can predict the probability of glaucomatous structural damage from the SD-OCT circle B-scan. The algorithm performed better than global and sectoral RNFL thickness parameters for discriminating glaucomatous from control eyes, especially in cases of preperimetric or early perimetric glaucoma. Application of this DL algorithm in a clinical setting may improve the accuracy and sensitivity of SD-OCT for diagnosing glaucoma, while obviating the need for error-prone segmentation of retinal layers.
Corresponding Author: Felipe A. Medeiros, MD, PhD, Duke Eye Center, Department of Ophthalmology, Duke University, 2351 Erwin Rd, Durham, NC 27710 (firstname.lastname@example.org).
Accepted for Publication: December, 2, 2019.
Published Online: February 13, 2020. doi:10.1001/jamaophthalmol.2019.5983
Author Contributions: Drs Thompson and Medeiros had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Thompson, Jammal, Berchuck, Medeiros.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Thompson, Jammal, Mariottoni, Medeiros.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: All authors.
Obtained funding: Medeiros.
Administrative, technical, or material support: Medeiros.
Conflict of Interest Disclosures: Dr Thompson is a recipient of the Heed Ophthalmic Fellowship. Dr Jammal reports other support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior outside the submitted work. Dr Medeiros reports grants from National Institutes of Health/National Eye Institute, Carl Zeiss Meditec, and Google during the conduct of the study; nonfinancial support (equipment) from Heidelberg Engineering during the conduct of the study; and personal fees from Bausch + Lomb, Merck, Sensimed, Topcon, Reichert, Novartis, Allergan, Galimedix Therapeutics, Stealth BioTherapeutics, and Biogen outside the submitted work. No other disclosures were reported.
Funding/Support: This work is supported in part by National Institutes of Health/National Eye Institute (grants EY027651 [Dr Medeiros], EY029885 [Dr Medeiros], and EY021818 [Dr Medeiros]).
Role of the Funder/Sponsor: The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.