[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Original Investigation
July 2018

Automated Diagnosis of Plus Disease in Retinopathy of Prematurity Using Deep Convolutional Neural Networks

Author Affiliations
  • 1Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown
  • 2Department of Ophthalmology, Casey Eye Institute, Oregon Health and Science University, Portland
  • 3Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago
  • 4Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts
  • 5Massachusetts General Hospital and Brigham and Women’s Hospital Center for Clinical Data Science, Boston
  • 6Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland
JAMA Ophthalmol. 2018;136(7):803-810. doi:10.1001/jamaophthalmol.2018.1934
Key Points

Question  Can an algorithm based on deep learning achieve expert-level performance at diagnosing plus disease in retinopathy of prematurity?

Finding  In this technology evaluation study including 5511 retinal photographs, using 5-fold cross-validation, the algorithm achieved mean areas under the receiver operating characteristic curve of 0.94 and 0.99 for the diagnoses of normal and plus disease, respectively. On an independent test set of 100 images, the algorithm achieved 91% accuracy and a quadratic-weighted κ coefficient of 0.92, outperforming 6 of 8 retinopathy of prematurity experts.

Meaning  These findings suggest the proposed algorithm can objectively diagnose plus disease with a proficiency comparable with human experts.


Importance  Retinopathy of prematurity (ROP) is a leading cause of childhood blindness worldwide. The decision to treat is primarily based on the presence of plus disease, defined as dilation and tortuosity of retinal vessels. However, clinical diagnosis of plus disease is highly subjective and variable.

Objective  To implement and validate an algorithm based on deep learning to automatically diagnose plus disease from retinal photographs.

Design, Setting, and Participants  A deep convolutional neural network was trained using a data set of 5511 retinal photographs. Each image was previously assigned a reference standard diagnosis (RSD) based on consensus of image grading by 3 experts and clinical diagnosis by 1 expert (ie, normal, pre–plus disease, or plus disease). The algorithm was evaluated by 5-fold cross-validation and tested on an independent set of 100 images. Images were collected from 8 academic institutions participating in the Imaging and Informatics in ROP (i-ROP) cohort study. The deep learning algorithm was tested against 8 ROP experts, each of whom had more than 10 years of clinical experience and more than 5 peer-reviewed publications about ROP. Data were collected from July 2011 to December 2016. Data were analyzed from December 2016 to September 2017.

Exposures  A deep learning algorithm trained on retinal photographs.

Main Outcomes and Measures  Receiver operating characteristic analysis was performed to evaluate performance of the algorithm against the RSD. Quadratic-weighted κ coefficients were calculated for ternary classification (ie, normal, pre–plus disease, and plus disease) to measure agreement with the RSD and 8 independent experts.

Results  Of the 5511 included retinal photographs, 4535 (82.3%) were graded as normal, 805 (14.6%) as pre–plus disease, and 172 (3.1%) as plus disease, based on the RSD. Mean (SD) area under the receiver operating characteristic curve statistics were 0.94 (0.01) for the diagnosis of normal (vs pre–plus disease or plus disease) and 0.98 (0.01) for the diagnosis of plus disease (vs normal or pre–plus disease). For diagnosis of plus disease in an independent test set of 100 retinal images, the algorithm achieved a sensitivity of 93% with 94% specificity. For detection of pre–plus disease or worse, the sensitivity and specificity were 100% and 94%, respectively. On the same test set, the algorithm achieved a quadratic-weighted κ coefficient of 0.92 compared with the RSD, outperforming 6 of 8 ROP experts.

Conclusions and Relevance  This fully automated algorithm diagnosed plus disease in ROP with comparable or better accuracy than human experts. This has potential applications in disease detection, monitoring, and prognosis in infants at risk of ROP.