Images are from an infant with a gestational age of 260/7 weeks and a birth weight of 830 g. The retinopathy of prematurity vascular severity score (range, 1-9) was 1.04 in A, 1.44 in B, and 1.36 in C.
Images are from an infant with a gestational age of 260/7 weeks and a birth weight of 783 g. The retinopathy of prematurity vascular severity score (range, 1-9) was 1.04 in A, 5.00 in B, and 8.64 in C.
P < .001 by Wilcoxon rank sum test with Bonferroni correction. Dots indicate medians; and boxes, IQRs.
P < .001 for all comparisons.
Controlling for birth weight and gestational age, multivariable logistic regression found the rate of change in the ROP vascular severity score was independently associated with future treatment. Error bars indicate SDs.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Taylor S, Brown JM, Gupta K, et al. Monitoring Disease Progression With a Quantitative Severity Scale for Retinopathy of Prematurity Using Deep Learning. JAMA Ophthalmol. 2019;137(9):1022–1028. doi:10.1001/jamaophthalmol.2019.2433
Can a quantitative measurement of retinopathy of prematurity severity be tracked over time to identify disease progression?
In this cohort study of 871 infants with 5255 clinical examinations, a quantitative retinopathy of prematurity vascular severity score developed using an automated deep learning–based plus disease algorithm identified differences in the mean severity of eyes progressing to treatment-requiring retinopathy of prematurity compared with eyes that did not require treatment using only a posterior pole photograph.
Tracking quantitative measurements of retinopathy of prematurity severity may be an effective method of identifying patients at risk for disease progression and in need of future retinopathy of prematurity treatment.
Retinopathy of prematurity (ROP) is a leading cause of childhood blindness worldwide, but clinical diagnosis is subjective and qualitative.
To describe a quantitative ROP severity score derived using a deep learning algorithm designed to evaluate plus disease and to assess its utility for objectively monitoring ROP progression.
Design, Setting, and Participants
This retrospective cohort study included images from 5255 clinical examinations of 871 premature infants who met the ROP screening criteria of the Imaging and Informatics in ROP (i-ROP) Consortium, which comprises 9 tertiary care centers in North America, from July 1, 2011, to December 31, 2016. Data analysis was performed from July 2017 to May 2018.
A deep learning algorithm was used to assign a continuous ROP vascular severity score from 1 (most normal) to 9 (most severe) at each examination based on a single posterior photograph compared with a reference standard diagnosis (RSD) simplified into 4 categories: no ROP, mild ROP, type 2 ROP or pre-plus disease, or type 1 ROP. Disease course was assessed longitudinally across multiple examinations for all patients.
Main Outcomes and Measures
Mean ROP vascular severity score progression over time compared with the RSD.
A total of 5255 clinical examinations from 871 infants (mean [SD] gestational age, 27.0 [2.0] weeks; 493 [56.6%] male; mean [SD] birth weight, 949  g) were analyzed. The median severity scores for each category were as follows: 1.1 (interquartile range [IQR], 1.0-1.5) (no ROP), 1.5 (IQR, 1.1-3.4) (mild ROP), 4.6 (IQR, 2.4-5.3) (type 2 and pre-plus), and 7.5 (IQR, 5.0-8.7) (treatment-requiring ROP) (P < .001). When the long-term differences in the median severity scores across time between the eyes progressing to treatment and those who did not eventually require treatment were compared, the median score was higher in the treatment group by 0.06 at 30 to 32 weeks, 0.75 at 32 to 34 weeks, 3.56 at 34 to 36 weeks, 3.71 at 36 to 38 weeks, and 3.24 at 38 to 40 weeks postmenstrual age (P < .001 for all comparisons).
Conclusions and Relevance
The findings suggest that the proposed ROP vascular severity score is associated with category of disease at a given point in time and clinical progression of ROP in premature infants. Automated image analysis may be used to quantify clinical disease progression and identify infants at high risk for eventually developing treatment-requiring ROP. This finding has implications for quality and delivery of ROP care and for future approaches to disease classification.
Early identification and management of retinopathy of prematurity (ROP) in infants born with low birth weights lead to improved long-term visual prognosis.1-3 The American Academy of Pediatrics, American Academy of Ophthalmology, and American Association for Pediatric Ophthalmology and Strabismus have established guidelines for ROP screening in preterm infants that rely heavily on data from the Early Treatment for ROP (ETROP) study and the Cryotherapy for ROP (CRYO-ROP) study and on nomenclature from the International Classification of ROP (ICROP).1-5 These data suggest that appropriate screening criteria (with high sensitivity for severe ROP) and timely intervention can greatly reduce the incidence of blindness. Current guidelines allow bedside ophthalmoscopic screening as well as telemedical screening using digital fundus imaging and remote interpretation, with follow-up and management based on ophthalmoscopic or image-based disease classification.5
Retinopathy of prematurity evolves on a relatively predictable timeline, with the level of severity progressing along a clinical continuum from mild to severe disease.1-3,6 On the basis of the ETROP trial, treatment of type 1 disease is known to improve visual outcomes compared with treatment at the level of CRYO-ROP threshold disease, but there was no benefit for treatment of less severe type 2 disease.2
However, ROP diagnosis is subjective and qualitative. Therefore, disease classification for the same infant may vary significantly among examiners,7-12 which may lead to real-world treatment differences.2,7 With the exception of zone I, stage 3 disease, the only difference between type 1 and type 2 disease (and thus the need for treatment) is the diagnosis of plus disease, which according to ICROP is an assessment of the arterial tortuosity and venous dilation of the retinal vessels compared with a historical photograph.4 The Benefits of Oxygen Saturation Targeting II trial found evidence of differences in type 1 ROP diagnosis among investigators in the trial based primarily on the difference in plus disease diagnosis.13 The lack of objective metrics of ROP disease severity and the presence of this same bias in ETROP and CRYO-ROP limit our ability to understand whether these differences in treatment severity may explain differences in visual outcomes or rates of retinal detachment among centers in the real world.
Recent advances in deep learning (DL) have demonstrated promise for automated and objective diagnosis of plus disease in ROP.14-16 Brown et al14 found that a fully automated plus disease classifier developed on data from the Imaging and Informatics in ROP (i-ROP) study, called i-ROP DL, was able to classify plus disease on a 3-level scale (no, pre-plus, and plus) with comparable or better accuracy than clinical experts. As a convolutional neural network, the i-ROP DL system generates hidden features for classifying plus disease based on a single image that presumably correlate with clinical disease features of arterial tortuosity and venous dilation but that are hidden in the deep network. Subsequent work by Redd et al16 found that a quantitative ROP vascular severity score developed using the i-ROP DL system correlated with ICROP disease severity in a cross-sectional evaluation of the i-ROP screening examinations and demonstrated promise as a screening tool in an ROP screening program. However, this technology has not been applied on a large scale to determine whether a continuous vascular severity score might be able to complement telemedicine or ophthalmoscopic disease screening to improve the objectivity of ROP screening, diagnosis, and management.15 In this study, we addressed this gap in knowledge by retrospectively applying this ROP vascular severity score to the prospective i-ROP cohort data to understand whether this score might be used to track changes over time in disease severity and identify patients at risk of progressing to severe treatment-requiring ROP (TR-ROP).
The i-ROP consortium is composed of 9 tertiary referral centers that screen high volumes of at-risk infants for ROP. The present analysis was approved by the Oregon Health & Science University Institutional Review Board, and the overall study was approved by each of the 9 participating institutions (Columbia University, Cornell University, University of Illinois at Chicago, William Beaumont Hospital, Children’s Hospital Los Angeles, Cedars-Sinai Medical Center, University of Miami, and Asociacion para Evitar la Ceguera en Mexico). All institutions abided by the tenets of the Declaration of Helsinki,17 and written informed consent was obtained from parents of all infants enrolled.
Deidentified images from clinical examinations performed from July 1, 2011, to December 31, 2016, were reviewed. All images were obtained using a commercially available camera (RetCam, Natus Medical Inc). Each study eye examination was assigned a reference standard diagnosis (RSD) including all categories of ROP classification: zone (I-III), stage (1-5), and plus (plus, pre-plus, or no plus). The RSD was based on combined findings from ophthalmoscopic and image-based examinations by multiple experts using methods previously published.18 On the basis of the ICROP classification, the examinations were grouped into disease subcategories for each eye examination (by eye): no ROP, mild ROP, type 2 ROP or pre-plus, or type 1 ROP. The cohort was split into 2 groups for this analysis: eyes that developed TR-ROP and eyes that did not. In most cases, eyes in the TR-ROP group had type 1 ROP; however, previous analysis of this data set demonstrated that approximately 10% of the time, patients were treated for less than type 1 ROP based on clinical judgment of the examining ophthalmologist.19 Images were excluded if 2 of 3 image graders labeled them unacceptable for diagnosis or if the clinical diagnosis was stage 4 or 5 ROP with retinal detachment. Data analysis was performed from July 2017 to May 2018.
The i-ROP DL system was used to analyze each image in the data set and classify the probability of plus disease, pre-plus disease, and no disease using previously published methods based on a single posterior-pole fundus image.14 On the basis of these 3 probabilities, an automated ROP vascular severity score was then assigned to each image from 1 (normal retinal vasculature) to 9 (worst retinal vasculature, eg, severe plus disease) using methods previously described ([1 × probability of no disease] + [5 × probability of pre-plus disease] + [9 × probability of plus disease]).16 The disease course was then assessed between the 2 groups throughout the period of inpatient ROP screening by grouping the examinations into five 2-week intervals between 30 and 40 weeks of postmenstrual age (PMA).
Changes in ROP vascular severity score were calculated by looking at the 4 intervals among these 5 time points. Multivariable logistic regression was performed to identify whether early change (before 35 weeks PMA) in the ROP vascular severity score was independently associated with the future development of TR-ROP (with P < .05 denoting statistical significance). Statistical analysis was performed using MATLAB 2014A (MathWorks). Descriptive statistics, 2-tailed, unpaired t test, Kruskal-Wallis test, Wilcoxon rank sum test with Bonferroni correction, and 2-way analysis of variance were used to analyze for an association between the ROP vascular severity score and ROP progression.
A total of 5255 clinical examinations from 871 infants (mean [SD] gestational age, 27.0 [2.0] weeks; 493 [56.6%] male; mean [SD] birth weight, 949  g) were analyzed. In total, 91 eyes (5.4% of the cohort) progressed to TR-ROP. The mean (SD) gestational age was 25.1 (1.2) weeks for those who required treatment and 27.2 (2.0) weeks for those who did not require treatment (P < .001 for both comparisons). The mean (SD) birth weight was 682 (195) g in those who required treatment compared with 964 (297) g in those who did not (P < .001). Treatment occurred a mean (SD) of 37.7 (3.1) weeks PMA. Figure 1 and Figure 2 display a representative example demonstrating the progression of images, RSD classifications, and ROP vascular severity score over time for 2 patients: 1 who developed TR-ROP and 1 who did not.
Figure 3 shows the distribution of all the ROP vascular severity scores for the entire population of patients (cross-sectionally) compared with the RSD disease classification. The median ROP vascular severity of score of examinations with an RSD of no ROP was 1.1 (IQR, 1.0-1.5) compared with 1.5 (IQR, 1.1-3.4) for eyes with mild ROP, 4.6 (IQR, 2.4-5.3) for eyes with type 2 ROP and/or pre-plus disease, and 7.5 (IQR, 5.0-8.7) for eyes with TR-ROP (P < .001 for differences in ROP severity scores between all categorizations by Wilcoxon rank sum test with Bonferroni correction).
Figure 4 displays the distribution of ROP vascular severity scores in cohorts that eventually progressed to TR-ROP vs those who did not. Overall, population-level differences between the 2 cohorts were seen early in follow-up and increased progressively over time. When the long-term differences in the median severity scores across time between the eyes requiring treatment and those that did not eventually require treatment were compared, the median score was higher in the treatment group by 0.06 at 30 to 32 weeks, 0.75 at 32 to 34 weeks, 3.56 at 34 to 36 weeks, 3.71 at 36 to 38 weeks, and 3.24 at 38 to 40 weeks postmenstrual age (P < .001 for all comparisons). The median vascular severity score increased with PMA even in patients who did not progress to TR-ROP, with a higher median score of 1.87 at 38 to 40 weeks PMA compared with a median score of 1.02 at 30 to 32 weeks PMA (P < .001). This finding suggests that PMA should be factored into any image-based risk prediction models and that the convolutional neural network may be associated with PMA based on vascular features.
Figure 5 shows the rate of change of the ROP vascular severity score over time. In this analysis, we found that eyes that developed TR-ROP had a higher mean rate of change in the ROP vascular severity score (range, 0.40-1.32 points per week) compared with eyes that never developed TR-ROP (range, 0.08-0.20 point per week). The early separation of vascular severity scores between the 2 cohorts suggests that this information may be useful for determining which eyes may be at highest risk for progression to TR-ROP. In multivariable logistic regression controlling for birth weight and gestational age, the rate of change of the ROP vascular severity score at both 32 and 34 weeks was independently associated with future TR-ROP.
We report the results of an automated ROP vascular severity score applied to the i-ROP multi-institution cohort study. The key findings of this study are as follows. First, a quantitative ROP vascular severity score may be derived from DL methods and was associated with clinical disease severity. Second, with the use of this technique, ROP severity can be tracked over time and used to identify patients with TR-ROP. Third, the rate of change in the ROP vascular severity score may be independently associated with disease worsening and have implications for modeling future disease risk.
Brown et al14 found that DL methods can classify plus disease with comparable or better accuracy than experts. The first key finding of this study suggests that we may extend that work to demonstrate that an ROP vascular severity score calculated from the DL algorithm can quantify severity differences among clinical disease levels (Figure 3). This has important implications for introducing an element of objectivity into the diagnosis of ROP, which currently remains based on the subjective interpretation of qualitative disease features. This finding has implications for research, teaching, and patient care. Heterogeneity in plus disease diagnosis has existed in, to our knowledge, all the randomized clinical trials in ROP, which limits generalizability of the findings to specific examiners and may account for heterogeneity in outcomes.1-3,13 Use of this severity score in clinical trials (or applied retrospectively to existing clinical trials) could provide insight into the current practice patterns of treating physicians as they relate to clinical trial outcomes. A better understanding of the objective level of disease severity being labeled as plus disease will lead to more standardized application of treatment guidelines by trainees in the future. There may be other states, such as pulmonary hypertension or the use of nitrous oxide, which elevate the ROP vascular severity score. In the future, incorporating an objective severity score into the evaluation of patients with ROP could, like other objective measurements of disease severity (eg, blood pressure), be put into the clinical context for decision making.
Furthermore, there are multiple lines of evidence indicating that regional variation exists in the application of clinical trial data in ROP.2,7,9-11,13,20 Specifically, because of the subjectivity of plus disease diagnosis and evidence of systematic bias between examiners, infants with the same image-graded level of disease may be treated differently by different physicians.13 One future application of the ROP vascular severity score would be to prospectively analyze objective treatment thresholds to identify the appropriate level of plus disease to guide treatment. In this study, the mean score at the time of treatment was 7. Gupta et al21 have reported that a higher ROP vascular severity score at the time of treatment was associated with a risk of laser treatment failure (needing >1 laser treatment). Future work may identify whether management should be guided by an objective level of vascular severity (an objective metric of plus disease) or whether other factors, such as the extent of extraretinal neovascularization (as was previously incorporated in the CRYO-ROP threshold disease definition), should return to the management paradigms.
The second key study finding is that ROP vascular severity score may be used to track clinical disease progression. Although there have been multiple reports of using computer-based image analysis to classify disease at a specific point in time, few articles15,22,23 have looked at objectively monitored changes in disease severity over time. We found that the ROP vascular severity score increased in all infants during development (Figure 4). These quantitative methods may provide insights about the normal process of vascularization and identify infants at risk for developing more aggressive disease. The severity score of infants who would subsequently develop TR-ROP was different at every time point, with differences emerging by 34 weeks PMA (Figure 4). This finding suggests that information contained in the image at a specific point in time may identify patients at higher future risk for developing severe TR-ROP. This finding may have important clinical applications for prognosis, delivery of care, and disease management using traditional bedside examination or telemedicine.5,6,24,25
The third key finding is that changes in quantitative ROP vascular severity score between images over time may help identify patients who eventually require treatment for ROP. This finding appears to be an independent risk factor beyond clinical factors; demographic factors, such as birth weight and gestational age; and ROP vascular severity score at individual points in time. Although some work has considered how practitioners may or may not factor in change in retinal vascular appearance over time,22,23 this clinical judgment is not part of existing classification schemas or educational paradigms. Moreover, current screening paradigms do not factor this information into referral, treatment, or follow-up guidelines.5 Further analyses of these and other data using DL may increase understanding of the natural progression of ROP and clinical phenotypic characteristics associated with disease progression. A future randomized clinical trial could then assess benefit of earlier intervention compared with ETROP type 1 disease.
There have been a number of attempts to develop risk models in ROP, with 2 main objectives: (1) to reduce the number of screening examinations by identifying patients at lowest risk of having disease and (2) to identify patients at highest risk of progressing to treatment.26-30 Redd et al16 demonstrated that applied retrospectively to the i-ROP cohort, use of this ROP vascular severity score may be associated with a reduction in the burden of ophthalmoscopic examinations by 80% with an area under the receiver operating characteristic of 0.95 for detection of type 1 disease. The ETROP trial also used the Risk Management Model for ROP 2 (RM-ROP2) to randomize only those infants at highest risk of progression to earlier treatment.29 In this study, the mean rate of change in ROP vascular severity score was higher in eyes progressing to TR-ROP as early as 32 weeks (ie, between the first and second screening examinations), which is 5 weeks before the mean PMA of treatment in this cohort. These pretreatment differences in rate of change between the 2 groups were significant at all time points analyzed and were also independent of birth weight or gestational age at earlier PMAs. These data suggest that the rate of change of the ROP vascular severity score may provide added prognostic value to the clinical demographic factors included in most risk models, which could lead to closer follow-up for high-risk infants and provide an objective measurement of disease risk progression to guide clinical decision making.
There are a number of limitations and opportunities for future improvements in this analysis. The study was cross-sectional and separated into only 2 cohorts. In the future, it would be beneficial to better understand change in DL features between individual patients over time and in patients who develop mild and moderate disease vs no ROP. The technology is currently only available in research studies under institutional review board approval; thus, the real-world consequences of these findings are currently limited. Further work needs to be performed to demonstrate the association between image quality, field of view, and camera system with the ROP severity score. Finally, this study was performed in a North American cohort of infants. It will be necessary to validate this system in larger patient cohorts from developed countries. Furthermore, it will be necessary to perform similar analysis in cohorts outside North America to assess the generalizability of the study findings, especially in low- and middle-income countries where ROP is epidemic and oxygen regulation is known to produce a more aggressive phenotype.
Deep learning using convolutional neural networks is a rapidly developing method of automated image analysis in a variety of medical specialties, including ophthalmology.31-33 However, the precise way in which this technology will be incorporated into existing practice paradigms or change existing practice paradigms remains to be seen.34 In this study, an automated ROP vascular severity score obtained from posterior-pole fundus images in patients with ROP effectively distinguished disease progression in infants undergoing screening for ROP. With further validation, these results may have broad implications for the potential role of similar technologies in ROP screening, diagnosis, and management.
Accepted for Publication: April 14, 2019.
Corresponding Author: Michael F. Chiang, MD, Casey Eye Institute, Oregon Health & Science University, 3375 SW Terwilliger Blvd, Portland, OR 97239 (email@example.com).
Published Online: July 3, 2019. doi:10.1001/jamaophthalmol.2019.2433
Author Contributions: Drs Taylor and Brown contributed equally to this work. Drs Kalpathy-Cramer and Chiang supervised this work equally. Dr Chiang had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Taylor, Campbell, Erdogmus, Ioannidis, Kim, Kalpathy-Cramer, Chiang.
Acquisition, analysis, or interpretation of data: Taylor, Brown, Gupta, Campbell, Ostmo, Chan, Dy, Erdogmus, Kim, Kalpathy-Cramer, Chiang.
Drafting of the manuscript: Taylor, Gupta, Campbell, Ioannidis, Chiang.
Critical revision of the manuscript for important intellectual content: Taylor, Brown, Gupta, Campbell, Ostmo, Chan, Dy, Erdogmus, Kim, Kalpathy-Cramer, Chiang.
Statistical analysis: Taylor, Gupta, Campbell, Chiang.
Obtained funding: Erdogmus, Ioannidis, Kalpathy-Cramer, Chiang.
Administrative, technical, or material support: Brown, Campbell, Ostmo, Chiang.
Supervision: Campbell, Chan, Dy, Erdogmus, Kalpathy-Cramer, Chiang.
Conflict of Interest Disclosures: Dr Campbell reported receiving grants from Genentech and personal fees from Allergan outside the submitted work. Dr Chan reported receiving personal fees from Alcon, Allergan, Beyeonics, Visunex, and Genentech outside the submitted work and is a cofounder of Paire Health. Dr Dy reported receiving grants from the National Science Foundation during the conduct of the study. Dr Ioannidis reported receiving grants from the National Science Foundation, Defense Advance Research Projects Agency, and the US Department of Defense outside the submitted work. Dr Kalpathy-Cramer reported receiving grants from the National Institutes of Health and the National Science Foundation during the conduct of the study and receiving personal fees from Infotech outside the submitted work. Dr Chiang reported receiving grants from the National Institutes of Health, National Science Foundation, and Genentech; receiving nonfinancial support from Clarity Medical Systems and receiving personal fees from Novartis and Inteleretina outside the submitted work. Drs Campbell, Ioannidis, Kalpathy-Cramer, and Chiang report a patent pending for DeepROP. No other disclosures were reported.
Funding/Support: This project was supported by grants R01EY19474, K12EY027720, and P30EY10572 from the National Institutes of Health; grants SCH-1622679, SCH-1622542, and SCH-1622536 from the National Science Foundation; and unrestricted departmental funding and a Career Development Award (Dr Campbell) from Research to Prevent Blindness.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
The Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium: Oregon Health & Science University (Portland, Oregon): Michael F. Chiang, MD, Susan Ostmo, MS, Sang Jin Kim, MD, PhD, Kemal Sonmez, PhD, J. Peter Campbell, MD, MPH. University of Illinois at Chicago (Chicago, Illinois): RV Paul Chan, MD, Karyn Jonas, RN. Columbia University (New York, New York): Jason Horowitz, MD, Osode Coki, RN, Cheryl-Ann Eccles, RN, Leora Sarna, RN. Weill Cornell Medical College (New York, New York): Anton Orlin, MD. Bascom Palmer Eye Institute (Miami, Florida): Audina Berrocal, MD, Catherin Negron, BA. William Beaumont Hospital (Royal Oak, Michigan): Kimberly Denser, MD, Kristi Cumming, RN, Tammy Osentoski, RN, Tammy Check, RN, Mary Zajechowski, RN. Children’s Hospital Los Angeles (Los Angeles, California): Thomas Lee, MD, Evan Kruger, BA, Kathryn McGovern, MPH. Cedars Sinai Hospital (Los Angeles, California): Charles Simmons, MD, Raghu Murthy, MD, Sharon Galvis, NNP. LA Biomedical Research Institute (Los Angeles, California): Jerome Rotter, MD, Ida Chen, PhD, Xiaohui Li, MD, Kent Taylor, PhD, Kaye Roll, RN. Massachusetts General Hospital (Boston, Massachusetts): Jayashree Kalpathy-Cramer, PhD. Northeastern University (Boston, Massachusetts): Deniz Erdogmus, PhD, Stratis Ioannidis, PhD. Asociacion para Evitar la Ceguera en Mexico (APEC) (Mexico City, Mexico): Maria Ana Martinez-Castellanos, MD, Samantha Salinas-Longoria, MD, Rafael Romero, MD, Andrea Arriola, MD, Francisco Olguin-Manriquez, MD, Miroslava Meraz-Gutierrez, MD, Carlos M. Dulanto-Reinoso, MD, Cristina Montero-Mendoza, MD.
Meeting Presentation: This paper was presented at the Annual Meeting of the Association for Research in Vision and Ophthalmology; May 2, 2018; Honolulu, Hawaii.