A, Standard published photograph of plus disease. B, A representative fundus image, where the white dashed line demarcates the approximate field of view of the standard published photograph. C, Results of manual segmentation of retinal vessels. D, Circular crops with the radius of 1 to 6 disc diameters for subsequent analysis.
A, Standard photograph. B, Representative wide-angle example of plus disease. C, The same retinal image as in part B, cropped to the field of view of the standard photograph. D, Study image with plus disease, with venous tortuosity out of proportion to arterial tortuosity.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Campbell JP, Ataer-Cansizoglu E, Bolon-Canedo V, et al. Expert Diagnosis of Plus Disease in Retinopathy of Prematurity From Computer-Based Image Analysis . JAMA Ophthalmol. 2016;134(6):651–657. doi:10.1001/jamaophthalmol.2016.0611
Published definitions of plus disease in retinopathy of prematurity (ROP) reference arterial tortuosity and venous dilation within the posterior pole based on a standard published photograph. One possible explanation for limited interexpert reliability for a diagnosis of plus disease is that experts deviate from the published definitions.
To identify vascular features used by experts for diagnosis of plus disease through quantitative image analysis.
Design, Setting, and Participants
A computer-based image analysis system (Imaging and Informatics in ROP [i-ROP]) was developed using a set of 77 digital fundus images, and the system was designed to classify images compared with a reference standard diagnosis (RSD). System performance was analyzed as a function of the field of view (circular crops with a radius of 1-6 disc diameters) and vessel subtype (arteries only, veins only, or all vessels). Routine ROP screening was conducted from June 29, 2011, to October 14, 2014, in neonatal intensive care units at 8 academic institutions, with a subset of 73 images independently classified by 11 ROP experts for validation. The RSD was compared with the majority diagnosis of experts.
Main Outcomes and Measures
The primary outcome measure was the percentage of accuracy of the i-ROP system classification of plus disease, with the RSD as a function of the field of view and vessel type. Secondary outcome measures included the accuracy of the 11 experts compared with the RSD.
Accuracy of plus disease diagnosis by the i-ROP computer-based system was highest (95%; 95% CI, 94%-95%) when it incorporated vascular tortuosity from both arteries and veins and with the widest field of view (6–disc diameter radius). Accuracy was 90% or less when using only arterial tortuosity and 85% or less using a 2– to 3–disc diameter view similar to the standard published photograph. Diagnostic accuracy of the i-ROP system (95%) was comparable to that of 11 expert physicians (mean 87%, range 79%-99%).
Conclusions and Relevance
Experts in ROP appear to consider findings from beyond the posterior retina when diagnosing plus disease and consider tortuosity of both arteries and veins, in contrast with published definitions. It is feasible for a computer-based image analysis system to perform comparably with ROP experts, using manually segmented images.
Retinopathy of prematurity (ROP) is a vasoproliferative disease affecting premature infants. Since the 1980s, clinical diagnosis has been standardized using the International Classification for Retinopathy of Prematurity (ICROP).1,2 Diagnostic cut points identifying severe ROP requiring treatment have been proposed and tested.3 As a result of multicenter clinical trials, it is known that severe ROP may be successfully treated if it is diagnosed early.3,4 Furthermore, in spite of these advances in diagnosis and treatment, ROP continues to be a leading cause of childhood blindness worldwide.5,6
The Early Treatment for Retinopathy of Prematurity multicenter clinical trial showed that plus disease is the most important ICROP parameter for identifying ROP requiring treatment.3 Plus disease is defined as arterial tortuosity and venous dilation in the posterior pole greater than that of a standard published photograph selected by expert consensus during the 1980s.1 Thus, accurate and consistent diagnosis of plus disease is critical to ensure that infants at risk for blindness receive the appropriate screening and treatment. Since that time, a newer pre-plus category has been defined by the revised ICROP as retinal vascular abnormalities that are insufficient for plus disease but have more arterial tortuosity and venous dilation than normal.1
There are numerous limitations in the definition of plus disease. Studies have found that clinical diagnosis of plus disease is subjective and varies among experts.7,8 The definition of ICROP explicitly states that plus disease refers only to arterial tortuosity and venous dilation within the posterior pole vessels, and the standard published photograph displays only a very narrow-angle retinal view of 2 to 3 disc diameters (DDs) (Figure 1A). However, previous work suggests that experts consider additional retinal features (such as venous tortuosity) and larger retinal fields of view during real-world clinical diagnosis.9-11 Better understanding of the retinal vascular abnormalities that characterize plus disease will lead to improved clinical diagnosis, education,12-14 and methods for automated, computer-based diagnosis.15,16
Our purpose is to identify quantitative retinal vascular features that correlate with diagnosis of plus disease by ROP experts. We have developed a computer-based image analysis system (Imaging and Informatics in ROP [i-ROP]) and have demonstrated that it can accurately identify plus disease and pre-plus disease.17 In this study, we use i-ROP to correlate quantitative vascular features with a reference standard diagnosis (RSD) defined by consensus of image reviews by 3 expert ROP image graders combined with the clinical diagnosis.
Question What does computer-based image analysis demonstrate regarding how experts classify plus disease?
Findings This study used a computer-based image analysis program to determine which vascular features best correlated with expert diagnosis of plus disease in retinopathy of prematurity. The computer-based image analysis system accurately classified 95% of images only when analyzing with the widest field of view and including tortuosity information from both arteries and veins.
Meaning These findings suggest that experts may use information outside of the technical definition of plus disease and that computer-based image analysis systems may perform as well as human graders.
We developed a database of 77 wide-angle retinal images acquired during routine clinical care and established an RSD for each image using previously published methods that combine interpretations and the clinical diagnosis from 3 expert graders (independent, masked gradings from 2 ophthalmologists and 1 ROP study coordinator: M.F.C., R.V.P.C, and S.O.).18 When the majority of the experts independently agreed with the clinical diagnosis (made at bedside using indirect ophthalmoscopy by the examining physician), this consensus became the RSD. If there was disagreement, the image was discussed among the group for consensus to determine the RSD. Among these 77 images, 14 had an RSD of plus disease, 16 had pre-plus disease, and 47 were normal. Each image was manually segmented by one of us (S.N.P.) to identify retinal vessels for computer-based analysis and cropped into circles in a range of sizes based on methods described previously (Figure 1).17 This study was approved by the Institutional Review Board at Oregon Health & Science University, and followed the tenets of the Declaration of Helsinki. Written informed consent was obtained from parents of infants enrolled in this study.
We developed the i-ROP computer-based image analysis system.17 We used 11 previously described measurements of dilation and tortuosity and designed the i-ROP system to identify the vascular features that best classified normal images, pre-plus images, and plus disease images into the correct categories compared with the RSD.17,19 From each measurement algorithm, we obtained a set of distribution values for each image and analyzed them using a 2-component gaussian mixed model system, which helped us represent the image features extracted from a vasculature structure as a probability distribution (with 2 components, respectively, representing the straight and tortuous segments) rather than a set of values coming from regular statistics, such as mean and median. Finally, we compared the ordered probability distributions for each image and compared them with the RSD to design the system to correctly classify the images as plus, pre-plus, or normal. We examined several algorithms to quantify vascular tortuosity, and found that the most accurate algorithm used an acceleration function, which was a point-based feature, defined as the second derivative of the best-fit line at each point on the vascular tree.17
Agreement of the RSD (plus, pre-plus, or normal) with findings from computer-based image analysis was examined over a range of circular image crop sizes from a 1-DD radius to a 6-DD radius (Figure 1). Agreement of the RSD with findings from computer-based image analysis was then examined when considering arteries only, veins only, and all vessels. To improve the external validity of image analysis, we used a leave-one-out cross-validation process, which involves a repeated process of using a subset of the data to train the system and compare with the remaining data set.17 We then used a jackknife variance estimate to calculate standard deviation and 95% CIs for the performance of the system for each subset of values along with Stata, version 11.0 (StataCorp) for determination of statistical significance using the t test function.
Finally, to determine whether the RSD was adequately representative of the general population of physicians, a subset of 73 of the 77 wide-angle retinal images was separately classified by 11 clinical experts as part of a larger set of 100 images (4 images [all of which were unanimously diagnosed as plus disease by all experts, clinical diagnosis, and the i-ROP system] were excluded from the larger data set for methodological reasons) from June 29, 2011, to October 14, 2014. All 11 experts were experienced in ROP examination and treatment and had all participated as experts in ROP research studies funded by the National Institutes of Health and/or published at least 2 peer-reviewed articles on ROP. To examine the classification performance of the i-ROP computer-based image analysis system, we calculated the tabular confusion matrix for the plus disease classification of the RSD compared with the i-ROP system for each of the 77 images.
The performance of i-ROP computer-based analysis improved with increased field of view: accuracy compared with the RSD generally improved from a 1-DD radius field of view (accuracy, 64%; 95% CI, 61%-67% when considering all vessels) to a 3-DD radius field of view (approximate field of view of the standard photograph; accuracy, 85%; 95% CI, 83%-86%) to a 6-DD radius field of view (accuracy, 95%; 95% CI, 94%-95%; P < .001 vs 3DD) (Table 1). Findings also demonstrated improved i-ROP system performance when considering all vessels (accuracy, 95%; 95% CI, 94%-95% with a 6-DD radius field of view) rather than only arteries (accuracy, 87%; 95% CI, 86%-88% with a 6-DD radius field of view; P < .001) or only veins (accuracy, 79%; 95% CI, 77%-81% with a 6-DD radius field of view; P < .001). None of the dilation features performed better than 80% accuracy.17 Examples of study images demonstrating these points are shown in Figure 2.
Table 2 displays the accuracy of the i-ROP system in diagnosing plus disease compared with the RSD. For an additional comparison, the diagnostic accuracy of 11 clinical experts was also determined using a subset of 73 of 77 images. Among the 73 images, accuracy of the 11 clinical experts ranged from 79%-99% (mean, 87%), and accuracy of the i-ROP image analysis system was 95% (69 of 73 images). When the diagnosis selected by the majority of the 11 experts was compared with the RSD, the accuracy of the clinical experts was 97% (71 of 73 images).
Table 3 provides details about diagnostic performance of the i-ROP system on the full data set of 77 images, including the 4 images that were diagnosed incorrectly compared with the RSD. The i-ROP system performed similarly to the 3 expert image graders who collectively determined the RSD for the 77 images, who individually agreed with the RSD in 72 (94%), 74 (96%), and 71 (92%) images.
This study examines the retinal features used by ROP experts while classifying plus disease by correlating their diagnoses with prespecified quantitative parameters from computer-based image analysis. Key results of this study are that ROP experts appear to consider findings from beyond the central retina when diagnosing plus disease and consider tortuosity of both arteries and veins, and it is feasible for a computer-based image analysis system to perform comparably to ROP experts, using manually segmented images.
These findings suggest that physicians incorporate information beyond the ICROP definition of plus disease, which is based only on abnormality within the posterior pole vessels. Although the 2005 revised ICROP classification provided examples of wide-field fundus images of pre-plus disease, there continues to be only a single standard published photograph of plus disease, which displays a very narrow field of view (Figure 1A and Figure 2A).1 In our study, a computer-based machine learning system for identifying plus disease was more accurate when evaluating larger-angle fields of view (Table 1). Similarly, a previous study has demonstrated that the accuracy of experts decreases when fundus images are cropped closer to the field of view of the standard photograph (Figure 2B).10 In combination, this finding suggests that the mid-peripheral and peripheral vessels contain information that physicians use diagnostically to improve their diagnosis, but that information was not included in the original definition of plus disease, which is consistent with previously published literature.9-11 Similarly, in a different study using qualitative research methods to analyze the diagnostic process of experts, we found that some experts specifically cited peripheral retinal vascular features as being useful for the diagnosis of plus disease.9
Our findings also show that diagnosis of plus disease by experts incorporates tortuosity of both arteries and veins rather than only the arteries as suggested in the ICROP definition.1 In fact, a previous study has shown that veins in the standard published photograph (Figure 1A and Figure 2A) actually have greater tortuosity than the arteries.20 This outcome is consistent with findings by Wilson et al,21 who demonstrated that quantitative arterial and venous tortuosity both increased with worsening clinical stage of ROP, and both increased regardless of whether 4 or 8 vessels were analyzed. This finding may be useful from a clinical perspective because distinguishing between retinal arteries and veins in infants with ROP can be difficult for a computer system or even for expert physicians.22 Furthermore, results of this study are consistent with other ROP algorithms and suggest that retinal vascular tortuosity is more correlated with expert diagnosis than venous dilation, although this finding may result in part because quantitative analysis of vascular dilation is dependent on external factors, such as image magnification.23
Because this study finds that expert diagnosis of plus disease is not consistent with published ICROP definitions and there is some interexpert diagnostic variability, it may be that individual experts are weighting particular retinal features differently to arrive at their diagnosis.8,9 This finding has implications for ROP education given that previous work has shown that trainees often perform poorly in clinical diagnosis.10,12-14,24 An important step toward improving clinical diagnosis and education would be through dissemination of wider-angle representative retinal images of ROP. To foster that development, we are creating a website (http://www.i-rop.com) that provides examples of such images. We acknowledge that it may be problematic to change the definition of plus disease ex post facto in terms of extrapolating prognostic information to a population that may be different than the original population, but we believe it would be helpful for all experts to use the same validated parameters. In the original Cryotherapy for Retinopathy of Prematurity study when using only the criterion standard of binocular indirect ophthalmoscopy, experts disagreed on the need for treatment based on a diagnosis of threshold disease in 12% of cases.25,26
These study results confirm that computer-based image analysis systems can perform similarly to physicians and provide objective measurements of the vascular features labeled plus disease and pre-plus disease. In addition, our findings raise the question as to whether our attempts at developing computer-based image analysis systems ought to be constrained by existing clinical definitions (eg, arterial tortuosity and venous dilation) or whether, as in this case, they should model real-world physician improvement and be designed to fit the data. As described above, our study results are consistent with previous work suggesting that distinguishing between arteries and veins is not necessary for computer-based image analysis development and that quantitative measurements of vascular dilation are less accurate than measurements of tortuosity for diagnosis of plus disease diagnosis.16,21 From this analysis, we cannot conclude that experts are ignoring dilation features; for example, it may be that measurements of tortuosity are more continuous as severity of plus disease increases, whereas venous dilation correlates strongly only at one or both ends of the spectrum of plus disease, and therefore tortuosity works better as a single quantitative variable. This theory may partially explain why other image analysis systems, such as ROPTool (FocusROP), have lower receiver operating characteristic curves with dilation measures, or it may be related to the imprecision and technical challenges of measuring vascular dilation on RetCam (Clarity Medical Systems) images.22,23,27
Although it is difficult to directly compare the performance of the i-ROP system with prior systems, such as ROPTool, given the use of manual segmentation, different populations studied, and different end points for comparison, the i-ROP system appears to have at least 2 key advantages to previously reported systems, which have been discussed in detail in a prior publication.17 Most important, given the high interexpert variability for the diagnosis of plus disease, the i-ROP system was trained against a rigorously developed RSD as opposed to a single clinical examiner’s impression. Second, the i-ROP system is the first reported system to accurately incorporate pre-plus disease into the semi-automated classification. Using the i-ROP system, no eyes with plus disease were classified as normal, and no normal eyes were classified as having plus disease (Table 3). In comparison, published results using ROPTool have reported a sensitivity of plus disease detection (against a clinical examiner and without identification of pre-plus disease) between 71% and 86%27,28 and, in a recent publication by Abbey et al,23 a sensitivity of 91% for any vascular abnormality (≥1 on a scale of 0-16).
The advantage of the continuous scale by Abbey et al23 is that it avoids the inherent limitation in all studies involving computer-based diagnosis, which is the identification of a criterion standard.15 In our study, the reference standard was developed by combining the image-based diagnoses of 3 experts in ROP image grading combined with the actual clinical diagnosis. The majority vote of the 11 physicians agreed with the RSD in 71 of 73 (97%) cases, suggesting that the external validity of our RSD is high, and therefore the implications of these findings are likely generalizable to the larger population of ROP physicians. Although the majority consensus of the 11 physicians performed well with the reference standard, there was variability even among experts in this group, which may imply different thresholds for each physician, attention to different vascular features, or imprecision in the classification of the physicians. The clinical relevance of this variability is unknown, and these questions will be avenues of future research.
There are several additional limitations to this study. First, the i-ROP system was trained using manually segmented images, which limits its immediate clinical utility. For the purposes of designing the machine learning software, we wanted to avoid the potential noise caused by automated segmentation algorithms; however, we are working on using the information gleaned from this work to develop a fully automated system. Second, despite the cross-validation procedures, these data will be more convincing when they are validated against other data sets. With only 77 images in the data set, it is possible that different parameters would be found to be more important in other image sets. Third, we did not analyze whether a combination of image features would have worked better than individual features, so we did not compare arterial tortuosity and venous dilation with our current system, which uses only a measurement of tortuosity. It may be that some combination of image features actually performs better than the current i-ROP system. Last, there is variability in the routine use of photographic diagnosis during routine clinical care, and some of the variability of the expert consensus may be owing to unfamiliarity with this technique and not represent true clinical differences.
These findings have important implications for quality of care, education, and delivery of care with emerging technologies, such as computer-based decision support and telemedicine.29-31 Physicians and trainees can use these findings to improve standardization of plus disease in diagnosis of ROP. Using the i-ROP system, it may be possible to develop a fully automated classification system for plus disease that performs as well as experts at 3-level classification.
Submitted for Publication: November 30, 2015; accepted February 8, 2016.
Corresponding Author: Michael F. Chiang, MD, Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 3375 SW Terwilliger Blvd, Portland, OR 97239 (firstname.lastname@example.org).
Published Online: April 14, 2016. doi:10.1001/jamaophthalmol.2016.0611.
Author Contributions: Dr Chiang had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Campbell, Ataer-Cansizoglu, Bolon-Canedo, Patel, Martinez-Castellanos, Jonas, Chan, Chiang.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Campbell, Bolon-Canedo, Patel, Chan, Chiang.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Campbell, Ataer-Cansizoglu, Bozkurt, Erdogmus, Patel, Martinez-Castellanos.
Obtained funding: Erdogmus, Kalpathy-Cramer, Chiang.
Administrative, technical, or material support: Ataer-Cansizoglu, Kalpathy-Cramer, Patel, Reynolds, Shapiro, Repka, Drenser, Ostmo, Jonas, Chiang.
Study supervision: Erdogmus, Kalpathy-Cramer, Ferrone, Martinez-Castellanos, Chan, Chiang.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Chiang reported serving as an unpaid member of the Scientific Advisory Board for Clarity Medical Systems and as a consultant for Novartis. Dr Reynolds reported serving as a consultant for Novartis. No other conflicts were reported.
Funding/Support: This study was supported by grants R01 EY19474, P30 EY010572 (Drs Chiang and Campbell), and R21 EY022387 (Drs Erdogmus, Chiang, and Kalpathy-Cramer) from the National Institutes of Health, unrestricted departmental funding from Research to Prevent Blindness (Drs Campbell, Reynolds, Repka, Chan, and Chiang and Mss Patel, Ostmo, and Jonas), the St Giles Foundation (Dr Chan), and the iNsight Foundation (Dr Chan and Ms Jonas).
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Group Information: The Imaging and Informatics in ROP (i-ROP) Research Consortium includes Michael F. Chiang, MD, Susan Ostmo, MS, Kemal Sonmez, PhD, and J. Peter Campbell, MD, MPH (Oregon Health & Science University, Portland, OR); R.V. Paul Chan, MD, and Karyn Jonas, RN (Cornell University, New York, NY); Jason Horowitz, MD, Osode Coki, RN, Cheryl-Ann Eccles, RN, and Leora Sarna, RN (Columbia University, New York, NY); Audina Berrocal, MD, and Catherin Negron, BA (Bascom Palmer Eye Institute, Miami, FL); Kimberly Drenser, MD, Kristi Cumming, RN, Tammy Osentoski, RN, and Tammy Check, RN (William Beaumont Hospital, Royal Oak, MI); Thomas Lee, MD, Evan Kruger, BA, and Kathryn McGovern, MPH (Children’s Hospital Los Angeles, Los Angeles, CA); Charles Simmons, MD, Raghu Murthy, MD, and Sharon Galvis, NNP (Cedars Sinai Hospital, Los Angeles, CA); Jerome Rotter, MD, Ida Chen, PhD, Xiaohui Li, MD, and Kaye Roll, RN (LA Biomedical Research Institute, Los Angeles, CA); Jayashree Kalpathy-Cramer, PhD (Massachusetts General Hospital, Boston, MA); Deniz Erdogmus, PhD (Northeastern University, Boston, MA); and Maria Ana Martinez-Castellanos, MD, Samantha Salinas-Longoria, MD, Rafael Romero, MD, and Andrea Arriola, MD (Asociacion para Evitar la Ceguera en Mexico (APEC), Mexico City, Mexico).
Create a personal account or sign in to: