A, Classified as type 1 ROP by ophthalmoscopy (zone I, stage 3, with plus disease), whereas all telemedicine graders classified it as mild retinopathy (zone II, stage 1-2, without plus disease). B, Classified as no retinopathy by ophthalmoscopy (zone III, stage 0, no plus disease), but classified as mild retinopathy according to the reference standard diagnosis (zone II, stage 2, no plus disease). C, Classified as type 2 ROP by 1 telemedicine grader (zone I, stage 2, preplus disease), whereas the ophthalmoscopic and reference standard diagnoses classified it as type 1 ROP (zone I, stage 3, with plus disease). D, Classified as type 1 ROP by 1 telemedicine grader (zone I, stage 3, no plus disease), compared with the reference standard classification of type 2 ROP (zone I, stage 2, no plus disease).
Customize your JAMA Network experience by selecting one or more topics from the list below.
Biten H, Redd TK, Moleta C, et al. Diagnostic Accuracy of Ophthalmoscopy vs Telemedicine in Examinations for Retinopathy of Prematurity. JAMA Ophthalmol. 2018;136(5):498–504. doi:10.1001/jamaophthalmol.2018.0649
Is ophthalmoscopy or telemedicine more accurate in diagnosing clinically significant retinopathy of prematurity (ROP) when compared with a reference standard diagnosis?
In this multicenter study, each method was slightly more accurate for different components of retinopathy of prematurity (zone, stage, and plus disease), but there was no statistically significant difference in their ability to detect clinically significant retinopathy of prematurity. Both methods demonstrate high interexaminer variability.
Telemedicine is as effective as ophthalmoscopy in identifying clinically significant retinopathy of prematurity, but both methods demonstrate high interexaminer variability; future studies should use a consensus reference rather than ophthalmoscopy as the criterion standard.
Examinations for retinopathy of prematurity (ROP) are typically performed using binocular indirect ophthalmoscopy. Telemedicine studies have traditionally assessed the accuracy of telemedicine compared with ophthalmoscopy as a criterion standard. However, it is not known whether ophthalmoscopy is truly more accurate than telemedicine.
To directly compare the accuracy and sensitivity of ophthalmoscopy vs telemedicine in diagnosing ROP using a consensus reference standard.
Design, Setting, and Participants
This multicenter prospective study conducted between July 1, 2011, and November 30, 2014, at 7 neonatal intensive care units and academic ophthalmology departments in the United States and Mexico included 281 premature infants who met the screening criteria for ROP.
Each examination consisted of 1 eye undergoing binocular indirect ophthalmoscopy by an experienced clinician followed by remote image review of wide-angle fundus photographs by 3 independent telemedicine graders.
Main Outcomes and Measures
Results of both examination methods were combined into a consensus reference standard diagnosis. The agreement of both ophthalmoscopy and telemedicine was compared with this standard, using percentage agreement and weighted κ statistics.
Among the 281 infants in the study (127 girls and 154 boys; mean [SD] gestational age, 27.1 [2.4] weeks), a total of 1553 eye examinations were classified using both ophthalmoscopy and telemedicine. Ophthalmoscopy and telemedicine each had similar sensitivity for zone I disease (78% [95% CI, 71%-84%] vs 78% [95% CI, 73%-83%]; P > .99 [n = 165]), plus disease (74% [95% CI, 61%-87%] vs 79% [95% CI, 72%-86%]; P = .41 [n = 50]), and type 2 ROP (stage 3, zone I, or plus disease: 86% [95% CI, 80%-92%] vs 79% [95% CI, 75%-83%]; P = .10 [n = 251]), but ophthalmoscopy was slightly more sensitive in identifying stage 3 disease (85% [95% CI, 79%-91%] vs 73% [95% CI, 67%-78%]; P = .004 [n = 136]).
Conclusions and Relevance
No difference was found in overall accuracy between ophthalmoscopy and telemedicine for the detection of clinically significant ROP, although, on average, ophthalmoscopy had slightly higher accuracy for the diagnosis of zone III and stage 3 ROP. With the caveat that there was variable accuracy between examiners using both modalities, these results support the use of telemedicine for the diagnosis of clinically significant ROP.
Quiz Ref IDRetinopathy of prematurity (ROP) is a leading cause of childhood blindness worldwide,1-4 and its effect on public health continues to grow as advances in perinatal medicine allow for improved survival of premature infants.5-7 Retinopathy of prematurity is amenable to screening interventions, as it is detectable before it causes loss of vision and prompt recognition and treatment can delay or reverse adverse outcomes.8-11 As a result, the American Academy of Pediatrics, American Academy of Ophthalmology, American Association for Pediatric Ophthalmology and Strabismus, and American Association of Certified Orthoptists have issued a joint policy statement detailing guidelines for ROP screening.12 The consensus statement specifies that all infants who meet the screening criteria should undergo dilated retinal examination using binocular indirect ophthalmoscopy.
Unfortunately, a lack of access to trained ophthalmologists with experience diagnosing ROP via ophthalmoscopy prevents many premature infants from receiving adequate screening, both in developed and underdeveloped countries.13-17 Telemedical screening via remote review of dilated ophthalmoscopic images has been proposed as a substitute for ophthalmoscopy to address this gap, and the use of telemedicine as a substitute for bedside ophthalmoscopy in real-world diagnosis of ROP is increasing.18-21
Prior studies have demonstrated that telemedicine is highly accurate in identifying clinically significant (type 2 or worse) ROP.22-31 These studies have established the accuracy of telemedicine as a screening tool using ophthalmoscopy as the reference standard. However, prior work has suggested that there may be significant variability in ROP categorization via ophthalmoscopy, even among experts who are highly experienced in the disease.32 Furthermore, numerous studies have suggested that critical aspects of ROP diagnosis, such as identification of plus disease and zone I disease, have significant variability among different experts.33-39 By definition, a criterion standard must have complete accuracy and consensus.40 This definition raises questions about the design of prior studies that examined the accuracy of telemedicine for ROP examination. To our knowledge, little published literature has directly compared the accuracy of ophthalmoscopy with that of telemedicine for ROP diagnosis, without assuming that ophthalmoscopy is the criterion standard.31,35 This fact is important not only to better understand the accuracy of ROP diagnosis but also to improve the design of future studies involving emerging diagnostic technologies across other ophthalmic diseases.
The purpose of this study is to directly compare the accuracy of ophthalmoscopy with that of telemedicine for ROP diagnosis in a large data set. To our knowledge, this is the first study to have examined this question in patients with ROP. This comparison is done by developing a consensus reference standard diagnosis (RSD) and by comparing both telemedicine and ophthalmoscopy with this new reference standard to determine the relative accuracy and sensitivity of each for diagnosing ROP.40-42
The study was conducted as part of the multicenter prospective Imaging and Informatics in ROP study. All data were collected prospectively from 7 participating institutions: Oregon Health & Science University, Weill Cornell Medical College, University of Miami, Columbia University Medical Center, Children’s Hospital Los Angeles, Cedars-Sinai Medical Center, and Asociación para Evitar la Ceguera en México. Inclusion criteria were infants who were either admitted to a participating neonatal intensive care unit or were transferred to a participating center for specialized ophthalmic care between July 1, 2011, and November 30, 2014; met published criteria for a screening examination for ROP; and had parents who provided written informed consent for data collection. Images were deidentified for analysis and were labeled only with birth weight, gestational age, and postmenstrual age at the time of examination. This study was conducted in accordance with Health Insurance Portability and Accountability Act guidelines; prospectively obtained institutional review board approval from Oregon Health & Science University, Weill Cornell Medical College, University of Miami, Columbia University Medical Center, Children’s Hospital Los Angeles, Cedars-Sinai Medical Center, and Asociación para Evitar la Ceguera en México; and adhered to the tenets of the Declaration of Helsinki.43
In accordance with current guidelines for ROP screening at each institution, all infants underwent serial dilated ophthalmoscopic examinations by a participating ophthalmologist (R.V.P.C. and M.F.C.). The clinical examination findings were obtained using ophthalmoscopy and documented according to the international classification of ROP.44 Findings at each examination were incorporated into an overall disease category, based on specifications from the Multicenter Trial of Cryotherapy for Retinopathy of Prematurity Study10 and from the Early Treatment for Retinopathy of Prematurity (ETROP) study9: (1) no ROP; (2) mild ROP, defined as ROP less than type 2 disease; (3) type 2 ROP (zone I, stage 1 or 2, without plus disease; or zone II, stage 3, without plus disease; or any ROP less than type 1 but with preplus disease); and (4) type 1 ROP or ROP requiring treatment (zone I, any stage, with plus disease; zone I, stage 3, without plus disease; or zone II, stage 2 or 3, with plus disease). Examinations were performed by experienced, board-certified ophthalmologists (R.V.P.C. and M.F.C.) who had undergone specialty training in either pediatric ophthalmology or vitreoretinal surgery, and all were either principal investigators or certified investigators in the ETROP study or had published more than 10 peer-reviewed articles on ROP.
Retinal images were captured by an ophthalmologist or trained photographer after the clinical examination using a wide-angle camera (RetCam; Clarity Medical Systems). Deidentified clinical and image data were uploaded to a secure web-based database system developed by us. Cases with clinical diagnoses of stage 4 or 5 ROP were excluded to focus on identification of the onset of clinically significant disease. For analysis of the diagnostic accuracy of ophthalmoscopy, owing to small numbers we excluded 2 participating examiners with less than 50 examinations. Thus, the accuracy of ophthalmoscopy was evaluated for 5 participating clinicians.
Three experts independently conducted remote, dilated ophthalmoscopic image review and interpretation of all images via a secure socket layer–encrypted web-based module developed by us. Two of the three experts (R.V.P.C. and M.F.C.) had more than 10 years of clinical ROP experience and more than 50 ROP-related publications. The third expert (S.O.) was a nonphysician ROP study coordinator who had previously helped validate a published computer-based ROP severity scale with very high accuracy and intraexpert reliability.45 Images were graded according to the same criteria as ophthalmoscopic examinations.
The overall RSD was developed by integration of the telemedicine diagnoses of all 3 image readers with the ophthalmoscopic diagnosis of the examining ophthalmologist using previously published methods.46 In cases of discrepancy between image-based and clinical diagnoses, the 3 image readers and a moderator (K.J.) reviewed all medical records to reach a consensus for the overall reference standard. If no consensus could be obtained owing to lack of confirmatory information in photographs (such as ophthalmoscopic diagnosis of zone 3 not clearly seen in photographs), preference was given to the ophthalmoscopic diagnosis. The rationale for this reference standard is that a more accurate diagnosis may be possible by combining ophthalmoscopic and telemedicine findings, and that such a standard may be feasible and applicable in rigorous clinical research settings. Previous work has shown that this approach for developing an RSD can provide higher accuracy and intergrader agreement than either telemedicine or ophthalmoscopy alone.46
We compared ophthalmoscopic and telemedicine diagnoses against the RSD as the criterion standard for each ordinal subcategory of zone (I-III), stage (0-3), plus (none, preplus, or plus), and overall disease category (no ROP, mild ROP, type 2 ROP, or type 1 ROP). This agreement was also calculated for clinically significant binary classifications of zone (zone I vs not), stage (stage 3 vs not), vascular morphologic characteristics (plus disease vs not), and presence of ROP warranting referral to an ophthalmologist (type 2 or worse ROP). All agreements were reported as absolute agreement (compared with a t test) and as κ statistic for chance-adjusted agreement. Interpretation of the κ statistic used a commonly accepted scale (0-0.20, slight agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and 0.81-1.00, near perfect agreement). To analyze the homogeneity of the distributions of categorical data, descriptive statistics and χ2 testing was used. No adjustment was made for including both eyes of an individual in some cases. We used Excel 2011 (Microsoft Corp) for data management and Stata/SE, version 11 (Stata Corp) for all statistical analysis. P < .05 was considered significant. All reported P values were 2-sided.
A total of 281 infants met the eligibility criteria for this study and underwent serial examinations in accordance with current guidelines for ROP screening (mean, 3.7 examinations per infant; range, 1-14).12,32 The mean (SD) gestational age was 27.1 (2.4) weeks, and 127 infants (45.2%) were female. A total of 127 infants (45.2%) were white, 74 (26.3%) were Asian, 73 (26.0%) were African American, 4 (1.4%) were Hispanic, and 3 (1.1%) were of undisclosed ethnicity. All screening sessions included an evaluation of each eye, for a total of 1576 study eye examinations. Twenty-three eye examinations were excluded owing to a clinical diagnosis of stage 4 or 5 ROP, yielding 1553 total study eye examinations for comparison.
The distribution of examination findings and disease severity for both telemedicine and ophthalmoscopy is described in Table 1. The RSD identified the presence of ROP in 913 examinations (58.8%); this included mild ROP in 512 examinations (33.0%), type 2 or preplus disease in 313 examinations (20.2%), and type 1 ROP in 88 examinations (5.7%). All telemedicine graders appeared to identify preplus disease more frequently than ophthalmoscopic examiners. In addition, zone III disease was more frequently diagnosed by ophthalmoscopy than telemedicine.
Quiz Ref IDTable 2 describes the accuracy of individual telemedicine graders and ophthalmoscopic examiners compared with the RSD for each ordinal subcategory of zone, stage, plus disease, and overall disease category. There was statistically significant intergrader variability in diagnostic accuracy, regardless of examination method (Figure). This variability among graders was statistically significant for all categories of examination findings and disease classification except for zone among telemedicine graders. As a group, examiners using telemedicine were slightly more accurate than those using ophthalmoscopy in identifying normal vs preplus vs plus disease (92% vs 88%; P < .001) compared with the RSD. However, ophthalmoscopy was more accurate in identifying zone (91% [95% CI, 89%-92%] vs 88% [95% CI, 87%-89%]; P = .009), stage (88% [95% CI, 86%-89%] vs 75% [95% CI, 74%-77%]; P < .001), and category (84% [95% CI, 82%-85%] vs 77% [95% CI, 76%-79%]; P < .001).
Table 3 displays sensitivity for the detection of clinically significant disease (ie, type 2 or worse ROP). There were no statistically significant differences between telemedicine and ophthalmoscopy in detecting zone I disease (78% [95% CI, 73%-83%] vs 78% [95% CI, 71%-84%]; P > .99), plus disease (79% [95% CI, 72%-86%] vs 74% [95% CI, 61%-87%]; P = .41), or type 2 or worse disease (79% [95% CI, 75%-83%] vs 86% [95% CI, 80%-92%]; P = .10). However, ophthalmoscopy did have a statistically significantly higher sensitivity for the detection of stage 3 disease (85% [95% CI, 79%-91%]) vs telemedicine (73% [95% CI, 67%-78%]; P = .004).
Quiz Ref IDThis study analyzed the accuracy and sensitivity of telemedicine grading of dilated fundus imaging vs binocular indirect ophthalmoscopy for ROP examination, compared with a consensus reference diagnosis. Key findings were: (1) there was no statistically significant difference in the sensitivity of ophthalmoscopy vs telemedicine to detect clinically significant (type 2 or worse) ROP; (2) ophthalmoscopy had slightly higher accuracy than telemedicine for detecting zone III and stage 3 ROP; and (3) there was statistically significant interobserver variability in the accuracy of ROP classification, regardless of examination method.
In this study, both examination methods had greater accuracy and sensitivity relative to the other for particular aspects of the ROP examination. Specifically, ophthalmoscopy was slightly more accurate in identifying zone, stage, and category of ROP. With respect to disease stage in particular, on average, ophthalmoscopy had greater accuracy for identifying stage 3 disease. It may be that the stereopsis afforded by ophthalmoscopic examination allows better visualization of the 3-dimensional nature of fibrovascular proliferation into the vitreous that occurs in this stage. However, image grader 3 demonstrated comparable accuracy using telemedicine. Image grader 3 is a nonophthalmologist who has been trained exclusively on fundus images, which suggests that there may be nonstereoscopic cues in the images that a trained grader can use to achieve similar accuracy. In addition, ophthalmoscopy was much more accurate for detecting zone III disease in this study. In fact, telemedicine graders almost never identified zone III disease, which is likely owing to the inability to visualize the far temporal retina via wide-field fundus photography in infants. This finding indicates that in some cases ophthalmoscopy may be required to detect pathologic characteristics, although in the absence of plus disease, zone III disease is unlikely to be clinically significant.9
There was no difference in sensitivity between ophthalmoscopy and telemedicine for detecting type 2 or worse ROP, which would typically require referral to an ophthalmologist. This finding is consistent with similar studies in which ophthalmoscopy was used as the criterion standard for telemedicine comparison.22-31 Based on these findings, the above-mentioned differences between ophthalmoscopy and telemedicine should not preclude implementation of screening programs using telemedicine in situations in which ophthalmoscopy is not readily available, particularly considering the unmet need for screening of at-risk infants.
As in this study, others have previously reported on the marked intergrader variability in diagnosing plus disease, even among highly experienced readers.33-36 Diagnosis of plus disease is based on interpretation of venous dilation and arteriolar tortuosity. These are both continuous variables, which are then transformed into a categorical outcome (no plus disease, preplus disease, or plus disease) based on comparison with a single reference standard photograph.10 This makes diagnosis of plus disease inherently subjective.47 Furthermore, it has been shown that in classifying plus disease, examiners focus on different pathologic features and have different interpretations of the same features.37 Consideration of plus disease as a spectrum of disease and quantification using a continuous scoring system rather than a categorical outcome may allow for more accurate diagnosis in the future.38 Limiting the subjective component of diagnosis of plus disease using computer-based image assessment may also allow greater diagnostic uniformity.48 Several computer-based image assessment tools have shown efficacy in diagnosing plus disease in ROP.49-51 We suggest that future validation of these computerized image assessment programs should use a consensus reference diagnosis as the criterion standard comparison to establish validity.
Quiz Ref IDThis study has several limitations. Remote image interpretation was performed by only 3 image readers, which may limit generalizability. However, this group was composed of 2 ophthalmologist experts in ROP and 1 nonophthalmologist trained to recognize features of ROP. In some cases, the telemedicine grader and ophthalmoscopic examiner for an examination were the same ophthalmologist. To minimize recall bias, images were generally reviewed several months after acquisition and no patient data, except demographic information such as gestational age, postmenstrual age, and birth weight, were visible during image reading. Because the total of 1553 examinations includes serial assessments of the same infant and the 2 eyes of each infant were regarded as separate study participants, these observations are not truly independent and there was no statistical adjustment to account for this. However, all identifying data were removed and each eye was separately assessed on telemedical review, which should not bias the results to favor ophthalmoscopy or telemedicine preferentially. Finally, a single ophthalmoscopic examination by an expert was performed for each participant. Therefore, the reference standard was based on only 1 ophthalmoscopic examination combined with 3 telemedicine examinations, and the accuracy and reliability of ophthalmoscopic examination is difficult to analyze in this study. Ophthalmoscopic examination has long been considered the criterion standard for ROP diagnosis, and we did not think there was any practical way for infants to undergo multiple masked sequential ophthalmoscopic examinations for this study because of concerns about infant safety.52
Overall, this study demonstrates that binocular indirect ophthalmoscopy and telemedicine using remote review of dilated ophthalmoscopic imaging possess similar sensitivity for detecting type 2 or worse ROP, and that both are limited by interobserver variability. Future investigations of screening and diagnostic modalities in ROP should use a consensus reference diagnosis as the criterion standard to improve validity.
Accepted for Publication: February 5, 2018.
Corresponding Author: Michael F. Chiang, MD, Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 3375 SW Terwilliger Blvd, Portland, OR 97239 (firstname.lastname@example.org).
Published Online: April 5, 2018. doi:10.1001/jamaophthalmol.2018.0649
Author Contributions: Drs Biten and Redd contributed equally to this article. Dr Chiang had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Biten, Campbell, Jonas, Chan, Chiang.
Acquisition, analysis, or interpretation of data: Biten, Redd, Moleta, Ostmo, Jonas, Chan, Chiang.
Drafting of the manuscript: Biten, Redd, Moleta, Campbell, Ostmo, Chan.
Critical revision of the manuscript for important intellectual content: Biten, Redd, Moleta, Campbell, Jonas, Chan, Chiang.
Obtained funding: Chiang.
Administrative, technical, or material support: Redd, Moleta, Ostmo, Jonas, Chiang.
Study supervision: Campbell, Chan, Chiang.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Chan reported serving as a consultant for Visunex Medical Systems. Dr Chiang reported serving as an unpaid member of the Scientific Advisory Board for Clarity Medical Systems and as a consultant for Novartis. No other disclosures were reported.
Funding/Support: This work was supported by grants R01 EY19474 and P30 EY010572 from the National Institutes of Health; grants SCH-1622679, SCH-1622542, and SCH-1622536 from the National Science Foundation; and unrestricted departmental funding from Research to Prevent Blindness.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Group Information: The members of the Imaging & Informatics in Retinopathy of Prematurity (ROP) Research Consortium include Michael F. Chiang, MD, Susan Ostmo, MS, Sang Jin Kim, MD, PhD, Kemal Sonmez, PhD, and J. Peter Campbell, MD, MPH (Oregon Health & Science University, Portland); R.V. Paul Chan, MD, and Karyn Jonas, MS, RN (University of Illinois at Chicago); Jason Horowitz, MD, Osode Coki, RN, Cheryl-Ann Eccles, RN, and Leora Sarna, RN (Columbia University, New York, New York); Anton Orlin, MD (Weill Cornell Medical College, New York, New York); Audina Berrocal, MD, and Catherin Negron, BA (Bascom Palmer Eye Institute, Miami, Florida); Kimberly Denser, MD, Kristi Cumming, RN, Tammy Osentoski, RN, Tammy Check, RN, and Mary Zajechowski, RN (William Beaumont Hospital, Royal Oak, Michigan); Thomas Lee, MD, Evan Kruger, BA, and Kathryn McGovern, MPH (Children’s Hospital Los Angeles, Los Angeles, California); Charles Simmons, MD, Raghu Murthy, MD, and Sharon Galvis, NNP (Cedars-Sinai Hospital, Los Angeles, California); Jerome Rotter, MD, Ida Chen, PhD, Xiaohui Li, MD, Kent Taylor, PhD, and Kaye Roll, RN (LA Biomedical Research Institute, Los Angeles, California); Jayashree Kalpathy-Cramer, PhD, Ken Chang, BS, and Andrew Beers, BS (Massachusetts General Hospital, Boston); Deniz Erdogmus, PhD, and Stratis Ioannidis, PhD (Northeastern University, Boston, Massachusetts); and Maria Ana Martinez-Castellanos, MD, Samantha Salinas-Longoria, MD, Rafael Romero, MD, Andrea Arriola, MD, Francisco Olguin-Manriquez, MD, Miroslava Meraz-Gutierrez, MD, Carlos M. Dulanto-Reinoso, MD, and Cristina Montero-Mendoza, MD (Asociación para Evitar la Ceguera en México, Mexico City).