Experts were asked to provide a diagnosis (plus, pre-plus, or neither) and annotate key findings while being videotaped “thinking aloud” to describe their reasoning. Videotapes were transcribed, coded, and analyzed to examine qualitative diagnostic process. A, Diagnosed as plus disease by expert 1. “…looks like a very low gestational age baby; it’s taken quite a long time to get to this stage. There is a lot of arterial tortuosity [annotated]; there is a little bit of venous congestion in the superior temporal and superior nasal quadrant, more in the superior half of the retina [annotated]. By definition, I think this has to be plus, because it’s 2 quadrants at least, and even the other quadrants aren’t normal….” B, Diagnosed as pre-plus disease by expert 2. “…there is a lot of tortuosity of the arteries; the veins are about 2 to 1. This could either be a pre-plus eye or a normal variant, depending on a quick look to the periphery. Curiously, there is a lot of tortuosity down here [annotated]; it looks like there is disease up here [annotated].” C, Diagnosed as neither by expert 4. “…vessels seem to be branching excessively in that region [superonasal area annotated] and some increased tortuosity [superotemporal area annotated] as well, and this vein looks too fat [superotemporal area annotated]. If all the quadrants were like this quadrant [superotemporal], then it would be at least pre-plus and verging on plus, but since it’s only 1 quadrant that’s highly questionable, would not classify it as plus. I could see why some would call it pre-plus…I would call it no plus.”
Plus Disease in Retinopathy of Prematurity
Hewing NJ, Kaufman DR, Chan RVP, Chiang MF. Plus disease in retinopathy of prematurity: qualitative analysis of diagnostic process by experts. JAMA Ophthalmol. Published online May 23, 2013. doi:10.1001/jamaophthalmol.2013.135.
Hewing NJ, Kaufman DR, Chan RVP, Chiang MF. Plus Disease in Retinopathy of PrematurityQualitative Analysis of Diagnostic Process by Experts. JAMA Ophthalmol. 2013;131(8):1026-1032. doi:10.1001/jamaophthalmol.2013.135
Copyright 2013 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
Plus disease is the most important parameter that characterizes severe treatment-requiring retinopathy of prematurity, yet diagnostic agreement among experts is imperfect and the precise factors involved in clinical diagnosis are unclear. This study is designed to address these gaps in knowledge by analyzing cognitive aspects of the plus disease diagnostic process by experts.
To examine the diagnostic reasoning process of experts for plus disease in retinopathy of prematurity using qualitative research techniques.
Cognitive walk-through, with qualitative analysis of videotaped expert responses and quantitative analysis of expert diagnoses.
Experimental setting in which experts were videotaped while reviewing study data.
A panel of international retinopathy of prematurity experts who had the experience of using qualitative retinal features as their primary basis for clinical diagnosis.
Six experts were video recorded while independently reviewing 7 wide-angle retinal images from infants with retinopathy of prematurity. Experts were asked to explain their diagnostic process in detail (think-aloud protocol), mark findings relevant to their reasoning, and diagnose each image (plus vs pre-plus vs neither). Subsequently, each expert viewed the images again while being asked to examine arteries and veins in isolation and answer specific questions. Video recordings were transcribed and reviewed. Diagnostic process of experts was analyzed using a published cognitive model.
Main Outcome and Measures
Interexpert and intraexpert agreement.
Based on the think-aloud protocol, 5 of 6 experts agreed on the same diagnosis in 3 study images and 3 of 6 experts agreed in 3 images. When experts were asked to rank images in order of severity, the mean correlation coefficient between pairs of experts was 0.33 (range, −0.04 to 0.75). All experts considered arterial tortuosity and venous dilation while reviewing each image. Some considered venous tortuosity, arterial dilation, peripheral retinal features, and other factors. When experts were asked to rereview images to diagnose plus disease based strictly on definitions of sufficient arterial tortuosity and venous dilation, all but 1 expert changed their diagnosis compared with the think-aloud protocol.
Conclusions and Relevance
Diagnostic consistency in plus disease is imperfect. Experts differ in their reasoning process, retinal features that they focus on, and interpretations of the same features. Understanding these factors may improve diagnosis and education. Future research defining more precise diagnostic criteria may be warranted.
Retinopathy of prematurity (ROP) is a vasoproliferative disease affecting low-birth-weight infants. An International Classification for ROP (ICROP) has been developed to standardize clinical diagnosis.1 Multicenter randomized trials such as the Cryotherapy for Retinopathy of Prematurity (CRYO-ROP) and Early Treatment for Retinopathy of Prematurity studies have found that severe ROP may be treated successfully by laser photocoagulation or cryotherapy.2,3 Pharmacological treatments are being studied.4 Despite these advances, ROP continues to be a leading cause of childhood blindness throughout the world.
Plus disease is a critical component of the ICROP system and is defined as arterial tortuosity and venous dilation in the posterior pole greater than or equal to that of a standard photograph selected by expert consensus during the 1980s.1,2 More recently, the revised ICROP system defines pre-plus disease as vascular abnormalities insufficient for plus disease but with more arterial tortuosity and venous dilation than normal.1 Presence of plus disease is a necessary feature for threshold disease and a sufficient feature for type 1 ROP, both of which have been shown to warrant prompt treatment. Therefore, accurate diagnosis of plus disease is essential.
However, there are limitations regarding the definition of plus disease. Studies have found diagnostic inconsistency, even among experts.5- 7 The standard published photograph has a larger magnification and narrower field of view than clinical evaluation tools such as indirect ophthalmoscopy and wide-angle retinal imaging, and this difference in perspective may cause difficulty for ophthalmologists.8,9 Vessels in the standard published photograph have varying degrees of tortuosity and dilation, creating uncertainty regarding which vessels to focus on during examination. Finally, although plus disease is defined solely from arteriolar tortuosity and venous dilation within the posterior pole, it is possible that other vascular features or the rate of vascular change are relevant for diagnosis.9,10 Better understanding of the examination features characterizing plus disease may improve diagnostic accuracy and consistency.
It has been our observation that many ophthalmologists who trained within the past 25 years, after dissemination of ICROP and the CRYO-ROP study findings,1,2 perform ROP examination predominantly by classifying the zone, stage, and presence of plus disease based on venous dilation and arteriolar tortuosity of the posterior vessels. The premise of this study is that reliance only on this classification system, without attention to description of rich underlying retinal features, may oversimplify the characterization of clinically significant findings. This study is designed to encode detailed qualitative thoughts of experts during plus disease diagnosis, using research methods from cognitive informatics.11 The overall goals are to ascertain levels of agreement as well as to better understand underlying reasons for diagnostic discrepancy among experts and to obtain more precise information about specific retinal features of plus disease.
This study was approved by the institutional review boards at Columbia University and Oregon Health & Science University. Informed consent was obtained from all expert participants, and waiver of consent was obtained for use of deidentified retinal images.
We assembled a panel of international ROP experts who had the experience of using qualitative retinal features as their primary basis for clinical diagnosis. In our view, this could be accomplished by identifying experts who had practiced ophthalmology before publication of the CRYO-ROP findings,2 participated as CRYO-ROP principal investigators, or participated on national ROP standards committees. The rationale was that this would identify a small number of experts with the background and perspective to articulate their underlying qualitative reasons for diagnosis.
A set of 7 wide-angle retinal images (RetCam; Clarity Medical Systems) was captured from premature infants during routine clinical ROP care. Each image showed the posterior retina and reflected some degree of vascular abnormality in our opinion. Images were printed on high-resolution photograph paper (Kodak) in a 5-in × 7-in format.
No additional information such as birth weight, systemic findings, or postmenstrual age was provided. This was to ensure that experts focused only on retinal features, without potential confounding factors. Neither the standard photograph nor any definitions of plus or pre-plus disease were provided to experts. This was to simulate a real-world examination scenario and avoid biasing expert opinions. We believed that study experts would be intimately familiar with these definitions through previous experiences and through contributing to the creation of those definitions in many cases.
This study was conducted in 2 rounds, in which each study expert was asked a series of scripted questions (eAppendix in Supplement) by one of us (N.J.H.): (1) Round 1 (“think-aloud protocol”). The 7 retinal images were shown individually and in the same order to each expert, who was asked to diagnose each image as either plus disease, pre-plus disease, or neither plus nor pre-plus. Each expert was asked to verbalize thoughts while reviewing the image, explain the process that led to the final diagnosis, and annotate the most important findings on the printed image using a marking pen. Finally, each expert was asked to rate the degree of confidence in the diagnosis for each image (certain, somewhat certain, or uncertain). Experts were encouraged by the observer (N.J.H.) to verbalize all of their thoughts but were not otherwise interrupted or coached during the think-aloud protocol. (2) Round 2 (“specific questions”). The 7 study images were displayed again in the same order, and each expert was asked a series of specific questions about each image. For each image, experts were asked whether the arteriolar tortuosity was sufficient for plus disease, whether the venous dilation was sufficient for plus disease, and whether the overall image reflected plus disease, pre-plus disease, or neither. Experts were asked to rank the 7 images in order of increasing arteriolar tortuosity, increasing venous dilation, and increasing overall severity of vascular abnormality. Additional specific questions were custom-tailored to each image regarding features used by experts to identify severe ROP, perceptions about the nature and location of vascular abnormalities, and other diagnostic heuristics (eAppendix in Supplement).
Each of the expert sessions was recorded using a video camera (Handycam; Sony). A digital recorder (GarageBand; Apple) was used as a backup. The video camera was directed to record the retinal images and hands of each expert. Personal features of experts were not recorded, and experts were identified only by a study number.
In round 1 (think-aloud protocol), digital files were processed using video editing software (iMovie; Apple). All video and audio files were manually transcribed for analysis. A modified protocol of the Hassebrock coding scheme was used to analyze the transcribed files.12 The scheme was designed to analyze medical reasoning and coding of verbal think-aloud protocols. Interexpert agreement was examined based on overall diagnosis provided after the think-aloud protocol. Specific examples were identified to represent differences in underlying qualitative diagnostic rationale among experts.
In round 2 (specific questions), interexpert agreement was examined by calculating correlation coefficients among each pair of experts who were asked to rank the 7 retinal images from least to most severe based on arterial tortuosity alone, venous dilation alone, and overall severity of vascular abnormalities related to plus disease. A published scale was used to interpret correlation coefficients: 0 to 0.30, small correlation; 0.31 to 0.50, medium correlation; and 0.51 to 1.00, strong correlation.13
Intraexpert agreement in plus disease diagnosis was calculated. As described earlier, each expert initially provided a diagnosis (plus, pre-plus, or neither) in round 1 while “thinking aloud” to explain their rationale. Each expert then provided a diagnosis in round 2 after responding to a series of questions about specific image features. Absolute intraexpert agreement and κ statistic were calculated for each expert using these diagnoses. A published scale was used to interpret κ values: 0 to 0.20, slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, near-perfect agreement.14
Finally, data from rounds 1 and 2 were analyzed to identify qualitative retinal features contributing to the ROP diagnostic process by experts and identify the relationship among individual retinal features and overall diagnosis.
Six ROP experts participated: 5 of 6 (83%) were principal investigators in the CRYO-ROP and/or Early Treatment for Retinopathy of Prematurity studies, 5 of 6 (83%) published 5 or more peer-reviewed ROP articles, 5 of 6 (83%) practiced ophthalmology before publication of initial CRYO-ROP findings, and 5 of 6 (83%) contributed to expert consensus activities such as selection of the standard published photograph, development of ICROP, or creation of screening guidelines.15,16 All experts met 2 or more of these criteria.
Table 1 summarizes interexpert agreement in plus disease diagnosis among the 6 experts based on the round 1 think-aloud protocol. In particular, 5 of 6 experts (83%) agreed on the same diagnosis in 3 images, 4 of 6 experts (67%) agreed in 1 image, and 3 of 6 (50%) agreed in 3 images. Several images were diagnosed differently by experts. For example, image 2 was diagnosed as plus disease by 3 of 6 experts (50%), pre-plus disease by 1 of 6 experts (17%), and neither by 2 of 6 experts (33%).
In round 2, experts were asked to rank all 7 images in order of increasing overall severity of vascular abnormality related to plus disease, in order of increasing arterial tortuosity alone, and in order of increasing venous dilation alone. The correlation in ordering of arterial tortuosity among pairs of experts was strong (mean [range] correlation coefficient, 0.89 [0.80 to 1.00]), whereas there was only a small correlation in ordering of venous dilation (mean [range] correlation coefficient, 0.27 [−0.04 to 1.00]) (Table 2).
To examine differences in underlying diagnostic process that may have led to discrepancies among experts, transcripts of think-aloud protocols were examined and compared. For example, image 5 was diagnosed as plus disease by 2 of 6 experts (33%), pre-plus by 3 of 6 experts (50%), and neither by 1 of 6 experts (17%). The Figure displays examples of differences in key retinal features that were discussed and annotated in that image by 3 different experts.
Table 3 summarizes intraexpert agreement between plus disease diagnosis provided in the think-aloud protocol from round 1 and the diagnosis provided after responding to a series of specific questions about image features in round 2. Absolute intraexpert agreement ranged from 4 of 7 (57%, 1 expert) to 7 of 7 (100%, 1 expert), and κ ranged from 0.30 (fair agreement, 1 expert) to 1.00 (perfect agreement, 1 expert).
In specific questions of round 2, experts were asked to characterize arterial tortuosity sufficient or insufficient for plus disease, characterize venous dilation sufficient or insufficient for plus disease, and provide an overall diagnosis (Table 4). Five individual ratings were excluded because an expert provided no response about arterial tortuosity or venous dilation; therefore, there were 37 total ratings by the 6 experts for the 7 images. In 5 of 37 ratings (14%), there was inconsistency between the expert diagnostic process and published definitions of plus disease (which requires both sufficient arterial tortuosity and venous dilation).1 In another 5 of 37 ratings (14%), there was inconsistency with the published definition of pre-plus disease (which requires arterial tortuosity and venous dilation that is insufficient for plus disease).1
Table 5 displays retinal features considered by experts in plus disease diagnosis during the think-aloud protocol of round 1. In addition to retinal features mentioned in the published definition of plus disease (sufficient arterial tortuosity and venous dilation within ≥2 quadrants of the central retina),1 experts cited many different features such as venous tortuosity, arterial dilation, peripheral retinal features, and vascular branching.
To our knowledge, this is the first study using qualitative research methods to examine the process of plus disease diagnosis by ROP experts. Key findings are that: (1) there are inconsistencies in plus disease diagnosis among experts, (2) some diagnostic discrepancies may occur because experts are considering different retinal features, and (3) the current concept of plus disease as arteriolar tortuosity and venous dilation within the posterior pole appears oversimplified based on expert behavior.
Our results regarding interexpert disagreement in plus disease diagnosis support findings from previous studies involving image-based diagnosis5,6 and from a previous study showing that certified CRYO-ROP experts performing unmasked ophthalmoscopic examinations to confirm presence of threshold disease disagreed with the first expert diagnosis in 12% of cases.7 One demonstration of interexpert inconsistency in the current study is shown in Table 1. Another demonstration is summarized in Table 2, showing that correlation among experts for ranking severity of arterial tortuosity (mean correlation coefficient, 0.89) was much higher than correlation for ranking severity of venous dilation (mean correlation coefficient, 0.27) or overall vascular severity (mean correlation coefficient, 0.33). This suggests that arterial tortuosity is easier for experts to recognize and order visually. Conversely, venous dilation may be more difficult to identify visually, more subjective, or perhaps more difficult to represent using wide-angle images. There are several possible reasons explaining the low interexpert correlation for overall severity, including that there are differences in retinal features considered by different experts. Other possibilities are that there are differences in the interpretation of the same retinal features by different experts or that the significance of particular features is weighted differently among experts (Figure). A final demonstration of variability is summarized in Table 3, showing intraexpert differences in plus disease diagnosis using different methods. Taken together, these findings suggest that there are significant inconsistencies and that experts appear to consider different retinal features and interpret the same features differently.
The traditional definition of plus disease was created by expert consensus during the 1980s and has been used for major multicenter trials.2,3 However, another key finding from the current study is that experts consider retinal features beyond arterial tortuosity and venous dilation within the posterior pole when diagnosing plus disease. As shown in Table 5, experts explicitly mentioned many additional factors while explaining their diagnostic rationale during the think-aloud protocol. These included retinal features such as venous tortuosity and vascular branching, as well as anatomic factors such as peripheral retinal appearance and macular features, none of which are described in the published definition of plus disease.1,2 Furthermore, as shown in Table 4, there were 10 of 37 study ratings in which expert diagnoses of plus or pre-plus disease were inconsistent with published definitions.1 Overall, these findings suggest that plus disease diagnosis is considerably more complex than current rules, which combine arterial tortuosity and venous dilation in the posterior pole, and that experts do not appear to consider the same retinal features even when examining the same images.
Qualitative cognitive research methods have been used to characterize complex processes pertaining to visual diagnosis in fields such as dermatology, pathology, and radiology.17- 19 The premise of this study is that current ROP management strategies are based on an international classification system,1 along with diagnosis and treatment guidelines resulting from groundbreaking multicenter trials.2,3 By nature, this translates the qualitative nuances of retinal examination into discrete evidence-based rules. Findings from the current study support the notion that plus disease diagnosis is oversimplified by these rules involving only central arterial tortuosity and venous dilation. In particular, most study experts had practice experience before publication of these definitions and rules and might therefore have greater insight about ROP diagnosis based on qualitative retinal characteristics. Follow-up research to encode the diagnostic methods and heuristics used by these experts may improve standardization and education in ROP care.20,21 Evidence-based protocols provide enormous benefits through guidelines to improve clinical management, and methods from this study can complement these protocols by providing additional information about subtle diagnostic factors.
Computer-based image analysis is an emerging method for improving accuracy and reproducibility of plus disease diagnosis using quantitative retinal vascular parameters.8,10,22- 35 Development of these systems requires identification of the relevant vascular features to analyze (eg, arterial tortuosity); definition of algorithms for quantifying these features; selection of the appropriate vessels for analysis (eg, all vessels, worst vessels); and combination of individual feature values into an overall diagnosis.34 Currently, there are no standard methods for performing most of these tasks. This study provides information about the diagnostic process used by experts, which may eventually provide a scientific basis for developing computer algorithms that better mimic expert diagnosis.
Several additional study limitations should be noted: (1) Wide-angle retinal images were used for expert review, rather than ophthalmoscopic examinations. This may have biased findings to the extent that examiners were less familiar with image-based diagnosis. However, all study experts had experience with ROP imaging and multiple studies have shown that image-based ROP diagnosis agrees closely with ophthalmoscopic diagnosis.36- 44 We felt that image review was the best study design to allow multiple experts to analyze the exact same retinal features. (2) Retinal images were reviewed by experts with no clinical information. This may have affected study findings to the extent that experts interpret retinal findings in the context of clinical data. However, the purpose of this study was to understand the significance of qualitative retinal features and the expert diagnostic process, not to simulate the process of ophthalmoscopic examination. (3) The number of study experts was limited. This may affect the generalizability of study findings to the extent that these 6 academic experts may not be representative of the larger group of clinical ROP specialists. However, the foundation of qualitative research is to collect detailed verbal descriptions to portray varying perspectives about complex phenomena.45 (4) This study focused only on identifying factors relevant to plus disease diagnosis. Other potentially relevant factors, such as location of retinal disease, were not explicitly asked about but were noted if mentioned by experts (Table 5). New research, such as studies relating vascular appearance with zone, may be useful.
In summary, this study suggests that agreement in plus disease diagnosis among experts is imperfect and that there are differences in the underlying diagnostic reasoning process and the retinal features examined. This study provides evidence that plus disease diagnosis is based on multiple factors that may depend on the specific examiner. Updated definitions based on detailed analysis of expert behavior, using qualitative research methods such as those used in this study, may lead to improved diagnostic accuracy and standardization. This may have implications for future definitions of plus disease, education and consistency of care, and development of computer-based diagnostic tools.
Corresponding Author: Michael F. Chiang, MD, Departments of Ophthalmology, Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, 3375 Terwilliger Blvd SW, Portland, OR 97239 (firstname.lastname@example.org).
Submitted for Publication: October 7, 2012; final revision received January 8, 2013; accepted January 9, 2013.
Published Online: May 23, 2013. doi: 10.1001/jamaophthalmol.2013.135
Author Contributions: Dr Chiang had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Conflict of Interest Disclosures: Dr Chiang is an unpaid member of the scientific advisory board for Clarity Medical Systems.
Funding/Support: This work was supported by grant EY19474 from the National Institutes of Health (Dr Chiang), the Dr Werner Jackstaedt Foundation (Dr Hewing), the Friends of Doernbecher Foundation (Dr Chiang), unrestricted departmental funding from Research to Prevent Blindness (Drs Chan and Chiang), and the St Giles Foundation (Dr Chan).
Previous Presentation: Portions of this study were presented at the 2012 ARVO Annual Meeting; May 9, 2012; Ft Lauderdale, Florida.
Additional Contributions: We are very grateful to the 6 experts who generously agreed to participate in this study.