[Skip to Navigation]
Sign In
Figure 1. 
Cross-tabulation of film and digital gradings of final Early Treatment Diabetic Retinopathy Study scale based on person-level of 310 subjects with gradable dual image types. κ = 0.44, SE = 0.03, 95% confidence interval = 0.38-0.5; weighted κ = 0.7, SE = 0.02, 95% confidence interval = 0.65-0.74; weights are 1 for complete agreement, 0.75 for 1-step, 0.5 for 2-step, and 0 for all other disagreement.

Cross-tabulation of film and digital gradings of final Early Treatment Diabetic Retinopathy Study scale based on person-level of 310 subjects with gradable dual image types. κ = 0.44, SE = 0.03, 95% confidence interval = 0.38-0.5; weighted κ = 0.7, SE = 0.02, 95% confidence interval = 0.65-0.74; weights are 1 for complete agreement, 0.75 for 1-step, 0.5 for 2-step, and 0 for all other disagreement.

Figure 2. 
Cross-tabulation of film and digital gradings of final Early Treatment Diabetic Retinopathy Study scale based on eye level of 310 subjects with gradable dual-image types (n = 628). Level 60 (scars of photocoagulation for proliferative diabetic retinopathy [DR] or severe nonproliferative DR without residual new vessels) and level 61 (mild retinal new vessels, with or without photocoagulation scars) are shown separately here rather than being pooled (into mild proliferative DR) as they are when change on the scale is calculated. κ = 0.52, SE = 0.02, 95% confidence interval = 0.47-0.57; weighted κ = 0.74, SE = 0.02, 95% confidence interval = 0.71-0.78; weights are 1 for complete agreement, 0.75 for 1-step, and 0 for all other disagreement.

Cross-tabulation of film and digital gradings of final Early Treatment Diabetic Retinopathy Study scale based on eye level of 310 subjects with gradable dual-image types (n = 628). Level 60 (scars of photocoagulation for proliferative diabetic retinopathy [DR] or severe nonproliferative DR without residual new vessels) and level 61 (mild retinal new vessels, with or without photocoagulation scars) are shown separately here rather than being pooled (into mild proliferative DR) as they are when change on the scale is calculated. κ = 0.52, SE = 0.02, 95% confidence interval = 0.47-0.57; weighted κ = 0.74, SE = 0.02, 95% confidence interval = 0.71-0.78; weights are 1 for complete agreement, 0.75 for 1-step, and 0 for all other disagreement.

Table 1. 
Clinical Characteristics of the 310 DCCT/EDIC Subjects With Gradable Digital and Film Photographs in the Digital-Film Ancillary Study
Clinical Characteristics of the 310 DCCT/EDIC Subjects With Gradable Digital and Film Photographs in the Digital-Film Ancillary Study
Table 2. 
Reliability of Digital-Film Photography Grading in EDIC (N = 310)
Reliability of Digital-Film Photography Grading in EDIC (N = 310)
Table 3. 
Logistic Regression of DCCT Treatment Effect on Risk of Any Degree of PDR Based on Film vs Digital Photography at EDIC Years 14 Through 16 Among the Participants Free of PDR at DCCT Closeout After Adjustment for the Other Risk Factors (N = 302)
Logistic Regression of DCCT Treatment Effect on Risk of Any Degree of PDR Based on Film vs Digital Photography at EDIC Years 14 Through 16 Among the Participants Free of PDR at DCCT Closeout After Adjustment for the Other Risk Factors (N = 302)
Table 4. 
Logistic Regression of DCCT Treatment Effect on Risk of Various Retinopathy Categories Based on Film vs Digital Photography at EDIC Years 14 Through 16 Among the Participants Free of Respective Complications at DCCT Closeout After Adjustment for the Other Risk Factorsa
Logistic Regression of DCCT Treatment Effect on Risk of Various Retinopathy Categories Based on Film vs Digital Photography at EDIC Years 14 Through 16 Among the Participants Free of Respective Complications at DCCT Closeout After Adjustment for the Other Risk Factorsa
Table 5. 
Reliability of Film Photography Grading in DCCT and EDIC
Reliability of Film Photography Grading in DCCT and EDIC
1.
The Diabetes Control and Complications Trial Research Group, The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus.  N Engl J Med 1993;329 (14) 977- 986PubMedGoogle Scholar
2.
 The effect of intensive diabetes treatment on the progression of diabetic retinopathy in insulin-dependent diabetes mellitus: the Diabetes Control and Complications Trial.  Arch Ophthalmol 1995;113 (1) 36- 51PubMedGoogle Scholar
3.
Diabetes Control and Complications Trial Research Group, Progression of retinopathy with intensive versus conventional treatment in the Diabetes Control and Complications Trial.  Ophthalmology 1995;102 (4) 647- 661PubMedGoogle Scholar
4.
Epidemiology of Diabetes Interventions and Complications (EDIC) Research Group, Epidemiology of Diabetes Interventions and Complications (EDIC). Design, implementation, and preliminary results of a long-term follow-up of the Diabetes Control and Complications Trial cohort.  Diabetes Care 1999;22 (1) 99- 111PubMedGoogle Scholar
5.
The Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group, Retinopathy and nephropathy in patients with type 1 diabetes four years after a trial of intensive therapy.  N Engl J Med 2000;342 (6) 381- 389PubMedGoogle Scholar
6.
White  NHSun  WCleary  PA  et al.  Prolonged effect of intensive therapy on the risk of retinopathy complications in patients with type 1 diabetes mellitus: 10 years after the Diabetes Control and Complications Trial.  Arch Ophthalmol 2008;126 (12) 1707- 1715PubMedGoogle Scholar
7.
Writing Team for the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group, Sustained effect of intensive treatment of type 1 diabetes mellitus on development and progression of diabetic nephropathy: the Epidemiology of Diabetes Interventions and Complications (EDIC) study.  JAMA 2003;290 (16) 2159- 2167PubMedGoogle Scholar
8.
Martin  CLAlbers  JHerman  WH  et al. DCCT/EDIC Research Group, Neuropathy among the diabetes control and complications trial cohort 8 years after trial completion.  Diabetes Care 2006;29 (2) 340- 344PubMedGoogle Scholar
9.
Nathan  DMCleary  PABacklund  JY  et al. Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Study Research Group, Intensive diabetes treatment and cardiovascular disease in patients with type 1 diabetes.  N Engl J Med 2005;353 (25) 2643- 2653PubMedGoogle Scholar
10.
Donner  AEliasziw  M A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation.  Stat Med 1992;11 (11) 1511- 1519PubMedGoogle Scholar
11.
Sim  JWright  CC The kappa statistic in reliability studies: use, interpretation, and sample size requirements.  Phys Ther 2005;85 (3) 257- 268PubMedGoogle Scholar
12.
 Diabetic retinopathy study. Report number 6: design, methods, and baseline results: report number 7: a modification of the Airlie House classification of diabetic retinopathy: prepared by the Diabetic Retinopathy.  Invest Ophthalmol Vis Sci 1981;21 (1, pt 2) 1- 226PubMedGoogle Scholar
13.
Early Treatment Diabetic Retinopathy Study Research Group, Grading diabetic retinopathy from stereoscopic color fundus photographs: an extension of the modified Airlie House classification: ETDRS report number 10.  Ophthalmology 1991;98 (5) ((suppl)) 786- 806PubMedGoogle Scholar
14.
Hubbard  LDDanis  RPNeider  MW  et al. Age-Related Eye Disease 2 Research Group, Brightness, contrast, and color balance of digital versus film retinal images in the age-related eye disease study 2.  Invest Ophthalmol Vis Sci 2008;49 (8) 3269- 3282PubMedGoogle Scholar
15.
Gardner  TWSander  BLarsen  ML  et al.  An extension of the Early Treatment Diabetic Retinopathy Study (ETDRS) system for grading of diabetic macular edema in the Astemizole Retinopathy Trial.  Curr Eye Res 2006;31 (6) 535- 547PubMedGoogle Scholar
16.
Early Treatment Diabetic Retinopathy Study Research Group, Fundus photographic risk factors for progression of diabetic retinopathy. ETDRS report number 12.  Ophthalmology 1991;98 (5) ((suppl)) 823- 833PubMedGoogle Scholar
17.
Shoukri  M Measures of Interobserver Agreement.  Boca Raton, FL Chapman & Hall/CRC2004;
18.
Cohen  J Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.  Psychol Bull 1968;70 (4) 213- 220PubMedGoogle Scholar
19.
Landis  JRKoch  GG The measurement of observer agreement for categorical data.  Biometrics 1977;33 (1) 159- 174PubMedGoogle Scholar
20.
Bhapkar  VP A note on the equivalence of two test criteria for hypotheses in categorical data.  J Am Stat Assoc 1966;61228- 23510.2307/2283057Google Scholar
21.
McNEMAR  Q Note on the sampling error of the difference between correlated proportions or percentages.  Psychometrika 1947;12 (2) 153- 157PubMedGoogle Scholar
22.
Feinstein  ARCicchetti  DV High agreement but low kappa I: the problems of two paradoxes.  J Clin Epidemiol 1990;43 (6) 543- 549PubMedGoogle Scholar
23.
Cochran  WG The combination of estimates from different experiments.  Biometrics 1954;10101- 12010.2307/3001666Google Scholar
24.
Lin  DYBlumenkranz  MSBrothers  RJGrosvenor  DM The sensitivity and specificity of single-field nonmydriatic monochromatic digital fundus photography with remote image interpretation for diabetic retinopathy screening: a comparison with ophthalmoscopy and standardized mydriatic color photography.  Am J Ophthalmol 2002;134 (2) 204- 213PubMedGoogle Scholar
25.
Fleiss  JL Measuring nominal scale agreement among many raters.  Psychol Bull 1971;76378- 38210.1037/h0031619Google Scholar
26.
Bursell  SECavallerano  JDCavallerano  AA  et al. Joslin Vision Network Research Team, Stereo nonmydriatic digital-video color retinal imaging compared with Early Treatment Diabetic Retinopathy Study seven standard field 35-mm stereo color photos for determining level of diabetic retinopathy.  Ophthalmology 2001;108 (3) 572- 585PubMedGoogle Scholar
27.
Fransen  SRLeonard-Martin  TCFeuer  WJHildebrand  PLInoveon Health Research Group, Clinical evaluation of patients with diabetic retinopathy: accuracy of the Inoveon diabetic retinopathy-3DT system.  Ophthalmology 2002;109 (3) 595- 601PubMedGoogle Scholar
28.
Rudnisky  CJTennant  MTWeis  ETing  AHinz  BJGreve  MD Web-based grading of compressed stereoscopic digital photography versus standard slide film photography for the diagnosis of diabetic retinopathy.  Ophthalmology 2007;114 (9) 1748- 1754PubMedGoogle Scholar
29.
Li  HKHubbard  LDDanis  RP  et al.  Digital versus film fundus photography for research grading of diabetic retinopathy severity.  Invest Ophthalmol Vis Sci 2010;51 (11) 5846- 5852PubMedGoogle Scholar
30.
Reimers  JLGangaputra  SEsser  B  et al.  Green channel vs color retinal images for grading diabetic retinopathy in DCCT/EDIC [ARVO abstract 2285].  Invest Ophthalmol Vis Sci 2010;51e-Abstract 228510.1167/iovs.10-6303Google Scholar
Clinical Sciences
June 13, 2011

Comparison of Digital and Film Grading of Diabetic Retinopathy Severity in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Study

Larry D. Hubbard, MAT; Wanjie Sun, MS; Patricia A. Cleary, MS; et al Ronald P. Danis, MD; Dean P. Hainsworth, MD; Qian Peng, MS; Ruth A. Susman, BS; Lloyd Paul Aiello, MD, PhD; Matthew D. Davis, MD; Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Study Research Group
Author Affiliations

Author Affiliations: Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison (Drs Danis, and Davis, Mr Hubbard, and Mss Peng and Susman); Biostatistics Center, George Washington University, Washington, DC (Mss Sun and Cleary); Department of Ophthalmology, University of Missouri, Columbia (Dr Hainsworth);and Joslin Diabetes Center, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts (Dr Aiello)

Arch Ophthalmol. 2011;129(6):718-726. doi:10.1001/archophthalmol.2011.136
Abstract

Objective  To compare diabetic retinopathy (DR) severity as evaluated by digital and film images in a long-term multicenter study, as the obsolescence of film forced the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Study (DCCT/EDIC) to transition to digital after 25 years.

Methods  At 20 clinics from 2007 through 2009, 310 participants with type 1 diabetes with a broad range of DR were imaged, per the Early Treatment Diabetic Retinopathy Study (ETDRS) protocol, with both film and digital cameras. Severity of DR was assessed centrally from film and tonally standardized digital cameras. For retinopathy outcomes with greater than 10% prevalence, we had 85% or greater power to detect an agreement κ of 0.7 or lower from our target of 0.9.

Results  Comparing DR severity, digital vs film yielded a weighted κ of 0.74 for eye level and 0.73 for patient level (“substantial”). Overall, digital grading did not systematically underestimate or overestimate severity (McNemar bias test, P = .14). For major DR outcomes (≥3-step progression on the ETDRS scale and disease presence at ascending thresholds), digital vs film κ values ranged from 0.69 to 0.96 (“substantial” to “nearly perfect”). Agreement was 86% to 99%; sensitivity, 75% to 98%; and specificity, 72% to 99%. Major conclusions were similar with digital vs film gradings (odds reductions with intensive diabetes therapy for proliferative DR at EDIC years 14 to 16: 65.5% digital vs 64.3% film).

Conclusion  Digital and film evaluations of DR were comparable for ETDRS severity levels, DCCT/EDIC design outcomes, and major study conclusions, indicating that switching media should not adversely affect ongoing studies.

Long-term multicenter studies such as the Diabetes Control and Complications Trial (DCCT)/Epidemiology of Diabetes Interventions and Complications (EDIC) require consistent measurements of key outcome parameters over time and across clinics, especially when technology evolves during the study. The DCCT (1983-1993) demonstrated that intensive therapy aimed at maintaining blood glucose levels as close to normal as possible substantially reduced the risk of development and/or progression of diabetic retinopathy (DR) and other microvascular complications compared with conventional therapy.1-3 The EDIC (1994-2016 [ongoing]), an observational follow-up study of the DCCT cohort,4 demonstrated that the differences in DR and other microvascular (and macrovascular) outcomes between the former intensive and conventional treatment groups persisted for at least 10 years after the DCCT despite the loss of glycemic separation after the clinical trial ended.5-9 Since the inception of the DCCT in 1983, recording of retinal images, from which DR status and progression are evaluated, has inexorably moved from film to digital. Commercial digital fundus camera systems have markedly improved in quality, have been widely adopted by clinics, and offer substantial convenience and economy compared with film cameras.

Changing retinal imaging methods in the DCCT/EDIC, while perhaps unavoidable, might alter study analysis results and conclusions. Although several cross-sectional studies have reported that digital systems provide results that are similar to the film “gold standard,” most represent single-center experience and some lack a wide range of retinopathy severity. Therefore, the DCCT/EDIC Research Group undertook a formal due-diligence ancillary study to gauge the effect on retinal outcomes of switching from film to digital photography. In addition to examining conventional measures of agreement between digital and film grading results, we were also able to evaluate retrospectively the degree to which DCCT/EDIC primary study outcomes and conclusions might be altered by transitioning between the different imaging media.

Methods
Study design

This was a masked, cross-sectional comparison study for determining results of film and digital imaging in assessing DR. Sample size calculations10,11 indicated that, for outcomes with 10% or greater prevalence, 300 subjects would provide 85% or greater power to detect a κ of 0.7 or lower compared with our target κ of 0.9. The target and alternative κ were based on the test/retest κ on film photographs in the DCCT/EDIC.2,6

Subjects

Twenty DCCT/EDIC centers certified for both film and digital imaging (of the 28 clinical centers) studied 319 subjects with type 1 diabetes at their regular visits; 9 subjects (2.8%) were excluded because they had ungradable digital (n = 6) and/or film (n = 5) photography sets in one or both eyes.

Inclusion and exclusion criteria for the DCCT have been published previously.1 Clinical characteristics in the 310 subjects included in the study are given in Table 1 at DCCT baseline (1983-1989), EDIC baseline, and at the time of the digital-to-film transition study (EDIC years 14-16). Comparison of the 310 participants with the remaining 1131 persons enrolled in the DCCT showed no important differences except that more nonparticipants were male, from the secondary cohort, and had higher mean hemoglobin A1c levels during DCCT (eTable), largely because 6.9% who had died and 6.2% who were inactive were included as substudy nonparticipants. Because the primary focus of this article is not on treatment effect, this imbalance does not introduce bias to most digital-film comparisons.

Dcct/edic data collection

Retinopathy was assessed by standard film fundus photography in the whole cohort every 6 months during DCCT, in approximately one-quarter of the cohort each year during EDIC, and in the entire cohort at EDIC years 4 and 10.6 Reproducibility of the film grading procedure and its stability over time were evaluated in each study by annual masked regrading of a sample of images (both eyes of each subject) that included a broad spectrum of DR severity. During DCCT, there were 7 annual replicate gradings of 42 and, later, 60 subjects; during EDIC, there were 10 annual replicate gradings of 50 subjects.4

Fundus photography procedure

Both film and digital photography used the standard 7-field, nonsimultaneous stereoscopic, 30° color procedure established by the Diabetic Retinopathy Study,12 as modified by the Early Treatment Diabetic Retinopathy Study (ETDRS).13 Sets of fundus photographs of both eyes included central views of disc and macula, adjacent views of each of the 4 major vascular arcades, and an adjacent view just temporal to the macula. Although recent studies of macular edema have shifted the disc and temporal-to-macula fields slightly to include the center of the macula, DCCT/EDIC has retained the original ETDRS definitions of fields 1 and 3.

Film photographs were taken on Zeiss FF2-4 fundus cameras (Carl Zeiss Meditech, Inc, Oberkochen, Germany) (or approved alternatives) by certified photographers. Digital images were obtained using camera systems with a minimum of 3 megapixels; 19 of 20 clinics had 5-megapixel or higher systems. Clinics were required to submit images taken of nonstudy volunteers to obtain reading center certification of photographers and digital camera systems.

Fundus image handling and display

At the clinic, film photographs were mounted in plastic sheets in approximate anatomic position and digital photographs were indexed as “proof sheets,” with personal identifying information removed except for study identification number. At the reading center, all digital images were loaded for unified handling into the Topcon IMAGEnet system (Topcon Medical Imaging Inc, Paramus, New Jersey) and were JPEG-compressed at the IMAGEnet “maximum” quality setting, with an average compression ratio of approximately 20:1.

Film sets were retroilluminated on a standard light box (6500° K color temperature) and viewed with the Donaldson stereo viewer (George Davco, Holbrook, Massachusetts). Digital images were displayed on calibrated 20.5-in liquid crystal display monitors (γ = 2.2; color temperature, 6500° K; luminance, 110-170 candelas per m2) and viewed with handheld stereo viewers (Screen-Vu Stereoscope; PS Manufacturing, Portland, Oregon).

Imposition on images of the ETDRS macular grid and measurements of distances/areas were done in film by superimposing grids and measuring circles printed on transparent acetate stock and in digital by superimposing a digital version of the grid and by using the standard distance and planimetry tools of the digital system. For stereo viewing, gridding, and measurement, graders invoked the IMAGEnet stereo analyzer function. For digital images, grids and measuring tools were scaled for each camera, according to the spatial calibration factor established by the reading center at the time of system certification.

Image illumination, contrast, and color balance were controlled in film by specifying acceptable film emulsions (Kodak Ektachrome Professional ASA [Kodak Inc, Rochester, New York] or equivalent) and development processes (E-6 process by a Kodak Q-certified laboratory). Digital image tonal characteristics were optimized via the standardized enhancement model published by the Age-Related Eye Disease Study 2.14 An automated processor-computed luminance histogram for each of the red/green/blue color channels and the curves for each channel were adjusted via algorithm to conform to a model image derived from exemplars.

Quality of both film and digital images was rated by the graders, based on proper field definition, crisp focus, and stereo effect. Graders assigned an image confidence score of high, adequate, or inadequate for answers to the main DR questions as affected by image quality.

Diabetic retinopathy grading procedure

Certified graders evaluated each eye using the ETDRS classifications of DR abnormalities, diabetic macular edema,12,13,15 and overall DR severity.16 Data were entered into computerized forms, with checks for internal consistency and completeness. The grading program included independent assessments of each eye by 2 graders (from a pool of 6), with adjudication of substantial differences by a senior grader (from a pool of 3). Grading of film and digital images of each eye was separated by a minimum of 2 weeks (in most cases, several months) to minimize any memory effect. Another senior grader not involved in the original grading compared film and digital images side- by-side, with knowledge of the grades from both, to explore possible reasons for differences in grading between the two media.

Grading and outcomes

Diabetic retinopathy severity at the eye level was assigned one of the following ETDRS levels: 10 (including levels 14 and15—eyes without microaneurysms but with cotton-wool spots or retinal hemorrhages, respectively), 20, 35, 43, 47, 53, 61 (including level 60—panretinal photocoagulation scars without extant proliferative DR), 65, 71, 75, 81, and 85.15 The ETDRS person-level combines eye results (worse eye emphasized method) as previously done in the DCCT/EDIC.3

To estimate the effect of digital/film grading differences on DCCT/EDIC design outcomes, we collapsed grading scales into dichotomous categories of particular interest to the study: any retinopathy (including microaneurysms only, ie, level 20 or worse in either eye), mild nonproliferative DR (NPDR) or worse (≥35 in either eye), moderate NPDR or worse (≥43 in either eye), moderately severe NPDR or worse (≥47 in either eye), severe NPDR or worse (≥53 in either eye), proliferative DR (PDR) (≥60/61 in either eye), and Diabetic Retinopathy Study high-risk characteristics or worse (≥71 in either eye). Proliferative DR is the primary EDIC retinopathy outcome after EDIC year 10. Retinopathy progression in the DCCT was defined as an increase of 3 or more steps on the ETDRS person scale from DCCT baseline. Further retinopathy progression in EDIC was defined as 3 or more steps progression from DCCT closeout. Progression of DR at the dual imaging visit was used to compare the outcomes from digital vs film images.

Diabetic macular edema was analyzed as the presence or absence of ETDRS clinically significant macular edema (CSME). Center-involved diabetic macular edema was insufficiently prevalent in our population for reliable comparison between media.

Preliminary test of grading performance on digital images prior to standardized enhancement

After grading the digital images of 98 eyes (49 subjects) without standardized enhancement for tonal characteristics, the reading center performed a preliminary comparison of ETDRS retinopathy severity levels between digital and film gradings. There appeared to be a systematic difference between results from the two media, with higher DR severity levels in some eyes on film compared with digital images (data not shown). Standardized enhancement (optimization) was then applied to these digital images, and they were independently regraded. The reduction in systematic differences between the two media achieved by optimization was substantial. Therefore, all digital images were optimized prior to being graded.

Statistical analysis

Agreement between film and digital gradings on ordinal DR categories was analyzed by cross-tabulation and by rates of exact and near agreement. Cohen κ statistics, both unweighted17 and weighted,18 were calculated for multistep ordinal scales. A weight of 1 was assigned for exact agreement, 0.75 for 1-step difference on eye and patient scales, and 0.5 for 2-step differences on the patient scale. For 2-step or greater differences on the eye scale or 3-step or greater differences on the patient scale, the weight 0 was applied. We used guidelines for interpretation of κ proposed by Landis and Koch: 0.0-0.20 indicates slight; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, substantial; and 0.81-1.00, almost perfect.19 The Bhapkar test of marginal homogeneity20 was used to assess the agreement between film and digital in marginal distribution of the ordinal ETDRS scale. The McNemar overall bias test21 was used to test for systematic overestimation or underestimation between film and digital gradings.

Film/digital agreement on dichotomous DCCT/EDIC DR categories was evaluated by prevalence, agreement rate, sensitivity, specificity, false-positive and false-negative rates, and Cohen unweighted κ, using film as the reference standard. For prevalence rates close to 0 or 1, Cohen κ was not reported because of its unreliability owing to substantial imbalance in the distribution of marginal totals.22

To assess the effect of switching from film to digital images, separate multivariate logistic regression models were constructed within each image type comparing the glycemic treatment effect (odds reduction of the former intensive therapy compared with conventional therapy) on several DR outcomes, especially risk of further 3-step DR progression during EDIC (our primary retinopathy outcome through EDIC year 10) and risk of onset of PDR during EDIC (our primary retinopathy outcome after year 10). These models adjusted for the same covariates as our published Weibull proportional hazard model, including primary or secondary cohort (no retinopathy or retinopathy at DCCT baseline), diabetes duration at DCCT baseline, hemoglobin A1c levels at DCCT eligibility, and retinopathy levels at DCCT closeout.6

To evaluate historical reproducibility of film photography during DCCT/EDIC, Fleiss κ among multiple raters17 was used to calculate κ for DR dichotomous categories, using data from annual replicate gradings on the quality control image samples. Reliability of the digital film grading across clinics was analyzed via the Cochran test of homogeneity.23

Results
Comparison of digital vs film gradings of dr severity

Figure 1 compares film and digital gradings on the ETDRS person-level scale. There were at least 12 persons in each of the lower retinopathy severity categories (from no retinopathy, level 10 = 10, through moderately severe NPDR in the worse eye, level 47 < 47) and in the 3 mildest PDR categories (levels 60 < 60, 60 = 60, and 65 < 65) but only 0 to 3 in the more severe NPDR (levels 47 = 47 through 53 = 53) and PDR categories (levels 65 < 65 through 71 = 71). There was exact agreement in 51% of subjects, agreement within 1 level in 82%, and agreement within 2 levels in 95% (DR progression is worsening of ≥3 levels). Weighted κ was 0.73 (95% confidence interval, 0.68-0.77), representing substantial agreement between digital and film gradings. The McNemar test of overall bias did not show significant systematic difference between gradings (film higher in 27% and lower in 22%; P = .14). The Bhapkar test of marginal homogeneity indicated a borderline significant imbalance between the marginal distributions of film vs digital gradings (P = .08; eFigure 1).

The corresponding analysis using ETDRS eye-level scale is shown in Figure 2. To gain power, we used all eyes with gradable film and digital photographs (N = 628, including those with gradable photographs in only 1 eye). Agreement rates were 63% for exact agreement and 94% for agreement within 1 step. Weighted κ for agreement was 0.74 (95% confidence interval, 0.71-0.78). Gradings showed more severe DR with film than with digital (film higher in 141 eyes and digital higher in 92, P = .001 by McNemar test), and there was significant marginal heterogeneity (P = .002 by Bhapkar test; eFigure 2). The most noteworthy differences were in the 106 eyes placed in level 10 by 1 or both image types (film higher in 36 and digital higher in 14; P = .002) and in the 122 eyes in level 43 by 1 or both image types (film higher in 56 and digital higher in 31; P = .004).

Side-by-side review of a sample of these cases post hoc by a senior grader confirmed that small, subtle microaneurysms, intraretinal microvascular abnormalities, and retinal new vessels were sometimes more difficult to detect in digital color images than in film, even after tonal enhancement.

Comparison of digital vs film gradings of diabetic macular edema

In this study, clinically significant diabetic macular edema occurred in only 6% to 7% of subjects and 6% to 7% of eyes, providing insufficient power for reliable analyses. However, agreement rates on presence or absence were 94% for subjects and 96.8% for eyes; digital was higher in 5.3% and film higher in 4.3% (McNemar bias test, P = .56); and marginal totals were not significantly different (Bhapkar test of marginal homogeneity, P = .59).

Agreement on dcct/edic dr outcomes based on digital vs film images

Table 2 presents the agreement on dichotomous DCCT/EDIC DR categories determined from digital vs film images. In these categories, digital vs film κ ranged from 0.69 to 0.96, agreement proportion was 86% to 99%, sensitivity was 75% to 98%, and specificity was 72% to 99%. Agreement on the presence of any degree of PDR (including scars of prior photocoagulation treatment of it, with or without residual new vessels), the primary EDIC retinopathy outcome, was very good, leading to high sensitivity (96%-98%), specificity (99%), and κ (0.95-0.96) for the PDR category. This result may be explained in part by panretinal photocoagulation scars, easily detected in images of either type in 25 of the 35 patients with mild proliferative DR. Proliferation consisting solely of early new vessels is sometimes more difficult to detect in digital than film images, although there was agreement on presence in 8 of 10 such eyes. Results for the severe NPDR (or worse) category could not be accurately determined because only 1 of the 310 participants was classified as having severe NPDR, and only using film (Figure 1). Similarly, the low sensitivity observed for CSME (50%) is of uncertain significance owing to low prevalence. There were very few subjects with no retinopathy in either eye (10 by film only, 5 by digital only, and 13 by both; Figure 1). Thus, the low specificity observed for the “any retinopathy” threshold (72%) is not statistically reliable.

Table 3 presents the agreement between digital and film grading regarding the effect of former DCCT treatment assignment (standard vs intensive glycemic control) on the risk of any degree of PDR, at the dual-imaging visit, among the 302 participants free of PDR at DCCT close out. Multivariate logistic regression revealed an almost identical treatment effect from film and digital gradings. Adjusted odds ratios (ORs) for risk of PDR, conventional vs intensive, were 1.7 for film (95% confidence interval, 0.7-4.1; P = .27) and 1.7 for digital (95% confidence interval, 0.7-4.1; P = .22). Models were adjusted for primary or secondary cohort (no retinopathy or retinopathy at DCCT baseline), diabetes duration at DCCT baseline, hemoglobin A1c levels at DCCT eligibility, and retinopathy levels at DCCT closeout.

Additional multivariate logistic regression models on other retinopathy categories (Table 4) showed similar results. Adjusted ORs of conventional vs intensive treatment are comparable between film and digital at various levels: for further 3-step or greater progression, film OR was 1.6 (P = .07) vs digital, 1.5 (P = .10); for mild NPDR or worse, film OR was 1.5 (P = .22) vs digital, 1.5 (P = .02); and for moderate NPDR or worse, film OR was 1.7 (P = .09) vs digital, 1.8 (P = .06). The greater-than-3-step progression from DCCT at baseline shows the largest discrepancy between image media, with adjusted ORs of 1.9 for film (P = .05) and 1.5 for digital (P = .18).

RELIABILITY OF κ ACROSS CLINICS

Comparison of κ for the dichotomous DCCT/EDIC DR outcomes across clinics via Cochran test of homogeneity24 showed no significant difference among the 20 clinics from the United States and Canada (eFigure 3).

Historical reproducibility of grading dr from film in dcct/edic

Weighted κ statistics for reproducibility on the ordinal ETDRS scale derived from film gradings in annual quality control exercises ranged from 0.72 to 0.84 in the DCCT2 and from 0.69 to 0.80 in the EDIC—values somewhat greater than the κ of 0.70 from the film vs digital comparison (Figure 1) using the same weighting scheme. For most dichotomous outcomes there were similar differences; for 3-step or greater progression, presence of mild NPDR or worse, and presence of moderate NPDR or worse, κ values ranged from 0.80 to 0.93 in DCCT and EDIC (Table 5), while corresponding values for film vs digital comparisons ranged from 0.69 to 0.77 (Table 2). In contrast, the film vs film quality control exercises produced lower κ values than the film vs digital comparison study for presence of PDR and presence of severe NPDR or worse, as might be expected in quality control sets selected to include eyes in level 53 and to minimize eyes with photocoagulation scars.

Comment

From the DCCT/EDIC perspective, the most important finding of this substudy is that, in the subset of subjects with dual images, the effects of DCCT intensive (relative to conventional) treatment on most measures of retinopathy progression were reasonably similar when assessed from digital compared with film images (Tables 3 and 4). For assessment of retinopathy severity level along the multistep ETDRS scale, agreement between gradings from film and digital images was also substantial (κ = 0.70) but appeared to be slightly lower than corresponding film vs film comparisons in the DCCT (κ = 0.72-0.84) and the more contemporaneous EDIC (κ = 0.69-0.80).

The comparability of grading digital vs film images for classification of DR severity has been described previously by others.24,26-29 While some previous studies used the full ETDRS 7SF (7 standard field) imaging procedure,27,29 others modified it by reducing the number of 30° fields or substituting wide-angle fields, switching to monochrome rather than color, dispensing with stereoscopic effect (in peripheral fields, or entirely), and/or using nonmydriatic (via dark adaptation) rather than pharmacologic pupillary dilation.24,26,28 Many of these studies were primarily oriented toward screening programs for the purpose of referring persons with clinically important retinopathy to ophthalmologic care rather than conducting clinical trials or epidemiological studies. Most of these articles concluded that the comparability between film and digital grading was adequate to justify adoption of the digital medium for various clinical purposes. Thanks to these precedent studies, we were made aware of the limitations in emerging digital practice and were able to address some of these difficulties.

The DCCT/EDIC digital vs film ancillary study is the first formal comparison to be reported by an ongoing, multicenter clinical trial or epidemiological study. Several of our study design and implementation features may have enhanced the comparability between film and digital imaging for DR assessment: modern digital fundus cameras with higher spatial resolution, photographers and camera systems certified for digital performance, full ETDRS 7SF stereo imaging, standardized tonal enhancement of digital images to filmlike standard, and certified graders at a central reading center experienced in evaluating DR for many years with film and for the past few years with digital images.

A weakness of our study was the small number of cases with severe NPDR, severe PDR, and mild PDR in the absence of photocoagulation scars, resulting in lower power to examine differences between digital and film in these categories. We recruited all subjects within a specified time period rather than recruiting a stratified sample, and these levels are infrequent in our subjects. In most populations, severe NPDR is rare, being an acute stage through which eyes pass relatively quickly on their way to developing PDR.15

For retinopathy studies requiring discrimination between all of the individual levels on the ETDRS severity scale, we emphasize that we found worse performance currently with digital images at 2 points on the DR scale. For the presence of any retinopathy (driven at the lower end by microaneurysms only), digital sensitivity was 72% and its false-positive rate was 28%. For moderate NPDR (levels 43 and 47, driven mostly by intraretinal microvascular abnormalities), digital sensitivity was 75% and its false-negative rate was 25%. Our more recent work suggests that supplementing the view of the full-color image with the monochromatic green channel (the latter extracted from the former) improves performance of digital photography.30 The green channel view maximizes the contrast of DR abnormalities against the retinal pigment epithelial background compared with the full-color view.

For studies that require evaluation of macular edema from fundus photography rather than ocular coherence tomography, we must also caution that sensitivity for detecting CSME with digital images appeared to be lower than with film, although this condition was too infrequent in our sample to draw robust conclusions. Our digital vs film results for CSME suggest high specificity (98%) but low sensitivity (50%) and a high false-negative rate (50%). Of note, most present-day clinical trials in ophthalmology now study diabetic macular edema primarily with ocular coherence tomography, which measures retinal thickening objectively rather than with grading of stereo color photographs (as done historically). However, the DCCT/EDIC has not yet elected to add ocular coherence tomographic examination, given the low incidence of CSME in our cohort. Work is ongoing at the reading center to improve grading of macular edema from digital photographs.

Given our ancillary study's finding of overall comparability of digital vs film gradings for evaluation of DR severity, the DCCT/EDIC Research Group and its external advisory committee voted in 2009 to approve the switch from film to digital imaging. At present, all 28 clinics have changed to digital photography.

In the context of a multicenter, long-term study, we found that ETDRS severity levels (the major DCCT/EDIC retinopathy outcomes) and our study conclusions drawn from them are comparable when DR is graded from digital rather than film images. Overall, these results support transition from the film to the digital imaging medium for research documentation of diabetic retinopathy.

Correspondence: Larry D. Hubbard, MAT, Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, 8010 Excelsor Dr, Ste 100, Madison, WI 53717-0568 (hubbard@rc.opth.wisc.edu).

Submitted for Publication: August 30, 2010; final revision received August 30, 2010; accepted October 5, 2010.

Financial Disclosure: The authors report contributions from Abbott, Animas, Aventis, BD Bioscience, Bayer, (donated one time in 2008) Can-AM, Eli Lilly, Lifescan, Medtronic Minimed, Omron, Roche, and OmniPod to the trial, not attributed to any individual author.

Group Information: A complete list of participants in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Study research group was published in Arch Ophthalmol. 2008;126(12):1713.

Funding/Support: This study is supported by contracts with the Division of Diabetes, Endocrinology, and Metabolic Diseases of the National Institute of Diabetes and Digestive and Kidney Diseases (DK 034818), the National Eye Institute, the National Institute of Neurological Disorders and Stroke, the General Clinical Research Centers Program, the Clinical and Translational Science Awards Program, the National Center for Research Resources, and by Genentech through a Cooperative Research and Development Agreement with the National Institute of Diabetes and Digestive and Kidney Diseases.

References
1.
The Diabetes Control and Complications Trial Research Group, The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus.  N Engl J Med 1993;329 (14) 977- 986PubMedGoogle Scholar
2.
 The effect of intensive diabetes treatment on the progression of diabetic retinopathy in insulin-dependent diabetes mellitus: the Diabetes Control and Complications Trial.  Arch Ophthalmol 1995;113 (1) 36- 51PubMedGoogle Scholar
3.
Diabetes Control and Complications Trial Research Group, Progression of retinopathy with intensive versus conventional treatment in the Diabetes Control and Complications Trial.  Ophthalmology 1995;102 (4) 647- 661PubMedGoogle Scholar
4.
Epidemiology of Diabetes Interventions and Complications (EDIC) Research Group, Epidemiology of Diabetes Interventions and Complications (EDIC). Design, implementation, and preliminary results of a long-term follow-up of the Diabetes Control and Complications Trial cohort.  Diabetes Care 1999;22 (1) 99- 111PubMedGoogle Scholar
5.
The Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group, Retinopathy and nephropathy in patients with type 1 diabetes four years after a trial of intensive therapy.  N Engl J Med 2000;342 (6) 381- 389PubMedGoogle Scholar
6.
White  NHSun  WCleary  PA  et al.  Prolonged effect of intensive therapy on the risk of retinopathy complications in patients with type 1 diabetes mellitus: 10 years after the Diabetes Control and Complications Trial.  Arch Ophthalmol 2008;126 (12) 1707- 1715PubMedGoogle Scholar
7.
Writing Team for the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group, Sustained effect of intensive treatment of type 1 diabetes mellitus on development and progression of diabetic nephropathy: the Epidemiology of Diabetes Interventions and Complications (EDIC) study.  JAMA 2003;290 (16) 2159- 2167PubMedGoogle Scholar
8.
Martin  CLAlbers  JHerman  WH  et al. DCCT/EDIC Research Group, Neuropathy among the diabetes control and complications trial cohort 8 years after trial completion.  Diabetes Care 2006;29 (2) 340- 344PubMedGoogle Scholar
9.
Nathan  DMCleary  PABacklund  JY  et al. Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Study Research Group, Intensive diabetes treatment and cardiovascular disease in patients with type 1 diabetes.  N Engl J Med 2005;353 (25) 2643- 2653PubMedGoogle Scholar
10.
Donner  AEliasziw  M A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation.  Stat Med 1992;11 (11) 1511- 1519PubMedGoogle Scholar
11.
Sim  JWright  CC The kappa statistic in reliability studies: use, interpretation, and sample size requirements.  Phys Ther 2005;85 (3) 257- 268PubMedGoogle Scholar
12.
 Diabetic retinopathy study. Report number 6: design, methods, and baseline results: report number 7: a modification of the Airlie House classification of diabetic retinopathy: prepared by the Diabetic Retinopathy.  Invest Ophthalmol Vis Sci 1981;21 (1, pt 2) 1- 226PubMedGoogle Scholar
13.
Early Treatment Diabetic Retinopathy Study Research Group, Grading diabetic retinopathy from stereoscopic color fundus photographs: an extension of the modified Airlie House classification: ETDRS report number 10.  Ophthalmology 1991;98 (5) ((suppl)) 786- 806PubMedGoogle Scholar
14.
Hubbard  LDDanis  RPNeider  MW  et al. Age-Related Eye Disease 2 Research Group, Brightness, contrast, and color balance of digital versus film retinal images in the age-related eye disease study 2.  Invest Ophthalmol Vis Sci 2008;49 (8) 3269- 3282PubMedGoogle Scholar
15.
Gardner  TWSander  BLarsen  ML  et al.  An extension of the Early Treatment Diabetic Retinopathy Study (ETDRS) system for grading of diabetic macular edema in the Astemizole Retinopathy Trial.  Curr Eye Res 2006;31 (6) 535- 547PubMedGoogle Scholar
16.
Early Treatment Diabetic Retinopathy Study Research Group, Fundus photographic risk factors for progression of diabetic retinopathy. ETDRS report number 12.  Ophthalmology 1991;98 (5) ((suppl)) 823- 833PubMedGoogle Scholar
17.
Shoukri  M Measures of Interobserver Agreement.  Boca Raton, FL Chapman & Hall/CRC2004;
18.
Cohen  J Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.  Psychol Bull 1968;70 (4) 213- 220PubMedGoogle Scholar
19.
Landis  JRKoch  GG The measurement of observer agreement for categorical data.  Biometrics 1977;33 (1) 159- 174PubMedGoogle Scholar
20.
Bhapkar  VP A note on the equivalence of two test criteria for hypotheses in categorical data.  J Am Stat Assoc 1966;61228- 23510.2307/2283057Google Scholar
21.
McNEMAR  Q Note on the sampling error of the difference between correlated proportions or percentages.  Psychometrika 1947;12 (2) 153- 157PubMedGoogle Scholar
22.
Feinstein  ARCicchetti  DV High agreement but low kappa I: the problems of two paradoxes.  J Clin Epidemiol 1990;43 (6) 543- 549PubMedGoogle Scholar
23.
Cochran  WG The combination of estimates from different experiments.  Biometrics 1954;10101- 12010.2307/3001666Google Scholar
24.
Lin  DYBlumenkranz  MSBrothers  RJGrosvenor  DM The sensitivity and specificity of single-field nonmydriatic monochromatic digital fundus photography with remote image interpretation for diabetic retinopathy screening: a comparison with ophthalmoscopy and standardized mydriatic color photography.  Am J Ophthalmol 2002;134 (2) 204- 213PubMedGoogle Scholar
25.
Fleiss  JL Measuring nominal scale agreement among many raters.  Psychol Bull 1971;76378- 38210.1037/h0031619Google Scholar
26.
Bursell  SECavallerano  JDCavallerano  AA  et al. Joslin Vision Network Research Team, Stereo nonmydriatic digital-video color retinal imaging compared with Early Treatment Diabetic Retinopathy Study seven standard field 35-mm stereo color photos for determining level of diabetic retinopathy.  Ophthalmology 2001;108 (3) 572- 585PubMedGoogle Scholar
27.
Fransen  SRLeonard-Martin  TCFeuer  WJHildebrand  PLInoveon Health Research Group, Clinical evaluation of patients with diabetic retinopathy: accuracy of the Inoveon diabetic retinopathy-3DT system.  Ophthalmology 2002;109 (3) 595- 601PubMedGoogle Scholar
28.
Rudnisky  CJTennant  MTWeis  ETing  AHinz  BJGreve  MD Web-based grading of compressed stereoscopic digital photography versus standard slide film photography for the diagnosis of diabetic retinopathy.  Ophthalmology 2007;114 (9) 1748- 1754PubMedGoogle Scholar
29.
Li  HKHubbard  LDDanis  RP  et al.  Digital versus film fundus photography for research grading of diabetic retinopathy severity.  Invest Ophthalmol Vis Sci 2010;51 (11) 5846- 5852PubMedGoogle Scholar
30.
Reimers  JLGangaputra  SEsser  B  et al.  Green channel vs color retinal images for grading diabetic retinopathy in DCCT/EDIC [ARVO abstract 2285].  Invest Ophthalmol Vis Sci 2010;51e-Abstract 228510.1167/iovs.10-6303Google Scholar
×