Lee SC, Lee ET, Wang Y, Klein R, Kingsley RM, Warn A. Computer Classification of Nonproliferative Diabetic Retinopathy. Arch Ophthalmol. 2005;123(6):759-764. doi:10.1001/archopht.123.6.759
To propose methods for computer grading of the severity of 3 early lesions, namely, hemorrhages and microaneurysms, hard exudates, and cotton-wool spots, and classification of nonproliferative diabetic retinopathy (NPDR) based on these 3 types of lesions.
Using a computer diagnostic system developed earlier, the number of each of the 3 early lesions and the size of each lesion in the standard photographs were determined. Computer classification criteria were developed for the levels of individual lesions and for NPDR. Evaluation of the criteria was performed using 430 fundus images with normal retinas or any degree of retinopathy and 361 fundus images with no retinopathy or the 3 early lesions only. The results were compared with those of the graders at the University of Wisconsin Ocular Epidemiology Reading Center and an ophthalmologist.
Main Outcome Measures
Agreement rates in the classification of NPDR between the computer system and human experts.
In determining the severity levels of individual lesions, the agreement rates between the computer system and the reading center were 82.6%, 82.6%, and 88.3% using the 430 images and 85.3%, 87.5%, and 93.1% using the 361 images, respectively, for hemorrhages and microaneurysms, hard exudates, and cotton-wool spots. When the “questionable” category was excluded, the corresponding agreement rates were 86.5%, 92.3%, and 91.0% using the 430 images and 89.7%, 96.3%, and 97.4% using the 361 images. In classifying NPDR, the agreement rates between the computer system and the ophthalmologist were 81.7% using the 430 images and 83.5% using the 361 images.
The proposed criteria for computer classification produced results that are comparable with those provided by human experts. With additional research, this computer system could become a useful clinical aid to physicians and a tool for screening, diagnosing, and classifying NPDR.
Nonproliferative diabetic retinopathy (NPDR) is characterized by structural damage to small retinal blood vessels, causing them to dilate, leak, or rupture. Visible retinal lesions include microaneurysms, at first alone but later accompanied by 1 or more of the following: dot or blot hemorrhages, hard exudates (HE), soft exudates or cotton-wool spots (CWS), intraretinal microvascular abnormalities (IRMA), and venous beading. Nonproliferative diabetic retinopathy can be classified into mild (or early), moderate, and severe stages depending on the severity of the lesions.
Early diagnosis and accurate staging are essential prerequisites for effective treatment of diabetic retinopathy and the reduction of visual disability risk. Advancing technology in the computer analysis of retinal images now has major efficiency and cost-effectiveness potential to improve the detection and selection of at-risk patients. Digital fundus imaging makes computer diagnosis and classification of diabetic retinopathy possible. A few studies have investigated whether computer systems can be as reliable as humans in the classification and grading of diabetic retinopathy or whether human diagnosis and classification can be improved or aided by the computer.1- 5
Computer diagnostic systems probably will not replace the current screening modalities, as suggested by the American Diabetes Association.6 They will, however, provide a modality that greatly reduces subjective human variation and the cost of the current screening process and therefore facilitate a more objective and economical method of diagnosis. In addition, the technology offered by the computer system will likely be an aid to human experts to produce diagnostic results of higher quality and accuracy.
We have developed a computer system using image-processing and pattern recognition techniques to detect 3 early lesions of NPDR, namely, retinal hemorrhages and microaneurysms (HMA), HE, and CWS, from color fundus slides or real-time images directly transmitted from the fundus camera to the computer. The computer system was developed for fundus photographs taken at 45° and between standard fields 1 and 2 (referred to as “1-2 field,” with both the optic disc and macula present). It can also be applied to real-time images. The diagnostic criteria of a local retinal specialist (R.M.K.) and fundus images from a large number of American Indians of southwestern Oklahoma were used to develop the system. The performance of the system was found to be closely comparable with that of a general ophthalmologist (A.W.), a retinal specialist, and certified graders at the University of Wisconsin Ocular Epidemiology Reading Center (UWOERC).7
Of the various available methods for the classification of diabetic retinopathy, grading of fundus photographs using Standard Photographs (SPs) has become an accepted method in clinical studies. The 7 standard fields used in the Early Treatment Diabetic Retinopathy Study (ETDRS) and the modified Airlie House classification system8 are commonly used to classify the level of severity of retinopathy. In 1986, an alternative method of categorizing diabetic retinopathy into 8 levels was proposed.9 However, in many epidemiologic studies, only 1 fundus photograph is taken of a field of the retina, and therefore classification criteria involving 7 fields are not applicable and must be modified.
We propose a method for computer classification of severity levels of HMA, HE, and CWS and for classification of NPDR. This method is based on a modification of the ETDRS criteria. The performance of this computer classification system is evaluated by comparing the computer results with those of the UWOERC and a general ophthalmologist.
Using image-processing and pattern recognition techniques including noise removal and image normalization, a computer vision system for the diagnosis of 3 early diabetic retinopathy lesions, HMA, HE, and CWS, was developed.7 The system detects these lesions on the basis of the color contrast between lesions and the retinal background. The diagnosis also depends on the shape of the lesion. The development of this computer system was part of an epidemiologic study of eye disease called Vision Keepers.7 Participants in Vision Keepers were American Indians who resided in southwestern Oklahoma. Informed consent was obtained from all participants to allow their fundus photographs to be used in the development of a computer vision system for the diagnosis of retinopathy. The study was approved by the University of Oklahoma Health Sciences Center Institutional Review Board.
Fundus photographs were taken by a 45° camera (CR5-45 NM; Canon, Tokyo, Japan) and included SP fields 1 and 2, with both the disc and macula present. These photographs were used as training images in the development of the computer system. To use it, a fundus photograph (35-mm slide) was first scanned and digitized using the Nikon LS1000 scanner (Melville, NY). The dimension of the digitized images was 512 × 512 pixels (screen resolution). The digitization resolution was 10 000 pixels per inch. The digitized image was then processed using the computer vision system. It has been shown that the diagnostic results obtained from the computer system are comparable with those obtained by human experts7 in agreement rates, sensitivity, and specificity. We have expanded the capability of the computer system to classify the individual lesions and NPDR into levels of severity.
The grading criteria used by the UWOERC for the 3 early retinal lesions are based on the ETDRS protocol10,11 and SPs 2A and 3, as follows:
• HMA: the levels are 0, none; 1, questionable; 2, <SP 2A; and 3, ≥SP 2A.
• HE: the levels are 0, none; 1, questionable; 2, <SP 3; and 3, ≥SP 3.
• CWS: the levels are 0, none; 1, questionable; and 2, present.
The “questionable” category was used only when the grader thought that the likelihood of the lesion being present was 50% or greater but less than 90%. When an abnormality was definitely present but its nature was uncertain, the grader assigned the grade of “questionable” for the lesion considered to be most likely and “none” for the lesions considered less likely.
The classification criteria used in this study were based on a single 45° photographic field, whereas the ETDRS classification requires 7 standard 30° fields. The ETDRS classification criteria were modified. The major difference was that the ETDRS mid-nonproliferative levels were collapsed into 2 levels owing to the inability to count the number of fields involved with specific lesions. The ETDRS classification gives weight to multiple fields having a lesion present, but the criteria used in this study categorize retinopathy severity based on the lesion amount present in 1 broader field (R.K. and S. Meuer, written communication, 2004). Although it is difficult to compare the 2 methods directly, the scale in this study has been used in many large epidemiologic studies; for example, the Third National Health And Nutrition Examination Survey.12
To develop a set of computer classification criteria that were similar to these criteria, we digitized the SPs, identified the 3 lesions using the developed computer system, and measured the size of each HMA and HE in pixels. The total lesion area of all HMAs in SP 2A was found to be 1717 pixels, and the total area of all HEs in SP field 3 was 86 pixels. The SPs were taken using a Zeiss camera (Carl Zeiss, Oberkochen, Germany) at 30°. The mean optic disc diameter of the 30° photographs was 4.7 mm; for photographs taken by the Canon 45° camera, the mean optic disc diameter was 3.4 mm, as measured on the film by the UWOERC. Therefore, the 45° photographs were approximately 72.3% magnification compared with the 30° pictures. Using this fraction, the 30°-to-45° area conversion ratio was 0.7232 = 0.527. We also measured the diameter of the optic disc of the digitized 30° SPs and the digitized 45° images from our study population, the American Indians. The mean diameter was found to be 111 pixels for the 30° SPs and 80 pixels for the 45° photographs. The resultant resolution was approximately 19 μm per pixel. The diameter conversion ratio was 0.7207 and the area conversion ratio was approximately 0.52, which was in agreement with the conversion ratio estimated by the UWOERC. We used 0.52 as the scaling factor in converting lesion areas in the 30° SPs to a 45° equivalent.
Using this scaling factor, the total area of HMA in SP 2A and the total area of HE in SP 3 were found to be equivalent to 893 pixels and 45 pixels, respectively, in a 45° photograph. On the basis of these converted values, we propose the following criteria for computer grading:
• HMA: 0, none detected; 1, questionable; 2, HMA detected and the total area is <893 pixels; and 3, HMA detected and the total area is ≥893 pixels.
• HE: 0, none detected; 1, questionable; 2, HE detected and the total area is <45 pixels; and 3, HE detected and the total area is ≥45 pixels.
• CWS: 0, none detected; 1, questionable; and 2, present.
The “questionable” category was assigned when the lesions were very small and therefore suggestive of NPDR. The classification also depended on whether HMA was present in the cases of HE and CWS. For HMA, very small lesions (area ≤4 pixels) were classified as questionable HMA. If the image contained only questionable HMA, it was classified as showing questionable HMA. If the image had no HMA or only questionable HMA, the image was considered as showing questionable HE even if HE was detected by the computer system. If the image was classified as showing definite HMA and none of the detected HE in the image was greater than 4 pixels, the image was classified as showing questionable HE. For CWS, if 1 or more CWS was detected by the computer system and the image had no HMA or only questionable HMA, the image was classified as showing questionable CWS.
This set of criteria was used to grade the level of severity of the 3 early retinal lesions in 430 images. None of these selected slides were used in developing the computer system. These images were selected from more than 1000 American Indians in southwestern Oklahoma who participated in an epidemiologic study of eye disease called Vision Keepers. In this study, participants were given a funduscopic examination by a general ophthalmologist using indirect ophthalmoscopy. In addition, for each participant, 2 fundus photographs were taken of each eye using a Canon 45° CR5 camera and focusing on the area between fields 1 and 2, including the optic disc and macula. The better image of each eye was sent to the UWOERC for grading. The overall rate of diabetic retinopathy in this cohort was 23%. The quality of the 430 selected images was considered gradable by the UWOERC. These images were selected with the constraint that approximately 23% of the participants would be classified as showing diabetic retinopathy. Of the 430 images, 361 had no lesions or 1 or more of the 3 lesions, HMA, HE, and CWS, but no other advanced lesions as diagnosed by the UWOERC. The results from the UWOERC included grading of individual lesions into the levels described previously. We compared the computer-grading results from the entire 430 images and the 361 images with those from the UWOERC. In addition, because the criteria for the “questionable” category used by the UWOERC involved judgment of the graders based on their experience and because those used by the computer system were largely based on the size of the lesion and whether definite HMA was present, we also compared the results while ignoring the questionable category.
In addition to individual lesions, we developed a set of computer classification criteria for NPDR. In the Vision Keepers study, NPDR was classified into 4 categories: none, early, moderate, and severe. The classification criteria used by the study ophthalmologist were modifications of the ETDRS definitions10 in consultation with study advisors at the UWOERC and the National Eye Institute. The resulting classification criteria for NPDR were as follows:
No NPDR: No retinal lesions are observed.
Early NPDR: At least 1 microaneurysm and definition not met for 3 and 4 below.
Moderate NPDR: Presence of hemorrhages, HE, IRMA, CWS, or venous beading and definition not met for 4 below.
Severe NPDR: Twenty or more microaneurysms and hemorrhages in each of the 4 midperipheral quadrants (the 4 quadrants are formed by drawing a horizontal line and a vertical line intersecting at the center of the optic disc of a 45° retinal photograph, which is centered between the optic disc and the macula13), or venous beading in any 2 quadrants, or IRMA in any 1 quadrant, and there is no retinal neovascularization or vitreous hemorrhage.
The computer system can detect the 3 early lesions, HMA, HE, and CWS, as well as their locations and sizes in pixels. For each of the 3 lesions, the diagnosis can be “yes,” “no,” or “questionable.” Because we were able to identify only 3 types of lesions for NPDR, for the purpose of this article, we propose the following computer classification criteria:
No NPDR: No retinal lesions are detected.
Questionable NPDR: All lesions detected by the computer are not definite, or HE and CWS are detected without the presence of HMA.
Early NPDR: At least 1 HMA without the presence of HE or CWS, and the total HMA area is <893 pixels (less than the level of SP 2A).
Moderate NPDR: HMA is detected with HE and/or CWS also present, or the total HMA area is ≥893 pixels (greater than or equal to the level of SP 2A).
Severe NPDR: Presence of 20 or more HMA in each of the 4 midperipheral quadrants.
The same 430 images were processed using the computer system and these classification criteria. This set of criteria was not exactly the same as that used by the UWOERC, and therefore we could not compare the computer classification results with those of the UWOERC. We compared the classification results of the 430 images and the 361 images that were obtained from the computer system with those of the general ophthalmologist. The overall agreement rate and κ statistic were computed for each comparison. A κ value higher than 0.40, 0.60, and 0.75 indicates, respectively, moderate, good, and excellent agreement.
We first report the results for individual lesions. Table 1 gives the comparisons of computer classification results from the 430 images with those of the UWOERC readers. For HMA, the overall agreement rate was 82.6% with a κ statistic of 0.63, indicating good agreement. Fourteen (3.3%) and 10 images (2.3%), respectively, were classified by the UWOERC grader and the computer system as showing questionable (level 1) HMA. The false-positive rate of the computer system (excluding level 1) for HMA was 7.7% (22/284), and the false-negative rate (excluding level 1) was 18.9% (25/132). When the “questionable” category was ignored, the agreement rate was 86.5% and the κ statistic was 0.70, indicating very good agreement between the 2 diagnostic methods. Major causes of misclassification were that (1) the HMA was too small, (2) the color intensity of the HMA was too close to the background or was low contrast, (3) the HMA was adjacent to a vessel, and (4) the shape of the HMA was long and thin, which could be mistaken as a vessel by the computer system.
For HE, the overall agreement rate was 82.6% and the κ statistic was 0.55. Twenty-two (5.1%) and 38 images (8.8%), respectively, were classified by the UWOERC grader and the computer system as showing questionable HE. The computer system categorized a total of 47 images as showing questionable, level 2, or level 3 HE, whereas the UWOERC found them free of HE; among these, 30 were classified as questionable by the computer. Most of the other misclassifications were due to the computer’s calling drusen, light reflection, and artifacts HE. On the other hand, UWOERC graders classified a total of 58 images as showing level 2 or level 3 HE, and the computer system misclassified only 3 (5.2%) as showing level 0 or level 1 HE. When the “questionable” category was excluded, the overall agreement rate and the κ statistic increased to 92.3% and 0.74, respectively, and the false-positive rate was only 5.3% (17/320), indicating excellent agreement.
The agreement rate was 88.3% between the classifications of CWS severity level by the computer and those of the UWOERC graders. The κ statistic was 0.65. The UWOERC graders gave 18 “questionable” diagnoses (4.2%), and the computer system gave 15 (3.5%). Of the 340 images classified by the UWOERC as showing no CWS, 21 were categorized as either questionable (11 images) or CWS-positive (10 images), for a false-positive rate of 6.2%. Of the 71 images with a classification of CWS by the UWOERC, the computer system missed 15 (21.1%). When the “questionable” category was omitted, the agreement between UWOERC graders and the computer system was excellent, with an agreement rate of 91.2% and a κ statistic of 0.78. The primary reason for computer misclassification was the vagueness of the CWS or too little contrast between CWS and the background.
For the 361 images that had the 3 early lesions only or no lesions at all, the agreement rates (Table 2) were about 3% to 5% higher (85.3%, 87.5%, and 93.1%, respectively, for HMA, HE, and CWS) than those in Table 1. However, the κ statistic did not improve except for CWS (0.60, 0.38, and 0.90, respectively, for HMA, HE, and CWS). When the “questionable” category was excluded, both the agreement and the κ statistic improved considerably. The agreement rates were 89.7%, 96.3%, and 97.4%, and the κ statistics were 0.69, 0.63, and 0.71, respectively, for HMA, HE, and CWS. Table 3 gives the results for the classification of NPDR in the 430 images from the study ophthalmologist and the computer system. In this table, we have excluded 16 images that were classified as showing questionable NPDR by the computer system because the ophthalmologist did not use the “questionable” category in her diagnosis. The overall agreement rate was 81.2%, and the κ statistic was 0.67. Of the 284 images that were categorized by the ophthalmologist as showing no NPDR, 27 were classified by the computer as showing early NPDR and 6 as moderate NPDR, for a 11.6% overall false-positive rate. Of the 79 images categorized as showing early NPDR, 26 (32.9%) were classified as showing no NPDR by the computer. However, in the moderate and severe groups, the computer made very few mistakes. Of the 361 images that had no lesions or the 3 early lesions only, 11 were classified as showing questionable NPDR by the computer system. Omitting these 11 images, Table 4 shows a slightly higher agreement rate of 83.5%.
Using pattern recognition and image-processing techniques in medical diagnosis has been successful in many areas. One example is the machine interpretation of electrocardiograms. This technique has been a useful aid to physicians to provide an expert interpretation, particularly when cardiologists are not readily available. Diabetic retinopathy is another disease that might benefit by machine- or computer-aided diagnosis, particularly because of the possible interobserver and intraobserver variation. As reported earlier,7 the agreement rates among a retinal specialist, a general ophthalmologist, and graders at a fundus reading center in diagnosing HMA, HE, and CWS ranged between 85% and 89% when classification consisted of yes/questionable/no categories and between 91% and 96% when classification consisted of only yes/no categories. Similar variation may exist in the classification of severity of individual lesions and NPDR.
Since the computer system to recognize the 3 early retinal lesions, HMA, HE, and CWS, was developed, we have attempted to expand its use to classify these 3 lesions into levels of severity. The human classification has been based on a comparison of the image under study with the SPs. Because the images were digitized into pixels before they were processed by the computer system, we also digitized the SPs and then identified the areas of various lesions. To determine whether the lesion in the image under study was more severe than in the SPs, we proposed to compare the total areas of lesions. In converting the lesion area in a 30° image to a 45° equivalent, we used a scaling factor of 0.52. The factor is an estimate. However, it was interesting to find that the magnification fraction using digital images (0.7207) was very close to that obtained by measuring the film (0.7230).
In addition to the 4 levels of NPDR defined by the ETDRS (none, early, moderate, and severe), we propose an additional level, “questionable NPDR,” which includes cases where all lesions detected by the computer are not definite or in which HE and CWS are detected without the presence of HMA. If the computer system is used alone, patients with a computer diagnosis of questionable or more severe NPDR should be referred to an ophthalmologist for further examination. Patients with a computer diagnosis of moderate or severe NPDR must be referred more urgently for confirmation and treatment.
In this study, we compared the diagnostic results of the computer with those of human graders and a general ophthalmologist using images that had some retinal lesions or had no lesions or the 3 early lesions only in the absence of other, more severe lesions characterizing NPDR. Of the 430 images that had no or some retinal lesions, the overall agreement rates for the levels of severity and the classification of NPDR were between 82.6% and 92.3% and the κ statistics were between 0.55 and 0.78. The agreement rates and κ statistics were higher when the “questionable” category was excluded. These agreement rates were good given that the graders at the UWOERC and the ophthalmologist often used the presence of other lesions or lesions outside of the SP fields 1 and 2 to grade or diagnose a specific lesion and the computer considered only the 3 early lesions inside the SP field 1 and 2. We also compared the results using images that had the 3 early lesions only or no lesions. Of the 361 such images, the agreement rates were between 85.3% and 97.4%.
The reproducibility of both the computer diagnosis and that of human graders is high. The computerized method is deterministically reproducible if the same digitized images are processed or if the images are transmitted to the computer directly without going through a scanner. Agreement between pairs of graders for lesions typical of diabetic retinopathy was 95% to 97%, and κ statistics ranged from 0.72 to 0.86, indicating excellent agreement.12
Our data indicate that the classification differences between the computer system and the UWOERC graders were due to the “questionable” category more than whether the image had only the 3 early lesions or other, more severe lesions. There are no uniform definitions for questionable lesions that can be used by both human experts and the computer system. The criteria used by graders were qualitative and involved subjective judgment. The criteria used by the computer system considered the small size (≤4 pixels) of the lesion and whether definite HMA was present (for HE and CWS only). The selection of 4 pixels as the cutoff point was based on the suggestion that the size of the smallest clinically visible microaneurysm is 30 μm3 or a 4-pixel square in a circular image of 576 pixels in diameter.14 Our images were in a circular field of 512 pixels in diameter, and therefore we considered that HMA of 4 pixels or less in area was questionable. We have also considered using lesion color or its contrast with the retinal background to determine if a lesion was questionable. However, there was no uniform human standard for the color or contrast, so we chose to use the size without considering these factors. The difference in criteria used by the graders and computer made it difficult to compare the results from these 2 diagnostic modalities. To improve the diagnostic capability of the computer system and to facilitate comparisons, a set of universally accepted quantitative and/or objective criteria is needed to diagnose these questionable lesions.
Another problem is how to improve the accuracy of classification by computer to reduce the rates of false-positive and false-negative diagnoses. This question appears to be more difficult to answer than in the case of electrocardiograms, in which the processing and tracking of shape of the electrocardiographic signals need to be studied. With fundus images, it involves not only the shape but also the color of the lesions and the color contrast between lesions, vessels, and background. Considerable variations exist in the color and levels of contrast of the lesions in fundus images from one patient to another. However, the human eye can distinguish lesions from normal retinal vessels and the retinal background despite these variations. To improve the capability of a computer system to diagnose and classify diabetic retinopathy, more research is needed on how human experts make diagnostic decisions and how human intelligence can be translated into computer algorithms. In addition, data on other, more severe lesions such as IRMA, venous beading, and new vessels need to be incorporated into the computer vision system and used in the classification criteria. More research is being conducted to further improve the algorithms in computer diagnosis and classification of diabetic retinopathy, including these clinically important lesions.
In conclusion, this article describes a method for a computer system using new software to grade the severity of 3 types of early lesions (HMA, HE, and CWS) of diabetic retinopathy and a method to classify NPDR based on these 3 lesions only. Classification results from the computer system were compared with those of graders from an established fundus reading center and a general ophthalmologist. Agreements rates were good to excellent. Our results show that the computer system could become a useful aid for physicians and fundus image graders and a tool for large studies, such as screening and epidemiologic studies, in diagnosing early lesions and classifying early diabetic retinopathy. For physicians and fundus image graders, the computerized diagnosis can serve as a second opinion. The computer system could be particularly helpful when ophthalmologists are not readily available and eye examinations are performed or fundus photographs are read by physicians in other specialties. Thus, computer-assisted diagnosis of the presence and severity of diabetic retinopathy may be beneficial to those with a high risk of this disease, especially in areas where ophthalmologic expertise is lacking.
Correspondence: Samuel C. Lee, PhD, School of Electrical and Computer Engineering, University of Oklahoma, 202 W Boyd St, Norman, OK 73019 (firstname.lastname@example.org).
Submitted for Publication: October 28, 2003; final revision received March 29, 2004; accepted April 29, 2004.
Financial Disclosure: None.
Funding/Support: This study was supported by grant EY-09898 from the National Eye Institute, Bethesda, Md.
Additional Information: This work was performed at the Center for American Indian Health Research, University of Oklahoma Health Sciences Center.
Acknowledgment: We thank the Vision Keepers participants for allowing us to use their fundus photographs in this study. We also thank Dana Russell, Stacy Meuer, Michelle Roberts, and Cathy Morales for their assistance.