Venn diagram showing the numberof images classified as unacceptable from the GDx VCC scanning laser polarimeter(Laser Diagnostic Technologies, Inc, San Diego, Calif), HRT II confocal scanninglaser ophthalmoscope (Heidelberg Retina Tomograph; Heidelberg Engineering,Dossenheim, Germany), and Stratus OCT optical coherence tomograph (Carl ZeissMeditec, Inc, Dublin, Calif).
Receiver operating characteristiccurves of the best parameters from the GDx VCC scanning laser polarimeter,HRT II confocal scanning laser ophthalmoscope, and Stratus OCT optical coherencetomograph. Manufacturers are provided in the legend to Figure 1. Bathija function refers to the linear discriminant functionused by Bathija et al.
Medeiros FA, Zangwill LM, Bowd C, Weinreb RN. Comparison of the GDx VCC Scanning Laser Polarimeter, HRT II ConfocalScanning Laser Ophthalmoscope, and Stratus OCT Optical Coherence Tomographfor the Detection of Glaucoma. Arch Ophthalmol. 2004;122(6):827-837. doi:10.1001/archopht.122.6.827
Copyright 2004 American Medical Association. All Rights Reserved.Applicable FARS/DFARS Restrictions Apply to Government Use.2004
To compare the abilities of current commercially available versionsof 3 optical imaging techniques: scanning laser polarimetry with variablecorneal compensation (GDx VCC), confocal scanning laser ophthalmoscopy (HRTII [Heidelberg Retina Tomograph]), and optical coherence tomography (StratusOCT) to discriminate between healthy eyes and eyes with glaucomatous visualfield loss.
We included 107 patients with glaucomatous visual field loss and 76healthy subjects of a similar age. All individuals underwent imaging witha GDx VCC, HRT II, and fast retinal nerve fiber layer scan with the StratusOCT as well as visual field testing within a 6-month period. Receiver operatingcharacteristic curves and sensitivities at fixed specificities (80% and 95%)were calculated for parameters reported as continuous variables. Diagnosticcategorization (outside normal limits, borderline, or within normal limits)provided by each instrument after comparison with its respective normativedatabase was also evaluated, and likelihood ratios were reported. Agreementon categorization between methods (weighted κ) was assessed.
After the exclusion of subjects with unacceptable images, the finalstudy sample included 141 eyes of 141 subjects (75 with glaucoma and 66 healthycontrol subjects). Mean ± SD mean deviation of the visual field testresult for patients with glaucoma was −4.87 ± 3.9 dB, and 70%of these patients had early glaucomatous visual field damage. No statisticallysignificant difference was found between the areas under the receiver operatingcharacteristic curves (AUCs) for the best parameters from the GDx VCC (nervefiber indicator, AUC = 0.91), Stratus OCT (retinal nerve fiber layer inferiorthickness, AUC = 0.92), and HRT II (linear discriminant function, AUC = 0.86).Abnormal results for each of the instruments, after comparison with theirnormative databases, were associated with strong positive likelihood ratios.Chance-corrected agreement (weighted κ) among the 3 instruments rangedfrom moderate to substantial (0.50-0.72).
The AUCs and the sensitivities at high specificities were similar amongthe best parameters from each instrument. Abnormal results (as compared witheach instrument's normative database) were associated with high likelihoodratios and large effects on posttest probabilities of having glaucomatousvisual field loss. Calculation of likelihood ratios may provide additionalinformation to assist the clinician in diagnosing glaucoma with these instruments.
Changes in the structural appearance of the optic nerve head and retinalnerve fiber layer (RNFL) often precede the development of visual field lossin glaucoma.1- 3 Thus,detection of optic nerve head and RNFL damage is crucial for the diagnosisof glaucoma in its early stages. Until recently, structural evaluation inglaucoma has been subjective, with primarily qualitative descriptions of change.With the emergence of optical imaging instruments, assessment of the opticnerve head and RNFL is objective, providing quantitative information.
Confocal scanning laser ophthalmoscopy, scanning laser polarimetry,and optical coherence tomography are various technologies that make use ofthe different properties of light and different characteristics of retinaltissue to obtain their measurements.4- 10 Studieshave compared the ability of earlier versions of these technologies to differentiatepatients with glaucoma from healthy subjects.4,11- 24 However,each of these technologies has recently undergone significant hardware andsoftware improvements, and issues that were once limitations for a given techniquemay no longer be relevant.
For scanning laser polarimetry, the introduction of variable cornealcompensation in the GDx VCC (Laser Diagnostic Technologies, Inc, San Diego,Calif) has resulted in improved diagnostic accuracy compared with the earlierversion of this instrument, which used fixed corneal compensation.25- 28 Foroptical coherence tomography, the new Stratus OCT (Carl Zeiss Meditec, Inc,Dublin, Calif) includes several improvements compared with the original OCT,including better resolution, an increased number of A-scans, and a reducedneed for pupil dilation.29 Also, the StratusOCT provides information on the probability of abnormality of patient examinationresults after comparison with an internal normative database. For confocalscanning laser ophthalmoscopy, the HRT II Heidelberg Retina Tomograph (HeidelbergEngineering, Dossenheim, Germany) is designed specifically for imaging ofthe optic nerve head, with almost completely automatic image acquisition andimproved diagnostic accuracy.11,30
The purpose of this study was to compare, in 1 study population, theability of current commercially available versions of these 3 technologiesto differentiate between healthy eyes and eyes with glaucomatous visual fieldloss.
This observational cross-sectional study included 183 eyes of 183 patients(107 patients with glaucoma and 76 healthy control subjects). All patientswere evaluated at the Hamilton Glaucoma Center, University of California,San Diego, from April 2002 to November 2003. These patients were includedin a prospective longitudinal study designed to evaluate optic nerve structureand visual function in glaucoma (Diagnostic Innovations in Glaucoma Study).All patients who met the inclusion criteria were enrolled in the current study.Informed consent was obtained from all participants. The University of California,San Diego, Human Subjects Committee approved all protocols, and the methodsdescribed adhered to the tenets of the Declaration of Helsinki.
Each subject underwent a comprehensive ophthalmologic examination includinga review of the medical history, best-corrected visual acuity measurement,slitlamp biomicroscopy, intraocular pressure measurement using Goldmann applanationtonometry, gonioscopy, dilated fundoscopic examination using a 78D lens, stereoscopicoptic disc photography, and automated perimetry using the 24-2 Swedish InteractiveThreshold Algorithm (Carl Zeiss Meditec, Inc). To be included, subjects hadto have a best-corrected visual acuity of 20/40 or better in the affectedeye, spherical refraction within ±5.0 diopters (D) and cylinder correctionwithin ±3.0 D, and open angles on gonioscopy. Eyes with coexistingretinal disease, uveitis, or nonglaucomatous optic neuropathy were excludedfrom this investigation. One eye of each patient was randomly selected forinclusion in the study.
Normal control eyes had an intraocular pressure of 21 mm Hg or less(with no history of increased intraocular pressure) and a normal visual fieldtest result. A normal visual field was defined as a mean deviation and patternstandard deviation within 95% confidence limits and a glaucoma hemifield test31 result within normal limits. Normal control eyesalso had a healthy appearance of the optic disc and RNFL (no diffuse or focalrim thinning, cupping, optic disc hemorrhage, or RNFL defects), as evaluatedby clinical examination. Eyes were classified as glaucomatous if they hadrepeated (2 consecutive) abnormal visual field test results, defined as apattern standard deviation outside the 95% normal confidence limits or a glaucomahemifield test result outside normal limits, regardless of the appearanceof the optic disc.
All subjects underwent ocular imaging with the GDx VCC, HRT II, andStratus OCT. For each subject, all ocular imaging and visual field examinationswere completed within 6 months.
All patients underwent imaging using a commercially available scanninglaser polarimeter (GDx VCC; software version 5.0.1; Laser Diagnostic Technologies,Inc). The general principles of scanning laser polarimetry have been describedin detail elsewhere.17 The GDx VCC is a modifiedscanning laser polarimetry system with variable corneal compensation. Imagesof the ocular fundus are formed by scanning the beam of a near infrared laser(780 nm) in a raster pattern. The scan raster covers an image field 40°horizontally and 20° vertically, including both the parapapillary andmacular regions of the eye. With the GDx VCC, the method of variable cornealpolarization compensation, as described by Zhou and Weinreb,25 hasbeen automated and has replaced the original fixed corneal compensator. Thevariable corneal compensator in this system consists of 2 identical linearretarders in rotating mounts so that both the retardation and axis of theunit can be adjusted according to requirements. To measure eye-specific cornealpolarization axis and magnitude, scanning laser polarimetry images of themacula are first acquired without compensation (the retardation is set to0). The uniform radial birefringence of the Henle fiber layer in the maculais used as an "intraocular polarimeter," and both the Henle fiber layer andcorneal retardation can be determined from the macular retardation profile.Next, corneal birefringence–compensated scanning laser polarimetry imagesare obtained using the appropriate eye-specific corneal polarization axisand magnitude values by adjusting the variable corneal compensation retarders.The GDx VCC measures retardation in nanometers. To simplify communications,retardation values are converted into thickness values (micrometers) usinga fixed conversion factor of 0.67 nm/µm.32
In this study, a baseline image was automatically created from 3 imagesobtained for each subject. Assessment of GDx VCC image quality was performedby an experienced examiner masked to the subject's identity and results fromthe other tests. The assessment was based on the appearance of the reflectanceimage, presence of residual anterior segment retardation, and presence ofan atypical pattern of retardation. To be classified as good quality, an imagerequired a focused and evenly illuminated reflectance image with a centeredoptic disc. To be acceptable, the baseline image also had to have a residualanterior segment retardation of 15 nm or less and an atypical scan score lessthan 25. The atypical scan score is a measure provided by the GDx VCC standardsoftware, which indicates the presence of atypical patterns of retardationthat can generate spurious RNFL thickness measurements. Twenty-four (13%)of 183 patients had unacceptable GDx VCC scans and were excluded from furtheranalysis. The Venn diagram in Figure 1 showsthe number of unacceptable images from each instrument.
The GDx VCC software calculates summary parameters based on quadrantsthat are defined as temporal (335° to 24°), superior (25° to 144°),nasal (145° to 214°), or inferior (215° to 334°). The GDxVCC parameters investigated in this study were superior ratio (superior quadrantthickness/temporal quadrant thickness), inferior ratio, superior/nasal ratio,superior maximum (mean of the 1500 thickest points in the superior quadrant),inferior maximum, superior average, inferior average, normalized superiorarea (area under the temporal-superior-nasal-inferior-temporal [TSNIT] curvein the superior quadrant), normalized inferior area, maximum modulation ([thickest quadrant–thinnest quadrant]/thinnest quadrant), ellipse modulation,ellipse average (TSNIT average), ellipse standard deviation (TSNIT standarddeviation), and nerve fiber indicator (NFI). The NFI is calculated using asupport vector machine algorithm based on several RNFL measures (Michael Sinai,PhD, Laser Diagnostic Technologies, written communication, March 2003) andassigns a number from 0 to 100 to each eye. The higher the NFI, the greaterthe likelihood that the patient has glaucoma. For each of these parameters,receiver operating characteristic (ROC) curves were constructed and sensitivitiesat fixed specificities (≥80% and ≥95%) were reported.
For the parameters TSNIT average, superior average, inferior average,and TSNIT standard deviation, the GDx VCC printout also provides probabilitymeasures of abnormality based on comparison with an internal normative databasecontaining information on 540 normal eyes. In the GDx VCC printout, each colorrepresents a different probability of the parameter being outside normal limits,with red having the highest probability (P<.005),followed by yellow (P<.01), light blue (P<.02), and dark blue (P<.05).For this study, a parameter was considered outside normal limits if P<.005 (red), borderline if P<.05(yellow, light blue, or dark blue), and within normal limits if P>.05 (green). We evaluated the diagnostic categorization (outsidenormal limits, borderline, or within normal limits) provided by the GDx VCCafter comparison with its normative database and reported likelihood ratios(LRs) for each parameter. For the parameter of NFI, no probability measureof abnormality is currently provided in the printout. However, the cutoffssuggested by the manufacturer are 0 to 30 for within normal limits, 31 to50 for borderline, and 51 to 100 for outside normal limits (Michael Sinai,PhD, written communication, December 2003). In our study, we investigatedthe diagnostic ability of the NFI using the manufacturer's suggested cutoffsas well as other arbitrarily selected cutoffs. Interval LRs were calculatedfor the NFI.
The HRT II uses a diode laser (670-nm wavelength) to sequentially scanthe retinal surface in the horizontal and vertical directions at multiplefocal planes. Using confocal scanning principles, a 3-dimensional topographicimage is constructed from a series of optical image sections at consecutivefocal planes.33 The topographic image determinedfrom the acquired 3-dimensional image consists of 384 × 384 (147 456total) pixels, each of which is a measurement of retinal height at its correspondinglocation. For every patient, 3 topographic images were obtained, combined,and automatically aligned to make a single mean topographic image used foranalysis. Magnification errors were corrected using patients' corneal curvaturemeasurements. An experienced examiner outlined the optic disc margin on themean topographic image while viewing stereoscopic photographs of the opticdisc. Good-quality images required focused reflectance with a standard deviationno greater than 50 µm. Fifteen (8%) of the 183 patients had unacceptabletopographic images and were excluded from further analysis (Figure 1).
Topographic parameters included with HRT II software and investigatedin this study were disc area, cup area, rim area, cup/disc area ratio, rim/discarea ratio, cup volume, rim volume, mean cup depth, maximum cup depth, heightvariation contour, cup shape measure, mean RNFL thickness, RNFL cross-sectionalarea, horizontal cup/disc ratio, vertical cup/disc ratio, and 2 linear discriminantfunctions, from Mikelberg et al34 (Mikelbergfunction) and Bathija et al4 (Bathija function).These parameters have been described in detail elsewhere. For each of theseparameters, ROC curves were constructed and sensitivities at fixed specificities(≥80% and ≥95%) were reported. All of these parameters except for the2 linear discriminant functions and the horizontal and vertical cup/disc ratioswere further examined using sectors categorized as temporal superior (45°to 90°), nasal superior (91° to 135°), nasal (136° to 225°),nasal inferior (226° to 270°), temporal inferior (271° to 315°),and temporal (316° to 44°).
The software for the HRT II also incorporates the Moorfields regressionanalysis,11 which is a comparison of the subject'srim area with a predicted rim area for a given disc area and age, based onconfidence limits of a regression analysis derived from 112 normal eyes ofwhite subjects. Each sector is classified as normal if the measurement fallswithin the 95% confidence interval (CI), borderline if the measurement fallsbetween the 95% and 99.9% CI, and outside normal limits if the measurementfalls lower than the 99.9% CI. The Moorfields regression analysis also providesresults for the global rim area as well as a final classification. A normalclassification requires the Moorfields regression analysis of all sectorsand the global rim area to be within normal limits. A borderline classificationoccurs when at least 1 of the sectors or the global rim area is borderline,and an outside normal limits result occurs when at least 1 sector or the globalrim area is outside normal limits. The LRs were calculated for each possiblediagnostic categorization (within normal limits, borderline, and outside normallimits) of the global and sectorial results as well as the final classificationof Moorfields regression analysis.
The commercially available optical coherence tomograph, the StratusOCT, was used to assess parapapillary RNFL thickness measurements. Opticalcoherence tomography uses the principles of low-coherence interferometry andis analogous to ultrasound B-mode imaging, but it uses light instead of soundto acquire high-resolution images of ocular structures.8 Alow-coherence near infrared (840-nm) light beam is directed onto a partiallyreflective mirror (beam splitter) that creates 2 light beams, a referenceand a measurement beam. The measurement beam is directed onto the subject'seye and is reflected from intraocular microstructures and tissues accordingto their distance, thickness, and reflectivity. The reference beam is reflectedfrom the reference mirror at a known, variable position. Both beams travelback to the partially reflective mirror, recombine, and are transmitted toa photosensitive detector. The pattern of interference is used to provideinformation regarding distance and thickness of the retinal structures. Bidimensionalimages are created by successive longitudinal scanning in transverse directions.
The fast RNFL algorithm was used to obtain RNFL thickness measurementswith the Stratus OCT. Three images were acquired from each subject, with eachimage consisting of 256 A-scans along a 3.4-mm-diameter circular ring aroundthe optic disc. A baseline image was automatically created using the StratusOCT software. Quality assessment of Stratus OCT scans was determined by anexperienced examiner masked to the subject's identity and the results of theother tests. Good-quality scans had to have focused images from the ocularfundus, an adequate signal-to-noise ratio, and the presence of a centeredcircular ring around the optic disc. Nineteen (10%) of 183 patients had unacceptableStratus OCT scans and were excluded from further analysis (Figure 1).
Parapapillary RNFL thickness parameters automatically calculated byexisting Stratus OCT software (version 3.1) and evaluated in this study wereaverage thickness (360° measurement), temporal quadrant thickness (316°to 45°), superior quadrant thickness (46° to 135°), nasal quadrantthickness (136° to 225°), inferior quadrant thickness (226° to315°), and thickness for each of 12 clock-hour positions with the 3-o'clockposition as nasal, 6-o'clock position as inferior, 9-o'clock position as temporal,and 12-o'clock position as superior. Other parameters evaluated included superiormaximum (Smax)(thickest point in the superior quadrant), inferior maximum(Imax) (thickest point in the inferior quadrant), and relational parameterssuch as Imax/Smax, Smax/Imax, Imax/temporal average thickness (Imax/Tavg),Smax/nasal average thickness (Smax/Navg), and difference between the thickestand thinnest points along the measurement circle (Max-Min). For each of theseparameters, ROC curves were constructed and sensitivities at fixed specificities(≥80% and ≥95%) were reported.
For each parameter, the Stratus OCT software provides a classification(within normal limits, borderline, or outside normal limits) based on comparisonwith an internal normative database of 328 eyes. A parameter is classifiedas outside normal limits if its value falls lower than the 99.9% CI of thehealthy, age-matched population. A borderline result indicates that the valueis between the 95% and 99.9% CI, and a within-normal-limits result indicatesthat the value is within the 95% CI. The LRs were calculated for each parameterand each possible diagnostic categorization, as provided by the Stratus OCTsoftware.
We used t tests to evaluate optic nerve headand RNFL measurement differences between glaucomatous and normal eyes. Resultsof statistical significance were also provided after Bonferroni correctionbased on the number of comparisons within each analysis (GDx VCC, HRT II,and Stratus OCT).
The ROC curves were used to describe the ability of each parameter fromeach instrument to differentiate glaucomatous from normal eyes. The ROC curveshows the trade-off between sensitivity and 1−specificity. An area underthe ROC curve (AUC) of 1.0 represents perfect discrimination, whereas an AUCof 0.5 represents chance discrimination. The method of DeLong et al35 was used to compare AUCs. Sensitivities at fixedspecificities were compared using the McNemar test.
Diagnostic categorization (outside normal limits, borderline, or withinnormal limits) provided by each instrument after comparison with its respectivenormative database was also evaluated, and LRs were reported. An LR is definedas the probability of a given test result in those with disease divided bythe probability of that same test result in those without the disease.36,37 Once determined, an LR can be incorporateddirectly into the calculation of posttest probability of disease by usinga formulation of the Bayes theorem.38 The LRfor a given test result indicates how much that result will raise or lowerthe probability of disease. A value of 1 means that the test provides no additionalinformation, and ratios higher or lower than 1 increase or decrease the likelihoodof disease. A classification of the effect of LRs of different magnitudeson the posttest probability of disease has been suggested and was used inour study.36 According to this classification,LRs higher than 10 or lower than 0.1 would be associated with large effectson posttest probability, LRs from 5 to 10 or from 0.1 to 0.2 would be associatedwith moderate effects, LRs from 2 to 5 or from 0.2 to 0.5 would be associatedwith small effects, and LRs closer to 1 would be insignificant. The 95% CIsfor LRs were calculated according to the method proposed by Simel et al.39
Chance-corrected agreement on categorization between different instrumentswas assessed using a weighted κ approach, with quadratic weighting assignmentas proposed by Fleiss.40 This method allowsfor differences in the importance of disagreement, assuming that disagreementbetween adjacent categories (eg, between normal and borderline or borderlineand outside normal limits) is not as important as that between distant categories(eg, between normal and outside normal limits). Strength of agreement wascategorized according to the method proposed by Landis and Koch41:less than 0 was poor, 0 to 0.20 was slight, 0.21 to 0.40 was fair, 0.41 to0.60 was moderate, 0.61 to 0.80 was substantial, and 0.81 to 1.00 was almostperfect.
P<.05 was considered statistically significant.Statistical analyses were performed using SPSS version 10.0 (SPSS Inc, Chicago,Ill) and S-PLUS 2000 (Mathsoft Inc, Seattle, Wash) statistical software.
After the exclusion of subjects with unacceptable images (Figure 1), the final study sample included 141 eyes of 141 subjects(75 patients with glaucoma and 66 healthy control subjects). There was nostatistically significant difference between the mean ± SD ages ofpatients with glaucoma and healthy subjects (mean ± SD, 68 ±10 years vs 65 ± 8 years, respectively; P =.06 using a t test). The mean ± SD mean deviationof the glaucomatous eyes on the visual field test nearest the imaging datewas −4.87 ± 3.9 dB. According to the grading scale for severityof visual field defects developed by Hodapp et al,42 53patients (70%) were classified as having early visual field defects, 11 patients(15%) had moderate defects, and 11 patients (15%) had severe visual fielddefects.
Table 1 presents the meanvalues of GDx VCC parameters in glaucomatous and normal eyes. After Bonferronicorrection (α = .003; 15 comparisons), statistically significant differenceswere found for all parameters except symmetry. Table 1 also shows the ROC curve areas and sensitivities at fixedspecificities. The 3 GDx VCC parameters with the largest AUCs were NFI (0.91),inferior normalized area (0.86), and TSNIT average (0.85). The AUC for NFIwas significantly higher than those for inferior normalized area (P = .04) and TSNIT average (P = .004).
Table 2 presents LRs withtheir 95% CIs for the GDx VCC parameters after comparison with the instrument'snormative database. For all parameters, outside normal limits results wereassociated with large effects on the posttest probability of disease. Borderlineresults were associated with small to moderate effects, whereas within-normal-limitsresults were associated with small effects on the posttest probability ofdisease. The LRs of the overall GDx VCC classification were also evaluated.For this classification, an outside-normal-limits result was considered tobe the presence of an NFI greater than 50 or any other parameter outside-normal-limits.The LR of an outside-normal-limits result in the GDx VCC overall classificationwas infinity. For a borderline result (NFI between 31 and 50 or any otherparameter that was borderline), the LR was 1.60 (95% CI, 1.02-2.50). A within-normal-limitsresult (NFI ≤30 and all parameters within normal limits) had an LR of 0.24(95% CI, 0.14-0.40).
Interval LRs were also calculated for the parameter of NFI using arbitrarilyselected cutoffs (Table 3); resultsof 0 to 15 and greater than 50 were associated with large effects on the posttestprobability of disease, whereas the other test ranges were associated withsmall effects.
Table 4 presents the meanvalues of HRT II parameters in glaucomatous and normal eyes. After Bonferronicorrection (α = .003; 17 comparisons), statistically significant differenceswere found for all parameters except disc area and height variation contour. Table 4 also indicates ROC curve areasand sensitivities at fixed specificities. The 3 HRT II parameters with thelargest AUCs were the Bathija function (0.86), the Mikelberg function (0.83),and the vertical cup/disc ratio (0.83). There were no statistically significantdifferences in ROC curve areas for these parameters (P>.05for all comparisons). The analysis of HRT II parameters by sector did notresult in higher ROC curve areas, with the parameter of temporal inferiorrim/disc area ratio having the largest ROC curve area (0.81).
Table 5 presents LRs withtheir 95% CIs for the HRT II Moorfields regression analysis. Global and sectorialresults outside normal limits were generally associated with large effectson the posttest probability of disease. Borderline results were associatedwith small to moderate effects, whereas within normal limits results wereassociated with small effects. An outside normal limits result in the overallHRT II classification (ie, the Moorfields regression analysis classification)was associated with a large effect on the posttest probability of disease(LR = 19.4), whereas borderline (LR = 0.88) and within normal limits (LR =0.35) results were associated with small changes in the probability of disease.
We evaluated interval LRs for the HRT II parameter with the largestAUC, the Bathija function. Several cutoffs were arbitrarily created for thisparameter, and the interval LRs are indicated in Table 3. Values for the Bathija function greater than 1.0 or smallerthan −1.0 were associated with large effects on posttest probabilitiesof disease, whereas the other test results had small effects on the probabilityof disease.
Table 6 presents the meanvalues of Stratus OCT parameters in glaucomatous and normal eyes. After Bonferronicorrection (α = .002; 25 comparisons), statistically significant differenceswere found for all parameters except thickness at 8 o'clock, thickness at9 o'clock, temporal thickness, Imax/Smax, Smax/Tavg, and Smax/Navg. Table 6 also indicates ROC curve areasand sensitivities at fixed specificities. The 3 Stratus OCT parameters withthe largest AUCs were inferior thickness (0.92), average thickness (0.91),and Imax (0.91). There were no statistically significant differences in ROCcurve areas for these parameters (P>.05 for all comparisons).
Table 7 presents LRs withtheir 95% CIs for the Stratus OCT parameters after comparison with the instrument'snormative database. For the overall Stratus OCT classification, an outside-normal-limitsresult was considered to be the presence of any quadrant outside-normal-limits.The LR of an outside-normal-limits result in the Stratus OCT overall classificationwas 43.1 (95% CI, 32.2-57.8). For a borderline result (any quadrant with aborderline result), the LR was 0.88 (95% CI, 0.44-1.77). A within normal-limits-result(all quadrants within normal limits) had an LR of 0.28 (95% CI, 0.17-0.44).
We evaluated interval LRs for the Stratus OCT parameter with the largestAUC, inferior thickness. Several cutoffs were arbitrarily created for thisparameter, and the interval LRs are indicated in Table 3. Inferior thickness values less than or equal to 70 µmor greater than 130 µm were associated with large effects on the posttestprobability of disease. Values ranging from 71 µm to 90 µm werealso associated with large effects, whereas the other test results were associatedwith small or insignificant effects on posttest probabilities of disease.
No statistically significant difference was found between AUCs for thebest parameters from the GDx VCC (NFI, AUC = 0.91), Stratus OCT (inferiorthickness, AUC = 0.92), and HRT II (Bathija function, AUC = 0.86) (P>.05 for all comparisons). Figure2 shows the ROC curves for the best parameters from each instrument.
At specificities of at least 95%, no statistically significant differenceswere found among parameters with the highest sensitivities from each instrument:GDx VCC NFI (sensitivity, 61%), Stratus OCT average thickness (sensitivity,71%), and HRT II Bathija function (sensitivity, 59%; P>.05for all comparisons). At specificities of at least 80%, a statistically significantdifference was found between the HRT II parameter with the highest sensitivity(Mikelberg function; sensitivity, 73%) and those from the GDx VCC (NFI; sensitivity,87%; P = .05) and Stratus OCT (inferior thickness;sensitivity, 89%; P = .01). No statistically significantdifference was found between parameters with the highest sensitivities fromthe GDx VCC and Stratus OCT.
Agreement on diagnostic categorization between pairs of instrumentswas also evaluated. For this analysis, the overall classification of eachinstrument was used as described previously. The GDx VCC and Stratus OCT overallclassifications agreed in 89% of cases, with a substantial chance-correctedagreement (κ = 0.72 [0.08]). The GDx VCC and HRT II overall classificationsagreed in 81% of cases, with a moderate chance-corrected agreement (κ= 0.50 [0.08]). The Stratus OCT and HRT II overall classifications agreedin 81% of cases, with a moderate chance-corrected agreement (κ = 0.55[0.08]).
To our knowledge, this is the first study to provide a comparison, usingthe same population, of the diagnostic accuracies of 3 instruments: the GDxVCC scanning laser polarimeter, HRT II confocal scanning laser ophthalmoscope,and Stratus OCT optical coherence tomograph. Each instrument represents thecurrent commercially available version of a different technology for evaluationof the optic nerve head and RNFL in glaucoma.
Several measures of diagnostic accuracy were provided in our study,including ROC curve areas, sensitivities at fixed specificities, and LRs.For parameters reported as continuous variables, no statistically significantdifferences in ROC curves were found among the best parameters of the 3 instruments.
Previous studies have compared ROC curve areas derived from measuresobtained using older versions of these technologies. Zangwill et al19 and Greaney et al43 reportedthat ROC curve areas were similar among the best parameters from the GDx NerveFiber Analyzer (Laser Diagnostic Technologies, Inc), OCT 2000 (Carl ZeissMeditec, Inc), and HRT I (Heidelberg Engineering). In the study by Zangwilland colleagues, the best parameters from the OCT 2000 and HRT I had highersensitivities than the best parameter from the GDx Nerve Fiber Analyzer. At96% specificity, the best parameter from the GDx Nerve Fiber Analyzer (a lineardiscriminant function combining several parameters of the instrument) hada sensitivity of only 32%. The introduction of the GDx with variable cornealcompensation, the GDx VCC, has reportedly resulted in improved diagnosticaccuracy as compared with scanning laser polarimetry using fixed corneal compensation.26 In our study, at a specificity of 97%, the sensitivityof the best parameter from the GDx VCC, NFI, was 61%. This confirms the improvementin diagnostic accuracy as described in other studies.26,44
For optical coherence tomography, the ROC curve areas obtained in ourstudy for the Stratus OCT were similar to those obtained with the previousversions of this technology. The AUCs for the earlier versions of the opticalcoherence tomographic reportedly ranged from 0.79 to 0.94 depending on theparameter and characteristics of the population evaluated.14,18,19,22,23,43 Instudies evaluating the diagnostic ability of several optical coherence tomographicparameters, the RNFL thickness in the inferior region often had the best abilityto discriminate healthy eyes from eyes with early to moderate glaucoma, withsensitivities between 67% and 79% for specificities of 90% or higher.18,19,22 In our study, theparameter of inferior thickness also had the highest AUC, with a sensitivityof 64% for a specificity set at 95%. Although results for the Stratus OCTin our study were at least as good as those reported for the previous versionof the instrument, the Stratus OCT may still have advantages compared withits predecessor, including increased sampling, a reduced need for pupillarydilation, and easier image acquisition.
For the HRT II, the ROC curve areas for parameters reported as continuousvariables were also similar to those reported for previous versions of theinstrument.19,24 The incorporationof the Moorfields regression analysis has been demonstrated to improve thediagnostic accuracy of this instrument.11,12,30 Inour study, ROC curve areas for parameters obtained from the HRT II Moorfieldsregression analysis were not provided because the small number of categoriesin these parameters may cause an underestimation of the ROC curve area.45 At high specificity (≥95%), the sensitivity ofthe Moorfields regression analysis overall classification (59%) was not statisticallysignificantly different from the best parameters of the Stratus OCT and GDxVCC. Furthermore, the application of sophisticated methods of analysis ofHRT data has recently been reported to improve the diagnostic ability of thisinstrument.46- 48 Additionalstudies are necessary to compare these methods with measures provided by theStratus OCT and GDx VCC.
Although sensitivity and specificity have commonly been reported asmeasures of diagnostic accuracy in medical studies, their clinical utilityis limited.36,49,50 Theyreflect the probability that a particular test result is positive or negativegiven the presence (sensitivity) or absence (specificity) of disease. However,there is an inversion of clinical logic intrinsic to this definition; knowledgeof whether the patient had the disease would clearly obviate the need fora diagnostic test. Similarly, the AUC is important for comparing the diagnosticaccuracies of different tests but has little intrinsic clinical meaning. Thestarting points of any diagnostic process are the patient developing a constellationof symptoms and signs and the clinician integrating this information to assigna pretest probability of disease. The results of diagnostic tests are thenused to modify the pretest probability of disease, yielding a new posttestprobability. The direction and magnitude of this change from pretest to posttestare determined by the test's properties, particularly the LR. The LR representsthe magnitude of change from a physician's initial suspicion of disease (pretestprobability) to the likelihood of disease after the test result (posttestprobability).36
In our study, LRs for outside-normal-limits results in all instrumentswere generally associated with large changes from pretest to posttest probabilityof glaucoma. However, LRs for within-normal-limits results were associatedwith small changes in probability. For the overall classifications from eachinstrument, the LRs of within-normal-limits results were 0.24, 0.28, and 0.35,respectively, for the GDx VCC, Stratus OCT, and HRT II. This indicates thatnormal results for each of these tests would induce only a small change inthe pretest probability of disease; that is, they would be of limited valuein excluding the presence of disease. For borderline results, the LRs weregenerally associated with small to moderate changes in the probability ofdisease. Depending on the pretest probability of disease and the clinicalsituation in which the test is used, even small changes in probability maybe clinically relevant.
Previous studies using the HRT I have reported the results of Moorfieldsregression analysis in terms of sensitivity and specificity.13,30 Todo so, the borderline test results had to be forced into the within-normal-limitsor outside-normal-limits categories.51 Thisapproach results in valuable loss of information and may cause distortionsin interpretation of the test results when used in clinical practice. Combiningborderline results with the within-normal-limits category reduces the sensitivityof the test and the importance of a within-normal-limits result, whereas combiningborderline results with the outside-normal-limits category reduces the specificityof the test and the importance of an outside-normal-limits result. In contrast,LRs can be calculated for each diagnostic categorization, permitting the clinicianto assess the diagnostic importance of each category. In this study, we showedthat borderline results in the Moorfields regression analysis overall classificationdid not appreciably change the probability of disease but that an outside-normal-limitsor within-normal-limits result produced a larger change in its probability.
The dichotomization of test results with continuous outcomes may alsoresult in loss of information because results that are markedly abnormal arelumped with results that are only mildly abnormal, leading to distortion intheir clinical interpretation.36,52,53 Thesedistortions are especially exaggerated when the patient's test result is closeto the established cutoff.52 Interval LRs,however, assign a specific value to each level of abnormality, and this valuecan be used to calculate the posttest probability of disease for a given levelof the test. Interval LRs calculated for parameters from each instrument arelikely to provide more clinically relevant information than what is currentlyavailable in the printout. For instance, according to the manufacturer's suggestedcutoffs for the GDx VCC NFI, values from 0 to 30 should be considered normal.This range of values was associated with an LR of 0.38, inducing only a smallchange toward reduction in the probability of having glaucomatous visual fieldloss. Using the manufacturer's suggested cutoffs, a test result with an NFIof 10 would be considered similar to one with an NFI of 27. Our results demonstratethe great difference between these 2 situations. Based on the interval LRscalculated in our study, an NFI of 10 would almost exclude the presence ofglaucomatous visual field loss, whereas a result of 27 would have a nearlyinsignificant effect on changing the pretest probability of disease. A similaranalogy could be demonstrated for the Stratus OCT and HRT II results.
The usefulness of a diagnostic test is strongly influenced by the proportionof patients suspected of having the target disorder whose test results havevery high (>10) or very low (<0.1) LRs, thus having a huge effect on theprobability of disease. As indicated in Table 3, this proportion was 41% for the GDx VCC NFI, 51% for StratusOCT inferior thickness, and 45% for the HRT II Bathija function. For the HRTII Moorfields regression analysis, this proportion ranged from 1% to 33% dependingon the specific parameter chosen. Selection of other cutoffs or parametersmay produce different results, and studies with larger sample sizes are necessaryto provide more robust estimations of LRs using smaller intervals of the rangeof possible test values. In our study, test results for some parameters hadLRs of infinity, which indicates that a particular test result was not foundin any of the healthy subjects (ie, the probability of the test result inhealthy subjects was 0). When evaluating the clinical importance of such parameters,it is critical to evaluate the probability of the same test result in subjectswith disease.
Agreement among the different instruments in our study varied from moderateto substantial. The chance-corrected agreement between the Stratus OCT andHRT II was 0.55, similar to the 0.58 index reported by Greaney et al43 when comparing earlier versions of these instruments.For the GDx VCC compared with the Stratus OCT and the GDx VCC compared withthe HRT II, the agreements reported in our study were higher than those usingearlier versions of these instruments.19,43 Thesefindings are similar to those in recent reports showing an improvement inthe correlation coefficients for associations between scanning laser polarimetryparameters and other measures of glaucomatous damage, such as RNFL semiquantitativephotographic scores54 or OCT RNFL thicknessmeasurements,55 when variable corneal compensationrather than fixed corneal compensation was used. Interestingly, the agreementof the GDx VCC with the Stratus OCT was higher than that between the GDx VCCand HRT II and between the Stratus OCT and HRT II, which most likely reflectsthe fact that both the GDx VCC and Stratus OCT measure RNFL properties, whereasthe HRT II measures optic disc topography and provides only an indirect measureof the RNFL.
In conclusion, the AUCs and the sensitivities at high specificitieswere similar among the best parameters from each instrument. Abnormal results(as compared with each instrument's normative database) were associated withhigh LRs and large effects on posttest probabilities of having glaucomatousvisual field loss. The calculation of interval LRs may provide additionalinformation to assist the clinician in diagnosing glaucoma.
Corresponding author: Robert N. Weinreb, MD, Hamilton Glaucoma Center,University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093-0946.
Submitted for publication January 26, 2004; final revision receivedMarch 1, 2004; accepted March 4, 2004.
This study was supported in part by the Foundation for EyeResearch, Rancho Santa Fe, Calif (Dr Medeiros), and by grant EY11008 fromthe National Institutes of Health, Bethesda, Md (Dr Zangwill).