The SEs are shown on log axes for better visualization. Cases shown in eFigures 1-3 in the Supplement are represented by red symbols.
Agreement of P values between FDT2 and SAP is shown in the bottom panels. Squares within dotted lines indicate good agreement. The light shaded region represents P > .01. The dark shaded region represents P > .05.
The yellow region shows the number of locations with baseline sensitivity in the corresponding strata. The blue region shows the number of locations with statistically significant deterioration (P < .05). The green region shows the percentage of locations in each stratum with statistically significant deterioration.
eFigure 1. An example of PoPLR analysis performed on FDT2 (top) and SAP (bottom).
eFigure 2. Glaucoma patient (B) with rapid deterioration in the superior visual field that yields statistically significant overall deterioration with both FDT2 and SAP.
eFigure 3. Glaucoma patient (C) with dense visual field loss in the superior visual field.
Redmond T, O’Leary N, Hutchison DM, Nicolela MT, Artes PH, Chauhan BC. Visual Field Progression With Frequency-Doubling Matrix Perimetry and Standard Automated Perimetry in Patients With Glaucoma and in Healthy Controls. JAMA Ophthalmol. 2013;131(12):1565-1572. doi:10.1001/jamaophthalmol.2013.4382
Copyright 2013 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
A new analysis method called permutation of pointwise linear regression measures the significance of deterioration over time at each visual field location, combines the significance values into an overall statistic, and then determines the likelihood of change in the visual field. Because the outcome is a single P value, individualized to that specific visual field and independent of the scale of the original measurement, the method is well suited for comparing techniques with different stimuli and scales.
To test the hypothesis that frequency-doubling matrix perimetry (FDT2) is more sensitive than standard automated perimetry (SAP) in identifying visual field progression in glaucoma.
Design, Setting, and Participants
Patients with open-angle glaucoma and healthy controls were examined by FDT2 and SAP, both with the 24-2 test pattern, on the same day at 6-month intervals in a longitudinal prospective study conducted in a hospital-based setting. Only participants with at least 5 examinations were included.
Data were analyzed with permutation of pointwise linear regression.
Main Outcome and Measure
Permutation of pointwise linear regression is individualized to each participant, in contrast to current analyses in which the statistical significance is inferred from population-based approaches. Analyses were performed with both total deviation and pattern deviation.
Sixty-four patients and 36 controls were included in the study. The median age, SAP mean deviation, and follow-up period were 65 years, −2.6 dB, and 5.4 years, respectively, in patients and 62 years, +0.4 dB, and 5.2 years, respectively, in controls. Using total deviation analyses, statistically significant deterioration was identified in 17% of patients with FDT2, in 34% of patients with SAP, and in 14% of patients with both techniques; in controls these percentages were 8% with FDT2, 31% with SAP, and 8% with both. Using pattern deviation analyses, statistically significant deterioration was identified in 16% of patients with FDT2, in 17% of patients with SAP, and in 3% of patients with both techniques; in controls these values were 3% with FDT2 and none with SAP.
Conclusions and Relevance
No evidence was found that FDT2 is more sensitive than SAP in identifying visual field deterioration. In about one-third of healthy controls, age-related deterioration with SAP reached statistical significance.
Standard automated perimetry (SAP) is performed clinically to monitor visual field deterioration in glaucoma. Despite its wide use, the value of the technique may be limited by the greater variability of thresholds with increasing visual field damage.1- 4 Furthermore, a considerable length of time (often years) of frequent examinations may be required to confidently identify deterioration.5- 8
Frequency-doubling perimetry (FDT1) and its successor, frequency-doubling matrix perimetry (FDT2), were devised to offer earlier detection of glaucomatous visual field loss than SAP.9,10 The stimulus was thought to selectively stimulate magnocellular retinal ganglion cells (RGCs)10 that were believed to be lost preferentially in early glaucoma.11 Unlike SAP,12 the test-retest variability with FDT12 and FDT24,13 does not increase with decreasing sensitivity. In 2009, our group reported a higher signal-noise ratio for FDT2 compared with SAP in cross-sectional data and hypothesized that FDT2 might also be superior to SAP in detecting glaucomatous visual field deterioration over time.14
Several studies13,15- 17 have shown that FDT2 performs similarly to SAP in identifying visual field damage in glaucoma, although other investigators have not concurred.18 However, the performance of the technique in identifying deterioration over time is unknown. To date, published longitudinal data are limited to FDT1, which uses a coarser test pattern and a different thresholding algorithm compared with FDT2. With event analysis, Bayer and Erb19 reported that FDT1 identified deterioration over time in more patients than SAP. Similarly, with glaucoma change probability analysis, Haymes and colleagues20 reported that deterioration was identified in more patients using FDT1 compared with SAP; however, linear regression of global and sectoral data suggested the opposite.
In this study, we compared the proportions of patients with glaucoma and healthy controls, followed up prospectively by FDT2 and SAP, who had statistically significant overall visual field deterioration. The statistical significance is derived with a new analysis method called permutation of pointwise linear regression (PoPLR).21 This method measures the significance of deterioration over time at each visual field location, combining significance values into a single overall statistic and determining the likelihood of that statistic existing because of chance alone. Because the outcome of PoPLR is a single P value independent of the scale of original measurement, the method is well suited for comparison of techniques with different stimuli and measurement scales.
The study adhered to the tenets of the Declaration of Helsinki. Ethics approval was obtained from the Capital Health Research Ethics Board, and all participants provided written informed consent before examinations. The participants in this study were drawn from an ongoing prospective longitudinal investigation on functional and structural changes in open-angle glaucoma and in normal aging,14,22 to which FDT2 testing was subsequently added. Common inclusion criteria for the longitudinal investigation (before the addition of FDT2) were a best-corrected visual acuity of +0.3 logarithm of the minimum angle of resolution (20/40) or better, a refractive error within 5-diopter (D) equivalent sphere and 3-D astigmatism, and at least 5 pairs of FDT2 and SAP examinations, with each pair conducted on the same day. Patients were included if they had a clinical diagnosis of open-angle glaucoma, a SAP mean deviation (MD) between −2 and −10 dB, optic disc damage consistent with the clinical diagnosis, and no other ocular disease. If both eyes were eligible, one eye was randomly selected as the study eye. Controls had normal eye examination findings and an intraocular pressure of less than 21 mm Hg. All participants were experienced with both tests at baseline. Patients were recruited from the clinics of the Queen Elizabeth II Health Sciences Centre, and controls were recruited from church groups or a local telephone company or were patients’ relatives.
All participants were followed up with FDT2 (Humphrey Matrix; Carl Zeiss Meditec) and SAP (Humphrey Field Analyzer 750; Carl Zeiss Meditec) in examinations on the same day every 6 months. Patients had one pair of additional examinations with each technique in the initial follow-up part of the study that was also included in the analyses.
The FDT2 technique uses sinusoidal grating stimuli (0.50 cycles per degree), each within a square window (5° × 5°) undergoing 18-Hz counterphase flicker. The zippy estimation of sequential thresholds algorithm is used to measure sensitivity.23 This algorithm is based on maximum likelihood estimation, in which a probability density function is multiplied by yes or no likelihood functions, depending on the response, to generate a new probability density function that determines the next stimulus intensity to be presented. Normally, the test terminates after 4 presentations at each location. The SAP technique was performed with the Swedish interactive thresholding algorithm standard thresholding strategy24 and a Goldmann III stimulus (0.43° diameter). While the zippy estimation of sequential thresholds procedure of FDT2 yields sensitivity values ranging from 0 to 38 dB, there are only 15 discrete, irregularly distributed levels. The Swedish interactive thresholding algorithm standard strategy of SAP yields over 40 more uniformly distributed levels. The 24-2 test pattern was used with both techniques, each testing the same number of locations. The analyses were performed with data from 52 test locations from each FDT2 and SAP visual field after exclusion of the foveal and 2 blind spot locations.
The statistical significance of differences between characteristics of patients and controls was assessed using the Mann-Whitney test. The statistical significance of visual field deterioration was determined for each participant using PoPLR. This technique, along with a formal validation, is described in detail elsewhere.21 Briefly, the objective of PoPLR is to derive a single statistic to determine whether statistically significant pointwise deterioration has occurred in the visual field. For each participant, pointwise ordinary least squares linear regression was performed, resulting in a P value for deterioration at each location. Thereafter, the P < .05 values were combined25,26 to provide a single observed statistic (Sobs). The sequence of visual field tests was then randomly reordered (or permuted), and a single statistic was derived for each permuted test sequence (Sp). The number of permutated sequences available depends on the number of actual examinations. For participants with 5 and 6 examinations, there are 120 (5 × 4 × 3 × 2 × 1) and 720 possible permuted sequences, respectively, allowing empirical null distributions of Sp to be computed with adequate precision. In participants with 7 examinations, there are 5040 possible sequences. For practical computation time, 5000 randomly selected sequences from the total available were used in participants with 7 or more examinations, while all permuted sequences were used for participants with 5 or 6 examinations. Thereafter, each participant’s Sobs was compared with the distribution of Sp derived only from his or her own data. The statistical significance (overall P value) of Sobs was determined by its position in the distribution of Sp. To permit high specificity, statistically significant visual field deterioration was defined as overall P < .01. The analyses were performed for both total deviation (TD) and pattern deviation (PD) data.
The number of patients and controls with statistically significant deterioration over time was compared between FDT2 and SAP. The statistical significance of differences in the proportion of participants identified as having deterioration with FDT2 and SAP was assessed with the McNemar test of paired proportions. For descriptive purposes, Cohen κ was used to assess the agreement between techniques in identifying deterioration.
To investigate the relationship between deterioration and baseline visual field sensitivity, the proportion of visual field locations with statistically significant deterioration over time, determined by pointwise linear regression (P < .05), was assessed across 4 strata defined by baseline sensitivity. Initially, a frequency distribution of all baseline sensitivity values for all participants (5200 values in total) was derived. To account for the unequally spaced FDT2 threshold levels, test locations were grouped according to the 26th, 54th, and 87th percentiles of baseline sensitivity to obtain approximately equal-sized strata representing low, mid, high, and very high baseline sensitivity. Within each stratum, the number and proportion of locations with statistically significant deterioration over time were calculated. The same percentile cutoffs were applied to SAP.
Sixty-four patients and 36 controls qualified for the study. The demographic data, as well as the baseline and follow-up summary visual field data, are given in Table 1. The rates of MD change with FDT2 and SAP and their SEs are shown in Figure 1. Patients had a steeper negative MD rate using both FDT2 and SAP compared with controls (Table 2).
The agreement between FDT2 and SAP in identifying visual field deterioration is shown for patients in Figure 2 and for controls in Figure 3. With TD, FDT2 identified deterioration in fewer patients and controls than SAP (P = .01 for both). All controls identified by FDT2 as having deterioration with TD were also identified by SAP with TD. The proportions of patients having deterioration with PD were similar between both techniques (P > .99) (Figure 2); however, only 2 patients showed deterioration with both. Deterioration with TD was identified in 1 control with FDT2 and in none with SAP. Agreement between FDT2 and SAP was moderate with TD for both patients (κ = 0.44) and controls (κ = 0.34) but was low with PD for both patients (κ = 0.03) and controls (κ = 0.00). The bottom panels in Figures 2 and 3 show the distribution of overall P values analyzed by PoPLR for FDT2 and SAP (TD and PD).
The distribution of pointwise baseline sensitivity values in all participants for both FDT2 and SAP (5200 for each) is shown in Figure 4. Because of only 15 possible discrete sensitivity values with FDT2, the corresponding 4 strata for FDT2 and SAP contain approximately (but not exactly) the same number of locations. Significant deterioration (P < .05) occurred at all levels of damage for FDT2 and for SAP. With both techniques, the mid-sensitivity and high-sensitivity strata spanned a narrow sensitivity range. Between corresponding strata, the number of locations with deterioration over time was always higher with SAP than with FDT2.
Three case examples are shown in eFigure 1 in the Supplement, eFigure 2 in the Supplement, and eFigure 3 in the Supplement (patients A, B, and C, respectively). For patient A (eFigure 1 in the Supplement), FDT2 and SAP show predominantly inferior visual field damage. Further deterioration is more apparent in the inferior field and is statistically significant with SAP but only borderline significant with FDT2. For patient B (eFigure 2 in the Supplement), FDT2 and SAP show moderate visual field loss, with rapid deterioration in the superior field. Deterioration is statistically significant with both techniques. For patient C (eFigure 3 in the Supplement), FDT2 and SAP show dense superior visual field damage, with further deterioration at only 2 locations with FDT2 and 1 location with SAP. Overall, PoPLR indicated that this deterioration was not statistically significant.
For each patient, only the TD analyses are shown. The MD rates and their SEs for patients A, B, and C are also shown in Figure 1.
This study examined visual field deterioration in glaucoma patients and healthy controls with FDT2 and SAP using a new analytical technique called PoPLR. Progression was identified in fewer glaucoma patients with FDT2 compared with SAP using both TD and PD analyses. Controls were also identified as having deterioration with both techniques but particularly by SAP with TD analysis. Agreement between techniques in the identification of deterioration was moderate with TD analysis and poor with PD analysis.
Frequency-doubling perimetry was developed in an attempt to establish a more sensitive test of early visual field loss due to glaucoma.9 It was thought that the large sinusoidal grating stimulus, with its low spatial and high temporal frequency, selectively stimulated magnocellular RGCs.10 This small subset of RGCs with their larger-diameter axons are purportedly damaged earlier in glaucoma11,27; however, this finding has not been universally confirmed.28- 30 Furthermore, with direct recordings from primate retinas, it was reported in 2011 that the SAP stimulus, conventionally thought to be nonselective to the different subsets of RGCs, showed greater preferential stimulation of magnocellular RGCs over parvocellular RGCs than the frequency-doubling stimulus.31
Direct comparisons between FDT2 and SAP in measuring glaucomatous visual field damage and its progression are problematic for several reasons. Among these are differences in the stimulus area, imperfect matching of stimulus locations, and variations between thresholding algorithms (which in turn lead to variations in the number and arrangement of possible sensitivity levels analyzed by the techniques), as well as different measurement scales. Although both techniques analyze sensitivity in decibel scales that have a similar numerical range, it cannot be assumed, for example, that deterioration of 1.0 dB/y with FDT2 is equivalent to deterioration of 1.0 dB/y with SAP. Therefore, we compared FDT2 and SAP solely on the basis of the statistical significance of deterioration derived from PoPLR rather than on the magnitude (in decibels) or the rate of deterioration (in decibels per year). While it is difficult to estimate the statistical power of PoPLR in the absence of a commonly agreed on model of visual field deterioration, our group has previously demonstrated that PoPLR provides at least equal and often superior performance in detecting evidence of change compared with other techniques.21
Current progression analyses with change probability maps32 are based on the test-retest variability estimates obtained from large samples and not from the individual participant whose visual field is being evaluated. The test-retest variability at a given location is pooled across participants and is assumed to represent the true variability in an individual participant. Hence, the assumption that the magnitude of change required for statistical significance is the same for all participants is likely invalid and leads to a large range of false-positive events when patients are examined over time.33 An advantage of PoPLR is that, by using individual cutoffs rather than population-based cutoffs for the statistical significance, a more accurate assessment of an individual’s visual field over time can be made. Furthermore, because an overall P value independent of criteria and the magnitude of change defining a deteriorating visual field is used, a meaningful comparison between techniques operating on different scales, such as FDT2 and SAP, can be made.
Using TD analyses, SAP identified visual field deterioration in 22 patients (34%), twice as many as showing deterioration with FDT2 (11 patients [17%]). While low specificity (high false-positive rate) would result in a falsely high frequency of patients identified as having deterioration with SAP, this explanation is highly unlikely because, as previously reported,21 the observed false-positive rate with PoPLR closely matches the nominal significance level. Hence, at the 1% significance level, for example, approximately 1% of participants show deterioration in the permuted visual field series. Therefore, our results indicate that, with TD analyses, SAP was twice as sensitive as FDT2 in detecting visual field deterioration in patients having glaucoma. Using PD analyses, approximately equal numbers of patients having glaucoma with FDT2 and SAP were identified as having visual field deterioration. This finding suggests that the origin of the changes observed with SAP were more diffuse or widespread.
Using TD analyses, SAP identified visual field deterioration in 11 controls (31%), a frequency similar to that in patients. These changes are likely genuine; however, the magnitude of change was smaller than that in patients (average MD change of −0.06 dB/y in controls compared with −0.16 dB/y in patients). Total deviation is calculated from the decrease in sensitivity, with age determined from cross-sectional data in a large population sample rather than from individuals followed up over time. In reality, the decline in sensitivity with aging occurs at different rates in different individuals; therefore, TD does not accurately capture these effects in individuals. Taken together, our findings indicate that SAP can detect age-related changes in healthy individuals when statistical techniques that account for the nature of an individual participant’s data are considered. Using PoPLR, FDT2 detected change in 3 controls (8%), a figure lower than that observed with SAP. Using PD analyses, the proportion of controls having statistically significant deterioration with either technique was much lower, indicating that PoPLR with PD values is more appropriate than that with TD values in the detection of focal glaucomatous visual field deterioration.
The mid-range to high-range baseline sensitivity strata for both FDT2 and SAP (containing more than one-half of the locations in patients and controls) represented a narrow range of discrete sensitivity values (3 levels [19, 22, and 26 dB] for FDT2 and 6 levels [26-31 dB] for SAP) compared with the entire dynamic range of the instruments. At this range of sensitivity, the test-retest variability of SAP is the lowest,13 providing the best performance characteristics. This narrow range likely drives the performance of FDT2 and SAP in identifying deterioration over time, at least in patients with early glaucoma. These findings are contrary to those by Boden and colleagues,34 who reported that most deterioration occurred in more damaged regions of the visual field.
In conclusion, our findings do not support the hypothesis that FDT2 is more sensitive than SAP in identifying visual field deterioration. In about one-third of healthy controls, age-related deterioration with SAP reached statistical significance.
Submitted for Publication: November 5, 2012; final revision received March 2, 2013; accepted March 18, 2013.
Corresponding Author: Balwantray C. Chauhan, PhD, Department of Ophthalmology and Visual Sciences, Dalhousie University, 1276 South Park St, 2W Victoria, Halifax, NS B3H 2Y9, Canada (email@example.com).
Published Online: October 31, 2013. doi:10.1001/jamaophthalmol.2013.4382.
Author Contributions: Dr Chauhan had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Redmond, O’Leary, Nicolela, Artes, Chauhan.
Acquisition of data: Hutchison, Nicolela, Artes, Chauhan.
Analysis and interpretation of data: Redmond, O’Leary, Artes, Chauhan.
Drafting of the manuscript: Redmond, O’Leary, Artes, Chauhan.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Redmond, O’Leary, Artes.
Obtained funding: Chauhan.
Administrative, technical, and material support: O’Leary, Hutchison, Nicolela, Artes, Chauhan.
Study supervision: Artes, Chauhan.
Conflict of Interest Disclosures: None reported.
Funding/Support: This study was supported by the Glaucoma Research Foundation (Dr Artes) and by grant MOP-11357 from the Canadian Institutes of Health Research (Dr Chauhan).
Role of the Sponsor: The funding organizations had no role in the design or conduct of the study; in the collection, management, analysis, or interpretation of the data; or in the preparation, review, or approval or the manuscript.
Correction: This article was corrected on November 15, 2013, and also on November 27, 2013, to fix errors in Figure 1.