Distribution of stable, progressing, and improving visual field series according to the Advanced Glaucoma Intervention Study (AGIS) and different pointwise linear regression criteria. See Table 1 for explanation of the pointwise linear regression criteria. GHT indicates Glaucoma Hemifield Test.
Nouri-Mahdavi K, Caprioli J, Coleman AL, Hoffman D, Gaasterland D. Pointwise Linear Regression for Evaluation of Visual Field Outcomes and Comparison With the Advanced Glaucoma Intervention Study Methods. Arch Ophthalmol. 2005;123(2):193-199. doi:10.1001/archopht.123.2.193
To investigate pointwise linear regression (PLR) for longitudinal evaluation of visual fields and to compare results with those of the Advanced Glaucoma Intervention Study (AGIS) criteria.
We selected 509 eyes (401 patients) from the AGIS with 3 or more years of follow-up, 7 or more visual field examinations, and an AGIS reference score of 16 or lower. Visual field change at test locations was defined as a change of threshold sensitivity of 1 dB/y or higher and P≤.01. Several sets of criteria were investigated for defining change of visual field series with PLR.
Main Outcome Measures
Progression or improvement of visual field series with PLR and AGIS criteria.
Mean (SD) follow-up time and baseline AGIS score were 7.4 (1.7) years and 7.7 (4.4), respectively. Pairwise agreement between AGIS and various PLR criteria ranged from 52% to 64% with the κ statistic varying between 0.22 (95% confidence interval, 0.15-0.29) and 0.30 (95% confidence interval, 0.22-0.38). One hundred thirty-eight (27%) and 151 (30%) eyes progressed (85 eyes or 17% detected by both methods) while 72 (14%) and 11 (2%) eyes improved (5 eyes or 1% detected by both methods) based on AGIS and the most rigorous PLR criteria, respectively.
Based on rigorous, clinically relevant criteria, PLR detects progression in a similar proportion of eyes compared with AGIS criteria. Pointwise linear regression may be superior to AGIS methods since it identifies fewer visual field series as improving.
Despite recent advances in imaging of the optic disc and nerve fiber layer, evaluation of the achromatic visual fields remains essential to follow up patients with established glaucoma over time. A major confounding factor in detecting progression of visual fields is the large magnitude of long-term fluctuation, or intertest variability. The magnitude of long-term fluctuation can be dramatic at times and exceed 15 dB at moderately to severely damaged test locations.1 Statistical and graphical methods have been used to overcome the sometimes low signal-to-noise ratio in longitudinal visual field series. However, there is still no “gold standard” for detection of visual field progression, and no technique has been demonstrated to be superior to others.2,3 The Advanced Glaucoma Intervention Study (AGIS) is a prospective, multicenter, randomized trial designed to evaluate outcomes of different management protocols in patients with glaucoma no longer controlled by maximally tolerated medications. Visual field progression in the AGIS was assessed with a defect classification system specifically developed for that study.4 To date, analyses of visual field outcomes in the AGIS have exclusively used this system. The AGIS criteria for detection of visual field progression have been demonstrated to be conservative compared with other approaches such as Glaucoma Change Probability Analysis (GCPA) and a similar defect classification system designed for the Collaborative Initial Glaucoma Treatment Study (CIGTS).3
In a study by Katz et al,3 the rate of progression was 11%, 22%, and 23% with AGIS, CIGTS, and GCPA criteria, respectively. Five eyes (8%) were identified by all 3 statistical methods as progressing. This low concordance among different techniques has also been reported by other investigators. Lee et al5 compared 6 different techniques for detection of change in a longitudinal series of visual fields. Six (13%) of 48 eyes had repeatable change based on any 1 of the 6 algorithms. All 6 eyes showed change on 2 or more algorithms, whereas only 2 eyes demonstrated change on all 6 algorithms. Vesti et al6 recently compared AGIS, CIGTS, GCPA, and pointwise linear regression (PLR) algorithms in a computer-simulated database using different levels of variability. They compared agreement among CIGTS, GCPA, and PLR methods. Under the no variability, moderate variability, and high-variability conditions, progression was detected by all 3 methods in 27 (35.5%), 23 (30.3%), and 17 (22.4%) patients, respectively. However, the overall agreement between the methods was between 50% and 63% depending on the variability condition. The primary objective of this report is to explore an independent approach, PLR, for the longitudinal evaluation of the AGIS visual field series and to compare the results with those of the AGIS criteria.
The AGIS design and methods, described in detail elsewhere,7,8 are summarized here. Patients aged 35 to 80 years with phakic eyes and open-angle glaucoma no longer controlled by maximally tolerated medical treatment were recruited. Eligible eyes had a best-corrected visual acuity score of at least 56 letters (Early Treatment Diabetic Retinopathy Study) and met specified criteria for combinations of consistently elevated intraocular pressure, glaucomatous visual field defect, and/or optic disc rim deterioration.7 Between 1988 and 1992, investigators at 12 participating AGIS clinical centers enrolled 789 eyes of 591 patients. Eyes were randomly assigned to be managed with one of 2 surgical intervention sequences: argon laser trabeculoplasty-trabeculectomy-trabeculectomy or trabeculectomy-argon laser trabeculoplasty-trabeculectomy. Follow-up study visits were scheduled 3 and 6 months after enrollment and every 6 months thereafter. The institutional review boards at each of the participating centers approved the AGIS protocol, and all patients provided informed consent. Data in this report are based on a database closure of March 31, 2001.
Visual field tests were conducted with a Humphrey Visual Field Analyzer I (Carl Zeiss Meditec, Dublin, Calif) with the central 24-2 threshold test, size III white stimulus, and full threshold strategy, with the foveal threshold test turned on. Visual field defect scores ranged from 0 (no defect) to 20 (advanced loss).4 Study measurements were made at baseline, 3 months after initial intervention, and at each 6-month follow-up examination. To lessen the effect of regression to and from the mean caused by the restricted lower and upper ranges on the eligibility values,9 baseline or “reference” measurements were performed after the eligibility measurements but before the first surgical intervention.
From the original pool of the recruited patients, 509 eyes of 401 patients meeting the following criteria were selected for this study:
At least 3 years of follow-up;
A minimum of 7 visual field examinations;
A reference visual field score of 16 or less; and
A reliability score of 2 or better.
The SPSS statistical software (version 11.5; SPSS Inc, Chicago, Ill) was used to perform PLR analysis. Thresholds at each of the 55 test locations for each visit were regressed as the dependent variable vs time after initial intervention as the independent variable. Data from the baseline visit were included in the regression equations. We used the 2-omitting regression algorithm, recently described by Gardiner and Crabb,10 for definition of change vs stability at each point. In summary, in this technique, a test location is considered to be progressing or improving during the follow-up period only if the regression slope is statistically and clinically significant (as defined later) in both of the following regression analyses: (1) after omitting the last threshold in a series and (2) after censoring the threshold before last for the same series.
This approach has been shown, in simulation experiments, to be more specific than using all the data points at once while it maintains a sensitivity comparable with other more stringent algorithms used for the same purpose such as 2 of 2,11 2 of 3,12 or 3 of 413,14 algorithms. Regression slopes were considered clinically and statistically significant if they measured 1 dB/y or more (improvement) or −1 dB/y or less (worsening) in the presence of P≤.01.
For evaluation of visual field series, several sets of criteria were explored for definition of change vs stability (Table 1). In cases where criteria for both progression and improvement were fulfilled, the field series was assigned to a category as follows. For the 1 point–change category, the visual field series was determined to be progressing or worsening based on the predominant direction of changing test locations. In cases with an equal number of worsening and improving test locations, the direction of change for that visual field series was assumed to be “indeterminate.” For other change criteria, the status of visual field series was considered “indeterminate” if the criteria for improvement and deterioration were both met.
Visual field change according to AGIS criteria was defined as the first occurrence in an eye, at 3 consecutive 6-month visits, of a change in visual field defect score of 4 or more from the baseline value. Similar AGIS criteria were used for definition of visual field progression and improvement. Changes in AGIS visual field defect score were measured from the preintervention reference values.
The pairwise concordance among different PLR approaches and the AGIS methods was assessed with percentage agreement and κ statistics. The pairwise agreement according to the κ statistic is interpreted as follows15: slight, 0.20 or less; fair, 0.21 to 0.40; moderate, 0.41 to 0.60; substantial, 0.61 to 0.80; and excellent, more than 0.80. In eyes with at least 1 significant slope (slope value ≤−1 or ≥1 dB/y and P≤.01), the mean slope was calculated by averaging the significant slopes. We used the Kruskal-Wallis test to simultaneously compare the average slopes in the stable, improving, and progressing groups according to PLR criteria. The Mann-Whitney U test was then used for post hoc pairwise comparison.
The characteristics of our study sample are shown in Table 2. More than half of this subgroup of the AGIS patients were African American. Our study sample was equally split between the 2 intervention sequences (50.5% and 49.5% in argon laser trabeculoplasty-trabeculectomy-trabeculectomy and trabeculectomy-argon laser trabeculoplasty-trabeculectomy sequences, respectively). The mean (SD) follow-up time was 7.4 (1.7) years, during which a mean (SD) of 15.4 (3.9) visual field examinations were performed. The AGIS criterion for change of visual field was the most conservative for diagnosis of progression but detected improvement in a higher percentage of the eyes compared with PLR criteria (Table 3) (Figure). The most conservative PLR approach was the 2-point Glaucoma Hemifield Test (GHT) change criterion. It proved to be the most rigorous approach for detecting both worsening and improvement of visual field. One hundred thirty-eight (27%) and 151 (30%) eyes progressed while 72 (14%) and 11 (2%) eyes improved based on AGIS and the most rigorous PLR criterion, the 2-point GHT change criterion, respectively. Eighty-five eyes (17%) were detected as progressing by both methods, and 5 eyes (1%) improved by both criteria.
The pairwise percentage agreement and concordance of the explored methods are described in Table 4 and Table 5. As the PLR criteria became more stringent, better agreement with AGIS criteria occurred. However, the agreement for even the strictest PLR criterion (2-point GHT change criterion) remained fair at best (κ = 0.30; percentage agreement, 64%). Agreement among different PLR approaches was moderate to excellent. Two-point criteria had excellent pairwise agreement (κ≥0.79 for all pairwise agreements). We performed all these analyses again after inclusion of indeterminate cases in the stable group. Repeated analysis did not change the results.
The distribution of stable, improving, and progressing test locations across the study sample is presented in Table 6. There were both worsening and improving test locations in the same visual field series in 19 eyes (3.7%). Based on the criteria described in the “Methods” section, between 4 and 7 visual field series were deemed indeterminate in each of the PLR categories and were excluded from further analysis.
The average slope of threshold sensitivities over time was compared among the improving, stable, and worsening eyes. Three hundred three eyes had an average slope different from zero as defined by the criteria described in the “Methods” section. The average negative slope/eye varied between –0.37 and –6.91 dB/y, while the range of the average positive slope/eye was 0.05 to 4.53 dB/y. The mean (SD) slope was 1.28 (0.91), –0.66 (1.87), and –2.07 (0.86) dB/y in the improving, stable, and progressing groups, respectively. The 3 groups were significantly different (Kruskal-Wallis test with post hoc Mann-Whitney U tests; P<.001 for all comparisons).
The AGIS is one of the early prospective, randomized, multicenter studies sponsored by the National Eye Institute to study surgical treatment of glaucoma. Long-term follow-up of AGIS patients has created an invaluable database of a large number of patients with open-angle glaucoma with prospectively gathered clinical information available during a long follow-up period. Our inclusion criteria ensured that only patients with adequate follow-up and number of visual fields were considered for this investigation. Comparison of this subgroup of patients with the original study group shows a similar distribution for age, sex, race, and intervention sequence.7 The recruited patients underwent an average of 15 visual field examinations over a mean follow-up time of more than 7 years.
A specifically designed defect classification system (AGIS scoring system) was used as the only outcome measure for detecting visual field progression in the AGIS. The disadvantages of such an approach include the following:
The AGIS criteria for detection of visual field progression may be more conservative than other techniques such as GCPA (used in the Early Manifest Glaucoma Trial3) and the CIGTS criteria.16 This may have caused correlations of factors associated with visual field progression to be missed.
Use of a single baseline field as a reference for detecting change over time. Use of a second test after the eligibility examination as the “reference” alleviates problems related to regression to the mean but probably does not eliminate it.
On the other hand, one of the advantages of the AGIS scoring system is that it shows less variability compared with other scoring systems, such as that of the CIGTS.16 However, Vesti et al6 recently reported that, in a simulation experiment, the CIGTS criteria demonstrated comparable specificity while detecting more progressive eyes. This suggests that AGIS criteria may be more conservative than CIGTS criteria.
Pointwise linear regression analysis is a frequently used research technique for longitudinal evaluation of visual field series. It is the only independent method for longitudinal evaluation of visual fields for which commercial software (Progressor; OBF Laboratories UK Ltd, Wiltshire, England) is available. Pointwise linear regression has been shown to be the best of all curve-fitting techniques for predicting threshold sensitivities17 and equally sensitive for detection of both gradual and sudden changes.18 The minimum number of visual field examinations required for PLR to perform adequately has been cited as 519 or 7 to 8 or even higher.20,21 In a recent article, Gardiner and Crabb10 showed that the optimal frequency for visual field testing is 3 per year. However, the advantage of more frequent testing disappears after 3 years of follow-up, as does the rate of false-positive progression.
The large number of patients, the long follow-up, and the fairly uniform interval between consecutive visual field examination (an average of 2 visual field examinations per year) make the AGIS database fit for linear regression models. Different studies of PLR have used disparate criteria for definition of progression. Some previous studies have considered change of a single test location adequate for defining change of the entire visual field.13,14,22,23 Our results suggest that using change at a single test location as the criterion for change of visual field series causes a high rate of false-positive results. Heijl et al24 have previously raised the same objection regarding definition of visual-field worsening based on a single location change.
In this study, progressively rigorous criteria led to fewer eyes showing either worsening or improvement on PLR (295 eyes for the single-point change criterion vs 162 eyes for the 2-point GHT change criterion) (Table 3). This is consistent with the results of other reports requiring a higher number of changing test locations to define progression or improvement of visual field series.25- 27 We used 3 consecutive steps to make PLR more specific: (1) a 2-omitting strategy, (2) strict criteria for definition of change (a minimum change in threshold sensitivity of 1 dB/y along with P≤.01), and (3) exploration of different criteria for definition of visual field progression with PLR.
Some investigators have suggested spatial or temporal processing of visual field data before PLR to increase sensitivity or specificity of PLR models.28- 30 Others have required confirmation of PLR findings in different orders to limit the sensitivity of the PLR to sudden 1-time changes of threshold sensitivity at later points during follow-up including 2 of 2,11 2 of 3,12 and 3 of 413,14 algorithms. It has been recently demonstrated in simulation experiments that use of a 2-omitting strategy makes PLR more specific compared with the algorithms mentioned earlier while maintaining nearly the same sensitivity.10
One of the more complicated issues regarding the use of PLR has been the level of the P value and the need for or lack of a numerical cutoff point for definition of change. Different approaches have used a P value ≤.05 or ≤.01 or a Bonferroni correction (based on the number of pointwise regression analyses performed on the visual field series).13,14,22,23,25- 27,31- 33 We did not use a Bonferroni correction in this study. It has been suggested that use of a Bonferroni correction is related to the universal null hypothesis and increases the risk of a type II error.34 Additionally, the spatial correlation of the test locations across the visual field may actually increase the risk of the type II error since a Bonferroni correction assumes the multiple comparisons are not correlated. Instead, by using a P value of ≤.01, we tried to limit the risk of type I error. It should be emphasized that the cutoff P value is mostly a function of whether the emphasis is on making the analysis more sensitive or more specific.35
Pointwise linear regression can be used to measure the rate of change over time. Requiring a change of at least 1 dB/y increases the likelihood that the observed statistically significant changes are also clinically significant. The cutoff value of 1 dB/y appears to be about 10 times the expected rate of age-related worsening of threshold sensitivity at central test locations and is a conservative estimate.36 Unfortunately, there are no longitudinal data for the true value of the visual field decay in healthy people, and this assumption is based on cross-sectional data.36,37 Rates of progression for test locations demonstrating significant change in previous studies have varied from –0.70 to −1.39 dB/y based on various criteria.25,27,38,39 Therefore, the cutoff point of ≥1 dB/y seems reasonable. The fact that the overall mean slope for the worsening group of eyes was quite high (–2.1 dB/y) at least partially reflects the clinically significant cutoff point used in this study.
Lastly, we progressively probed more rigorous criteria for PLR to further increase the specificity of our analyses. As observed in Table 4, this resulted in increasing agreement between PLR and AGIS methods. The 2-point GHT change criterion showed the highest agreement with the AGIS criteria (64% agreement; κ = 0.30). It appears that if deteriorating test locations belong to the same GHT cluster, the change is more likely to be real rather than caused by chance alone.
An interesting finding of this study was the high rate of improvement shown with the AGIS criteria. For definition of improvement, we used criteria similar to those used for definition of worsening. This led to a rate of improvement (14%) higher than the most lenient PLR criteria (11% for 1-point change criterion). The high rate of improvement detected by the AGIS criteria has been previously reported.7,16 In one of the first reports by the AGIS investigators,4 11% of the patients improved their score by 4 or more whereas only 5% demonstrated a worsening of the same magnitude over a short period. We suspect that this is due to regression to the mean, with an additional element of learning effect in these patients who were already familiar with perimetry. Another possibility is that since AGIS criteria were not designed for detecting improvement of visual field series, they do not have a high specificity in this regard. Cataract extraction during follow-up may also have caused an inordinately high number of visual field improvements. This may have actually contributed to the fairly low rate of visual field progression with AGIS criteria in the previous AGIS studies.
Pointwise linear regression represents only 1 of many possible approaches for detection of visual field progression. Like other approaches, it suffers from shortcomings due to the nature of longitudinal visual field data such as spatial and temporal dependence of test locations across the visual field. Sophisticated multivariate regression approaches have been explored to remedy some of the problems associated with the use of PLR26,40; however, none has found widespread acceptance. Because of the lack of a “gold standard,” neither the AGIS scoring system, the PLR approach, or any other technique can be considered best. Given the lack of external verification, various approaches may reveal different associations for progression of visual field series. If the same predictive variables are confirmed with different techniques for detection of visual field progression, it is more likely that the association is real. A shortcoming of the present study is that the effect of cataract progression and extraction could not be taken into account in a clinically meaningful way. However, since we were interested in comparing the AGIS and PLR approaches in the same cohort of patients, we do not think that this has significantly biased the results of the current study.
In conclusion, we report the performance of different PLR criteria for the detection of change in visual field series in eyes enrolled in the AGIS. As the criteria for change become more rigorous, the rate of visual field progression diminishes and reaches a low of about 30% for the most rigorous criterion (2-point GHT change). On the other hand, the PLR approaches tend to detect improvement in a lower percentage of eyes and therefore may be more appropriate for longitudinal evaluation of visual fields compared with the AGIS criteria. The concordance between the AGIS methods and different PLR criteria is only fair, and the 2 approaches agree only in two thirds of the visual field series.
Correspondence: Joseph Caprioli, MD, Glaucoma Division, Jules Stein Eye Institute, University of California Los Angeles, 100 Stein Plaza, Los Angeles, CA 90095-7004 (firstname.lastname@example.org).
Submitted for Publication: August 20, 2003; final revision received August 12, 2004; accepted September 28, 2004.
Financial Disclosure: None.
Funding/Support: This research was supported by grant R01 Ey12738 from the National Institutes of Health, Bethesda, Md, and an unrestricted grant from Research to Prevent Blindness, New York, NY.
Additional Information: Dr Nouri-Mahdavi and Mr Hoffman had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.