Sample results from a single patient in whom 4 consecutive tests were performed, using the Full Threshold test. MD indicates mean deviation.
Sample results from the same patient as in Figure 1, using the SITA Standard strategy. MD indicates mean deviation.
Example of linear regression analysis of the mean deviation values from the patient shown in Figure 1 and Figure 2. Solid circles are test results obtained using the Full Threshold strategy. Open circles are results obtained using the SITA Standard test. E indicates expected value for each strategy at the midpoint in time between the last Full Threshold test and the first SITA Standard test.
Heijl A, Bengtsson B, Patella VM. Glaucoma Follow-up When Converting From Long to Short Perimetric Threshold Tests. Arch Ophthalmol. 2000;118(4):489-493. doi:10.1001/archopht.118.4.489
To study the influence of test length in automated perimetry follow-up of glaucomatous eyes and, particularly, to determine if it is possible to usefully interpret test results obtained using a testing algorithm shorter than that used for baseline testing.
Automated perimetry findings were retrospectively evaluated in 31 patients with glaucoma for whom multiple Humphrey 30-2 tests were available on the Full Threshold strategy and the SITA Standard strategy.
Variability around the mean deviation regression lines was smaller with SITA than with the Full Threshold strategy. Mean deviation values with SITA averaged about 1 dB less severe. Although localized scotomas measured in decibels were deeper on the Full Threshold strategy, number of significantly depressed points on total deviation and pattern deviation probability plot analyses did not differ significantly between the 2 strategies.
The SITA strategy showed test-retest consistency that was at least as good as that of the Full Threshold strategy. The 2 strategies produced similar results when analyzed relative to their respective normal significance limits. Generally, it is appropriate to establish a new baseline when converting from one perimetric algorithm to another. When necessary, however, results may be usefully compared if such comparisons are based on total and pattern deviation probability maps rather than on decibel values.
COMPUTERIZED threshold perimeters have served to standardize perimetric testing and to aid in the general availability and adoption of efficacious and efficient techniques. Early automated testing strategies were time-consuming, and tests sometimes lasted more than 20 minutes per eye. Such long examinations resulted in considerable patient fatigue, clinical delays, and sometimes reduced patient compliance. Early efforts at reducing testing time were based on trade-offs between data quality and time expended.1- 6 Recently, the SITA (Swedish Interactive Threshold Algorithm) strategies have been reported to reduce testing time without loss of useful diagnostic information,7,8 and these strategies have become available on Humphrey perimeters.
One of the complications associated with the adoption of shorter and more efficient testing methods has revolved around lack of understanding of how results from new strategies can be compared with clinical baselines obtained using older methods. Prolonged perimetric testing is associated with reduced threshold sensitivity, also known as visual fatigue, and thus shorter tests should be expected to generate somewhat higher sensitivity levels than longer protocols. This may complicate perimetric follow-up when changing algorithms, particularly in glaucomatous eyes, which generally exhibit even larger visual fatigue effects than those seen in healthy subjects.9- 11 Yet, for practical reasons, many patients are being converted to the newer shorter protocols, and it is important to know how the older data may be used during the conversion.
The aim of this article is to analyze test result differences encountered when converting from a longer to a shorter test in following up patients with manifest and suspect glaucoma and to see how follow-up data can best be interpreted in such situations. We chose to compare the most commonly used older algorithms, the Humphrey Full Threshold program with the SITA Standard, which is the strategy now offered by the manufacturer as a replacement for the Full Threshold strategy.
From lists of patients followed up for glaucoma or suspect glaucoma by one of us (A.H.), 43 patients were retrospectively identified who had been subjected to 4 or more consecutive SITA Standard 30-2 tests, which were preceded by at least the same number of consecutive Full Threshold 30-2 tests. Experience on at least 1 computerized threshold perimetry test was required before the beginning of this series. No patients were excluded on the basis of cataract, but all eyes were excluded that had undergone cataract surgery before or during the testing series. All test series that contained 1 or more tests with false-positive or false-negative answer frequencies greater than 25% were excluded. Also excluded were all series that had at least 1 test that showed fixation loss rates that exceeded 20% and no evidence of the physiological blind spot at any of the thresholded locations near its expected location.
After application of these exclusion criteria, the remaining material consisted of 31 eyes of 31 patients, 19 of which had 4 reliable 30-2 tests on each strategy, 8 of which had 5 tests on each strategy, and 4 of which had 6 tests on each strategy. The mean age of the 31 patients was 66 years (range, 46-78 years); 12 were men, and 19 were women. Follow-up time for these test series averaged 61.7 months (range, 44-88 months); mean time between first and last Full Threshold tests was 27.3 months and between first and last SITA tests was 25.9 months. In patients with bilateral glaucoma or suspect glaucoma, one eye was randomly selected for study; in cases of unilateral glaucoma, we selected the eye having manifest glaucoma.
Visual field tests were evaluated in terms of mean deviation from age-corrected normal threshold values (MD), mean sensitivity, pattern standard deviation (PSD), average of the point-by-point pattern deviation from the age-corrected normal decibel values, and number of significantly depressed points at the P<.005 level on total and pattern deviation probability maps12 (Figure 1 and Figure 2). Two linear regression analyses were performed on each series for each of these parameters, one for Full Threshold tests and one for SITA tests. Regression slopes and residuals were calculated.
We compared results from the last Full Threshold test with the first SITA Standard test. We also calculated an expected value for each strategy based on extrapolations of each regression line to the midpoint in time between the last Full Threshold test and the first SITA Standard test (Figure 3). Thus, the Full Threshold regression line for each parameter and test series was extended forward in time, and the SITA Standard regression line was extended backward in time to the above midpoint. Therefore, the difference in expected values was taken as a time-adjusted estimate of the algorithm-induced difference in that parameter in each series. Slope residuals were calculated and compared using paired t tests.
Average testing time was 16.1 minutes (range, 12.7-23.1 minutes) for the Full Threshold strategy and 8.4 minutes (range, 6.4-10.4 minutes) for the SITA Standard.
Fourteen eyes lacked glaucomatous field defects at the beginning of the follow-up period, as defined by a finding of within normal limits on the Glaucoma Hemifield Test.13 Two of these eyes developed borderline classifications on the Glaucoma Hemifield Test, and 10 developed definite field loss during follow-up. Two still had normal fields at the end of the study. Thus, of the 31 eyes studied, 17 had visual field loss throughout the study, 10 developed definite loss and 2 borderline loss, and 2 maintained normal fields.
During follow-up, 10 patients developed visual acuity reductions of more than 1 line secondary to cataract. In 5 of these, visual acuity decreased at an approximately constant rate during the study. In 3 patients, visual acuity decreased more during the Full Threshold series than during the SITA follow-up, and 2 showed a larger decrease during the SITA series.
Full Threshold and SITA MD values differed significantly, as did mean sensitivity and mean pattern deviation, indicating that SITA tests showed somewhat healthier results than Full Threshold. Pattern standard deviation and the number of points that were significant at the .005 level on total and pattern deviation probability maps did not differ significantly between the 2 methods (Table 1).
Slope residuals or slope root mean square errors were significantly larger for MD and marginally larger for decibels of mean sensitivity and mean pattern deviation using the Full Threshold strategy. No differences were seen in residuals for PSD or the number of points significant at the .005 level on total and pattern deviation (Table 2). SITA MD slopes were significantly more negative than those of Full Threshold (Full Threshold slope, +0.06 dB per year; SITA Standard slope, –1.12 dB per year; P<.0001), but this might simply have been due to generally faster glaucomatous progression during the latter portion of follow-up.
Our results confirmed earlier findings that shorter test algorithms are associated with somewhat higher differential light sensitivities and somewhat shallower decibel defects in total deviation and pattern deviation.14- 16 It is likely that these differences were due to reduced visual fatigue and perhaps greater patient alertness in the shorter and more quickly paced SITA tests. Such differences in average threshold value between strategies have been shown to be strongly associated with differences in test time.8 The use of probability density functions in threshold perimetry differs from the conventional staircase approach. These 2 methods could result in slight differences of estimated threshold values. Such differences can be isolated from any effects of visual fatigue in computerized simulations. Results from simulated tests indicate that differences between the SITA Standard and Full Threshold strategy are small, 0.14 dB on average.
The differences between test algorithms were small, on the other hand, when comparisons were made in terms of significance relative to the limits of normality, ie, probability maps, rather than in terms of simple numeric comparisons.
Mean deviation and mean sensitivity slope residuals were lower with SITA, despite the fact that visual field losses generally were more advanced during the SITA follow-up than during the earlier Full Threshold testing; this suggests a higher correlation among SITA follow-up threshold values than among Full Threshold results. Furthermore, there was no difference in slope residuals for number of points significant on total and pattern deviation probability plot analyses, despite the fact that the normal significance limits for SITA are narrower than those of Full Threshold. One interpretation of these findings is that the SITA Standard showed less intertest variability in patients with glaucoma than did the Full Threshold strategy.
Comparisons of PSD findings always should be made with caution, since PSD values peak for moderate field loss and are low in both normality and advanced visual field loss. We believe, however, that our findings may be viewed as indicative of the true relationship between the 2 methods, since none of the enrolled patients had or suffered from field loss sufficient to have gone beyond this peak.
In patients with glaucoma, testing induces visual fatigue, causing a general but nonuniform decrease in the height of the hill of vision. The presence of this effect complicates comparison of raw results (eg, threshold sensitivities, decibel defect depths, and decibel MDs) between strategies that have differing test times. The normal range of variation has been quantified for each test algorithm in terms of significance levels for deviations from age-corrected normal threshold values, and the shorter SITA Standard strategy has shown less intersubject variability in healthy subjects than has the older Full Threshold method.17 This reduction implies that shallow defects that might be within the range of normal variability on Full Threshold may well be clinically significant if found using the SITA Standard. This reduced variability in the SITA Standard appears to approximately offset the reduced decibel defect depths found by SITA, yielding analysis results that show approximately the same number of significant total and pattern deviation points on the old and new strategies, even in glaucomatous visual fields.
This fortunate result does not imply that new baselines need not be established when shifting between perimetric algorithms. Visual fatigue differs between eyes, and perimetry test results always will be more comparable if obtained with the same testing protocols. It also is possible that this comparability derives in part from the fact that all tests were obtained using the same type of instrument and under the same testing conditions. However, when patients are being converted to a more efficient algorithm, there inevitably will be a transitional period when medical judgments must and will be best made with the help of earlier tests. During these transitional periods, it is reassuring to know that useful comparisons at the level of the total and pattern deviation probability plots are possible.
This investigation was based on retrospective analysis of a limited number of patient records. Larger prospective investigations are desirable to confirm these preliminary findings.
Accepted for publication January 7, 2000.
Reprints: Anders Heijl, MD, PhD, Department of Ophthalmology, Malmö University Hospital, S-20502 Malmö, Sweden (e-mail: firstname.lastname@example.org).