Power to detect differences in normal and glaucoma change rates by study length and number of observations with fixed total sample size of 150 participants (75 per group, both eyes included). The variance components (ie, random effect and error variances) and effect sizes (ie, difference in change rates) are set equal to the model estimates derived from the 2-year trial with observations 3 months apart (effect size of 0.32 dB/year for visual field [VF] 24-2 mean deviation [MD] and 0.54 μm/year for global retinal nerve fiber layer [RNFL] thickness). Study lengths to achieve 80% power are given in blue.
Detectable VF 24-2 MD effect sizes for glaucoma studies of varying sample size (assuming both eyes are included), follow-up, and observation frequency.
Detectable RNFL thickness effect sizes for glaucoma studies of varying sample size (assuming both eyes are included), follow-up, and observation frequency.
eTable. Random Effect and Error Variance Estimates Derived From Mixed Models Fit to Data From the Two-Year Trial of Glaucoma Patients With Observations at Three, Six, and Twelve Months Intervals
eFigure. Plot of Estimated Change in VF 24-2 MD and RNFL Thickness From Baseline for Each Eye in This Study Against Time (Gray Lines), With Population-Averaged Slope (Black Solid Line) and Upper 95th and Lower 5th Percentiles (Black Dashed Lines)
Customize your JAMA Network experience by selecting one or more topics from the list below.
Proudfoot JA, Zangwill LM, Moghimi S, et al. Estimated Utility of the Short-term Assessment of Glaucoma Progression Model in Clinical Practice. JAMA Ophthalmol. 2021;139(8):839–846. doi:10.1001/jamaophthalmol.2021.1812
Can a study design that evaluates current clinical practice assess glaucomatous change in a shorter than usual study period in patients with glaucoma?
In this cohort study of 178 eyes from 97 patients with glaucoma, with testing every 3 months, clinically relevant rates of retinal nerve fiber layer and visual field mean deviation change were detected in 18 months. Assuming moderate differences in the rates of change and testing every 3 months, sufficient power was demonstrated for glaucoma therapy trials with 18 months follow-up.
In this study using frequent testing and the rate of change in retinal nerve fiber layer thickness or visual field mean deviation as the end point, the results suggest that clinical trials of glaucoma therapy can be completed within a relatively short time frame.
Clinical trials of glaucoma therapies focused on protecting the optic nerve have required large sample sizes and lengthy follow-up to detect clinically relevant change due to its slow rate of progression. Whether shorter trials may be possible with more frequent testing and use of rate of change as the end point warrants further investigation.
To describe the design for the Short-term Assessment of Glaucoma Progression (STAGE) model and provide guidance on sample size and power calculations for shorter clinical trials.
Design, Setting, and Participants
A cohort study of patients with mild, moderate, or advanced open-angle glaucoma recruited from the Diagnostic Innovations in Glaucoma Study at the University of California, San Diego. Enrollment began in May 2012 with follow-up for every 3 months for 2 years after baseline examination. Follow-up was concluded in September 2016. Data were analyzed from July 2019 to January 2021. Visual fields (VF) and optic coherence tomography (OCT) scans were obtained at baseline and for 2 years with visits every 3 months.
Glaucoma was defined as glaucomatous appearing optic discs classified by disc photographs in at least 1 eye and/or repeatable VF damage at baseline.
Main Outcomes and Measures
Longitudinal rates of change in retinal nerve fiber layer (RNFL) thickness and VF mean deviation (MD) are estimated in study designs of varying length and observation frequency. Power calculations as functions of study length, observation frequency, and sample size were performed.
In a total referred sample of 97 patients with mild, moderate, or advanced glaucoma (mean [SD] age, 69 [11.4] years; 50 [51.5%] were female; 19 [19.6%]), over the 2-year follow-up, the mean VF 24-2 MD slope was −0.32 dB/y (95% CI, −0.43 to −0.21 dB/y) and the mean RNFL thickness slope was −0.54 μm/y (95% CI, −0.75 to −0.32 μm/y). Sufficient power (80%) to detect similar group differences in the rate of change in both outcomes was attained with total follow-up between 18 months and 2 years and fewer than 300 total participants.
Conclusions and Relevance
In this cohort study, results from the STAGE model with reduction of the rate of progression as the end point, frequent testing, and a moderate effect size, suggest that clinical trials to test efficacy of glaucoma therapy can be completed within 18 months of follow-up and with fewer than 300 participants.
Glaucoma is characterized by progressive changes in the retinal nerve fiber layer (RNFL), ganglion cell layer, optic nerve head, and visual field (VF).1,2 Methods have been proposed for serial monitoring of disease progression and measuring these changes, but with no consensus on what approach is best. The US Food and Drug Administration allows visual function testing as a primary progression outcome for glaucoma.3 Structural outcomes may be considered if treatment outcomes in structural parameters are associated with treatment outcomes in function.3 However, clinical trials of glaucoma have required large sample sizes and lengthy follow-up to detect significant change because of the slow rate of progression4-9 and the variability of VF.10-13 Conducting these studies is lengthy and expensive,14 posing challenges for new glaucoma therapies to be tested in clinical trials.
Shorter trials have been reported to be possible if frequency of testing is increased.15 The United Kingdom Glaucoma Treatment Study detected differences in incidence of VF deterioration between placebo and treatment groups with 11 visits after less than 2 years of follow-up.16 Other approaches also have been proposed for shortening the duration of glaucoma clinical trials. These approaches include grouping of test visits,17 evaluating measured group differences using linear mixed-effects modeling,18 bayesian approaches19,20 and analytic algorithms or computational methods.21,22 Multivariate methods for longitudinal models also provide a framework for assessing whether a therapy has a similar relative effect across multiple concurrently measured outcomes.23 Thus, a combination of frequent testing, optimal monitoring, and analysis strategy may help improve the efficiency of clinical trials, reducing the time needed to detect disease progression in patients with glaucoma and saving patient sight-years.
Several methods of detecting and monitoring change of progressive disease in glaucoma exist and studies have compared various methods of functional and structural change.8,24-26 However, comparisons of these methods can be challenging because follow-up of patients with glaucoma with different instruments concurrently in a clinical setting is relatively infrequent and irregular.27,28 Testing at a frequency conducive for detecting progressive change in short periods of time is even more challenging but important for developing end points for use in clinical trials.
The Short-term Assessment of Glaucoma Progression (STAGE) model was designed to evaluate diagnostic tests currently used in clinical practice and determine which are the most sensitive for detecting glaucomatous change over short time periods. Patients with early, moderate, and advanced glaucoma were tested on a variety of visual function and imaging instruments every 3 months for 2 years to determine how best to identify progressive disease. This report describes the study design, presents preliminary results comparing rates of change with varying follow-up times and observation frequencies, and provides guidance on sample size and power calculations for shorter clinical trials.
Patients with open-angle glaucoma were recruited from the Shiley Eye Institute, Hamilton Glaucoma Center, University of California, San Diego and from the Diagnostic Innovations in Glaucoma study (DIGS). The University of California, San Diego institutional review board approved the study methods, which adhered to the tenets of the Declaration of Helsinki29 and the Health Insurance Portability and Accountability Act. All participants gave written informed consent to participate in the study and were compensated for their participation.
Enrollment began in May 2012 with all participants provided written informed consent by May 2013. Participants were followed up for every 3 months for 2 years after a baseline examination with OCT and VF testing, with annual simultaneous optic disc stereo photographs. All follow-up was concluded in September 2016. Data were analyzed from July 2019 to January 2021. Although some of these data have been used in earlier publications,30-32 this is, to date, the first use of these data focused on clinical trial design.
Quiz Ref IDStudy inclusion criteria were similar to the DIGS.32-35 All participants were at least 18 years old with no history of intraocular surgery other than glaucoma or cataract surgery, secondary glaucoma or other diseases known to affect the VF (eg, pituitary lesions, demyelinating diseases, HIV, or AIDS), cognitive impairment or history of stroke, Alzheimer disease, dementia, inability to perform perimetry reliably, or life-threatening disease. Eyes had open angles as evaluated using gonioscopic examination and visual acuity equal to or better than 20/40 with less than 5.0 diopters sphere and 3.0 diopters cylinder refraction. Both eyes were included in the study unless only 1 met the inclusion criteria. Patients with diabetes were included if there was no evidence of retinopathy. Patients with glaucoma had glaucomatous appearing optic discs classified by disc photographic imaging in at least 1 eye and/or repeatable VF damage at baseline. Standard automated perimetric VF results were considered abnormal if the pattern standard deviation was triggered at the greater than or equal to 5% level or the Glaucoma Hemifield Test result was outside normal limits.33 Patients were recruited to represent a mix of early, moderate, and advanced disease.
Relevant medical history was surveyed and anthropometric measurements were obtained. At baseline, each participant underwent complete ophthalmic examination, including visual acuity, slitlamp biomicroscopy, gonioscopy, pachymetry, dilated funduscopy, stereoscopic ophthalmoscopy, and simultaneous stereoscopic disc photography. Intraocular pressure was measured using Goldmann applanation tonometry.
Data were processed centrally through the Hamilton Glaucoma Center Data Coordinating Center, the Imaging Data Evaluation and Analysis (IDEA) Center, and the VisFACT (Visual Field Assessment Center) reading centers. The IDEA Center processed and reviewed the quality of simultaneous stereophotographs and images, and VisFACT processed and reviewed the quality of VF tests.
Annual simultaneous stereophotographs were graded by 2 independent graders masked to the participant’s identity according to a standard protocol using standard photographs as a reference. In cases of discrepancy in opinion, a third senior grader (F.A.M. or C.B.) adjudicated. All photographs were graded for quality and presence of glaucoma. Glaucomatous optic disc damage was defined as neuroretinal rim narrowing or notching, localized or diffuse RNFL defect, or a between-eye asymmetry of the vertical cup-disc ratio more than 0.2.
Standard automated perimetry was performed every 3 months using the 24-2 test pattern Swedish interactive thresholding algorithm (SITA standard) on the Humphrey Field Analyzer (Carl Zeiss Meditec). Only repeatable and reliable VF tests (≤33% fixation losses and false-negative results and ≤33% false-positive results) were included. According to 24-2 VF mean deviation (MD) severity, eyes were classified as early (MD greater than −6 dB), moderate (−12 dB to ≤MD to less than or equal to −6 dB), and advanced (MD greater than −12 dB).36
Spectral-domain OCT (SD-OCT) scans were obtained every 3 months using Spectralis SD-OCT software, version 220.127.116.11 (Heidelberg Engineering). Spectralis SD-OCT scans acquire a total of 1536 A-scan points from a 3.45-mm circle centered on the optic disc. Images with noncentered scans or signal strength 15 or less were excluded. Segmentation errors were corrected by IDEA center staff. Three measurements were collected at each visit and averaged for use in the longitudinal analysis.
Baseline patient-level characteristics are presented as means with 95% CIs for continuous variables and count (percentage) for categorical variables. The CIs for eye-level characteristics were computed using linear mixed-effects models.
Longitudinal rates of change in RNFL thickness and VF MD were estimated with linear mixed-effects models, with correlated intercept and slopes within eye and intercepts within participant. Fixed effects included baseline age and follow-up time. In RNFL thickness models, we included an acquisition model version as an additional fixed effect to model a potential bias introduced by a change in software version that occurred during this study. Mean, 5th percentile and 95th percentile of rates of change were derived from these models using all available data. Models were fit on subsets of the data to emulate studies of varying length and observation frequency.
Power calculations as functions of study length, observation frequency, and sample size were performed using the methods of Liu and Liang.37 Variance component estimates were derived from longitudinal mixed models fit with data spaced 3, 6, and 12 months apart within the 2-year follow-up. All P values were 2-tailed, and significance was set at P < .05. All analyses were performed using R, version 3.6.3 (R Foundation for Statistical Computing).
The study included 178 eyes of 97 patients with glaucoma, with 87 patients (90%) completing the full 24-month follow-up. A total of 8 eyes were excluded from this study because of poor initial VF reliability or insufficient OCT quality (7 for VF only, 1 for both VF and OCT). Eighty-six patients (89%) had documented glaucoma medication during follow-up. A summary of patient, eye, and study characteristics is shown in Table 1. The baseline mean (SD) age was 69.0 (11.4) years. The baseline mean VF mean deviation (MD) and RNFL thickness was −4.27 dB (95% CI, −5.17 to 3.37 dB) and 73.6 μm (95% CI, 71.0-76.1 μm), respectively. One-hundred forty-one eyes (79.2%) of the 178 included had early glaucoma with MD better than −6.0 dB; 21 (11.8%) and 16 (9.0%) had moderate (MD between −6 dB and −12 dB) or advanced VF loss (MD worse than −12 dB), respectively.
Rates of VF 24-2 MD and RNFL thickness change for each eye over time are show in the eFigure in the Supplement. The mean VF 24-2 MD thickness slope was −0.32 dB/y (95% CI, −0.43 to −0.21 dB/y) and the mean RNFL thickness slope was −0.54 μm/y (95% CI, −0.75 to −0.32 μm/y). The 5th to 95th percentile of slopes was −1.81 to 0.72 dB/y for VF 24-2 MD and −2.40 to 1.67 μm/y for RNFL thickness. Results from the linear mixed-effects models fit for each outcome on subsets of the data with varying total follow-up and observation frequency are shown in Table 2. For example, at follow-up month 18 with observations at 3-month intervals in VF 24-2 MD, the change rate was·−0.41 (95% CI, −0.56 to −0.26; P < .001). At follow-up month 18 with observations at 3-month intervals in RNFL thickness, the change rate was·−0.58 (95% CI, −0.83 to −0.32; P < .001).
In VF 24-2 MD, with observations at 3-month intervals, the variance of eye-level intercepts and slopes was 12.89 and 0.33, respectively, with covariance 0.69. The patient intercept variance was 18.73 and the error variance was 1.10. In RNFL thickness, with observations at 3-month intervals, the variance of eye-level intercepts and slopes was 107.98 and 0.88, respectively, with covariance −0.15. The patient intercept variance was 127.53 and the error variance was 2.18 (eTable in the Supplement). With a predefined target effect size, these estimates can be used for mixed model–based sample size and power calculations for prospective trials of glaucoma treatments with varying follow-up and observation frequency.37 We used the variance parameters of mixed models fit on data with 3-month observation intervals as examples.
Quiz Ref IDWe first show the association between total study length and observation frequency with fixed sample size (150 participants, 75 per group, both eyes included) and effect size. As an example, we set the effect sizes (ie, differences in change rates between groups) for these calculations equal to the change rates observed in our cohort (0.32 dB/y and 0.54 μm/y difference in change rates for VF MD and RNFL thickness, respectively). A study to detect a difference in change rates between groups in a theoretical trial of glaucoma treatment with these effect sizes would achieve 80% power at the .05 significance level with 3 observations per participant with a minimum follow-up of 22.3 months for VF MD and 18.1 months for RNFL thickness. The same power can be achieved in a study with 9 observations per participant and 16.3 and 13.2 months of total follow-up, respectively (Figure 1).
The effect sizes associated with a range of sample sizes at 80% power and significance level .05 for VF 24-2 MD are shown in Figure 2 and for RNFL thickness are shown in Figure 3. For example, a study with a total of 300 patients (150 in each arm) and at least 3 observations during an 18-month follow-up will have 80% power at the .05 significance level to detect a difference in change rates between groups in RNFL thickness as small as 0.38 μm/y or in VF 24-2 MD as small as 0.26 dB/y. Increasing observation frequency to 5, 7, or 9 total decreases the detectable effect size for this design to 0.36, 0.33, or 0.32 μm/y for RNFL thickness and 0.24, 0.22, and 0.21 dB/y for VF 24-2 MD, respectively.
This study provides data to support that clinical trials of glaucoma interventions using VF results as an outcome measure might be possible with the right combination of observation frequency and sample size. Reasonable effect sizes were detectable with adequate power in study designs with as little as 18 months of follow-up.
Several studies suggest specific strategies for reducing the sample size and/or duration of clinical trials of glaucoma treatment, including increasing the frequency of testing, grouping test visits, using trend-based methods with linear mixed models as opposed to event based methods, and various other statistical approaches.17-22 Similar to Wu et al,18 the trend-based analysis performed here suggests that large sample sizes or follow-up durations are not required to detect differences in VF MD or RNFL thickness change rates. Wu et al18 found a trend-based analysis to detect a 30% difference in mean rates of VF MD changes (assuming a baseline rate of −0.57 dB/y and an effect size of 0.17 dB/y) in a simulated 2-year study with observations every 3 months required only 277 participants per group to achieve 90% power. The sample size was further reduced to 153 participants per group to detect a 40% (0.23 dB/y) difference in rates and to 99 participants to detect a 50% (0.29 dB/y) difference in rates. In contrast, 1924, 1027, and 603 patients were needed for detection of event-based guided progression analysis with the same underlying treatment effects.
Using the effect sizes from Wu et al,18 we applied the methods of Liu and Liang37 with variance estimates equal to those derived from the VF MD model fit to all our available data within the 2-year follow-up. We found that totals of 223, 125, and 80 participants per group were required to obtain 90% power for treatment effects of 30%, 40%, and 50% (ie, 0.17 dB/y, 0.23 dB/y, and 0.29 dB/y), respectively. The slightly lower required sample sizes derived from our study compared with Wu et al18 may be owing to differences in both study design and sample size calculation paradigm. Although both studies had a total follow-up of 2 years with observations intervals of 3 months, Wu et al18 applied a simulation-based approach which resampled eyes for both control and treatment arms to emulate a well-matched cohort. In addition, to simulate a reduction in the mean rate of VF change, a randomly selected portion of participants in the treatment arm in Wu et al18 had progression completely halted. This assumption, though necessary to simulate slowing at a pointwise level in the absence of information on the mechanisms of a new potential treatment, may serve to create a bimodal distribution of random-slope estimates which could affect mixed-model performance. Although the required sample sizes are different, both this study and Wu et al18 show substantially fewer participants required for trend-based analyses compared with event-based analyses.
Other investigators38,39 have reported decreased rates of improvement in the detection of VF progression with increased observation frequency. Specifically, assuming MD loss of −0.5 dB/y, Wu et al37 showed 80% power to detect VF progression after 7.3, 5.7, and 5.0 years with tests performed once, twice, and thrice per year, respectively. We similarly show decreased gains in study efficiency as measured by total study duration when increasing observation frequency for fixed-effect sizes and samples sizes.
Quiz Ref IDSeveral issues should be considered when considering the results of this study for designing future trials. The sample size calculations presented are dependent on the rate of progression in the glaucoma population studied and the expected outcome of treatment on that rate. Estimates of the rates of VF and RNFL loss vary across clinical studies, study populations, and subtypes of glaucoma.4,40-42 In our study, the rate of VF progression change was modest with a mean rate of change of −0.32 dB/y. The median rates of VF MD change in patients with glaucoma treated in clinical practice have been reported to range from −0.05 to −0.62 dB/y.4,40-42 Wide variation in reported rates of change in RNFL thickness have been summarized.43 Our estimates of RNFL change rates varied (−0.98 to −0.12 μm/y) depending on total follow-up and frequency of observation. A possible explanation for the variability in change rates is a change in software that occurred during data acquisition, which we included in the model as a fixed effect (acquisition model version). Although not ideal for interpreting the results in this study, the variability estimates in this study may be larger than they would be if the hardware and software had remained stable during the study duration and, therefore, may overestimate the required sample size for a clinical trial.
Within-participant variability is another factor in calculating power and sample size requirements in longitudinal studies. Differences in operator, software version, quality, or other factors may increase within-participant variability and obfuscate actual change in outcome measures. Variance estimates for VF MD were largely consistent. For RNFL, the negative correlation observed between random intercept and slope variances is consistent with previously reported thickness parameter measurement floors.43-45 We recommend using the variance parameters from the 3-month observation window models in conjunction with the desired detectable effect size when determining sample size or power via the methods in Liu and Liang.37
Quiz Ref IDThis study has limitations. First, selection bias may occur when recruiting participants who are willing and able to participate in trials with more frequent testing. Second, any missed visits or loss to follow-up will affect power.46 Third, VF tests include those with a 15% or greater false-positive threshold, and not all VF test takers were experienced. However, of the 1399 VFs used in this report, only 11 had false-positive rates between 15% and 33% and less than 10% of participants had no prior VF experience. Fourth, 10-2 VF testing or metrics such as VFI or pattern standard deviation, which are not affected by cataract, may be associated with improved VF progression detection. In addition, the large proportion of participants with early glaucoma may limit the generalizability of these results to advanced glaucoma trials.
The sample size calculations presented here require that mixed-model assumptions be reasonably met. Heavily skewed progression rates may violate these assumptions. Covariate adjustment may also affect variance term and effect size estimates, which should be incorporated into any sample size calculation. Other considerations, such as nonlinear or bilinear progression for early VF loss, may be better incorporated via simulation.
This cohort study provides data suggesting that, based on the STAGE model, clinical trials to test efficacy of glaucoma therapy may be completed within 18 months of follow-up. Depending on the target effect size, clinical trials may also be completed with sample sizes of as few as 300 participants when reduction of the rate of progression is used as the end point.
Accepted for Publication: April 19, 2021.
Published Online: June 10, 2021. doi:10.1001/jamaophthalmol.2021.1812
Corresponding Author: Robert N. Weinreb, MD, Shiley Eye Institute, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093-0946 (email@example.com).
Author Contributions: Mr Proudfoot and Dr Zangwill had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Mr Proudfoot and Dr Zangwill had equal contributions as co–first authors.
Concept and design: Proudfoot, Zangwill, Bowd, Belghith, Dirkes, Weinreb.
Acquisition, analysis, or interpretation of data: Proudfoot, Zangwill, Moghimi, Saunders, Hou, Belghith, Medeiros, Williams-Steppe, Acera, Dirkes, Weinreb.
Drafting of the manuscript: Proudfoot, Zangwill, Weinreb.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Proudfoot, Moghimi, Belghith.
Obtained funding: Zangwill, Weinreb.
Administrative, technical, or material support: Zangwill, Saunders, Hou, Medeiros, Williams-Steppe, Acera, Dirkes, Weinreb.
Supervision: Zangwill, Bowd, Medeiros, Weinreb.
Conflict of Interest Disclosures: Dr Zangwill reported grants from Heidelberg Engineering (equipment and research support), grants from Carl Zeiss Meditec (equipment and research support), nonfinancial support from Optovue Inc (research equipment), nonfinancial support from Topcon Inc (equipment support), and grants from the National Eye Institute during the conduct of the study; personal fees from Idx (consultant) outside the submitted work; in addition, Dr Zangwill had a patent for UCSD issued by Carl Zeiss Meditec. Dr Medeiros reported grants from Novartis, grants from AbbVie, grants from Carl Zeiss Meditec, personal fees from Reichert, and grants from Heidelberg Engineering during the conduct of the study. Dr Weinreb reported grants from National Eye Institute, other from Carl Zeiss Meditec (equipment support, patent), nonfinancial support from Centervue (equipment support), nonfinancial support from Genentech Research (support), nonfinancial support from Heidelberg Engineering (equipment support), nonfinancial support from Konan (research or equipment support), nonfinancial support from Optovue (equipment support), nonfinancial support from Bausch & Lomb (research support), personal fees from Aerie Pharmaceuticals (consultant), personal fees from Allergan (consultant), personal fees from Eyenovia (consultant), and other from Toromedes (patent) during the conduct of the study. No other disclosures were reported.
Funding/Support: Genentech Inc, National Institutes of Health/National Eye Institute grants EY029058, EY011008, EY019869, EY027510, and EY026574, core grant P30EY022589; an unrestricted grant from Research to Prevent Blindness (New York, NY) and participant retention incentive grants in the form of glaucoma medication at no cost from Novartis/Alcon Laboratories Inc, Allergan, Akorn, and Pfizer Inc.
Role of the Funder/Sponsor: The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.