Nichols JJ, Mitchell GL, Saracino M, Zadnik K. Reliability and Validity of Refractive Error–Specific Quality-of-Life Instruments. Arch Ophthalmol. 2003;121(9):1289-1296. doi:10.1001/archopht.121.9.1289
To evaluate the reliability and validity of the National Eye Institute Refractive Error Quality of Life Instrument (NEI-RQL-42) and the Refractive Status and Vision Profile survey (RSVP).
Eighty-one participants with good visual acuity (better than 20/30 best-corrected acuity in each eye) completed the NEI-RQL-42 and RSVP on 2 occasions. Noncycloplegic, subjective refractions and high-contrast visual acuity assessments were also performed. Statistical analyses addressed internal consistency, test-retest reliability, and validity (ie, concurrent and construct validity) of the 2 instruments.
The NEI-RQL-42, RSVP survey, subjective refraction, and visual acuity.
The internal consistency for the overall NEI-RQL-42 was excellent (Cronbach α= 0.91); and for the overall RSVP, good (Cronbach α = 0.81). Likewise, the test-retest reliability for the overall NEI-RQL-42 was excellent (intraclass correlation coefficient [ICC], 0.91; 95% limits of agreement, −9.1 to 10.1); and for the RSVP, fair (ICC, 0.76; 95% limits of agreement, −12.1 to 12.5). The NEI-RQL-42 overall score showed good concurrent validity as it correlated significantly with subjective refraction, whereas the RSVP overall score did not. The NEI-RQL-42 and RSVP showed similar construct validity in terms of refractive error discrimination, but the NEI-RQL-42 showed better construct validity when discriminating by the type of refractive correction used by patients. Between-instrument convergent and divergent validity was good.
The NEI-RQL-42 and RSVP generally have good reliability and validity in this sample of patients with refractive error. However, other factors such as content should be considered in choosing 1 of these instruments for studies of refractive error correction.
QUALITY-OF-LIFE (QOL) assessments are important in ophthalmic research. These instruments complement and enhance our understanding of a patient's visual status, supplementing the traditional clinical tests used for assessment. Examples of previous ophthalmic QOL instruments include the National Eye Institute Visual Function Questionnaire, the Visual Function–14, the Glaucoma Symptom Scale, and the Graves' Ophthalmopathy Quality of Life Survey.1- 4 Recent interest in refractive surgery has led to the development of the following 2 refractive error–specific QOL surveys: the National Eye Institute Refractive Error Quality of Life instrument (NEI-RQL-42) and Refractive Status and Vision Profile survey (RSVP).5,6
Quality-of-life instruments should be reliable, valid, sensitive, and responsive if they are to be clinically useful.7 Although these psychometric properties are interrelated, they are examined in somewhat different ways. For instance, patients whose refractive error and vision have not changed should, theoretically, make similar responses on refractive error–specific QOL instruments each time they undergo assessment. To distinguish responsiveness(the ability to detect change in status over time) and sensitivity (the ability to detect differences among different patient groups) from measurement error, we should assess instrument variability when administered to clinically healthy patients on repeated occasions.8 Assessing validity consists of the accumulation of evidence over time and various studies suggesting the scales are rational and respond as predicted. Once psychometric properties are established, an instrument may be ready for use in epidemiological studies.9,10 The objectives of this study were to evaluate the reliability and validity of the NEI-RQL-42 and RSVP.
Subjects recruited for the study were required to review and sign informed consent documents, which were approved by the Biomedical Institutional Review Board of The Ohio State University, Columbus. Subjects had to be 14 years or older, and without ocular abnormality (other than refractive error) or systemic conditions that could influence refractive error stability (eg, pregnancy or diabetes). Patients with best-corrected visual acuity worse than 20/30 or who had undergone ocular surgery (eg, cataract or refractive surgery) were not enrolled in this study.
The NEI-RQL-42 was scored according to the guidelines set forth by its authors.11 The survey consists of 42 items used to develop 13 subscales, which are rescaled to a 100-point scale. Lower subscale scores indicate worse constructs. The subscales include clarity of vision, expectations, near vision, far vision, diurnal fluctuations, activity limitations, glare, symptoms, dependence on correction, worry, suboptimal correction, appearance, and satisfaction with correction. An overall score was calculated by averaging the subscales.
The RSVP was also scored according to the guidelines set forth by its authors.6 This scoring system accounts for the refractive correction used by the patient when answering and scoring the questionnaire (eg, spectacles, contact lenses, or both). Forty-five questions are used to calculate 8 subscale scores, which are rescaled to a 100-point scale. Higher subscale scores indicate worse constructs. The subscales include concern, expectations, physical and social functioning, symptoms, driving, optical problems, glare, and problems with corrective lenses. An overall score is also calculated.
A noncycloplegic, subjective refraction was performed according to standard clinical techniques. The end point of the refraction was the maximum plus sphere associated with the best Snellen visual acuity. Visual acuity was then measured at 4 m using the protocol similar to that of the Early Treatment of Diabetic Retinopathy Study.12- 14 High-contrast, logarithm of the minimum angle of resolution visual acuity was measured in the right and left eyes independently and binocularly, using the results from the subjective refraction. The total number of correct responses was recorded(of a total of 70 letters).
Subjects first completed 1 refractive error–specific QOL instrument(the NEI-RQL-42 or the RSVP), which was randomly assigned. Following completion of the first questionnaire, subjective refraction and visual acuity were measured. Subjects then completed the second questionnaire. Each subject was asked to return for a second visit within 30 days (±7 days) to examine the test-retest reliability of each of these outcomes. The QOL questionnaires were completed in the same order at the second visit, and a single examiner performed all measures of subjective refraction and visual acuity.
Statistical analyses were conducted using SAS, Version 8.2 software(SAS Institute Inc, Carey, NC). Descriptive statistics were used to summarize refractive error (the average spherical equivalent of both eyes) and visual acuity results (letters correct) binocularly from visit 1.
Internal consistency was assessed for each survey's subscales using the Cronbach α for data from visit 1.15 It is recommend that the α values for scales be greater than 0.70 to ensure internal consistency.16 A low α indicates that the items do not come from the same domain, as all items in each subscale should be correlated if they measure the same thing. An excessively high internal consistency (Cronbach α>0.90) could indicate that the items in the instrument are too highly correlated, with too many items measuring the same construct.
The test-retest reliability of the surveys, subjective refraction, and visual acuity measures were assessed using 2 methods.8 First, we used the 95% limits of agreement to characterize the test-retest reliability of each outcome.17 In these analyses, the mean of the differences relative to zero represents the bias between visits, and the width of the 95% limits of agreement represent the test-retest reliability of the scale.17 Second, we calculated intraclass correlation coefficients (ICC) and their 95% confidence intervals (95% CIs). It is generally recommended that the ICC exceed 0.90 if an instrument is to be used on individual patients in clinical practice and that the ICC exceed 0.70 for discriminating among groups of patients in research.18 A sample size of 80 patients provided a 95% CI lower bound of 0.70 for an ICC of 0.80.
Data from visit 1 of the study protocol were used to assess validity. We assessed concurrent validity by examining the correlation of each scale with clinical measures of vision, including refractive error and best-corrected binocular visual acuity. Construct validity was examined by known-groups validation and convergent/divergent validity.18 Known-groups validation was examined by testing for differences in survey scales by refractive error category (hyperopic, emmetropic, and myopic patients) and by the type of refractive correction (spectacles, contact lenses, or no correction) using the Kruskal-Wallis test. We assessed convergent and divergent validity using a multitrait, multimethod correlation matrix for the subscales and overall scores of the NEI-RQL-42 and RSVP. Because of the distribution of the subscale scores, nonparametric statistics were used (Spearman correlation coefficients and Kruskal-Wallis test). Because of the numerous correlations and comparisons in these analyses, type I errors were minimized by establishing a significance level of .01.
Eighty-one patients were recruited to participate in the study; however, 2 patients were lost to follow-up, and all analyses reflect a sample size of 79. The average ± SD time elapsed between visits was 33.1 ± 13.1 days. The average ± SD age of the sample was 33.3 ± 10.8 years (range, 20.9-61.5 years), and 44 subjects were female. It included 14 hyperopic patients (range, 3.94-0.50 diopters [D]), 26 emmetropic patients(range, 0.44 to −0.38 D), and 39 myopic patients (range, −0.63 to −11.19 D). Twenty-four subjects (30%) wore spectacles; 29 (37%), contact lenses; and 26 (33%), neither mode of correction. The range of binocular visual acuity in this study was 50 to 68 letters correct (Snellen equivalent, 20/24 to 20/11, respectively) (Table 1).
The average scores for the NEI-RQL-42 and RSVP across both visits and the mean ± SD differences between visits can be found in Table 2. The only subscale for which the mean scores significantly differed between visits was the RSVP symptoms subscale; the mean of this subscale was 11.6 ± 9.9 and the mean difference between visits was –2.8± 9.3 (1-sample t test, P = .01).
Table 3 provides reliability estimates for the NEI-RQL-42 and RSVP. The internal consistency of the NEI-RQL-42 subscales was generally good. Three of the subscales had internal consistencies of less than 0.70, including glare (Cronbach α = 0.49), activity limitations(Cronbach α = 0.63), and appearance (Cronbach α = 0.65), and 2 subscales had excessively high reliability (expectations and near vision). Also shown are values obtained from a previous report.19 The RSVP also generally showed good internal consistency. Two of the 8 RSVP subscales had internal consistencies of less than 0.70, including glare (Cronbach α= 0.60) and problems with corrective lenses (Cronbach α = 0.52). In a previous report, none of the subscales had an internal consistency of less than 0.70.6 The NEI-RQL-42 expectations and RSVP driving subscales had excessively high reliability.
The test-retest reliability of subjective refraction and visual acuity was good. For subjective refraction, we found an ICC of 0.99 (95% CI, 0.99-1.00) and the 95% limits of agreement were –0.53 to 0.53 D. For visual acuity, the ICC was 0.76 (95% CI, 0.57-0.87) and 95% limits of agreement were –5.20 to 5.80 letters. This finding indicates that refractive error and vision were stable during the 1-month interval.
The overall test-retest reliability of the NEI-RQL-42 was excellent(Table 3), with an ICC of 0.91(95% CI, 0.82-0.95). The 95% limits of agreement for the overall scale were–9.1 to 10.1 U. Six of the 13 subscales showed wide limits of agreement(≥±25 U), including expectations (±35 U), diurnal fluctuations(±27.2 U), glare (±36 U), dependence on correction (±36 U), worry (±32 U), and satisfaction with correction (±27 U). Four of the 13 subscales had ICC values less than 0.70, including activity limitations (ICC, 0.67; 95% CI, 0.44-0.82), glare (ICC, 0.64; 95% CI, 0.40-0.80), appearance (ICC, 0.49; 95% CI, 0.20-0.70), and satisfaction with correction(ICC, 0.68; 95% CI, 0.46-0.82).
The overall test-retest reliability of the RSVP was good, with an ICC of 0.76 (95% CI, 0.59-0.87). The 95% limits of agreement for the overall RSVP scale were –12.1 to 12.5 U. Two subscales had significant variability(≥±25 U) when we examined the 95% limits of agreement, including expectations (±41 U) and driving (±33 U). Seven of the 8 subscales had ICC values less than 0.70, including expectations (ICC, 0.63; 95% CI, 0.38-0.79), physical and social functioning (ICC, 0.65; 95% CI, 0.42-0.81), driving (ICC, 0.66; 95% CI, 0.42-0.81), symptoms (ICC, 0.64; 95% CI, 0.39-0.80), optical problems (ICC, 0.64; 95% CI, 0.40-0.80), glare (ICC, 0.60; 95% CI, 0.34-0.77), and problems with corrective lenses (ICC, 0.47; 95% CI, 0.18-0.69). Also shown in Table 3 are ICC values previously reported in the literature.6 Three subscales and the overall score had significantly worse ICC values in this study than those reported.6
On the NEI-RQL-42, 4 of the 13 subscales and the overall score significantly correlated with refractive error, including expectations (r = 0.45; P<.001), far vision (r = 0.30; P = .007), diurnal fluctuations(r = 0.30; P = .007), dependence on correction (r = 0.44; P<.001), and the overall score (r = 0.39; P = .003). We found no significant correlations between any of the NEI-RQL-42 scores and best-corrected visual acuity. On the RSVP, 2 of the 8 subscales significantly correlated with subjective refraction, including driving (r = –0.29; P =.01) and glare (r = –0.28; P = .01). Similar to the NEI-RQL-42, none of the RSVP scores correlated with best-corrected visual acuity.
As a measure of known-groups validity, we compared the instruments across the 3 refractive error groups (Table 4).For the NEI-RQL-42, we found significant differences between refractive error groups on 5 of the 13 subscales and on the overall score. For instance, hyperopic and myopic patients had much higher expectations than emmetropic patients and reported significantly more trouble than emmetropic patients with their far vision. Hyperopic and myopic patients also reported significantly more dependence on their correction than emmetropic patients, and were also more worried about their vision than emmetropic patients. Overall, hyperopic and myopic patients had significantly worse refractive error–specific QOL than emmetropic patients.
On the RSVP, 3 of the 8 subscales and the overall score differed across refractive error categories. On this instrument, hyperopic and myopic patients were more concerned with their vision than emmetropic patients. Hyperopic and myopic patients also had significantly worse physical and social functioning than emmetropic patients, and hyperopic and myopic patients reported more optical problems than emmetropic patients. Similar to the NEI-RQL-42, hyperopic and myopic patients had worse overall refractive error–specific QOL than emmetropic patients.
As a second measure of known-groups validity, both instruments were compared across the mode of refractive error correction shown in Table 5 (ie, spectacles, contact lenses, or neither). For the NEI-RQL-42, 6 of the 13 subscales and the overall score showed significant differences between the modes of correction. Those needing no correction believed that their clarity of vision was significantly better than those wearing spectacles or contact lenses. Those without correction also had fewer expectations regarding their vision than those who wore spectacles or contact lenses. Spectacle or contact lens wearers reported significantly worse far vision than those not needing correction; likewise, they also reported more diurnal fluctuations with spectacles or contact lens wear. Those without correction reported less activity limitations and less dependence on correction than those who wore spectacles or contact lenses. Overall, those not needing correction had significantly better refractive error–specific QOL than those who wore spectacles and contact lenses.
On the RSVP, 3 of the 8 subscales and the overall score differed significantly across modes of refractive correction. Spectacle and contact lens wearers had significantly more concern than those without refractive correction. They also had fewer symptoms than spectacle and contact lens wearers. Contact lens wearers reported significantly more optical problems than spectacle wearers and those without refractive correction. Finally, contact lens wearers had significantly worse overall refractive error–specific QOL than those without refractive correction and spectacle wearers.
A multitrait, multimethod correlation matrix (Table 6) presents relations between the instruments and their scales. Of the 126 potential combinations of scales between the instruments, 86 (68%) showed significant relations. The NEI-RQL-42 suboptimal correction subscale did not converge with most RSVP subscales. The RSVP expectations subscale did not converge with any NEI-RQL-42 subscale. The RSVP problems with corrective lenses subscale did not converge with the NEI-RQL-42 activity limitations, suboptimal correction, near vision, or satisfaction with correction subscale.
Questionnaires and their scales must be reliable to be useful in research and clinical practice. Two important forms of reliability include internal consistency and test-retest reliability. Internal consistency is high when items that constitute a scale are related to each other. For the NEI-RQL-42, we found that most subscales (77% of them) had good internal consistency (Cronbach α>0.70). Three subscales had poor internal consistency, including activity limitations, glare, and appearance, which also had poor internal consistency in a previous report.11 Two subscales had very high internal consistencies, including expectations and near vision; the items in these subscales might be redundant. None of the NEI-RQL-42 subscales had internal consistencies of greater than 0.90 in a previous report. For the RSVP, we found that 78% of the subscales had good internal consistency. Two subscales had poor internal consistency, including glare and problems with corrective lenses. The driving subscale had a very high internal consistency; the items in this subscale might be considered redundant. We found much lower internal consistencies for a large number of RSVP subscales than previously reported, including expectations, symptoms, optical problems, glare, and problems with corrective lenses, as well as the overall score.6
Subjective refraction and visual acuity were stable during the course of the study. Theoretically, we would expect that patients whose refractive error and visual acuity had not changed to provide similar responses on refractive error–specific QOL instruments each time they undergo assessment. The NEI-RQL-42 performed well in terms of test-retest reliability. Most scales had ICC values greater than 0.70, except activity limitations, glare, appearance, and satisfaction with correction. Six subscales had relatively wide 95% limits of agreement, including expectations, diurnal fluctuations, glare, dependence on correction, worry, and satisfaction with correction. The RSVP had worse test-retest reliability. Only 2 subscales had ICC values greater than 0.70, including concern and the overall score. This differs from a previous report in which all but 1 subscale (physical and social functioning) had test-retest ICC values of greater than 0.70.6 Several of our test-retest ICC values differed significantly from those of the previous report, including expectations, optical problems, problems with corrective lenses, and the overall score.6 However, the inclusion of patients with only good vision (better than 20/30 visual acuity) in this study might be considered a limitation in that the generalizability of the test-retest results is limited to healthy patient samples.
Although a scale may look correct (face validity) and may cover the right things (content validity), other important aspects of validity must be assessed (ie, concurrent and construct validity). A valid refractive error–specific QOL instrument should also theoretically be associated with clinical measures of vision, such as refractive error and visual acuity. For instance, individuals with high levels of ametropia should be very dependent on their correction, limited at times in their activities and functioning, and have more symptoms and problems with their vision than someone with low ametropia. Therefore, the degree of ametropia should be globally related to refractive error–specific QOL. This was true for the overall NEI-RQL-42 score, but not the overall RSVP score, using a conservative α of .01. Refractive error–specific QOL might also be related to best-corrected visual acuity, especially if individuals do not have visual acuity that is correctable to levels associated with good functional ability. As we limited the inclusion of individuals in this study to those with visual acuity of better than 20/30, it is not necessarily surprising that there were no correlations with the refractive error–specific QOL instrument and best-corrected visual acuity, as the 20/30 level of visual acuity is still very functional.
Another important component of validity is construct validity. An instrument measuring refractive error–specific QOL should be able to distinguish among hyperopic, myopic, and emmetropic patients. The NEI-RQL-42 and RSVP showed good construct validity in terms of their overall scores; emmetropic patients in general had fewer problems than myopic and hyperopic patients. On the NEI-RQL-42, 4 of the 13 subscales showed differences among refractive groups, including expectations, far vision, dependence on correction, and worry. In a previous report, emmetropic patients scored significantly better than myopic patients on 12 subscales and better than hyperopic patients on all 13 subscales.19 On the RSVP, 3 subscales showed differences, including concern (similar to worry on the NEI-RQL-42), optical problems, and physical and social functioning. Again, hyperopic and myopic patients scored significantly worse than emmetropic patients.
Refractive error–specific QOL instruments should also be able to distinguish between patients with varying forms of refractive correction, as each presumably has an impact on different aspects of daily living, symptoms, and functioning. On the NEI-RQL-42, we found differences between the groups on 6 of the 13 subscales, including clarity of vision, expectations, far vision, diurnal fluctuations, activity limitations, and dependence on correction, and the overall score. We detected no differences among refractive correction groups on the symptoms subscale, although it has been reported that those who wear glasses or contact lenses commonly experience symptoms.20,21 This finding may be related to the subscale's content. In a previous report, those requiring no correction scored significantly better than those wearing spectacles or contact lenses on 8 of the 13 NEI-RQL-42 subscales. The RSVP showed differences among groups on 4 of 9 scales in this report (overall score, concern, symptoms, and optical problems). Similar to our study, others have found differences between refractive correction groups using the RSVP on the overall score, symptoms, and optical problems in addition to physical and social functioning and problems with corrective lenses.
An important part of construct validity is convergent and divergent relations between scales, and we found good convergent validity between the instruments. There was a strong correlation between the individual scales from each instruments and the overall scale from the alternative instrument. The RSVP expectations subscale did not converge with any NEI-RQL-42 subscale. This subscale is composed of 2 questions that pertain to an individual's acceptance of functional but less than perfect vision. Perhaps the RSVP expectations subscale assesses an area of refractive error–specific QOL that is not assessed by the NEI-RQL-42; alternatively, this domain may not be related to refractive error–specific QOL at all. It is unclear to us why this subscale behaved in this manner.
Although both of these instruments satisfy aspects of reliability and validity to one degree or another, the investigators choosing such an instrument for studies of refractive error correction should thoroughly evaluate content of the instrument they choose in addition to the other psychometric properties of that instrument. One purpose of these instruments is to predict outcomes associated with refractive surgery, and there are clearly significant changes in the NEI-RQL-42 and RSVP subscales after refractive surgery.11,22 However, the constructs relevant to patients before and after refractive surgery are not necessarily the same as those necessary for patients successfully wearing contact lenses and spectacles, or attempting new types of these forms of visual correction. As seen in this sample of patients with a limited range of scores, floor or ceiling effects may limit the utility of these instruments. Furthermore, the content of both instruments may not be appropriate if they are to be used in trials that include wearers of spectacles and contact lenses, in which patients are presumed to be successfully wearing these modes of refractive correction.23 These types of issues will be important for future researchers to consider when selecting or developing an appropriate refractive error–specific QOL instrument.
Corresponding author: Jason J. Nichols, OD, MS, MPH, College of Optometry, The Ohio State University, 320 W 10th Ave, PO Box 182342, Columbus, OH 43218(e-mail: email@example.com).
Submitted for publication September 5, 2002; final revision received March 11, 2003; accepted March 25, 2003.