Median values of gastroesophageal reflux disease (GERD) burden, symptoms, and treatment scores by study group (heavy horizontal lines). In all 3 panels, median values for general medical patients vs patients with active GERD, and for patients with active GERD vs patients who underwent laparoscopic fundoplication surgery, differ significantly (P<.001, using the Wilcoxon rank sum test). Median scores for the general medical group and the postoperative group also differ significantly (P = .02, using the Wilcoxon rank sum test) for GERD treatment score. Differences between these groups were not statistically significant for GERD burden (P = .95) or GERD symptoms scores (P = .14).
Scale responsiveness preoperatively and 8 weeks after laparoscopic surgery for gastroesophageal reflux disease (GERD) (heavy lines represent median values).
Change in gastroesophageal reflux disease (GERD) burden, symptoms, and treatment median scores based on the question "How do you feel now compared with the first time you took this survey?"
Median gastroesophageal reflux disease (GERD) burden score (interquartile ranges) for patients bothered by symptoms and/or treatment.
Liu JY, Woloshin S, Laycock WS, Rothstein RI, Finlayson SRG, Schwartz LM. Symptoms and Treatment Burden of Gastroesophageal Reflux DiseaseValidating the GERD Assessment Scales. Arch Intern Med. 2004;164(18):2058-2064. doi:10.1001/archinte.164.18.2058
A comprehensive assessment instrument that measures the burden of both symptoms and treatment is needed to determine the optimal management of gastroesophageal reflux disease (GERD), and we developed such an instrument.
This validation study included 3 groups: patients with active GERD (n
= 193), surgical patients with prior GERD (n = 197), and general medical outpatients (n = 63). All completed an initial survey. General medical patients and patients with active GERD were resurveyed after 2 to 6 weeks. The main outcome measures were test-retest reliability, internal consistency, discriminant validity, and responsiveness to change for 3 scales graduated from 0 to 100: a GERD burden (or overall impact on quality of life scale), a symptoms scale, and a treatment scale. Higher scores indicated greater disease burden.
The GERD burden, symptoms, and treatment scales all demonstrated good discriminant validity, as patients in the active-GERD group had the highest scores. Scores on each scale effectively classified the patients who belonged to the active-GERD group. Moreover, scores improved substantially 8 weeks after surgery, thereby demonstrating the scales' responsiveness to change. As hypothesized, the burden of treatment was distinct from that of symptoms, as 23% of patients not bothered by GERD symptoms described their GERD treatment to be a moderate or serious problem. Indeed, the impact of treatment problems approached that of symptoms problems. All pairwise comparisons were significant (P<.02).
The GERD burden, symptoms, and treatment scales were valid, reliable, and responsive instruments for use in patients with GERD. Our analyses highlight the importance of assessing both symptoms and treatment burden in patients with GERD.
Gastroesophageal reflux disease (GERD) is a common chronic condition, as an estimated 25 million American adults experience heartburn daily.1 While medical and surgical treatment options exist, the optimal therapy for GERD is unknown. To compare treatment options, a valid, reliable, and comprehensive assessment method is needed. Many instruments have been developed to measure the health effects of GERD. Some, such as the Gastroesophageal Reflux Disease Activity Index (GRACI),2 the Gastrointestinal Symptom Rating Scale (GSRS),3 and the Gastroesophageal Reflux Questionnaire (GERQ) measure the intensity and frequency of GERD symptoms.4 Others, which include the Quality of Life in Reflux and Dyspepsia (QOLRAD)5 and the GERD Health Related Quality of Life Scale (GERD-HRQL),6 take into account not only the symptoms but also the effects of symptoms on quality of life.
Although these instruments are well studied and amply validated, none of them adequately measures the effects on health of GERD treatment itself. Specifically, they are limited in their ability to measure the adverse effects (eg, bloating or dysphagia after surgery and diarrhea due to medication) and bother of treatment (eg, dietary restrictions and daily medication). This limitation is important because effective GERD treatment requirements are not trivial and may significantly affect quality of life independently of GERD symptoms. If so, comparisons of surgical, endoscopic, and medical treatments based on their effect on GERD symptoms alone will be biased toward those that are most effective in controlling symptoms, even if they require lifestyle changes that are bothersome or induce unfavorable side effects.
We developed and evaluated an instrument for measuring the impact of GERD on quality of life. We made an a priori decision to develop separate scales assessing (1) overall GERD-related quality of life; (2) the burden of GERD-related symptoms; and (3) the burden of GERD treatment.
We conducted the formal validation study at the Dartmouth Hitchcock Medical Center, a tertiary care center in Lebanon, NH, with 3 sets of respondents: patients with active GERD (ie, who had been referred for GERD to a gastroenterologist or a surgeon) (n = 193); patients who had undergone surgical treatment (laparoscopic fundoplication) for GERD 1 to 5 years previously (n = 197); and general medical outpatients recruited from the primary care clinic (n = 63) (Table 1). The study received approval from the local institutional review board.
To assess overall quality of life with GERD, we adapted a standard "feeling thermometer"—a simple visual analog scale looking like a thermometer, with a graduation ranging from 0 (labeled as "death") to 100 ("perfect health"). Subjects were asked to make 2 marks on the thermometer: one showing how their health was in the past 2 weeks, and another showing how their health would be without GERD symptoms. The impact of GERD on quality of life was measured as the difference between the 2 marks, and ranged from 0 (GERD had no effect on quality of life) to 100 (GERD had maximum effect).
Based on a review of the relevant literature, as well as input from local gastroenterologists and surgeons specializing in the treatment of GERD and from focus groups with GERD patients, we identified 7 common symptoms plausibly related to GERD. Prior work has shown that (at least in coronary disease) symptom frequency and bother may represent separate domains. For this reason, we drafted questions assessing both how often GERD symptoms occurred and how bothersome they were. Because we found a very high correlation between the frequency and bother questions (r = 0.88-0.90), we included only the bother questions in the final scale to minimize respondent burden.
The 7 specific symptoms questions all shared the same stem ("In the past 2 weeks, how bothered have you been by...") and set of possible responses (a Likert-type scale with the following categories: not at all, a little bit, a lot, and terribly). We also included an overall symptoms assessment: "If you were to spend the rest of your life with your reflux symptoms just the way they have been during the past 2 weeks, would you feel delighted, pleased, neutral, unhappy, or terrible?" adapted from the NIH prostatitis index.7 Scores were generated by summing the points assigned to each response category, from 0 (no symptoms) to 3 or 4, depending on the number of categories. The scores were then transformed to a 100-point scale (higher was worse).
The treatment questions were designed to capture what patients did to control their symptoms and organized into 4 domains:
Avoiding foods: Initially we included a list of 10 foods that people with GERD symptoms often avoid (chocolate, coffee/caffeine products, spicy foods, alcohol, fried or fatty foods, carbonated drinks, onions, peppermint, fruit juices, citrus juices, and tomatoes) and a fill-in-the-blank "other" category (responses: dairy [n = 5], vegetables [n = 3], meat [n = 2], and bread [n = 1]). Because discriminant analysis of pilot data showed that items other than coffee and spicy foods added no independent information, for greater simplicity we only retained a single combined item: "Do you avoid coffee or spicy foods [yes/no]" in the scale.
Changing lifestyle: Patients were asked about 6 specific lifestyle changes commonly recommended by physicians treating GERD (eating dinner earlier, stopping or decreasing smoking, elevating the head of the bed, losing weight, eating frequent small meals instead of 3 large ones, and stopping or decreasing exercise), and a fill-in-the-blank "other" category was added (responses: avoid overeating [n = 2] and avoid bending down/over [n = 2]). All of these were included in the scale.
Treatments as a problem: Patients were asked how much of a problem it was for them to take their medications and change their diet or lifestyle because of reflux on a 4-point Likert-type scale (not a problem, a little problem, a moderate problem, or a serious problem). Patients were also asked how they would feel if they had to spend the rest of their life managing reflux as they were now doing, using a 5-point Likert-type scale (delighted, pleased, neutral, unhappy, terrible).
Need for medications: We asked patients how many over-the-counter and prescription medications they needed to take daily to control GERD.
Scoring was done by assigning 1 point for each food avoided or lifestyle change made, and for each medication used daily. The "treatment as a problem" questions were assigned 0 points (not a problem) to 3 points (serious problem). Points were summed and transformed to a 100-point scale (higher was worse).
All patients were asked to complete a 31-item self-administered questionnaire that included the 3 scales. General medical patients and patients with active GERD were resurveyed after 2 to 6 weeks.
We evaluated reliability for all 3 scales by calculating the Pearson correlation coefficient of scores for all "stable" patients who completed test and retest questionnaires. We defined stable patients as those who responded that they felt "the same" at the retest (n = 40).
Internal consistency reliability was calculated using the Cronbach α. Since this analysis is only relevant for multi-item scales, the calculation was performed only for the symptoms and treatment burden scales. We hoped to find that each scale had items that measured the same domain but were not highly redundant.
Discriminant validity was demonstrated by comparing median scores on each scale across the 3 groups of respondents using the Wilcoxon rank sum and Kruskall-Wallis tests. We hypothesized that the patients with active GERD would, on average, score higher on all 3 scales than the general medical outpatients and the patients who had undergone surgery 1 to 5 years previously.
In addition, we used logistic regression to predict whether patients belonged to the active-GERD group; specifically, we created a model for each scale and calculated the area under the receiver operating characteristics (ROC) curve.
We assessed responsiveness in 2 ways. First, we compared before and after scores for the subset of patients in the active-GERD group who underwent laparoscopic fundoplication (n = 30). Because surgical treatment is known to be effective, we hypothesized that we would see a substantial improvement in scores obtained about 8 weeks postoperatively. The second measure of responsiveness used the retest question about how the patient felt now vs at the time of the first survey. We used the Wilcoxon rank sum test to see whether scores on each scale were lower (ie, improved) for the patients who said they had gotten better by the time of the retest, and were higher for the patients who said they had gotten worse.
All statistical tests were done using STATA software, version 7.0 (STATA Corp, College Station, Tex). All P values were 2-sided, and considered significant at .05.
Table 2, which is adapted from Ramos et al,8 summarizes the psychometric properties of the GERD burden, symptoms, and treatment scales.
Both symptoms and treatment scales demonstrated excellent internal consistency reliability (α = 0.87 and 0.83, respectively). Test-retest reliability was good for the GERD burden scale (r = 0.62) and excellent for the symptoms and treatment scales (r = 0.91 and 0.79).
All 3 scales demonstrated good discriminant validity. The distribution of responses to individual scale items and overall scores are presented in Table 3 and Figure 1. As hypothesized, compared with the general medical patients or those who underwent GERD surgery 1 to 5 years previously, patients with active GERD had a substantially worse GERD burden (median score, 20; interquartile range [IQR], 10-40; P<.001) and GERD symptoms scores (median score, 48; IQR, 32-56; P<.001). The same pattern was found with regard to treatment scores; however, in this case, the group of patients with prior GERD surgery scored higher than the general medical outpatient group (median scores, 12 vs 4; IQRs, 8-24 vs 0-12; P<.02), highlighting the fact that a substantial number of postoperative patients still needed to take medications or make diet and lifestyle changes to control symptoms.
We were able to demonstrate discriminant validity in a second way. For each scale, we used logistic regression to predict "membership" in the active-GERD group. In each case, the scales effectively classified patients; the area under the ROC curve for GERD burden, symptoms, and treatment scores was 0.83, 0.95, and 0.95, respectively.
To be useful a scale should also be responsive, that is, pick up clinically meaningful change. We were able to demonstrate responsiveness by comparing, for each scale, the scores of 30 patients in the active-GERD group before they underwent laparoscopic fundoplication and about 8 weeks after the intervention. Since this surgical procedure is effective in relieving symptoms, we expected to see scores improve and more closely resemble those of patients without GERD (ie, the scores found in the general medical outpatient group). The expected improvement was observed on all 3 scales (Figure 2).
Responsiveness was also assessed using data from the test-retest analysis. In Figure 3, we display the hypothesized linear relation between median change values for each scale and qualitative responses to the question "How do you feel now compared with when you first took this survey? [much better, a little better, the same, a little worse, much worse]."
We used item nonresponse as our measure of scale usability. Item nonresponse was low on both the symptoms and treatment scales (1%-8% of responses were blank or unusable). The proportion of patients with 1 or more item nonresponse was 8% for the symptoms scale, 14% for the treatment scale, and 18% for the GERD burden scale. A usable response on the latter scale entailed making 2 distinct marks on the visual analog image, and the GERD burden score had to stay the same or improve when imagining life without GERD.
Finally, one of the motivations for creating the separate symptoms and treatment scales was the notion that these 2 concepts are distinct. Our findings support this idea (Figure 4). We found that 23% of patients not bothered by GERD symptoms found their treatment to be a moderate or serious problem, and 30% of patients with no problems from treatment were still bothered "a lot" or "terribly" by GERD symptoms. The impact of treatment problems approached that of symptoms problems (median GERD burden scores, 14 vs 20; IQRs, 2.5-25 vs 10-30) for those with only symptoms problems. For context, median GERD burden scores were 0 vs 27 (IQRs, 0-5 vs 20-40) for patients not bothered by either symptoms or treatment vs patients bothered by both. All pairwise comparisons were significant (P<.02).
Our 3 GERD assessment scales proved to be valid and reliable measures of how GERD affects quality of life. The main advantage of our scales over other instruments is their comprehensiveness: our scales measure the overall quality of life and the symptoms burden as well as the treatment burden of GERD.
Previous work on the quality of life of patients with GERD focuses primarily on symptoms assessment.4,5,9 The typical symptoms of heartburn and regurgitation have been well studied, and atypical symptoms less so. We include the atypical symptoms of cough and hoarseness to capture the estimated 16% of patients with GERD who have these symptoms without heartburn.10 We also chose to include a broad array of symptoms that might, strictly speaking, be termed adverse treatment effects. In a recent study, for example, 36% of patients had significant bowel dysfunction after laparoscopic antireflux surgery—namely, bloating and diarrhea.11 Because it may be difficult for patients to distinguish the source of their symptoms (eg, is this bloating from GERD or from my surgery?) we decided to include all relevant symptoms on the same list.
In addition to regular medication use, patients with GERD must often make substantial dietary and lifestyle changes. We believe that our scale is the first to try accounting for the burden of these efforts. Our findings demonstrate that these treatments, often assumed to be benign and done "at no cost" to the patient, can in fact be quite bothersome.
Three limitations should be acknowledged with regard to the GERD assessment scales. First, there is no gold standard by which to judge the scales. Nonetheless, we believe that we have been able to establish their content, discriminant, and face validity as well as their responsiveness. A second concern relates to the usability of the GERD burden visual analog scale, which had the highest rate of nonresponses (18%). Most of these item nonresponses occurred among asymptomatic patients who had prior GERD surgery and those in the general medical outpatient group. While we did not interview the patients in the latter group, we believe that they did not realize that indicating no GERD burden meant placing the second mark (quality of life without GERD) next to the first. Making the instructions explicit with regard to this point (ie, "If you do not feel that GERD has any impact on your life now, your 2 marks should be in the same place") may eliminate the problem.
Finally, we have created and validated 3 separate scales rather than a single composite score. While it is possible to mathematically combine the scores, we think that it makes most sense to use them separately. The GERD burden scale produces an overall quality-of-life measure if a single value is needed (and which, at least theoretically, could be transformed into a utility measure12). The symptoms and treatment scales are useful in helping patients understand how various treatment options may affect them, and, specifically, how difficult it may be to achieve a given level of symptom control.
Gastroesophageal reflux disease is a common chronic problem. Successful treatment of GERD is traditionally defined by the absence of symptoms, and our findings highlight the importance of assessing the "cost" of symptom relief. We found that taking medications and making dietary and lifestyle changes can be important burdens for many patients, and believe that assessing the quality-of-life impact of GERD—or any disease—requires measuring the burden not only of symptoms but also of treatment.
Correspondence: Jean Y. Liu, MD, MS, Department of Veterans Affairs Medical Center (112), 215 N Main St, White River Junction, VT 05009 (email@example.com).
Accepted for publication November 28, 2003.
This study was supported by a research grant from the Hitchcock Foundation Veterans Affairs Career Development Awards in Health Services Research and Development (Drs Woloshin, Finlayson, and Schwartz), and the Generalist Physician Faculty Scholars Program of the Robert Wood Johnson Foundation (Drs Woloshin and Schwartz).
The views expressed in this study do not necessarily represent the views of the Department of Veterans Affairs or the US government.
We thank H. Gilbert Welch, MD, MPH, and the members of the Veterans Affairs Outcomes Group for many helpful suggestions throughout the course of this project.
The Gastroesophageal Reflux Assessment Scales are available from Dr Liu.