Flow diagram of meta-analysis.
Summary receiver operating characteristic curves of proton pump inhibitors (PPIs) (A) and placebo (B) as a diagnostic test in all studies. The curves suggest the superiority of PPI compared with placebo in detecting gastroesophageal reflux disease in patients with noncardiac chest pain. Numbers alongside the plots indicate individual studies (1-6 represent Xia et al,20 Fass et al,21 Pandak et al,22 Fass et al,30 Bautista et al,31 and Squillance et al,32 respectively). The X’s mark the pooled estimate of the true-positive and false-positive rates. The shaded regions mark the zone of 95% confidence intervals of the pooled sensitivity and specificity.
Point estimates with confidence intervals of sensitivity, specificity, and diagnostic odds ratios of 6 studies on the validity of the proton pump inhibitor test for the diagnosis of gastroesophageal reflux disease in patients with noncardiac chest pain. No significant outliers were found.
Wang WH, Huang JQ, Zheng GF, Wong WM, Lam SK, Karlberg J, Xia HHX, Fass R, Wong BCY. Is Proton Pump Inhibitor Testing an Effective Approach to Diagnose Gastroesophageal Reflux Disease in Patients With Noncardiac Chest Pain?A Meta-analysis. Arch Intern Med. 2005;165(11):1222-1228. doi:10.1001/archinte.165.11.1222
Gastroesophageal reflux disease (GERD) is common in patients with noncardiac chest pain (NCCP). Results of studies evaluating the accuracy of a proton pump inhibitor (PPI) treatment as a diagnostic test for GERD-related NCCP have varied. We evaluated the overall accuracy of this modality.
We searched the PubMed, MEDLINE, EMBASE, CINAHL, and Cochrane databases to May 2004 and included randomized, placebo-controlled studies evaluating the accuracy of findings from PPI testing in the diagnosis of GERD in patients with NCCP. The GERD diagnosis was confirmed by results of endoscopy and/or 24-hour esophageal pH monitoring. A summary diagnostic odds ratio and summary receiver operating characteristic curve analysis were used to estimate the overall accuracy and to explore any contributing factors.
Six studies met the inclusion criteria. The overall sensitivity and specificity of a PPI test were 80% (95% confidence interval [CI], 71%-87%) and 74% (95% CI, 64%-83%), respectively, compared with 19% (95% CI, 12%-29%) and 77% (95% CI, 62%-87%), respectively, in the placebo group. The PPI test showed a significant higher discriminative power, with a summary diagnostic odds ratio of 19.35 (95% CI, 8.54-43.84) compared with 0.61 (95% CI, 0.20-1.86) in the placebo group. The impact of the prevalence of GERD and treatment duration on the accuracy of the test could not be determined because of the lack of an adequate number of studies.
The use of PPI treatment as a diagnostic test for detecting GERD in patients with NCCP has an acceptable sensitivity and specificity and could be used as an initial approach by primary care physicians to detect GERD in selected patients with NCCP.
Recurrent episodes of retrosternal pain lacking documented cardiac abnormalities are defined as noncardiac chest pain (NCCP).1 The annual prevalence of NCCP in the general population of the western world ranges from approximately 25% to 35%.2,3 Patients with NCCP have a poor quality of life4 and consume a large proportion of health care resources.5 Although NCCP may have a number of causes, gastroesophageal reflux disease (GERD) is considered to be the most common.1,6- 12
Several esophageal tests are used to evaluate NCCP, including endoscopy, 24-hour esophageal pH monitoring, esophageal manometry, or provocative tests.1,8,13 However, the sensitivity of endoscopy is limited because most patients with NCCP have nonerosive GERD.14 Although 24-hour esophageal pH monitoring has been considered the best modality for diagnosing GERD in patients with NCCP,1 it is invasive, costly, and often unavailable in the primary care setting.
The knowledge that GERD is the most common cause of NCCP has led investigators to treat patients empirically with proton pump inhibitors (PPIs).15- 18 The successful management of NCCP with PPIs has proven the causal relationship between GERD and NCCP.15- 18 Treatment with PPIs is now used by investigators as a diagnostic test for identifying GERD in patients with NCCP.19 Studies assessing the accuracy of the PPI test result demonstrated sensitivity ranging from 78% to 92% and specificity from 67% to 86% for diagnosis of GERD in patients with NCCP.20- 22 However, these studies generally have a small number of patients,20- 22 and the potential impact of demographics of patients has not been evaluated. Thus, the clinical value of the PPI test in evaluating NCCP remains to be thoroughly assessed.23
We undertook this systematic review and meta-analysis to evaluate the overall sensitivity and specificity of the PPI test for detecting GERD in patients with NCCP. We also examined the impact of the characteristics of the study population or the study design on study findings.
A computerized literature search was performed in the PubMed, MEDLINE, EMBASE, CINAHL, and Cochrane Controlled Trials Register databases for relevant articles published in any language between 1966 and May 2004 with the following medical subject heading terms and/or text words: chest pain, noncardiac, or noncardiac in combination with omeprazole, lansoprazole, pantoprazole,rabeprazole, or esomeprazole. Meeting abstracts were searched from CD-ROMs of major international gastroenterological meetings held from 1995 through 2003 (American Digestive Disease Week, American College of Gastroenterology, World Congress of Gastroenterology, and United European Gastroenterology Week) using the same terms. Finally, we manually searched the reference list of all relevant review articles and original studies that we retrieved.
The title and abstract of all potentially relevant studies were screened for relevance before the retrieval of the full articles. Full articles were also scrutinized for relevance if the title and abstract were ambiguous. The literature search was conducted independently by 3 reviewers (W.H.W., J.Q.H., and G.F.Z.).
The following criteria were used to include studies: (1) adult patients with recurrent episodes of chest pain without documented cardiac abnormalities; (2) GERD diagnosed by results of endoscopy and/or 24-hour esophageal pH monitoring; (3) only randomized, placebo-controlled trials because a symptomatic response to PPI treatment in patients was evaluated as a diagnostic test; and (4) the number of true-positive, false-positive, true-negative, and false-negative findings were described explicitly, or such numbers could be derived from studies.
We excluded (1) therapeutic trials evaluating the efficacy of PPI treatment in patients with GERD-related NCCP, (2) studies without raw data for retrieval, and (3) duplicate publications. When duplications were found, we only included the publication that reported the most extensive information.
Data were extracted independently from each study by 3 researchers (W.H.W., J.Q.H., and G.F.Z.) using a predefined review spreadsheet. Any disagreements between the reviewers were resolved by discussion to reach consensus.
Criteria modified by Irwig et al24 on behalf of the Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests were applied to assess the quality of each study. These criteria include an explicit statement of the spectrum of disease, a clear definition of NCCP, a reference test used, a blinded measurement of the PPI and reference tests, execution of the test, explicit definition of the improvement of symptoms and reporting of the cutoff point of the test, and a description of the demographic information and sampling strategy.
For each study, we calculated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and their 95% confidence intervals (CIs). We also reported the diagnostic odds ratio (DOR) as a measure for the discriminative power of a diagnostic test for individual studies.
We used 2 methods to summarize the data. First, statistical pooling of the sensitivities and specificities was performed and summary DORs were calculated under a random-effects model.25 As a complementary method, summary receiver operating characteristic (sROC) curves were plotted with sensitivity (true-positive rate) on the y-axis and 1 − specificity (false-positive rate) on the x-axis according to the method proposed by Moses and Shapiro26 and refined by Littenberg and Moses.27 We used this approach because sensitivity and specificity are measures of diagnostic accuracy that rely on a single threshold for classifying a test result as positive or negative. For the PPI test, the threshold effect may find its origin in the variation of setting, study design, definition of NCCP, medications and the dosages and duration used, washout period, and the definition of symptom improvement. Studies were combined by using the sROC method if the definition of symptom improvement and reference test used were comparable, and the natural logarithms (Ln) of DOR of the included studies were homogeneous. The use of the sROC curve that combines results from different studies allows simultaneous evaluation of sensitivity and specificity and facilitates the comparison of the accuracy of the results from PPI and placebo tests. The regression curves were extended only over the range of the data for the few studies included in the analysis. Areas under the sROC curves were also calculated directly under the curve where data existed. We performed analyses that were unweighted and weighted by the inverse of the variance. The significance of the difference between the PPI and placebo was statistically analyzed by applying a Wilcoxon paired-sample test to the parameter D (D = LnOR, where the OR equals [Sensitivity/(1 − Sensitivity)]
/[(1 − Specificity)/Specificity]). The influence of covariates on the accuracy of the test was determined by the ROC regression analysis.26
We first used a χ2 test to assess the statistical homogeneity between studies.28 Then we plotted the sensitivity, specificity, and DOR of individual studies and their 95% CIs to evaluate study variations.28 If a visual heterogeneity was identified, we searched for the sources of any possible clinically important heterogeneity.
Data were reported according to the guidelines for meta-analysis evaluating diagnostic tests.28,29 Analyses were performed using MetaTest software (version 0.6) written by Joseph Lau, MD, and specially designed for meta-analysis of diagnostic tests.
We identified a total of 33 reports, of which only 6 studies met the inclusion criteria20- 22,30- 32 (Table 1). Fifteen irrelevant articles were deleted after screening the titles and abstracts. Full articles of the remaining potentially relevant articles were further scrutinized. Of these, 12 were excluded for the following reasons: 2 were nonrandomized, placebo-controlled clinical trials17,33; 2 were cost-effective analyses of the PPI test34,35; 2 were comments on a single study36,37; 4 presented duplicated data18,38- 40; 1 included only patients with NCCP and GERD15; and 1 did not compare the outcomes with a reference test41 (Figure 1).
Of the 6 studies, 5 were double-blind21,22,30- 32 and 1 was single-blind.20 Four were full-text articles,20- 22,31 and 2 were abstracts.30,32 A total of 220 study subjects with NCCP were included in these studies. The episodes and the duration of chest pain of patients at baseline were clearly described in 4 studies.21,22,30,31 Diagnosis of NCCP was based on a normal finding on a cardiac angiogram or in other comprehensive cardiac evaluations in 4 studies20,21,31,32 and on a negative result on technetium Tc 99m methoxy isobutyl isonitrile testing in 1 study.22 The diagnosis of NCCP was not clearly described in 1 study.32 Four studies claimed to exclude patients with peptic ulcer, history of gastric surgery, or recent treatment with antireflux medications.20- 22,31 The NCCP patients with endoscopic esophagitis were excluded in 1 study.20 Five studies provided a clear description of the demographic information,20- 22,30,31 with the mean ± SD age being 54.4 ± 6.1 years and the percentage of men, 60.4%. Five studies were performed in a crossover fashion with a washout period of 5 to 21 days.21,22,30- 32
Three studies evaluated omeprazole (60-80 mg/d)21,22,32; 2, lansoprazole (30-90 mg/d)20,31; and 1, rabeprazole (40 mg/d)30 as a diagnostic test for detecting GERD in patients with NCCP. Of the 6 studies, 5 assessed the value of a short course (1-2 weeks) of high-dosage PPI treatment,21,22,30- 32 whereas 1 used a standard dosage (30 mg/d) of lansoprazole for 4 weeks.20 A positive test result was defined as an improvement of chest pain by more than 50% after the treatment with a PPI. Three studies stated that the assessment of symptom improvement was independent of the results of the reference test.20- 22 The main characteristics of these studies are listed in Table 1.
The main results of the 6 studies are summarized in Table 2 and Table 3. The sensitivities of PPI test results in all 6 studies were significantly higher than that of placebo. However, the specificities were comparable or higher in 2 studies.20,31 In the remaining 4 studies, the specificities in the PPI group were lower than that in the placebo group.21,22,30,32 The PPV and NPV were higher in the PPI-treated group than in the placebo group in 4 studies,20- 22,31 except for 2 studies that included a small number of patients.30,32 According to the DORs of the individual studies, 5 studies demonstrated that the PPI treatment had a significantly high discriminative power for diagnosing GERD in patients with NCCP.20- 22,30,31 However, this was not observed in the placebo group (Tables 2 and 3).
The overall sensitivity for a PPI diagnostic test was 80% (95% CI, 71%-87%) compared with 19% (95% CI, 12%-29%) in the placebo group. The summary specificity for the PPI test was 74% (95% CI, 64%-83%) compared with 77% (95% CI, 62%-87%) in the placebo group (Table 4). The PPI test had a significantly higher discriminative power for diagnosing GERD in patients with NCCP, with an estimated DOR of 19.35 (95% CI, 8.54-43.84) compared with 0.61 (95% CI, 0.20-1.86) for the placebo group (P = .03).
Figure 2 shows the sROC curves plotted by using the results of accuracy from the individual studies. The shape of the sROC curve for the PPI group is different from that for the placebo group, with the PPI having a sharper increase in the sensitivity for a given increase in the false-positive rate (1 − specificity).
Incorporation of the additional covariates into the ROC regression analysis indicated that variables such as the type and year of publication, study design, prevalence of NCCP, reference test used, and treatment duration had no significant effect on the test accuracy estimated.
No statistical heterogeneities between studies were found (P = .95 for PPI and P = .17 for placebo). This was confirmed using a graphic presentation where 95% CIs of sensitivity, specificity, and DOR of individual studies overlap considerably (Figure 3).
In the present meta-analysis, we found that the overall sensitivity was 80% (95% CI, 71%-87%) and the specificity was 74% (95% CI, 64%-83%) for the PPI test. We also found a significantly higher discriminative power associated with the PPI test, with an estimated DOR of 19.35 (95% CI, 8.54-43.84). The PPV and NPV should give the most useful information of a diagnostic test because they mimic the situation in which the test is used. However, the predictive value suffers a disadvantage because the calculation is closely related to the prevalence of the disease in the study population.42 In the present analysis, the prevalence of GERD in patients with NCCP ranges from 33% to 76%.20- 22,30- 32 The large variability in the prevalence has made the estimate of the overall PPV and NPV unreliable. Therefore, the data should be explained with caution.
Our results show that the sensitivity of the PPI test was significantly higher than that for placebo, whereas the specificity was almost the same between both groups. Treatment with PPIs and placebo showed similar and better effects on improving NCCP symptoms in patients without GERD, indicating a possible placebo effect. The considerably high placebo effect is not uncommon in patients with functional bowel disorders and not surprising in patients with non–GERD-related NCCP. Nevertheless, interpretation of the study results must be cautioned because of the observed high placebo effect.
The accuracy of a diagnostic test should be evaluated by comparing its results with a gold (reference) standard that has been validated. However, this is not available for the diagnosis of GERD. Endoscopy results are frequently normal in patients with symptoms of GERD and abnormal esophageal acid exposure.43 The sensitivity of symptom evaluation falls short of a gold standard.44 Ambulatory 24-hour esophageal pH monitoring is generally considered to provide the most objective measurement of pathologic reflux. However, the sensitivity is reported to range from 85% to 90%.45 The sensitivity and specificity can be increased if reflux symptoms are also evaluated.46 Nevertheless, 24-hour esophageal pH monitoring alone is insufficient to be considered a gold standard. In most of the studies we have included, a combination of endoscopy and pH testing was used, which is the closest to the accepted reference test for GERD. Thus, findings in some patients with GERD may have been classified as negative for GERD. In addition, physiological acid reflux can also induce chest pain in individuals in whom the esophagus is hypersensitive to gastric acid.47- 49 These patients, although considered to have false-positive findings, may respond to the PPI test.
The sensitivity of the PPI test seems to be related to the duration of the treatment. Extending the duration of treatment from 1 to 2 or to 4 weeks increases the sensitivity by approximately 10% (Table 2). However, extending treatment duration beyond 4 weeks was unnecessary because 80% of patients who were likely to respond to PPI would respond within 4 weeks (Table 4). Therefore, we propose that initial treatment with a PPI can be given up to 4 weeks at least twice a day, based on the patient’s frequency of symptoms. The degree of relief expected would be at least 50%.
Although no statistical heterogeneity between studies was found, we cannot rule out the possibility of absence of any between-study heterogeneities. For example, there are differences in the definition of NCCP, type of PPIs and dosages used, washout period, degree of blinding, execution of test, and reference standard for diagnosing GERD. Therefore, certain biases may exist and could threaten the validity of our conclusions. For example, although all studies used endoscopic esophagitis and/or abnormal 24-hour esophageal pH monitoring as the reference test, different variables among studies were used to diagnose GERD. One study excluded patients with esophagitis and used abnormal findings of 24-hour esophageal pH monitoring as the reference standard.20 Another study used endoscopic esophagitis or abnormal findings of 24-hour pH monitoring with symptoms index as the reference standard.32 Therefore, a differential verification bias among studies may exist.50
There are several limitations to our study. First, only 6 randomized, placebo-controlled trials were included in the final analysis, with a total of 220 patients. Therefore, the results are susceptible to possible selection bias.28 The small number of patients may also limit the ease of generalizability of the study findings to other populations. Second, subgroup analysis was not feasible to perform in the present study because the number of studies was too small to obtain reliable estimates. We therefore cannot determine any possible influence of factors such as the type and year of publication, prevalence of NCCP, study design, reference test, and dose and duration of PPI treatment on the results of this analysis. Third, the quality of the meeting abstracts was a concern because the information available from these studies was limited.30,32 As a consequence, a possible verification bias may exist.24,28
Our meta-analysis suggests that testing with the high-dosage PPI treatment up to 4 weeks has an acceptable sensitivity and specificity and could be considered as an initial useful and possibly cost-saving strategy by primary care physicians in managing patients with NCCP suspected of esophageal disorders and with no alarming symptoms. If more than 50% reduction in symptom scores can be achieved, the chance of having GERD-related NCCP is significantly increased, and the PPI treatment should be continued. Future well-designed, adequately powered studies are needed to ascertain the findings of this analysis.
Correspondence: Benjamin C. Y. Wong, MD, Department of Medicine, Faculty of Medicine, University of Hong Kong, Hong Kong (firstname.lastname@example.org).
Accepted for Publication: January 31, 2005.
Financial Disclosure: Dr Huang has received research funding and honoraria from AstraZeneca and Merck.
Funding/Support: The study was supported by the Gastroenterological Research Fund, University of Hong Kong, Hong Kong. Dr Wang is a visiting professor under the Croucher Foundation Chinese Visitorship Scheme, Hong Kong, at the Department of Medicine, University of Hong Kong.