Study flow diagram.
Forest plots showing the negative and positive likelihood ratios (LRs) for enzyme-linked immunosorbent assay (ELISA) and radioimmunosorbent assay (RIA) testing along with the corresponding pooled estimates using a random-effects model. CI indicates confidence interval; numbers following Hobbs et al indicate different subgroups of patients in that study, as defined by setting (Table 1).
Receiver operating characteristics (ROC) plot along with the summary ROC (SROC) curves stratified for studies of the enzyme-linked immunosorbent assay (ELISA) and radioimmunosorbent assay (RIA).
Nomograms of the relation between the pretest and posttest probabilities after a negative finding on enzyme-linked immunosorbent assay (ELISA) or radioimmunosorbent assay (RIA).
Customize your JAMA Network experience by selecting one or more topics from the list below.
Battaglia M, Pewsner D, Jüni P, Egger M, Bucher HC, Bachmann LM. Accuracy of B-Type Natriuretic Peptide Tests to Exclude Congestive Heart FailureSystematic Review of Test Accuracy Studies. Arch Intern Med. 2006;166(10):1073–1080. doi:10.1001/archinte.166.10.1073
Copyright 2006 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2006
Congestive heart failure (CHF) is a major public health problem. The use of B-type natriuretic peptide (BNP) tests shows promising diagnostic accuracy. Herein, we summarize the evidence on the accuracy of BNP tests in the diagnosis of CHF and compare the performance of rapid enzyme-linked immunosorbent assay (ELISA) and standard radioimmunosorbent assay (RIA) tests.
We searched electronic databases and the reference lists of included studies, and we contacted experts. Data were extracted on the study population, the type of test used, and methods. Receiver operating characteristic (ROC) plots and summary ROC curves were produced and negative likelihood ratios pooled. Random-effect meta-analysis and metaregression were used to combine data and explore sources of between-study heterogeneity.
Nineteen studies describing 22 patient populations (9 ELISA and 13 RIA) and 9093 patients were included. The diagnosis of CHF was verified by echocardiography, radionuclide scan, or echocardiography combined with clinical criteria. The pooled negative likelihood ratio overall from random-effect meta-analysis was 0.18 (95% confidence interval [CI], 0.13-0.23). It was lower for the ELISA test (0.12; 95% CI, 0.09-0.16) than for the RIA test (0.23; 95% CI, 0.16-0.32). For a pretest probability of 20%, which is typical for patients with suspected CHF in primary care, a negative result of the ELISA test would produce a posttest probability of 2.9%; a negative RIA test, a posttest probability of 5.4%.
The use of BNP tests to rule out CHF in primary care settings could reduce demand for echocardiography. The advantages of rapid ELISA tests need to be balanced against their higher cost.
Congestive heart failure (CHF) is a major public health problem. Incidence and prevalence of CHF increase steeply with age,1,2 and CHF is a leading cause of hospitalization among people older than 65 years.3,4 With improvements in the prognosis of CHF, improved survival after acute myocardial infarction, and an aging population, the burden of CHF will continue to increase in the years to come.3
Ruling out CHF early in the diagnostic process is important but difficult. Clinical symptoms and signs such as edema, shortness of breath, or persistent coughing or wheezing are unspecific and can be absent.2,5 Particularly in elderly patients and patients with comorbid disorders that can mimic CHF (eg, obese patients in primary care or patients with acute pulmonary conditions attending emergency departments), diagnostic uncertainty may lead to delayed therapy and unnecessary echocardiograms.
B-type (brain) natriuretic peptide (BNP) is a neurohormone that is secreted in response to volume expansion and pressure overload of cardiac ventricles.6 In recent years, 2 tests measuring BNP in plasma have been developed, an enzyme-linked immunosorbent assay (ELISA) and a radioimmunosorbent assay (RIA). The ELISA test is a bedside test and would be particularly suitable to assist rapid diagnosis on site in primary and emergency care settings. The comparative accuracy of the 2 tests is, however, unclear. We performed a systematic review of the literature and meta-analysis to compare the diagnostic accuracy of ELISA and RIA assays.
We used methods recommended by the Cochrane methods group on systematic reviews of screening and diagnostic tests.7
We searched Medline and EMBASE (January 1990 to March 2004), the Cochrane Library (2004, issue 1) and MEDION (a database of diagnostic test reviews set up by Belgian and Dutch colleagues) (December 1971 to March 2004) to identify diagnostic studies evaluating the accuracy of BNP in the diagnosis of CHF. Search strategies combined free text terms and Medical Subject Headings (MeSH) relating to heart failure and diagnostic accuracy studies. The detailed search strategy is available on request. We considered studies in any language. We supplemented electronic searches by hand searching reference lists of relevant articles and reviews and by contacting experts and manufacturers of BNP tests.
Studies were eligible if they compared any type of BNP assay in asymptomatic patients or patients with suspected acute CHF with echocardiographic findings or findings from radionuclide scans, with or without additional clinical criteria. Besides this, the minimum requirement for inclusion was enough information to fill the 2×2 table. Two reviewers independently examined titles and abstracts of all potentially relevant articles and obtained full articles of all citations meeting the selection criteria. When necessary, we contacted authors to clarify issues, for example, when data from the same patients may have been included in several studies. We excluded studies that did not report the number of true- positive, false-positive, true-negative, and false-negative findings, and studies that were restricted to patients with diastolic dysfunction or examined the use of BNP as a prognostic marker. Final decisions on eligibility were made by consensus.
We abstracted study characteristics, methodologic quality, and results from each selected article. Study characteristics included the setting and type of population examined, publication year, test type and cutoff, reference standards, and the source of funding (main support from industry vs other). For comparison in statistical calculations we converted BNP levels from picogram per milliliter to picomole per liter where necessary, using a factor of 0.289 (1/3.463), based on the molecular weight of the BNP. Data were abstracted in duplicate using an electronic data entry form developed for this purpose.8 Discrepancies were resolved by consensus.
We assessed all articles that met the selection criteria for study quality. We selected items based on theoretical considerations and empirical data,9 including study design (case-control design or other), type of recruitment (prospective or retrospective, consecutive, or other), use of different reference tests (yes or no), application of reference tests (applied to all or only a fraction of study participants), and blind interpretation of test results (yes or no).
We calculated sensitivities, specificities, likelihood ratios, and their standard errors. We examined individual study results and between-study heterogeneity by plotting sensitivity and specificity in the receiver operating characteristics (ROC) space and used the regression model proposed by Moses et al10 to calculate summary ROC curves for both assays separately. We calculated the I2 statistic, which describes the percentage of total variation across studies that is due to heterogeneity rather than chance.11 Mild heterogeneity will account for less than 30% of the variation, and pronounced heterogeneity will account for more than 50%.
Because the BNP test is mainly used to rule out CHF, we were particularly interested in the likelihood ratio of a negative result. The likelihood ratio is a measure of a test result's ability to modify pretest probabilities and is used to convert the estimated probability of the suspected diagnosis before the test result was known (pretest probability) into a posttest probability, which takes the result into account.12
We used random-effects metaregression13 to investigate sources of variation in the negative likelihood ratios. The following variables were considered: type of test (ELISA or RIA), mean age of study population, proportion of male participants, presence of typical symptoms of heart failure, type of setting, and year of publication. We also examined the following items on study quality: patient recruitment (consecutive or other), blinding of test results (yes or no), study design (prospective or retrospective), and funding source (industry or other). Finally, we explored whether the type of reference test (echocardiography, radionuclide scan, or echocardiography combined with clinical criteria) influenced results. Results are presented as sensitivities, specificities, negative likelihood ratios, and summary ROC curves, with 95% confidence intervals (CIs). All analyses were performed using Stata Statistical software (version 8.2; Stata Corp, College Station, Tex).
Figure 1 summarizes the process of identifying and selecting studies. Nineteen studies, which described 22 study populations, met our inclusion criteria. Study characteristics are listed in Table 1. Among the 19 studies, 10 had a prospective design,14,15,18,22,24,25,27- 29,32 8 enrolled patients consecutively,17,19,22,24- 28 and 11 reported blind interpretation of test results.14,17,19,22- 24,26,28- 31 There were no diagnostic case-control studies, and reference tests were applied equally to all study participants. Thirteen studies examined the accuracy of the RIA assay14- 22,25,27,30,32; 5, the ELISA test23,24,28,29,31; and 1, the ELISA N-terminal pro-BNP (NT-pro-BNP) test.26 Studies used either the ejection fraction determined by echocardiography (cutoff, 30%-50%) or radionuclide scans (cutoff, 35%-40%) as the reference standard; 8 studies used a combination of echocardiography and clinical symptoms. The median sample size of the 19 studies was 139 patients (range, 52-3177); the analysis was based on 9093 patients. Five studies were sponsored by industry.23,25,27,29,31
Study populations and settings were heterogeneous (Table 1). The mean age of participants ranged from 51 to 79 years; the percentage of men, from 35% to 95%. Five studies were conducted in patients with acute dyspnea in tertiary care settings, with the prevalence (pretest probability) of heart failure ranging from 39% to 72%.15,23,28,29,31 Similarly high prevalences were observed in patients examined after a myocardial infarction and in patients with an existing diagnosis of heart failure.14,19,26,30 Among outpatients who were referred by general practitioners, prevalence of heart failure ranged from 10% to 31%.16,17,20,22,27 In screening studies of patients with risk factors for coronary heart disease and CHF, the prevalence was below 10%.18,21,32
Table 2 lists for each study population the number of true-positive, false-positive, false-negative, and true-negative test findings, as well as sensitivity, specificity, and negative likelihood ratio. The median sensitivity was 87% (range, 68%-98%). Specificity tended to be lower and more heterogeneous (median, 72%; range, 19%-98%).The combined negative likelihood ratio overall from random-effect meta-analysis was 0.18 (95% CI, 0.13-0.23). It was lower for the ELISA test (0.12; 95% CI, 0.09-0.16) than for the RIA test (0.23; 95% CI, 0.16-0.32) (Figure 2). This difference was unlikely to be a chance finding (P from test of interaction = .009). The type of test explained a substantial proportion of between study heterogeneity in the negative likelihood ratio: the I2 statistic was reduced from 66% to 45% when including type of test system in the metaregression model. The difference in the performance of the 2 test systems is also evident from the summary ROC curves shown in Figure 3.
The other variables considered in metaregression analyses were not strongly associated with the negative likelihood ratio, with the exception of presence of symptoms and type of test. The presence of symptoms was associated with a lower negative likelihood ratio (P = .07); however, the association was much attenuated and became nonsignificant (P = .62) when the type of test system was entered into the model.
Figure 4 shows the estimated posttest probabilities for different pretest probabilities assuming constancy of the negative likelihood ratio. For example, for a pretest probability of 20% (which is typical for patients with suspected heart failure in primary care), a negative result of the ELISA test would produce a posttest probability of 2.9%, and a negative RIA test would produce a posttest probability of 5.4%. Conversely, the corresponding posttest probabilities among patients with acute symptoms in an emergency setting, assuming a pretest probability of 60%, would be 15% and 26%, respectively.
We summarized the accuracy of the 2 available BNP screening tests for excluding CHF in different study populations, ranging from asymptomatic patients in the community to patients presenting with acute dyspnea in emergency departments. We found that negative results of both tests accurately rule out the diagnosis if patients are at relatively low risk of CHF. The use of BNP tests in low-risk patients in primary care settings could reduce demand for echocardiography and referrals of patients to specialists. The ELISA tests, which allow bedside testing and provide results in a few minutes, performed somewhat better than the RIA tests, which must be sent to a laboratory. However, the ELISA advantages must be weighed against its higher cost compared with RIA ($55 vs $11).
This systematic review and meta-analysis is based on 9093 participants from 22 separate study populations, 1854 (20%) of whom were diagnosed as having CHF. Studies were generally of good methodologic quality, although reporting was sometimes incomplete. Both tests had been examined in a range of populations, but there was little evidence that test performance depended on patient characteristics or that the difference in performance observed between the ELISA and RIA test was explained by differences in study populations. Studies used different reference tests and criteria for the diagnosis of CHF. Different reference standards could have introduced heterogeneity in test accuracy, but this was not confirmed by our analyses.
We acknowledge that we could only adjust for information that was aggregated at the study level. Individual patient data would have been preferable but were not available. Individual patient data would also have allowed us to examine the effect of the choice of different BNP cutoffs. Despite a comprehensive literature search, we could not identify any studies that directly compared the 2 test systems. We therefore believe that our study makes an important contribution to the best available evidence on the accuracy of BNP tests for the diagnosis of CHF.
Doust and colleagues33 recently did a systematic review of the diagnostic accuracy of natriuretic peptides in the diagnosis of heart failure. This review differs in several respects: we excluded 6 studies that were included by Doust et al but did not meet our inclusion criteria. In 3 of these, the definition of the reference standard was inadequate, and the other 3 were of patients with diastolic heart failure. Conversely, we identified 5 additional studies and obtained unpublished information from some authors.34 Furthermore, in contrast to Doust et al,33 we compared the characteristics of the bed-side ELISA test with the RIA assay.
It is widely acknowledged that CHF is a major public health problem. Recent data from the Framingham Heart Study indicate that the lifetime risk of CHF is 1 in 5, for both men and women.1 In the United Kingdom and elsewhere, CHF is a leading cause of hospitalization among people older than 65 years.3,4 Although prognosis has improved in recent decades, mortality remains high: the 5-year mortality from 1990 to 1999 was 59% for men and 45% for women aged 65 to 74 years in the Framingham study.35 With further improvements in the prognosis of CHF, improved survival after acute myocardial infarction, and an aging population, the burden of CHF will continue to increase in the years to come.3 B-type natriuretic peptide tests have the potential to guide clinical decisions, particularly in patients at lower risk in primary care and emergency departments. Applied early in the diagnostic process in patients with suspected cardiac failure, negative BNP test findings can help rule out CHF and thus avoid unnecessary referral to echocardiography. If the test result is positive, confirmation by echocardiography will generally be required. Early diagnosis of left ventricular dysfunction or CHF may be important: treatment with angiotensin-converting enzyme inhibitors can improve the prognosis of patients with left ventricular dysfunction without overt heart failure and patients with symptomatic CHF.36- 38
Future diagnostic accuracy studies should model the probability of CHF given the BNP test result and also take into account relevant clinical data (such as the presence of symptoms and age). Such studies should be adequately powered so that the influence of clinical indicators on test performance can be examined. The resulting prediction rules should then be validated in other populations, and their cost-effectiveness should ideally be investigated in randomized trials.39 Indeed, although our review indicates that BNP tests are useful to rule out CHF in patients at lower risk, the effect of the introduction of such tests on clinical outcomes or costs remains unclear.
Correspondence: Matthias Egger, MD, MSc, MFPHM, Department of Social and Preventive Medicine, University of Berne, Finkenhubelweg 11, CH-3012 Berne, Switzerland (firstname.lastname@example.org).
Accepted for Publication: January 10, 2006.
Financial Disclosure: None.
Funding/Support: This study was funded by Gesellschaft für das Gemeinnützige und Gute and the Swiss Academy of Medical Science (Käthe Zingg-Schwichtenberg-Foundation), Basel, Switzerland, and grants 3233B0-103182, 3200B0-103183 (Dr Bachmann), 3233-066377, and 3200-066378 (Dr Jüni) from the Swiss National Science Foundation, Berne, Switzerland.
Acknowledgment: We thank Fritz Grossenbacher, MD, for help with the literature searches.