Customize your JAMA Network experience by selecting one or more topics from the list below.
van der Windt DAWM, Jellema P, Mulder CJ, Kneepkens CMF, van der Horst HE. Diagnostic Testing for Celiac Disease Among Patients With Abdominal Symptoms: A Systematic Review. JAMA. 2010;303(17):1738–1746. doi:10.1001/jama.2010.549
Author Affiliations: Arthritis Research UK National Primary Care Centre, Keele University, Keele, Staffordshire, England (Dr van der Windt); and Department of General Practice, EMGO Institute (Drs van der Windt, Jellema, and van der Horst), and Departments of Gastroenterology (Dr Mulder) and Pediatric Gastroenterology (Dr Kneepkens), VU University Medical Centre, Amsterdam, the Netherlands.
Context The symptoms and consequences of celiac disease usually resolve with a lifelong gluten-free diet. However, clinical presentation is variable and most patients presenting with abdominal symptoms in primary care will not have celiac disease and unnecessary diagnostic testing should be avoided.
Objective To summarize evidence on the performance of diagnostic tests for identifying celiac disease in adults presenting with abdominal symptoms in primary care or similar settings.
Data Sources A literature search via MEDLINE (beginning in January 1966) and EMBASE (beginning in January 1947) through December 2009 and a manual search of references for additional relevant studies.
Study Selection Diagnostic studies were selected if they had a cohort or nested case-control design, enrolled adults presenting with nonacute abdominal symptoms, the prevalence of celiac disease was 15% or less, and the tests used included gastrointestinal symptoms or serum antibody tests.
Data Extraction Quality assessment using the Quality Assessment of Diagnostic Accuracy Studies tool and data extraction were performed by 2 reviewers independently. Sensitivities and specificities were calculated for each study and pooled estimates were computed using bivariate analysis if there was clinical and statistical homogeneity.
Data Synthesis Sixteen studies were included in the review (N = 6085 patients). The performance of abdominal symptoms varied widely. The sensitivity of diarrhea, for example, ranged from 0.27 to 0.86 and specificity from 0.21 to 0.86. Pooled estimates for IgA antiendomysial antibodies (8 studies) were 0.90 (95% confidence interval [CI], 0.80-0.95) for sensitivity and 0.99 (95% CI, 0.98-1.00) for specificity (positive likelihood ratio [LR] of 171 and negative LR of 0.11). Pooled estimates for IgA antitissue transglutaminase antibodies (7 studies) were 0.89 (95% CI, 0.82-0.94) and 0.98 (95% CI, 0.95-0.99), respectively (positive LR of 37.7 and negative LR of 0.11). The IgA and IgG antigliadin antibodies showed variable results, especially for sensitivity (range, 0.46-0.87 and range, 0.25-0.93, respectively). One recent study using diamidated gliadin peptides showed good specificity (≥0.94), but evidence is limited in this target population.
Conclusion Among adult patients presenting with abdominal symptoms in primary care or other unselected populations, IgA antitissue transglutaminase antibodies and IgA antiendomysial antibodies have high sensitivity and specificity for diagnosing celiac disease.
Abdominal symptoms are common in primary care, with an annual incidence of 35 to 40 per 1000 individuals.1 Chronic abdominal symptoms can adversely affect daily functioning and quality of life.2,3 For the primary care physician, the diagnostic challenge is to discriminate between patients with functional gastrointestinal problems only and those with organic disease, such as celiac disease.
Quiz Ref IDCeliac disease is a gluten-sensitive systemic disorder that primarily affects the small bowel.4,5 Estimates of the prevalence of celiac disease in the population vary widely, but most large-scale studies in Western Europe and the United States report prevalence rates between 0.5% and 1.0%.6,7 However, many patients have unrecognized celiac disease.8,9 This may partly be explained by a lack of symptoms in those with latent or silent celiac disease, but the identification of symptomatic celiac disease is also complicated by the broad spectrum of presenting symptoms, which often include diarrhea, weight loss, abdominal distension, malaise, and anemia.4,10
Celiac disease can have serious long-term consequences, including fertility impairment, stillbirth and dysmaturity, osteoporosis, and malignancy.5,11,12 Diagnosing celiac disease is important because a gluten-free diet typically resolves symptoms and can prevent long-term consequences.4,5 Confirmation of the diagnosis requires histological assessment of small-bowel biopsy material. However, most patients presenting with abdominal symptoms will not have celiac disease. Therefore, primary care physicians also aim to avoid unnecessary diagnostic testing.
Numerous studies have investigated the diagnostic accuracy of serological tests for celiac disease, but most have been performed in secondary care, and results of diagnostic test performance vary greatly across different settings and populations.13 The objective of this systematic review is to summarize evidence on the performance of diagnostic tests for the identification of celiac disease in primary care and other populations with a similar prevalence or spectrum of disease. This review focuses on tests available in primary care (ie, serum antibodies) and on specific gastrointestinal symptoms in adults because uncertainty regarding celiac disease diagnosis is especially large in this patient group.
MEDLINE (beginning in January 1966) and EMBASE (beginning in January 1947) through December 2009 were searched using Medical Subject Headings, EMTREE terms, and free text words, and included subsearches related to the index test, target condition, study population, design, and publication type (full strategies in eSupplement 1 and eSupplement 2). Reference lists of all included diagnostic studies, reviews, meta-analyses, and guidelines were checked for eligible studies. All citations identified by the search were checked by one author (D.A.W.M.vdW.), while another author (P.J.) independently checked eligibility of all abstracts assessed by the first author as possibly relevant. Disagreements were resolved by discussion. Full publications were retrieved to decide on elgibility; and a manual search was conducted of references for additional relevant studies.
Studies were eligible for inclusion when the study population consisted of adults and the prevalence rate of gastrointestinal symptoms was 50% or greater. Primary care was the setting of interest, but in some countries primary care is not well-defined or specialist services are directly accessible (not requiring physician referral). Therefore, studies performed in open-access outpatient clinics were also included, as long as the prevalence of celiac disease was less than 15%, suggesting a similar spectrum of disease as in primary care.14,15 Studies of hospitalized patients or children were excluded.
Diagnostic studies using a cohort design were included, as well as nested case-control designs in which consecutive cases of celiac disease were compared with controls with functional gastrointestinal problems (eg, chronic diarrhea or irritable bowel syndrome) sampled from the same baseline cohort. These control groups reflect primary care populations presenting with abdominal symptoms. All other case-control designs (using healthy controls or controls with a specific disease), studies for which no diagnostic 2 × 2 table could be reconstructed, and reports in languages other than English, Dutch, German, or French were excluded.
Included studies had to confirm celiac disease using small-bowel biopsy and histology. Studies had to assess gastrointestinal symptoms or serum antibody tests (alone or a combination) because these are accessible in primary care.
Two reviewers (D.A.W.M.vd.W. and P.J.) independently conducted data extraction and quality assessment. Quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool,16 recommended by the Cochrane Collaboration.17 The modified version consists of 11 items on study characteristics with the potential to introduce bias (see eTable 1). Items were scored as positive (no bias), negative (potential bias), or insufficient information. It was decided a priori to explore whether selection (spectrum) bias and verification (workup) bias explained variation in diagnostic performance. These aspects are important in diagnostic research on celiac disease and have been shown to influence the results of diagnostic performance.18,19
Diagnostic 2 × 2 tables and performance measures were calculated per test, using MetaDiSc statistical software version 1.4 (Clinical Biostatistics Unit, Ramón y Cajal Hospital, Madrid, Spain).20 Bivariate analysis was used to calculate pooled estimates of sensitivity, specificity, and likelihood ratios (LRs) along with 95% confidence intervals (CIs) for the summary estimates.21 The bivariate model preserves the 2-dimensional nature of diagnostic data by analyzing the logit-transformed sensitivity and specificity of each study in a single model, and takes into account both within-study and between-study variability, in contrast to the Littenberg and Moses method22 that departs from a fixed-effects model. Pooled estimates of sensitivity, specificity, and LRs (LR for a positive test result is indicated as a positive LR; and for a negative test result as a negative LR) were computed when 4 or more studies on a specific index test showed sufficient homogeneity (4 studies are needed to estimate parameters). Statistical homogeneity was defined as overlapping 95% CIs of both sensitivity and specificity and differences in point estimates among the studies of less than 20%.14,15 In cases of statistical or clinical heterogeneity (in terms of characteristics of populations and test characteristics), ranges were presented instead of pooled estimates. When pooled estimates could be calculated, predictive probabilities were calculated based on the average prevalence of celiac disease in included cohort studies.
Subgroup analyses were conducted to determine whether the following factors explained variation in performance: (1) prevalence of celiac disease (≤5% vs >5%); (2) design (cohort or nested case-control); (3) whether or not all patients or only a proportion (50%-99%) had gastrointestinal symptoms; (4) whether or not IgA deficiency had been assessed because cases may be missed when a limited serological screening is used; and (5) selection and verification bias. When subgroups showed distinct estimates, with more homogeneous results for sensitivity and specificity within each category, analyses were stratified and pooled estimates of diagnostic parameters were calculated for each category. Sensitivity analyses were performed to assess the effect of verification bias by calculating pooled estimates of diagnostic performance, excluding studies with a high risk of verification bias.
The searches identified a total of 1234 unique citations, of which 263 were potentially relevant and were screened by a second reviewer (P.J.). After initial evaluation, 133 full articles were retrieved, of which 17 articles23-39 (two31,32 reporting on the same study) were considered eligible for the review (N = 6085 patients; see the eFigure).
Table 1 provides information on design, setting, population, reference standard, and index tests. The 16 studies included 3 nested case-control studies31,32,36,38 and 13 cohort studies, 3 of which used a retrospective design by collecting information from medical records.27,28,39 Only 3 studies were performed in primary care populations.25,27,39 The prevalence of celiac disease ranged between 2% and 13%.
Celiac disease was confirmed by small-bowel biopsy in all studies; however, histological criteria varied across studies and were not always clearly described. In 8 studies, patients were started on a gluten-free diet following positive histological testing; however, in 4 studies,36-39 it was not clear if a positive response to change in diet was required to confirm the diagnosis. In 1 study,24 the reference standard additionally included a positive gluten challenge test in seronegative patients.
The results of quality assessment are presented in eTable 1. On average, the reviewers disagreed on 3 of 11 items (range, 0-6). All disagreements were resolved by consensus. Diagnostic review bias and period between index and reference test were often poorly described. Four studies performed well, receiving a positive assessment of at least 8 of 11 items.24,29,30,36 A frequent shortcoming concerned valid selection of the study population, leading to potential spectrum bias in all but 3 studies.24,29,36 Four studies referred only patients with positive antibody test results for small-bowel biopsy, resulting in partial verification bias.25,33,35,39 Two additional studies showed a high risk of verification bias because patients received different types of reference tests depending on index test results,27 or only a selection of patients received the reference test.34
The diagnostic performance of individual abdominal symptoms was investigated in 6 studies (Table 2).25,26,28,29,33,35Quiz Ref IDThe sensitivity and specificity of diarrhea for celiac disease varied widely, ranging from 0.27 to 0.86 for sensitivity and from 0.21 to 0.86 for specificity. Results for other symptoms (constipation, weight loss, abdominal pain, nausea, flatulence) also showed substantial heterogeneity. The LRs were close to 1.00, indicating poor performance for abdominal symptoms. A primary care study reported the lowest sensitivity combined with the highest specificity for diarrhea (0.27 and 0.86, respectively) and constipation (0.18 and 0.74, respectively).25
Pooled estimates were not calculated because of wide variation in results. For diarrhea, the number of studies was large enough to conduct 3 subgroup analyses (Table 3), showing that heterogeneity could not be explained by the diagnostic workup (verification bias) or differences in celiac disease prevalence. Sensitivity of diarrhea tended to be higher (range, 0.45-0.86) in studies including solely patients with gastrointestinal symptoms compared with studies in which a proportion of patients presented with such symptoms (range, 0.27-0.50).
Two studies analyzed the performance of a combination of symptoms in the identification of celiac disease.25,30 Catassi et al25 analyzed the value of meeting irritable bowel syndrome criteria in the identification of celiac disease in a primary care population, and reported a sensitivity of 0.32 and specificity of 0.73. In a large population of patients referred by general practitioners for endoscopy, having either diarrhea, weight loss, or anemia was associated with a sensitivity of 0.92 and specificity of 0.65 for diagnosing celiac disease (positive LR, 2.65 and negative LR, 1.12).30
Nearly all studies (n = 14) reported on the diagnostic performance of serum antibodies either alone or in combination: IgA antigliadin antibodies (IgA-AGA, 6 studies), IgG-AGA (5 studies), IgA antiendomysial antibodies (EmA, 8 studies), IgA antitissue transglutaminase antibodies (IgA-tTG, 7 studies), IgG-tTG (3 studies), and the more recent test for diamidated gliadin peptides (DGP, 1 study). Ten studies identified patients with IgA deficiency, resulting in the identification of 0 to 6 IgA-deficient patients with celiac disease per study.
Most studies reporting on the diagnostic value of IgA-AGA23,29,32,36-38 showed high specificity (0.70-0.98; negative LR, 0.14-0.55), but variable sensitivity (0.46-0.87; positive LR, 2.59-41.9), which precluded the statistical pooling of results (Table 4). None of these 6 studies were conducted in a primary care population. Subgroup analyses were possible for 2 factors. Study design or testing for IgA deficiency could not explain variation in diagnostic performance (Table 3).
Specificity of IgG-AGA was high in most studies (range, 0.80-0.99; negative LR, 0.08-0.76), but sensitivity showed wide variation (range, 0.25-0.93; positive LR, 4.38-18.6). One study, conducted in a primary care population of military personnel, reported sensitivity of 0.88 and specificity of 0.84.39 Subgroup analyses (Table 3) indicated little influence of study design on sensitivity and specificity, although positive LRs tended to be higher for cohort studies (range, 5.49-18.63) compared with nested case-control studies (range, 4.38-4.67). Sensitivity was lower in studies that tested for IgA deficiency (range, 0.25-0.42) compared with studies in which IgA testing was not performed (range, 0.62-0.93).
Eight studies, including 3 primary care studies, examined the performance of EmA and showed fairly homogenous results.24,26,27,29,35-37,39 However, 1 study37 reported a high proportion of false-negatives (Table 4). Pooled estimates were 0.90 (95% CI, 0.80-0.95) for sensitivity and 0.99 (95% CI, 0.98-1.00) for specificity (positive LR, 171 and negative LR, 0.11). Sensitivity was less than 100% in 4 studies with false-negative rates ranging between 11% and 26%.26,29,36,37 Three studies27,35,39 used diagnostic workup and showed verification bias (patients with negative serological findings were assumed to have no celiac disease), although sensitivity analysis showed similar pooled estimates when these studies were excluded (sensitivity, 0.87; specificity, 0.99). Given a mean prevalence of 9% in 7 cohort studies, these results yield predictive values of 0.90 for a positive EmA test result and 0.99 for a negative EmA test result.
Results of 7 studies, including 1 in primary care, showed fairly homogenous results for the diagnostic performance of IgA-tTG (Table 4).24,25,29-31,36,38 Pooled estimates using bivariate analysis were 0.89 (95% CI, 0.82-0.94) for sensitivity and 0.98 (95% CI, 0.95-0.99) for specificity (positive LR, 37.7 and negative LR, 0.11). Sensitivity analysis excluding studies with a high risk of verification bias showed similar estimates, although the positive LR was lower (29.7). Given a mean prevalence of 5.5% in 4 cohort studies, these yield predictive values of 0.72 for a positive IgA-tTG test result and 0.99 for a negative IgA-tTG test result.
Only 3 studies analyzed diagnostic performance of IgG-tTG antibodies.31,38,39 Rashtak et al31 reported poor sensitivity (0.27) of IgG-tTG testing. In contrast, the study by Yagil et al39 in a primary care population reported very good results, but showed verification bias because patients with negative serological results did not undergo further testing and were assumed not to have celiac disease.
The search identified only 1 study in which the performance of IgA-DGP, IgG-DGP, or the combined test (IgA+IgG-DGP) was investigated (Table 4).31,32 Rashtak et al31 reported specificities of 0.94 or higher for each test (negative LR, 0.26-0.35), but these were associated with lower sensitivities (range, 0.65 to 0.75; positive LR, 13.3-40.4).
Diagnostic performance of test combinations are presented in eTable 2. As expected, sensitivity decreased and specificity increased when combinations of positive serum antibody test results were required to diagnose celiac disease. Optimal results were achieved by combining a positive IgA-tTG and a positive EmA test result, with a sensitivity of 0.81, specificity of 0.99, positive LR of 121, and negative LR of 0.19.29 Hopper et al30 investigated combinations of a risk score based on symptoms and anemia with results of IgA-tTG testing. A strategy based on either a high-risk score or a positive IgA-tTG test result in those at low risk was considered by the authors to be the optimal strategy to identify likely cases of celiac disease with a 100% sensitivity and 61% specificity (positive LR, 2.5 and negative LR, 0.01).
Our review demonstrates widely varying results of diagnostic performance of presenting gastrointestinal symptoms in the identification of celiac disease. Evaluation of serological tests showed good performance of IgA-tTG and EmA, but there was wide variation in sensitivity and specificity of IgA-AGA, IgG-tTG, and IgG-AGA.
Serological tests have been extensively studied, and our results confirm those of previous reviews.13,40,41Quiz Ref IDHowever, the evidence base is small for patients presenting with abdominal symptoms in primary care, even though primary care is the setting in which diagnostic uncertainty is large and early identification of celiac disease should be facilitated. The drawback of primary care research is that not all participants can receive an invasive reference test, and the 3 primary care studies in this review25,27,39 showed verification bias with only patients having positive test results going on to receive small-bowel biopsy. Sensitivity analyses, however, did not show large effects of verification bias on pooled estimates of diagnostic parameters.
The spectrum of disease and population characteristics are important determinants of diagnostic performance, and the prevalence of disease is a good indicator of this effect.42 This review only included studies with a low prevalence of celiac disease to increase the clinical homogeneity of studies and present evidence relevant to primary care. Rostom et al13 showed that positive predictive values of serological tests decreased in populations with prevalence rates below 20%, and conversely, predictive values of negative test results increased in these populations. Lower positive predictive values imply more false-positive test results, and thus, possibly more unnecessary testing, which is an important concern in primary care.
Characteristics of index and reference tests may also influence diagnostic performance. Several test characteristics varied across studies, such as the definition of symptoms (eg, for abdominal pain or diarrhea), criteria for a positive serological test, testing for IgA deficiency, tissues used for serological tests (human, monkey, or guinea pig), and diagnostic criteria of celiac disease. Cutoffs used for serological tests varied, but there was no clear association with sensitivity or specificity (Table 4). Characteristics of serological testing, including reliability and observer agreement, may vary across locations and laboratories, possibly leading to differences in performance.
Most studies used similar histological criteria for diagnosing celiac disease (Marsh grade ≥III), but the level of damage may vary across populations. Only 4 studies29,31,35,36 presented the proportion of patients in whom only partial villous atrophy was found (Marsh grade of IIIA), which ranged from 4% to 100%. The presence of positive serum antibodies has been shown to correlate with the degree of villous atrophy, and patients with celiac disease who have less severe histological damage may have seronegative findings.43,44 This could be important, especially in primary care, in which levels of mucosal damage may be lower, and consequently, more patients with celiac disease may be missed.
The prevalence of celiac disease in primary care patients presenting with gastrointestinal symptoms is 2% to 4%.8,25,45Quiz Ref IDHowever, gastrointestinal symptoms in primary care are common and screening all patients for celiac disease is neither necessary nor efficient. In addition to patients with a positive family history of celiac disease, patients with longstanding or refractory abdominal symptoms may be more likely candidates for screening, and several case-finding studies appear to confirm this.6,25,46
So which diagnostic strategy should be recommended? Our review shows that gastrointestinal symptoms alone are not sufficiently accurate. Some serological tests (IgA-tTG and EmA) showed good performance, but not one single serological test seems to be sufficient to identify all cases of celiac disease. The EmA may have better test performance, but sensitivity was poor in some studies, and the test is more expensive, complex, and operator-dependent, with larger interobserver variation.36Quiz Ref IDTherefore, the simpler IgA-tTG test, which is automated and reliable, is often recommended as the first step in examination, followed by the EmA to confirm a likely diagnosis of celiac disease and need for referral for biopsy.5,25,47 However, test performance may change when the LRs for the EmA and IgA-tTG tests are calculated after being used in isolation and they may not be valid when the tests are applied sequentially. Given the strong dependence of diagnostic performance on prevalence and spectrum of disease, the effectiveness of a sequential strategy should be investigated in a primary care setting. Future research may also include a diagnostic randomized trial comparing the costs of different diagnostic strategies and their effects on treatment decisions and subsequent patient outcomes, including symptoms and signs, quality of life, and the consequences of false-negative and false-positive test results.
Strengths of this review include the use of current methods for searching evidence, quality assessment, and meta-analysis. The review was limited to cohort studies and nested case-control designs, discarding information from many case-control studies. This increased the validity and clinical relevance of our findings because cohort studies provide more valid estimates of diagnostic accuracy in clinical settings.18,19 Nested case-control designs may have an increased risk of selection bias and clinical review bias. Although subgroup analyses seemed to indicate that study design did not explain variation in diagnostic performance, the number of studies was small, limiting the power of these subgroup analyses.
Pooled estimates were only calculated for studies showing sufficient clinical and statistical homogeneity. I2 or Q tests (commonly used in meta-analysis) are not recommended for assessing statistical homogeneity in diagnostic reviews48 because they do not take into account the association between sensitivity and specificity. We defined statistical homogeneity as overlapping 95% CIs combined with less than 20% variation between point estimates. However, the cutoff of 20% for variation across point estimates was arbitrary. Using a 15% cutoff would have resulted in a decision not to pool estimates of EmA and IgA-tTG. The sensitivity analyses for verification bias showed consistent estimates, which may indicate that the pooled estimates of sensitivity and specificity for these 2 tests were robust. Further methodological development is needed to provide clear guidelines for assessing statistical homogeneity in diagnostic meta-analysis.
The population of interest was adults presenting with abdominal symptoms, but celiac disease may present with a wide range of other atypical symptoms. For optimal identification of all patients with celiac disease, diagnostic testing should be considered in patients presenting with other symptoms or health conditions in which the prevalence of celiac disease may be high, such as iron-deficiency anemia, infertility, type 1 diabetes, Down syndrome, and reduced bone density.5,47,49
In conclusion, in adult patients presenting with chronic abdominal symptoms, symptoms alone are insufficient for diagnosing celiac disease. The IgA-tTG and EmA tests show good performance, but the evidence in primary care populations is limited. Further research should investigate the performance of a diagnostic algorithm, using sequential serological testing in patients with chronic or refractory abdominal symptoms in primary care.
Corresponding Author: Daniëlle A. W. M. van der Windt, PhD, Arthritis Research UK National Primary Care Centre, Primary Care Sciences, Keele University, Keele, Staffordshire ST5 5BG, UK (firstname.lastname@example.org).
Author Contributions: Dr van der Windt had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: van der Windt, Jellema, Mulder, Kneepkens, van der Horst.
Acquisition of data: van der Windt, Jellema, Kneepkens.
Analysis and interpretation of data: van der Windt, Jellema, Kneepkens.
Drafting of the manuscript: van der Windt.
Critical revision of the manuscript for important intellectual content: Jellema, Mulder, Kneepkens, van der Horst.
Statistical analysis: van der Windt.
Obtained funding: van der Windt, Mulder, van der Horst.
Administrative, technical, or material support: Jellema.
Study supervision: van der Windt, van der Horst.
Financial Disclosures: None reported.
Funding/Support: This study was supported by grant 945-06-001 from the Netherlands Organization for Health Research and Development, the Hague, the Netherlands.
Role of the Sponsor: The funding organization played no role in the design and conduct of the review; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
Create a personal account or sign in to: