Catanzaro A, Perry S, Clarridge JE, Dunbar S, Goodnight-White S, LoBue PA, Peter C, Pfyffer GE, Sierra MF, Weber R, Woods G, Mathews G, Jonas V, Smith K, Della-Latta P. The Role of Clinical Suspicion in Evaluating a New Diagnostic Test for Active TuberculosisResults of a Multicenter Prospective Trial. JAMA. 2000;283(5):639-645. doi:10.1001/jama.283.5.639
Author Affiliations: University of California, San Diego Medical Center, San Diego (Drs Catanzaro, Perry, and LoBue); VA Medical Center, Houston, Tex (Drs Clarridge, Dunbar, and Goodnight-White); San Diego County Public Health Laboratory, San Diego (Dr Peter); Swiss National Center for Mycobacteria, Department of Medical Microbiology, University of Zurich, Zurich, Switzerland (Dr Pfyffer); State University of New York, Health Science Center at Brooklyn, Brooklyn, NY (Dr Sierra); Division of Infectious Diseases, University Hospital, Zurich, Switzerland (Dr Weber); Laboratory Medicine, Clinical Microbiology, University of Texas Medical Branch at Galveston (Dr Woods); Gen-Probe Inc, San Diego, Calif (Mr Mathews, Ms Jonas, and Dr Smith); and Pathology Department, Columbia-Presbyterian Medical Center, New York, NY (Dr Della-Latta). Dr Dunbar is now with Luminex, Austin, Tex; Dr LoBue is now with the Centers for Disease Control and Prevention, stationed at the San Diego County Department of Health TB Control Office. Dr Sierra is deceased.
Context In laboratory trials, nucleic acid amplification tests for the diagnosis
of tuberculosis (TB) are more accurate than acid-fast bacilli (AFB) smear
microscopy and are faster than culture. The impact of these tests on clinical
diagnosis is not known.
Objective To assess the performance of a nucleic acid amplification test, the
enhanced Mycobacterium tuberculosis Direct (E-MTD)
test, against a uniform clinical standard stratified by level of clinical
Design Prospective multicenter trial conducted between February and December
1996, documenting the clinical suspicion of TB at enrollment and using final
comprehensive diagnosis as the criterion standard.
Setting Six urban medical centers and 1 public health TB clinic.
Patients A total of 338 patients with symptoms and signs consistent with active
pulmonary TB and complete clinical diagnosis were stratified by the clinical
investigators to be at low (≤25%), intermediate (26%-75%), or high (>75%)
relative risk of having TB.
Main Outcome Measures Sensitivity, specificity, and positive and negative predictive values
of the E-MTD test in clinical suspicion of groups with low (n = 224); intermediate
(n = 68); and high (n = 46) clinical suspicion of TB.
Results Based on comprehensive clinical diagnosis, sensitivity of the E-MTD
test was 83%, 75%, and 87% for low, intermediate, and high clinical suspicion
of TB, respectively, and corresponding specificity was 97%, 100%, and 100%
(P = .25). Positive predictive value of the E-MTD
test was 59% (low), 100% (intermediate), and 100% (high) compared with 36%
(low), 30% (intermediate), and 94% (high) for AFB smear. Corresponding negative
predictive values were 99%, 91%, and 91% (E-MTD test) vs 96%, 71%, and 37%
Conclusions For complex diagnostic problems like TB, clinical risk assessments can
provide important information regarding predictive values more likely to be
experienced in clinical practice. For this series, a clincial suspicion of
TB was helpful in targeting areas of the clinical spectrum in which nucleic
acid amplification tests can make an important contribution.
Robert Koch's discovery in 1882 that tuberculosis (TB) is caused by Mycobacterium tuberculosis was a watershed in the history
of efforts to understand and control this deadly disease.1
Since then, diagnosis of TB has focused on the detection of M tuberculosis using basic techniques of acid-fast bacilli (AFB) smear
microscopy and culture. Detection of AFB in a smear requires more than 10,000
organisms/mL, and the test does not distinguish among mycobacteria. Even with
modern radiometric detection systems, identification of M tuberculosis can require 2 to 5 weeks.2- 4
Before its pathogenesis was understood, TB was recognized as a distinct
clinical entity. These characterizations remain an integral part of the medical
assessment. Presently, the most important criteria for establishing a presumptive
diagnosis are AFB smear and a case definition, which may be based on radiographic
signs, physiologic symptoms, risk factors, or a combination of these.5 In many Western countries, the declining incidence
of TB, combined with the human immunodeficiency virus (HIV) epidemic, has
increased the number of mycobacteria other than tuberculosis cases, further
impairing the reliability of the AFB smear to specifically predict TB.6- 8 In acute care settings,
as many as 8 to 10 patients are suspected to have TB for every confirmed case.9,10 Accurate laboratory tests that provide
results in a clinically useful time frame have the potential to affect the
performance of TB programs broadly, offering opportunities for more effective
patient management and more efficient allocation of scarce resources.
Nucleic acid amplification (NAA) tests11
represent a major advance in the diagnosis of TB. With the use of amplification
systems, nucleic acid sequences unique to M tuberculosis can be detected directly in clinical specimens, offering better accuracy
than AFB smear and greater speed than culture.12- 14
Early studies used in-house procedures based on polymerase chain reaction
with amplification of the IS6110 gene sequence.15,16 However, kits providing standard
formats and reagents are now available, making the amplification technologies
more practical for use in clinical laboratories.17- 24
Currently, 2 commercial kits—the Mycobacterium tuberculosis Direct (MTD) test (Gen-Probe Inc, San Diego, Calif) and the AMPLICOR
MTB (Roche Molecular Systems, Branchburg, NJ)—have been approved by
the Food and Drug Administration for use in respiratory specimens of previously
untreated patients with positive AFB smear results.
The majority of studies, to date, have been based on laboratory criteria
for diagnosis of disease, with clinical records used to evaluate discrepant
results. Only a few studies have examined the performance of the NAA tests
against clinical definitions of TB25,26
or test performance under routine testing conditions.27
The American Thoracic Society has stressed the need for additional studies
evaluating these and other emerging diagnostics against clinical reference
standards.28 This report describes the performance
of the Enhanced MTD (E-MTD) in a multicenter prospective trial that documented
the physician's suspicion of TB at the time of the initial examination and
a clinical and laboratory diagnosis of TB. The MTD, a 3-hour assay using transcription-mediated
amplification for M tuberculosis complex-specific
recombinant RNA,29 was first approved by the
Food and Drug Administration in 1995. The enhanced version, which accepts
a larger volume of sample and has a shorter processing time than the initial
kit, was approved in 1999 for use in smear negative and positive respiratory
samples. The objectives of the present work were to provide a uniform clinical
standard for diagnosis in the population studied, and to describe the E-MTD's
performance against this standard for different levels of clinical suspicion.
Between February and December 1996, 425 individuals suspected of having
active pulmonary TB were enrolled from 7 inpatient and specialized outpatient
facilities. Two sites were located in San Diego, Calif (a university hospital
and a public health TB clinic); 1 in Houston, Tex (a Veterans Affairs Hospital);
1 in Galveston, Tex (a university facility); 2 in New York, NY; and 1 center
in Zurich, Switzerland. All patients were identified for enrollment by hospital
or clinic site physicians based on suspicion of active pulmonary TB, including
but not limited to symptoms, risk factors, tuberculin skin test reaction,
and chest radiograph findings. Enrolling physicians were pulmonary or infectious
disease specialists with experience in the evaluation of patients for TB.
Patients were not eligible for enrollment if they received multidrug treatment
for TB for more than 7 days during the 3 months prior to enrollment. Diagnosis
and treatment decisions were made by site physicians according to local standards
of care. During the trial, physicians were blinded to results of the NAA.
All other laboratory results were available to physicians in accordance with
routine laboratory procedures.
Clinical information, including patient demographic information, medical
history, physical examination, and chest radiograph results, was collected
by clinical research staff at each site either directly from the patient,
or from the patient's chart. Completion of 3 standardized case reporting forms
was required for inclusion in the final study population. These forms were
completed at the first physical examination (initial enrollment form), at
the end of specimen collection or at discharge (first follow-up form), and
at 3 months follow-up (end-of-study form).
The initial enrollment form requested the site physician to estimate
the probability that the patient had TB by using a range from 0% to 100%.
No clinical guidelines were provided to physicians for determining this estimate.
The clinical suspicion of TB (CSTB) was based on the physician's clinical
judgment. Laboratory results available to the physicians for this assessment
may have included some AFB smears, and at some sites, the earlier version
of the MTD may have been available. However, the results of the E-MTD were
withheld from the chart and from the clinician.
To provide a uniform definition of TB across sites, criteria intended
to represent a conservative consensus standard for ruling in or out pulmonary
TB were established by an independent panel consisting of 3 experts in TB
diagnosis and treatment. Under these standards, the combination of high clinical
suspicion (>80%) and at least 2 positive cultures for M
tuberculosis from separate specimens was considered definitive evidence
of active TB. In the absence of these conditions, the cases were reviewed
by the independent expert panel and at least 2 of the 3 panel members had
to consider the patient to have TB for a positive ruling. The combination
of low clinical suspicion (<10%) and respiratory specimens consistently
negative for M tuberculosis was considered to constitute
definitive evidence for absence of active TB. In the absence of these conditions,
at least 2 of the 3 panel members had to consider the patient to be free of
TB for a negative ruling.
Without knowing the reason for referral, the panel reviewed all cases
not meeting criteria defined above. Possible actions of the panel were to
exclude a case from analysis due to insufficient information or to provide
a consensus clinical diagnosis by at least 2 of 3 panel members. Information
reviewed by the panel consisted of all case report forms, an assessment by
the laboratory director as to whether contamination occurred in any single
positive culture, and copies of initial and (when available) follow-up chest
radiographs. Copies of case report forms excluded the identity of the patient,
the site of enrollment, the CSTB, and the results of the NAA. As a quality-control
measure to assess interobserver agreement, 6 control cases were drawn randomly
from cases considered to meet the conservative case definitions. Panel determinations
in all 6 cases were unanimous and consistent with the site physician's diagnosis
as reported on the end-of-study form.
Briefly, 1 to 6 sequential respiratory samples, primarily expectorated
or induced sputa, were collected over 7 days following patient enrollment.
For this study, only the first sample of each specimen type was accepted per
day as a study sample. Specimens were processed within 3 days of collection,
stored at 2°C to −8°C, and an aliquot of resuspended sediment
was frozen for inhibition testing. Culture (Lowenstein-Jensen and BACTEC 460,
BD Biosciences Division, Sparks, Md; Middlebrook 7H10/7H11), AFB smear (auramine
O stain), and NAA tests were performed on each specimen. Experienced clinical
laboratory technologists performed the NAA assays according to manufacturer's
Data were pooled across all sites for analysis. Final comprehensive
diagnosis as determined by the panel review procedure was defined as the criterion
standard for computing sensitivity, specificity, and predictive values of
culture, E-MTD, and AFB smear. Laboratory results were modeled at the patient
level by defining a positive test result as the occurrence of at least 1 positive
test in a series of up to 6 specimens per patient.
Clinical Suspicion. Physicians' clinical suspicion estimates were grouped into 3 relative
risk categories: low (≤25%); intermediate (26%-75%); and high (>75%) probability
of TB. For each category, we report the number and proportion of patients,
the prevalence of prognostic symptoms and signs, and the observed prevalence
of disease based on the panel's clinical diagnosis at the end of the study.
The association of patient characteristics with risk group classification
was assessed in polychotomous logistic regression using high suspicion as
the reference category. Results of the univariate analysis are reported.
Performance of AFB Smear and E-MTD by Level of Clinical Suspicion.
Sensitivity and specificity of E-MTD and AFB smear are reported for
low, intermediate, and high categories of CSTB. Variations in these test properties
with respect to clinical suspicion level were assessed in dichotomous logistic
regression predicting the probability of a positive test result given final
clinical diagnosis and clinical risk classification. Statistical evidence
of variation by CSTB is reported as the χ2 statistic for a
linear model (low, intermediate, high) of clinical suspicion. This analysis
was conducted separately for AFB smear and E-MTD, and each model was validated
by the Hosmer-Lemeshow test.
Estimating Clinical Utility. Positive predictive values (PPVs) and negative predictive values (NPVs)
of E-MTD and AFB smear are reported for low, intermediate, and high categories
using the observed prevalence of TB in each group as a proxy for the prior
(pretest) risk of disease. Because this was a partially blinded observational
study, the objective of this analysis is descriptive, not prescriptive.
A total of 425 patients were enrolled at 6 sites in the United States
and 1 European site. Of these, 341 (80%) had complete clinical and laboratory
data. Eighty-four patients (20%) were excluded for the following reasons:
46 did not have at least 1 valid specimen, 8 received more than 7 days of
treatment for TB within the previous 3 months, and 30 had an incomplete set
of clinical data forms. Table 1
summarizes characteristics of the 84 excluded patients and the 341 patients
eligible for analysis. Among the 7 trial sites, the number of evaluable patients
ranged from 13 (4%) in Europe to 122 (36%) at a university hospital in San
Diego. Site exclusions averaged 23%, ranging from 11% (7 patients) to 43%
(10 patients) of a site's enrollments.
Of the 341 patients with complete study forms and valid laboratory results,
303 (89%) had an end-of-study report that satisfied the case definitions.
Thirty-eight cases (11%) were referred to the panel for further review, including
15 patients with a 10% or greater probability of TB and less than 2 positive
cultures, and 23 patients with an 80% or less probability of TB but at least
1 positive smear and/or culture. Three of these cases were excluded by the
panel because the patient record provided insufficient information on which
to base a final diagnosis. Of the remaining 35 cases, 11 were classified as
TB and 24 were classified as not TB. These determinations were unanimous in
22 cases (58%; 7 TB, 15 not TB) and rendered by two-thirds consensus in 13
cases (4 TB, 9 not TB). Following panel review, there were a total of 338
patients for analysis, including 72 (21%) considered to have active TB and
266 considered to be free of TB.
Among the 72 patients diagnosed as having active TB, 45 (63%) had 2
or more cultures positive for M tuberculosis in a
series of 6, 20 (28%) had 1 positive culture, and all cultures were negative
for M tuberculosis in 7 cases (10%). One to 6 specimens
collected on different consecutive days were accepted for the study. The average
number of specimens cultured per patient was 2.5 (2.6 with TB and 2.5 without
TB). When the culture result was compared with final comprehensive diagnosis,
the sensitivity of a series yielding only 1 positive culture was 90%, but
was 63% when 2 or more positive cultures were required to meet the case definition.
The number of specimens with M tuberculosis isolated
was established by the site physician end-of-study report and panel consensus.
One patient determined to be free of TB had a positive culture, resulting
in a specificity of 99.6%.
The average CSTB for patients with a final clinical diagnosis of TB
was 69% (median, 80%). For patients with a final clinical diagnosis other
than TB, the average CSTB was 20% (median, 15%). Two hundred twenty-four patients
(66%) had low CSTB, 46 patients (14%) had high CSTB, and 68 patients (20%)
had intermediate CSTB. Based on final clinical diagnosis, prevalence of TB
was 5% (low CSTB), 29% (intermediate CSTB), and 87% (high CSTB) (χ22 = 83; P<.001). A total of
103 patients (30%) were prescribed on anti-TB medication presumptively, including
11% of those in the low CSTB group, 49% of those in the intermediate CSTB
group, and 98% of those in the high CSTB group (χ22
= 64; P<.001).
Patient characteristics most significantly associated with CSTB group
are summarized in Table 2. The
most consistent predictor of CSTB level was a chest radiograph suggestive
of current disease. The proportion of patients with a suggestive chest radiograph
increased steadily from 25% in the low CSTB group, to 57% in the intermediate
CSTB group, to 87% in the high CSTB group. The presence of at least 2 major
symptoms predicted classification at an intermediate or high CSTB, with the
most important symptoms being cough lasting more than 2 weeks and recent weight
loss. Thirty-four percent of patients had a positive tuberculin skin test,
and 19% were known contacts of a TB case. Having at least 1 of these latter
characteristics was moderately associated with higher CSTB.
Overall sensitivity and specificity of the E-MTD test were 83% (95%
confidence interval [CI], 71%-93%) and 97% (95% CI, 95%-99%), respectively.
By level of clinical suspicion, sensitivity was 83% (low CSTB), 75% (intermediate
CSTB), and 87% (high CSTB). Specificity ranged from 97% at low CSTB to 100%
at intermediate and high CSTB. This variation was not statistically significant
(χ22 = 2.77; P = .25; Figure 1). By reference, performance of AFB
smear varied significantly by CSTB group (χ22 =
18.2; P<.001). Sensitivity of the AFB smear was
not statistically different for low and intermediate CSTB (42% and 25%, respectively, P = .33), but was significantly lower than sensitivity
at high CSTB (83%; high vs low, P = .008). Specificity
of the AFB varied inversely to sensitivity and also significantly by level
of clinical suspicion, ranging from 96% at low CSTB, 77% at intermediate CSTB,
to 67% at high CSTB (intermediate vs low, P<.001;
high vs low, P = .009). The numbers for calculating
sensitivity and specificity are provided in Table 3.
In evaluating a diagnostic test, documentation of a prior or pretest
risk is important for modeling PPVs and NPVs likely to operate in the clinical
setting. If the prevalence of TB in each CSTB group represented a pretest
risk, the predictive values shown in Figure
2 would be estimated. Based on final diagnosis, overall PPVs and
NPVs for the AFB smear were 67% (95% CI, 56%-78%) and 90% (95% CI, 86%-94%),
respectively. Corresponding values for the E-MTD were 88% (95% CI, 80%-96%)
and 95% (95% CI, 93%-98%). For the low CSTB group (5% prior risk), both tests
appeared useful for ruling out disease with NPVs of 96% (AFB smear) to 99%
(E-MTD). While neither test provided convincing evidence for ruling in disease
at this risk level, the E-MTD was potentially more useful with a PPV of 59%
compared with 36% for the AFB smear. Conversely, both tests appeared to be
useful for ruling in disease for the high CSTB group (87% prior risk), with
PPVs of 94% for the AFB smear and 100% for the E-MTD. However, the expected
NPV of the AFB smear was only 37%, compared with 91% for the E-MTD. The numbers
for calculating PPV and NPV are provided in Table 3.
The E-MTD appeared to offer greatest utility overall in the intermediate
CSTB (29% prior risk), demonstrating a PPV of 100% (vs 30% for AFB smear)
and an NPV of 91% (vs 71% for AFB smear). This clinically complex group included
20 TB cases (29%) and 40 cases (18%) free of active TB (Table 4). Mycobacteria other than M tuberculosis were cultured more commonly from patients in the intermediate CSTB
group than the low or high CSTB groups (22/48 [46%] vs 18% for low, 9% for
high). Cases of HIV infection were somewhat more frequent (40% vs 32% for
low, 11% for high). The TB and non-TB cases with CSTB estimates in the intermediate
range were equally likely to have a positive AFB smear (25% TB, 23% not TB).
The prevalence of the constellation of suggestive chest radiograph, cough,
and weight loss was also similar (20% vs 23%). Patients with TB in this CSTB
group were relatively more likely to have risk factors, such as contact exposure
or positive tuberculin skin test, than their counterparts at high or low CSTB.
The majority of studies assessing performance of the NAA tests have
used laboratory performance criteria. While the tests have performed well
under these conditions, appropriate clinical uses have been more difficult
to establish because clinical and laboratory definitions of disease may differ.
Whereas laboratory performance is based on culture growth and is usually presented
at the specimen level of analysis, clinical diagnosis is based on multiple
indications including clinical signs and symptoms and response to therapy
and laboratory results. Laboratory and clinical case definitions can measure
clinical utility differently as shown by Bradley et al,25
who compared the MTD against the classification system of the American Thoracic
Society, and Chin et al,26 who used empiric
case definitions with the Roche AMPLICOR. Because the clinical risk assessment
is more likely to reflect physician decision making, the American Thoracic
Society has recommended that the NAA tests be evaluated at different levels
of clinical suspicion.30
Several aspects of this multicenter trial advance these efforts. Patients
were selected for evaluation by clinicians based on usual workup indications
in symptomatic patients. Laboratory and clinical evidence was documented prospectively,
and clinical impressions of risk were documented both quantitatively and qualitatively.
A final comprehensive diagnosis as determined by an independent panel of experts
was used as the diagnostic reference standard. This imposed a uniform standard
of comprehensive diagnosis for analysis and was an important component of
this multicenter trial. Based on this standard, a single positive culture
had a sensitivity of approximately 90%. However, 38% of clinically determined
TB cases had at least 1 negative culture and 10% of all cases were referred
to the panel. Use of the clinical case definition solely to resolve test discrepancies
with culture, as is commonly done in laboratory trials, may not reveal the
discrepancy between laboratory and clinical reference standards.
In this trial, we asked enrolling physicians to quantify their degree
of suspicion (CSTB) using a scale from 0% to 100%. Regression analysis suggests
that these estimates were consonant with predictive signs and symptoms observed
with other TB scoring systems,10,31- 35
including suggestive chest radiograph, cough, and recent weight loss. As with
factor-based scoring systems, the meaning and distribution of these clinical
suspicion estimates necessarily reflects the population under study, both
patients and physicians, and these values are best understood as estimates
of relative, not absolute, risk. Factors likely to affect physicians' estimates
of relative risk include the customary prevalence of disease in the practice
setting, the clinical spectrum of disease, the specialty or experience of
the physician, and the quality of the medical history.36
Further research is needed to appreciate the interior characteristics of summary
risk assessments and to validate these measures in different practice settings.
Use of a CSTB is potentially informative for understanding the clinical
context in which laboratory results are used. Predictive values based on clinical
risk assessments like the CSTB are more likely to reflect pretest probabilities
operating in the clinical setting, and hence to reflect conditions under which
specific test attributes are needed. Although the average prevalence of disease
in this series was 21%, prevalence ranged from 5% in the low to 87% in the
high CSTB group, and 29% in the intermediate group. Characterizing the performance
of the E-MTD for low, intermediate, and high CSTB groups helped to characterize
the usefulness of the test in this population. When stratified for these risk
levels, the sensitivity and specificity of the E-MTD were higher than AFB
smear and stable, while sensitivity and specificity of the AFB smear varied
considerably. For this patient series, the E-MTD appeared to offer an improvement
in PPV (100%) for patients with CSTB estimates in the intermediate- or high-risk
range. The high and consistent specificity of the E-MTD also appeared to be
clinically valuable in excluding disease among patients with intermediate
or high CSTB estimates, offering an NPV of 91%, compared with 71% and 37%
for AFB smear in intermediate and high CSTB.
In this series, the CSTB was useful to document the potential for clinical
spectrum bias37- 39
in the performance of the AFB smear. Approximately 40% of individuals with
an intermediate CSTB were HIV-positive and nearly one third were ultimately
diagnosed as having mycobacteria other than tuberculosis infections. Conventional
diagnostic signs (AFB smear, chest radiograph, advanced symptoms) appeared
to offer limited distinction between TB and non-TB groups in this suspicion
range. Although sample size was limited for this analysis, the existence of
a broad and rather complex intermediate risk group (and the poor performance
of the AFB smear in this range) is consistent with clinical observation. Under
pressures of the HIV epidemic and new immigration patterns, clinical spectrum
of disease has changed,8,40 altering
also the relative utility of many conventional diagnostic tools.30
This has important implications for use of this test as a selection criterion
in clinical and cost-utility studies.
The present observational study had design limitations that preclude
it as a basis for setting clinical benchmark standards. The most important
of these was a partial blind (to the NAA test only), which permitted clinicians
access to some initial AFB smear results during formulation of clinical suspicion
estimates. Many patients in inpatient settings were enrolled from an isolation
ward, criteria for which can have included, but were not limited to, prior
workup by AFB smear. Spectrum and prevalence of disease also varied by trial
site, and sample sizes precluded site adjustments to test performance. With
these caveats, important strengths of the present study include use of a uniform
clinical diagnosis, a representative patient set, laboratory performance that
is consistent with that observed in previous laboratory trials involving the
NAA tests, and the clinical plausibility of findings in the experience of
pulmonary physicians. More rigorously controlled, fully blinded, head-to-head
studies with careful serial documentation of pretest risk are under way.
For this patient series, the CSTB helped to characterize important limitations
of a standard reference test, AFB smear, and the contribution a new technology,
NAA, could make. Although the study design was observational, we believe it
has laid important groundwork for continuing assessment of emerging diagnostics
for TB. Interdisciplinary study designs, demonstrating the performance of
laboratory tests in conjunction with clinical risk assessments, are needed.