Customize your JAMA Network experience by selecting one or more topics from the list below.
Bailey WC, Gerald LB, Kimerling ME, et al. Predictive Model to Identify Positive Tuberculosis Skin Test Results
During Contact Investigations. JAMA. 2002;287(8):996–1002. doi:10.1001/jama.287.8.996
Author Affiliations: Divisions of Pulmonary and Critical Care Medicine (Drs Bailey, Gerald, Kimerling, Brooks, and Dunlap, and Messrs Bruce and Duncan), General Internal Medicine (Dr Kimerling), and Biostatistics (Dr Tang), Schools of Medicine (Drs Bailey, Gerald, Kimerling, Tang, Brooks, and Dunlap, and Messrs Bruce and Duncan) and Health-Related Professions (Drs Gerald and Brooks), and Department of Biostatistics, School of Public Health (Dr Redden), University of Alabama at Birmingham; and Alabama Department of Public Health, Division of Tuberculosis Control, Birmingham (Ms Brook).
Context Budgetary constraints in tuberculosis (TB) control programs require
streamlining contact investigations without sacrificing disease control.
Objective To develop more efficient methods of TB contact investigation by creating
a model of TB transmission using variables that best predict a positive tuberculin
skin test among contacts of an active TB case.
Design, Setting, and Subjects After standardizing the interview and documentation process, data were
collected on 292 consecutive TB cases and their 2941 contacts identified by
the Alabama Department of Public Health between January and October 1998.
Generalized estimating equations were used to create a model for predicting
positive skin test results in contacts of active TB cases. The model was then
validated using data from a prospective cohort of 366 new TB cases and their
3162 contacts identified between October 1998 and April 2000.
Main Outcome Measure Tuberculin skin test result.
Results Using generalized estimating equations to build a predictive model,
7 variables were found to significantly predict a positive tuberculin skin
test result among contacts of an active TB case. Further testing showed this
model to have a sensitivity, specificity, and positive predictive value of
approximately 89%, 36%, and 26%, respectively. The false-negative rate was
less than 10%, and about 40% of the contact workload could be eliminated using
Conclusions Certain characteristics can be used to predict contacts most likely
to have a positive tuberculin skin test result. Use of such models can significantly
reduce the number of contacts that public health officials need to investigate
while still maintaining excellent disease control.
Recent tuberculosis (TB) research has focused on case identification
and treatment with little attention given to contact investigation. The traditional
concentric circle approach of defining contacts as either close or casual
based on risk assessments presents difficulties in defining a close contact
and in determining when to end the search for contacts.1-3
Past research pertaining to contact infection has focused on single variables
as risk factors: patient factors, such as smear and culture status or cavitary
contact factors, such as age,10 immunosuppression
(or human immunodeficiency virus [HIV]),11-13
and poverty status14; and environmental exposure
factors, such as shared household contact,9,15
ventilation of the exposure environment,16-19
and duration of exposure.1,3 No
study of TB transmission has simultaneously evaluated information on case,
contact, and environmental exposure factors.
Recent reports by the Institute of Medicine and the Advisory Council
for the Elimination of Tuberculosis cited the importance of developing more
effective methods of identifying contacts with a high risk of infection.20,21 The University of Alabama at Birmingham
and the Alabama Department of Public Health conducted such a study and developed
a model of TB transmission to show which variables best predict a positive
tuberculin skin test (TST) result among contacts to active TB cases.
Prior to the study, we formulated standardized operational definitions
of all variables associated with contact investigation.22
Investigators recorded and observed contact screening interviews and held
focus group discussions with TB field staff and area managers of the Alabama
Department of Public Health. These activities showed significant disagreement
on definitions of variables related to contact investigation. A behavioral
intervention was developed to train TB staff to gather consistent data on
each of the precisely defined variables collected during patient treatment
and contact investigation.22-24
The behavioral intervention was developed using social cognitive theory in
the context of health education and included instruction, demonstration, and
practice with feedback and assessment. The primary training mechanism of the
behavioral intervention was a task-oriented workshop. Training included a
review of the risk factors for TB infection, an overview of interviewing skills,
the introduction of the standardized contact screening protocols and the computer
module used to collect the data, as well as practice scenarios for contact
investigation and screening. Quality control was ensured by monthly review
of field staff reports by area managers and review of the computer modules
to determine the extent of missing data, the number of errors made by staff
entering the data, and the number of prompts required during the data-entry
process. In addition, monthly discussions were held with area managers and
select TB field staff to determine their adherence to the use of the standardized
definitions. Follow-up educational interventions were performed as necessary.
Staff were trained to enter data on a laptop computer, which was transferred
weekly to area servers and then via modem to the University of Alabama at
Birmingham.22 Cases were defined according
to Centers for Disease Control and Prevention criteria.25
Only confirmed TB cases were used in the model. Case and contact demographics,
characteristics of the exposure environment, all field investigation activities,
as well as laboratory and clinical data were entered into the database.
The state of Alabama has 11 public health areas. Each area has a manager
responsible for all TB-related activities. The sample used to develop the
model included 292 consecutive cases with a total of 2941 contacts identified
from January 1 through October 15, 1998. Data were collected from 366 new
consecutive TB cases and their 3162 contacts identified from October 16, 1998,
through April 2000 for a validation sample. Mass screenings at prisons and
nursing homes and school screenings unrelated to specific contact investigations
were excluded. This study was approved by the University of Alabama at Birmingham
institutional review board and the Alabama Department of Public Health.
The TST result was the primary outcome variable and provided a surrogate
measure for recent transmission of disease. While not a perfect measure of
recent exposure with transmission, it is the primary measure used in all epidemiological
contact investigations.3 The TST results were
considered positive with induration of 5 mm or more, which is standard for
contact investigation. There were 47 readings between 5 and 9 mm and the analysis
was similar using either 5 or 10 mm as a positive reaction. The test is administered
by injecting 0.1 mL of 5 TU (tuberculin) intradermally (Mantoux method) into
the volar aspect of the forearm, reading the millimeters of induration between
48 and 72 hours after injection.26 Contacts
whose initial TST result was negative were given a second test 10 to 12 weeks
later. If either the first or second test result was positive, contacts were
considered positive. Persons known to have a positive skin test result 60
days or more prior to the date the case was reported were considered not to
represent recent infection from the case in question and were eliminated from
the analysis (98 contacts). When contacts were exposed to more than 1 case,
the area manager determined the primary case for that contact.
Explanatory variables with multiple outcome categories were often collapsed
for analytical purposes. For example, smear and culture status were both defined
as dichotomous variables (negative vs positive) rather than grading the degree
of positivity. Case age was grouped into 3 categories (<15, 15-65, and
>65 years) to determine if transmission differences existed between children
and adolescents, adults, and older adults. Contact age was grouped into 5
categories (≤4, 5-14, 14-24, 25-64, and ≥65 years) defined by the clinicians
prior to analysis according to differences thought to exist regarding infection
rates among age groups. Cases were considered to have cavitation by radiograph
result, which was confirmed by film review. Ventilation of the exposure environment
was rated on an ordinal scale defined as follows: 1 = ventilation situation
of closed windows and doors; 2 = window/fan exhaust; 3 = window air conditioner
unit; 4 = central air conditioner/heat; 5 = completely open to the outside.
Because a contact can be exposed in multiple environments, the lowest ventilation
rating of all environments in which the contact was exposed was used in the
model. The size of the exposure environment was also rated on an ordinal scale
(1 = size of a vehicle or car; 2 = size of a bedroom; 3 = size of a house;
and 4 = size larger than a house). The total number of times per month the
contact was exposed to the case (no matter what the duration of each time)
as well as the number of hours per month (accounting for each separate time
and duration) were collected.
Certain variables (including positive HIV status for contact and whether
the case was homeless) were thought to be important in determining the probability
of infection of individual contacts; however, there were too few contacts
with these traits to use these variables in the model. Current smoking status
of the case was significant but not included due to the large amount of missing
data. There was almost no missing data for all the other variables.
Generalized estimating equations (GEEs)27
were used to obtain a model for predicting a positive TST result in contacts
of active TB cases. The GEE analysis is necessary to analyze this data because
outcomes for the contacts having a TB case in common are not independent of
each other. Thus, outcomes within clusters of contacts are correlated, which
is taken into account by the GEE. The sample used to develop the model included
292 consecutive cases with a total of 2941 contacts identified during January
to October 1998. The selection of predictors began with a univariate analysis
of all variables. Those variables significant at the .10 level were retained
for inclusion in the generalized estimating equation analysis (Table 1a). To obtain our model, we used a backward elimination method
with a significance level of .10. Examination of influential observations
and clusters within the GEE model was performed.28
All analyses were performed using SAS software.29
To calculate the predicted probability of a positive skin test result,
one must first use the GEE to calculate a given contact's log odds of a positive
skin test result (Table 2). This
log odds can then be converted back to a predicted probability using the following
To use the model, one must choose a predicted probability level above
which all contacts will be examined. To determine this probability level cut
point, we compared the sensitivity and specificity of different cut points
using the classification table shown in Table 3.
Data were collected from 366 new consecutive TB cases and their 3162
contacts from October 16, 1998, through April 2000 for a validation sample.
Since this data set was significantly larger than that used to develop the
model, it was divided into 3 data sets using random sampling without replacement
to compare results for consistency. The data sets were created by randomizing
contacts; therefore, cases could be included in more than 1 of these smaller
data sets. Using several data sets to validate our model more efficiently
examines its generalizability. The 3 data sets included 1030, 1052, and 1080
contacts, respectively (Table 4).
The model was tested in these data sets and the sensitivity, specificity,
positive predictive value, false-negative rate, and false-positive rate were
During the period in which the data were collected, Alabama had a TB
incidence rate of 8.8 per 100 000, which was the sixth highest rate in
the country. Characteristics of the TB cases and their contacts used to create
the model are shown in Table 5.
The mean number of contacts per case was 10, but this is quite variable from
case to case and ranged from 1 to 181. The median number of contacts per case
was 4 with no statistical differences in the number of contacts investigated
by sex, race, or age of the active case. Significant differences did exist
in the number of contacts investigated by clinical characteristics of the
case.30 The overall infection rate among all
contacts was approximately 20%.
Table 1 shows the univariate
analysis and Table 6 shows the
results of the GEE model. Variables are displayed in 3 characteristic domains:
case, contact, and environmental exposure. Three hundred seventy-seven contacts
were eliminated from the GEE analysis due to missing data. If any variable
was missing for a contact or its associated case, the contact's information
was not used in model development. Ten cases and their contacts were considered
highly influential in the fit of the GEE model. These influential cases were
removed from the modeling process. For the final model, collinearity was examined
and no significant concerns existed. Interactions were examined and none improved
the predictive ability of the model. Using the model outlined in Table 6, we evaluated different probability
levels predicting risk of transmission in a classification table showing their
test characteristics such as sensitivity, specificity, false-positive, and
false-negative rates (Table 3).
A cut-point value with a higher sensitivity avoids missing infected individuals
but sacrifices specificity. Cut points with higher specificity miss greater
numbers of infected individuals but require fewer public health resources.
Thus a cut point can be chosen depending on the characteristics of patient
populations and availability of public health resources. Our goal was to improve
the efficiency of our contact investigations with minimal sacrifice of efficacy.
Most importantly, we did not want to miss contacts likely to have been recently
infected. Based on our current knowledge 2 assumptions seemed reasonable:
Not all contacts will become infected; the percentage probably
lies somewhere between 20% and 30%.1,3,20
Some people already had TB infection before this particular exposure
occurred; this "background rate" varies with age, socioeconomic status, and
country of origin. A reasonable estimate for Alabama is 5% to 10%.31
Considering these assumptions, we chose a cut point in which the false-negative
rate was close to the presumed background rate, yet allowed for a substantial
reduction in the number of contacts examined. A cut point of 0.10 reduced
the number of contacts to be investigated by 40% ([783 + 54]/2118) while maintaining
a false-negative rate of less than 7% (Table 3). Using this cut point results in a false-positive rate
of 80%, which is consistent with an infection rate of between 20% and 30%.
The cut point of 0.10 was used to test our original model in 3 prospective
samples of cases and contacts (Table 4).
Results were consistent among the new data sets with an average sensitivity,
specificity, and positive predictive value of 89%, 36%, and 26%, respectively.
The sensitivity increased by approximately 7% for the new data sets and the
specificity decreased by about 7%. In all 3 data sets, the false-negative
rate remained between 5% and 10%.
To illustrate the sensitivity and specificity of other cut points, we
have included receiver operating characteristic curves for both the model-building
data set and the validation data sets (Figure
Our results show that specific case, contact, and environmental exposure
characteristics can predict which contacts of TB cases are most likely to
have a positive TST result. In our model, 7 variables were determined to be
statistically significant. The mean sensitivity was 89% and the mean false-negative
rate was 7% when tested prospectively in 3 new populations. This analysis
indicated there are 3 variables that we deem to be particularly clinically
relevant: case has a positive smear, case has cavitary disease, and total
hours exposed to the contact each month. These variables are almost immediately
available to the field worker on identification of a TB case and indicate
that the case is likely to transmit TB to his/her contacts.
To choose an appropriate cut point for determining which contacts to
investigate, one must examine the model's sensitivity and specificity at each
probability level (Table 3) and
assess characteristics of the local population, local priorities, and available
resources. Sensitivity represents the probability of the model to correctly
predict a positive TST result; whereas specificity denotes the probability
of the model to correctly predict a negative TST result. Altering the cut
point at which you choose to investigate a contact will influence both the
sensitivity and specificity of the model. Lowering the cut point means more
people who actually have a positive TST result will be predicted by the model
to be positive (increased sensitivity); however, you would also spend additional
resources investigating false-positives (persons incorrectly predicted by
the model to be TST positive). For example, a state with large resources to
allocate to contact investigation might choose a cut point of 0.06, which
would allow them to investigate approximately 10% fewer contacts but maintain
a sensitivity of 97% and a false-negative rate of 5% (Table 3). This approach might be particularly appropriate in a state
with a low infection rate. On the other hand, increasing the cut point improves
the specificity, but one would fail to investigate larger numbers of infected
contacts (false-negatives). A state with fewer resources to devote to contact
investigation might choose a cut point of 0.20 allowing them to investigate
78% fewer contacts, but yeilding a model with lower sensitivity (42%) and
a higher false-negative rate (11%) (Table
While trade-offs always exist, sensitivity of a test should be increased
at the expense of specificity when the consequences associated with missing
a positive test result are high.32 The consequences
of missing a positive TST result representing recent infection with TB may
lead to spread of disease. This is particularly important if the contact is
an infant or is HIV-positive. Therefore, although the cut point we chose results
in a low specificity, missing recently infected contacts is less likely.
Another important consideration in determining an appropriate probability
level cut point is the background rate of positive TST reactors. The background
rate (the prevalence of TB or non-TB mycobacterial infection endemic in the
population) is not related to recent TB transmission and will vary with age,
geographic area, socioeconomic status, and country of origin. In the absence
of recent skin testing survey data, the true background rate is not known.
However, we do know atypical mycobacteria infection is relatively high in
Alabama.31 Ideally we would choose a cut point
in which the false-negative rate was equivalent to a precisely known background
rate and unlikely to represent recent transmission. A cut point producing
a false-negative rate of approximately 9% will mean the proportion of false-negative
results due to recent infection is less than 9%—perhaps appreciably
less. Therefore, such a model is unlikely to miss many positive reactors representing
The clarification and standardization of terminology on contact tracing
and interview skills22 coupled with training
courses minimized interobserver variation. Current work is focusing on using
alternative methods of analysis to create an algorithm for field workers to
use in prioritizing investigation of contacts. In addition, we anticipate
this model to serve as a tool for studying host genetic susceptibility and
resistance, as well as bacterial virulence and infectiousness, since it precisely
characterizes the pheonotypic and environmental aspects of recent transmission.
One limitation of this study is the large amount of missing data on
current smoking status for cases. Due to the limited amount of data available,
this variable was not included in the analysis. In addition, Alabama had few
TB cases in which the individual was homeless or had HIV or AIDS. States with
high rates of homelessness or cases of HIV or AIDS among TB cases need to
consider this limitation of our study in their contact investigations.
We believe our TB transmission model is a valuable tool for public health.
The model can be adapted to different disease and population conditions, reducing
the number of contacts public health officials need to investigate while maintaining
excellent disease control. The use of this model should allow public health
workers to substantially reduce the number of contacts investigated and save
valuable resources, which can be devoted to directly observed therapy and
other important disease-control activities. While this article emphasizes
the science of transmission, it is important to remember that contact tracing
is also an art requiring other forms of expertise and intuition. The use of
this method should in no way preclude the concept of extending contact tracing
in individual cases when a high percentage of contacts are found to be positive
reactors. Rather, the use of this model combined with the intuition and experience
of TB field workers can assist in reaching the goal of TB elimination while
ensuring efficient and effective use of public health resources.
Create a personal account or sign in to: