Copyright 2003 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2003
To systematically review the accuracy of modern laboratory tests for the diagnosis of serious bacterial infection in newborns.
The MEDLINE, EMBASE, and Cochrane Library databases were searched using the keywords newborn, infection, sepsis, and diagnosis. We included studies published from 1995 through 2001 that included infants younger than 90 days with proven bacterial growth in a sample from a sterile site. Whenever possible, relevant data were extracted to calculate likelihood ratios (LRs) for whether each test can diagnose a serious bacterial infection. Two independent reviewers selected and reviewed the articles (interobserver agreement, κ = 0.80). All disagreements were resolved by consensus.
Of the 137 citations we retrieved, 37 articles met the inclusion criteria; 17 studies, evaluating 11 different tests, met the highest methodological criteria. The most commonly evaluated test was interleukin 6 (IL-6) level (n = 7 studies). The remaining tests were each evaluated in no more than 3 studies. Positive LRs ranged from 1.5 to ∞. Six individual tests examined in 8 studies had LRs of more than 10 (range, 12.5-∞). Combined tests also had a wide range of LRs (3.4-9.9). All studies were performed in single medical centers and had small sample sizes, making recommendations according to gestational age criteria difficult.
We found few methodologically rigorous studies of the accuracy of laboratory tests for the diagnosis of bacterial infection in newborns; in a significant proportion of studies, the accuracy of the tests could not be independently determined because of a lack of adequate data. There was marked heterogeneity in sample selection and cutoff levels for diagnosis of neonatal sepsis. A few tests showed promising accuracy, but there are insufficient data to support their confident use as clinical tools.
CLINICIANS ARE frustrated by the limitations in the diagnosis of neonatal sepsis and would benefit from reliable tests to diagnose sepsis early in its course. Currently, no single test fulfills the criteria of an ideal diagnostic test.1,2 In neonatology, tests using hematological indices or acute-phase reactants, such as C-reactive protein (CRP), remain in widespread use despite continuing concerns about their reliability. These concerns largely stem from the demonstrated marked variations in the predictive accuracy of hematological parameters.1 We wished to assess the validity of several newly available immunological markers, including acute-phase reactants other than CRP, and inflammatory mediators, levels of which have been claimed to assist in the diagnosis of neonatal sepsis (Table 1).3 Therefore, we examined studies of various diagnostic tests with reference to their methodological rigor.
We identified all relevant articles on bacterial infection in newborns using the following MeSH headings: newborn, infection, sepsis, and diagnosis. We searched the National Library of Medicine's PubMed database, EMBASE, and the Cochrane Library's Cochrane Controlled Trials Register for the years 1995 through 2001. Our search was limited to English-language articles with human data; editorials, commentaries, letters, and reviews were excluded. We searched the bibliographies of review articles, and we also attempted to obtain pertinent missing data from authors.
In addition, we used the following criteria and definitions to select relevant articles. Inclusion criteria were that (1) the diagnostic tests being evaluated were considered to be "new" tests (ie, excluding hematological parameters, such as immature-total neutrophil ratio, or the commonly used acute-phase reactant CRP); (2) the postnatal age of the infants studied was younger than 90 days; and (3) studies were focused on serious bacterial infections, and true infections were proven by a criterion standard. Because of the uncertainty surrounding the concept of clinical sepsis in neonates, we focused only on studies in which the criterion standard for diagnostic tests was unequivocal proof of bacterial infection (ie, bacterial growth) in cultures of blood or cerebrospinal fluid (CSF) samples from sterile body sites. (We did not include urinary tract infections because they are an uncommon cause of sepsis in newborns unless they are accompanied by general sepsis and because for validity, these samples must be obtained by invasive measures that many neonatologists are reluctant to employ.) We defined clinical sepsis as sepsis not meeting the definition of true infection. If studies reported positive bacterial growth from endotracheal tube aspirates, with or without changes on chest radiographs, in the absence of blood or CSF cultures positive for bacterial growth, we considered this to be clinical sepsis.
We excluded studies that examined antenatal tests, including amniotic fluid tests, and studies in which data for antenatal and neonatal infection could not be separated.
Two of us (A.M. and C.P.S.H.) assessed each article for methodological quality independently to establish that all selected studies allowed us to distinguish true bacterial infections from clinical sepsis. These 2 authors also attempted to extract data for an independent calculation of sensitivity, specificity, and likelihood ratio (LR). Each article was then independently assessed by 2 of us (A.M. and C.P.S.H.). Adjudication (H.K.) and subsequent consensus resolved all disagreements regarding the inclusion of studies and the extracted data.
If studies provided adequate data, 2 × 2 tables were created to calculate sensitivity, specificity, positive predictive value, negative predictive value, and LRs. Following the recommendations of Sackett et al,2 an a priori rule was defined for this study whereby an LR of less than 10 was considered unlikely to affect clinical diagnosis. Confidence intervals were recorded for the LRs in studies that presented adequate data to perform an independent calculation of pretest probabilities and LRs (StatsDirect statistical software, version 1.9.8; StatsDirect Ltd, Sale, England; Confidence Interval Analysis, version 2.0, Wilson Method, T. Bryant, 2000). If we were not able to independently extract raw data, LRs were calculated from sensitivity and specificity values provided by the authors.
The literature search generated 137 citations. Of these, 49 were potentially relevant for inclusion based on a review of the abstract. More detailed review resulted in good interobserver agreement between the 2 reviewers regarding eligible articles; full agreement was reached for 42 articles, and consensus discussion was required for only 7 articles (Cohen κ = 0.80). We excluded one study that reported on 548 blood samples but did not specify the number of infants from whom they were drawn,4 making it impossible for us to determine whether multiple samples were from single subjects. Another article5 also included multiple samples from single subjects; however, this article provided the sample size from which the specimens were drawn. Although this limits our ability to fully interpret the data, we chose to include this article because the denominator was provided. In total, 37 articles met the inclusion criteria and were assessed for methodological quality.
Of the 37 studies that were included, 17 (46%) clearly distinguished clinically septic infants from those who had true bacterial growth according to our criteria (Table 2).5- 21 The cumulative sample sizes of these studies were small, even for the most frequently applied diagnostic tests. Because we deliberately chose bacterial growth as a criterion standard, the remaining 20 studies could not be further analyzed. A list of references found but that did not meet inclusion criteria can be obtained from the corresponding author (H. K.).
The 17 studies that met our methodological criteria assessed 11 new tests and a total of 299 septic infants. Of these studies, 7 (41%), enrolling a total of 68 septic infants, provided adequate raw data to allow independent calculation of sensitivities, specificities, and LRs along with confidence intervals (Table 3). In 10 studies, we could only use the values as originally calculated by the authors. From the study by Messer et al,7 we could extract data for interleukin (IL) 6 levels but not for levels of the tumor necrosis factor (TNF) receptors p55 and p75. Similarly, the study by Silveira and Procianoy10 examined 3 diagnostic tests (IL-6, TNF-α, and IL-1β levels), but no data are provided for IL-1β levels. The authors concluded that "IL-1β is not a good marker of neonatal sepsis."10(p650) In the absence of numeric data, we have omitted it from Table 3. Three studies reviewed more than 1 diagnostic test used in combination in an attempt to enhance diagnostic accuracy (Table 4). In the 17 included studies, the most common new reported test was IL-6 level, which was examined by 7 separate studies that enrolled a total of 92 septic and 524 nonseptic infants. The remaining tests were assessed by no more than 3 studies each (cumulative range, 4-72 septic infants).
The cutoff laboratory values that were chosen to distinguish between the presence and absence of infection appear to be unique to each study. Franz et al13 employed 2 separate cutoff values for IL-8 in 2 separate study periods, which allowed a comparison of these values (Table 3). In a later study, Franz et al14 provided more data using the second cutoff value. Franz et al13,14,22 also reported on similar data sets employing IL-8 and procalcitonin levels. We included 2 of these studies13,14 because we could not determine how much they overlapped. We annotated data from all 3 subsets of infants described by these researchers (Table 3 and Table 4). We used the authors' own cutoff values whenever available. Gendrel et al,20 who examined 13 septic infants, did not specify any cutoff level to demarcate between infected and noninfected newborns. However, the authors did provide a scattergram that allowed us to make a distinction. In order to use the study by Gendrel et al,20 we employed the cutoff value for procalcitonin used by Franz et al14 in their similar study.
The range of test sensitivity and specificity, both calculated and reported, was large (Table 3). Sensitivities ranged from 57% to 100%, and specificities ranged from 43% to 100%. Similarly, positive LRs ranged from 1.5 to ∞ (Table 3). Six tests in 8 studies had LRs of more than 10.5,8,9,11,15,18,20,21 Of these studies, we were able to perform an independent verification that the positive LR was more than 10 in the study of procalcitonin by Gendrel et al20; the studies of neutrophil CD11b by Weirich et al18 and Nupponen et al15; the study of IL-6 by Bhartiya et al11; and the study of IL-8 by Nupponen et al.15 In total, 2 of 3 studies that evaluated procalcitonin levels20,21 had a positive LR of more than 10.
We also assessed the accuracy of combinations of tests evaluated in 3 studies (Table 4), although these studies did not provide adequate raw data to allow calculation of LRs. All were small studies; the largest, by Franz et al,13 enrolled only 26 septic infants. None of these test combinations had a positive LR of more than 10.
The rapidly evolving understanding of the molecular physiological processes underlying sepsis and technical advances in biochemical testing hold promise for rapid accurate diagnosis, although it should be remembered that bacterial growth requires at least 12 hours by commercial automated testing. Table 1 addresses the clinical utility of these putative new tests. However, in this systematic review of the accuracy of these newer diagnostic tests, we were unable to provide a summary in a simple statistical form that clinicians could use. This reflects several methodological issues. Perhaps the most striking is that the predominance of small studies, all using such differing approaches, makes formal meta-analysis unproductive. Had studies been of large enough size and power, it would have been possible to report on test characteristics by birth weight and gestational age, because risk of neonatal sepsis is likely to be different with both.
The marked heterogeneity in studies of tests for neonatal sepsis has also been noted by authors reviewing older diagnostic tests.1,23 It is worth emphasizing that, among studies, enrolled subjects were heterogeneous, varying by postnatal age, gestational age, and risk factors. Each study consisted of small numbers of patients from single medical centers, which differed with regard to types of patients (eg, surgical, cardiac, medical, and inborn and outborn) and their demographic characteristics. Often, we were uncertain about the exact nature of the neonatal population in which the diagnostic test was studied because most studies reported only gestational age, sex, and birth weight. Even the prevalence of sepsis in the nurseries studied was recorded in only a small minority of articles. In addition, a wide range of cutoff values was employed, and no study used previously reported values; instead, authors chose to use unique cutoff values, making comparison of these tests difficult. Most articles (Table 3) did not report whether blood culture and lumbar puncture were performed before infants received antibiotics. We were left to assume that these tests were performed first if there was clinical suspicion of sepsis and that antibiotic treatment was instituted afterward. Attempts to contact the authors of several studies to obtain further raw data were by and large unhelpful. These methodological problems prevented us from providing a concise statistical summary in the form of a meta-analysis.
Because some clinicians are more familiar with sensitivity than LRs, we provide these values. Where possible, however, we emphasize LRs because this statistic offers potential advantages compared with measures such as sensitivity and specificity.24,25 Likelihood ratios may be more useful than sensitivity and specificity, largely because of their independence from prevalence.26,27 This is relevant to our review because the prevalence of neonatal sepsis (whether nosocomial or peripartum) appears to vary considerably, from 5% to 42% in a recent study.28 We could not find the true prevalence in most studies included in this review. The prevalence of sepsis varies according to birth weight, gestational age, and the characteristics of the neonatal nursery, including its proportion of infants undergoing medical vs surgical treatment.
Sackett et al2 suggest that positive LRs of less than 10 are unlikely to greatly enhance posttest probabilities. We chose to follow this rule in assessing how useful these studies are to clinical practice. Only 8 individual studies examining 6 tests had LRs outside this range. We have not emphasized negative LRs, although we also report these (Table 3 and Table 4). Taking the top prevalence range of Brodie et al28 at 42%, even an extremely negative LR would likely reduce the posttest probability of the patient having sepsis to 5%, as calculated using the LR nomogram in Sackett et al.2 We remain unconvinced that a clinician would accept a 5% risk in choosing not to treat.
A common diagnosis in neonatology is clinical sepsis, without confirmation from blood or CSF cultures. Combining clinical sepsis and documented infections dilutes the true sepsis rate, which should be the denominator for rates. We have avoided these possible problems by specifically examining only studies that enrolled infants with clinical signs leading to identification of an unequivocally true bacterial infection (ie, bacterial growth). Less than half the initial studies met this criterion, and less than half the remaining studies provided adequate data to enable an independent confirmation of the statistics, including the LRs.
In contrast to our insistence on an unequivocal criterion standard, Mehr and Doyle,29 in reviewing selected cytokine levels as markers of bacterial sepsis in newborn infants, chose to examine data from all infants, including those with clinical sepsis. They reviewed articles examining TNF-α, IL-6, and IL-8 levels that were published from 1966 to 1999 in English-language journals listed in MEDLINE. They argued that including "probable/suspected" sepsis was more relevant to "real life" clinical circumstances. However, a strict and uniform definition of culture-negative sepsis was not adhered to in the reviewed studies. In addition, the clinical terms were often not defined, nor were the cutoff levels for various tests consistent across studies. Mehr and Doyle also raised concerns about false-positive results due to skin contaminants and the lack of stringent definitions to distinguish contaminants from true infections, leading to a falsely high positive predictive value. Although we did not attempt to address this in our review, a previous study, for which age-matched and birth weight–matched controls were specifically recruited, showed a lower rate of positive blood cultures in nonsymptomatic infants.30 Mehr and Doyle's overall concerns about the methodological rigor of this field of study were similar to ours.
We recognize that there are other tests for infection, such as heart rate analysis,31 sepsis scores,32 and urinary tests.33 However, we did not include these, because they were not in the scope of our review. In addition, we did not include articles examining CRP because they had previously been reviewed1 and CRP tests appear to be as inaccurate as more standard hematological tests.
We conclude that serious methodological flaws plague current studies that aim to improve the diagnosis of neonatal bacterial sepsis with modern tests. In particular, the predominance of studies at single medical centers with small sample sizes makes it difficult to apply the tests in clinical decision making. Furthermore, diagnostic tests were not applied to differing populations with a mix of inborn and outborn infants, gestational ages, birth weights, and levels of acuity (level 1, 2, and 3 neonatal intensive care units), which made generalizability a problem. A few diagnostic tests remain promising, of which IL-6 level is the most intensively studied, probably because of its acknowledged importance as an alarm cytokine. In addition, procalcitonin levels appear to show considerable promise as a diagnostic test for neonatal sepsis. To give clinicians a firmer recommendation, studies of adequate size and using rigorous methods are now needed to enable estimates of the diagnostic accuracy of these new tests.
Corresponding author and reprints: Haresh Kirpalani, BM, MSc, FRCP, Department of Pediatrics, McMaster University Medical School, Room 3N27A, 1200 Main St W, Hamilton, Ontario L8N 1C5, Canada (e-mail: email@example.com).
Accepted for publication February 5, 2003.
Dr Hui received salary support in 2002 and 2003 from a GlaxoSmithKline, Canadian Institutes of Health Research, and Canadian Infectious Disease Society Infectious Disease Research Fellowship.
We reviewed all the literature on diagnostic tests for neonatal sepsis published since the last substantive review in 1995. Although many newer tests are being evaluated, this study highlights the fact that few of these tests have been evaluated with methodological rigor. The problems of significant heterogeneity in sample selection and cutoff levels for diagnosis of sepsis persist. A few tests look promising, but until larger multicenter trials are performed, they should not be employed in routine clinical practice.
Malik A, Hui CPS, Pennie RA, Kirpalani H. Beyond the Complete Blood Cell Count and C-Reactive ProteinA Systematic Review of Modern Diagnostic Tests for Neonatal Sepsis. Arch Pediatr Adolesc Med. 2003;157(6):511-516. doi:10.1001/archpedi.157.6.511