Assessment of the Accuracy of Using ICD-9 Diagnosis Codes to Identify Pneumonia Etiology in Patients Hospitalized With Pneumonia

Key Points Question Are organism-specific International Classification of Diseases, Ninth Revision (ICD-9) administrative codes for pneumonia valid measures in identifying pneumonia etiology? Findings In this cross-sectional study of data from 161 529 patients hospitalized with pneumonia between 2010 and 2015, ICD-9 codes had generally low sensitivity but high specificity for pneumonia etiology identified by laboratory testing. Meaning In this study, ICD-9 codes appeared to underestimate prevalence of specific organisms.


Introduction
Although detailed clinical data represent the criterion standard for studying epidemiology, outcomes, and temporal trends in health care delivery, such data are cumbersome and expensive to collect. It is difficult to create research data sets large enough to represent the patient mix and the variety of health care settings; medical record abstraction requires intensive review by trained professionals and is subject to interobserver variability and observer bias. The Centers for Disease Control and Prevention directs surveillance of specific health care-associated infections captured by the National Hospital Surveillance Network and engages a small number of academic centers to collect data through the Centers for Disease Control and Prevention Epicenters Program, but these data are limited in scope. 1,2 In contrast, administrative data collected during routine clinical encounters for the purpose of reimbursement are copious, widely available, and generalizable. For these reasons, administrative data offer a potential alternative for some types of research. Administrative data have been used, for example, to evaluate temporal trends in pneumonia hospitalization and mortality, but there remains a paucity of efforts to validate administrative data with corresponding clinical information. 3 Administrative data can be imprecise, with claims-based algorithms for some conditions demonstrating lower mortality, length of stay, and costs than independent clinical review. 4 Validation studies testing the accuracy of pathogen-specific coding have been rare in hospitalizations for infectious diseases in general and in pneumonia in particular. To establish the validity of administrative data regarding pneumonia, we examined the performance of pathogenspecific administrative coding in comparison with corresponding microbiological data in the setting of community-onset pneumonia in a large multicenter US database.

Methods
In this cross-sectional diagnostic accuracy study, we studied patients hospitalized with pneumonia between July 1, 2010, and June 30, 2015, using data from 178 US hospitals in the Premier Healthcare Database. Data were analyzed from February 14, 2017, to June 27, 2019. Using microbiological evidence of a pathogen as the criterion standard (test results for blood or respiratory culture, urinary antigen, or polymerase chain reaction), we derived the performance characteristics (sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) of the corresponding ICD-9 organism codes as indicators of diagnosis. Because the data source was completely deidentified, the institutional review board of the Cleveland Clinic determined that this study was exempt from review and did not require informed patient consent. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline for diagnostic accuracy studies.
The Premier Healthcare Database is widely used for research and has been well described elsewhere. 5 Between July 1, 2010, and June 30, 2015, the number of participating hospitals increased from 461 to 592. In 2015, 75% of participating hospitals were in urban settings (census block groups or blocks have a population density of at least 1000 people per square mile) and 25% were rural by the US Census Bureau definition (any territory outside urban setting), 6  We included all patients aged 18 years or older who were discharged between July 1, 2010, and June 30, 2015, with either a principal diagnosis of pneumonia or with a principal diagnosis of respiratory failure, acute respiratory distress syndrome, respiratory arrest, sepsis, or influenza and a secondary diagnosis of pneumonia (details of the algorithm have been published previously). 7 In

JAMA Network Open | Infectious Diseases
Assessment of ICD-9 Diagnosis Codes to Identify Pneumonia Etiology in Hospitalized Patients addition, a blood culture, respiratory culture, pneumococcal urinary antigen, Legionella urinary antigen, or antibody tests or polymerase chain reaction targeting atypical bacterial pathogens (Bordetella pertussis, Chlamydophila pneumoniae, Mycoplasma pneumoniae) or viruses was required for inclusion. Included tests are listed in the eTable in the Supplement. Patients with a secondary diagnosis of cellulitis, cholecystitis, appendicitis, diverticulitis, perforated diverticulum, peritonitis, postoperative anastomotic leaks, or abdominal surgical site infections were excluded (Figure).

Statistical Analysis
Baseline characteristics of patients with pneumonia were summarized by frequency distributions for categorical variables and mean (SD) or quartiles for continuous variables. We cross-classified the presence vs absence of an ICD-9 code for each of the 12 common organisms identified as causing pneumonia in hospitalized patients with the presence or absence of a laboratory sample identifying that organism. From these cross-classifications, we calculated 4 measures of ICD-9 code performance in designating a laboratory-confirmed organism: (1) sensitivity, the fraction of patients with a positive laboratory finding of an organism for whom the ICD-9 code of that organism was present; (2) specificity, the fraction of patients without a positive laboratory finding for an organism for whom the ICD-9 code of that organism was also not present; (3) PPV, the fraction of patients with an ICD-9 code of an organism for whom a corresponding laboratory finding was present; and (4) NPV, the fraction of patients without an ICD-9 code for an organism who were also without a laboratory finding for the organism. For the 4 most common organisms, sensitivity, specificity, PPVs, and NPVs were also tracked across year and by whether the diagnosis was primary or secondary. For any particular laboratory test, patients without a result were assumed not to have had that test. Data management and analysis were performed with SAS statistical software version 9.4 (SAS Institute).

Results
The database included 515 684 patients before exclusions ( Figure);  for Pseudomonas species.
Among 34 263 (21.2%) patients with either an organism-specific code or laboratory evidence, concordance between the 2 existed in fewer than half (eFigure in the Supplement). Table 3 shows the characteristics of ICD-9 coding against the microbiology criterion standard for certain common Temporal trends for 5 selected organisms are given in Table 4. Despite year-to-year variance, there do not appear to be consistent trends in sensitivity, specificity, or PPVs.

Discussion
In this cross-sectional diagnostic study of more than 160 000 patients undergoing culture or antigen limited to pneumonia), and found sensitivity of 24% with a PPV of 31%, lower than our findings but

JAMA Network Open | Infectious Diseases
Assessment of ICD-9 Diagnosis Codes to Identify Pneumonia Etiology in Hospitalized Patients using a different coding approach. In a study focused on multidrug-resistant organisms, Burnham and colleagues 11 found that a higher rate of coding for MRSA was associated with infectious disease consultation, and counseled against using that ICD-9 code to estimate rates of multidrug-resistant organism infection in hospitals. The present study expands on this body of validation work by increasing the pool of common pneumonia pathogens beyond S pneumoniae and MRSA.
Understanding the epidemiology of pneumonia is important for resource allocation and risk prediction based on demographic characteristics and geospatial location. 12 The low sensitivity of administrative data with regard to specific microorganisms has implications for interpreting epidemiological studies. Smith and colleagues, 13 for example, used the Nationwide Inpatient Sample    to explore the association between introduction of pneumococcal vaccine and distribution of pathogens among admissions of patients with pneumonia. They reported a reduction in S pneumoniae among pneumonia codes following the year 2000, suggesting that the pneumococcal vaccination was conferring its desired benefits. Our findings that the S pneumoniae ICD-9 code identifies only 54% of culture-confirmed infections might call such an assertion into question.

JAMA Network Open | Infectious Diseases
However, if coding practices remained constant throughout the study time frame, relative reductions in longitudinal trends would be unaffected by such discrepancies. We found that over a 5-year period sensitivity of the S pneumoniae code declined slightly, while specificity remained constant.
The high specificities of administrative codes for individual uncommon organisms make administrative data well suited for deriving predictive models, because specificity is a primary component of PPV when prevalence is low. PPV exceeded 70% for influenza and all bacterial species except mycoplasma, suggesting that characteristics of patients who have codes for specific organisms may be representative of patients who actually have infection with those organisms. At least 9 models have been created to predict drug-resistant organisms, such MRSA and Pseudomonas species, in pneumonia. Nearly all models perform better than the health care-associated pneumonia criteria, but no single model is yet accurate enough to guide antibiotic stewardship. 14,15 More sophisticated models would be particularly useful as decision aids to help clinicians optimize empirical treatment while avoiding overuse of broad-spectrum agents. High specificity is critical to ensure the accuracy of such predictive instruments. Although missing cases may have minimal implications for the model's discrimination, low sensitivity of ICD-9 data could result in miscalibration, consistently underpredicting risk.
Similarly, changing resistance patterns over time are of concern to clinicians prescribing empirical antibiotic therapy. The Centers for Disease Control and Prevention National Healthcare Safety Network provides robust data describing trends in resistance, for example, changing prevalence of MRSA and the emergence of multidrug-resistant gram-negative bacteria. 16 Our temporal analysis shows that the association between administrative and laboratory data for MSSA and MRSA has been stable between 2010 and 2014, with a slight uptick in sensitivity in 2015 for both organisms. This finding suggests utility for efforts at more generalizable infection surveillance using administrative data.

Limitations
This study has limitations. The case selection algorithm may have insufficiently discriminated pneumonia from other infection diagnoses, and identified pathogens may represent colonization rather than infection. For example, many patients had culture growth but no corresponding ICD-9 Revision (ICD-10) coding was implemented in the US on October 1, 2015, and is far more detailed than ICD-9, making it difficult to study a population across the transition. Thus, our analysis is limited to 2010-2015. Nonetheless, the process by which codes are chosen has not changed, and it is possible to crosswalk from ICD-10 to ICD-9. In addition, it is not known how our results may be applicable to other large administrative data sets. Because this was a national sample, coding is likely to be similar, but it is possible that hospitals outside of the Premier Healthcare Database have different coding patterns. Validation in additional data sets would be welcome.