Each geometric shape represents a different type of patient (eg, patient with asthma admitted with different physiologic severity), and the varying sizes of each shape represent different characteristics of that patient type (eg, different ages). A template is formed from the national sample and each hospital is matched to the template using a 1:1, 2:1, or 3:1 ratio, depending on the available sample size at that hospital.
The Spearman correlation coefficient r between median cost (A) and trimmed LOS (B) was highly significant (r = 0.57; P < .001), the correlation between ICU utilization (C) and trimmed LOS was significant (r = 0.41, P = .01), and the correlation between median cost and ICU utilization was not significant (r = 0.03, P = .87). The straight solid lines within the scatterplots were constructed using robust linear model m-estimation.37,38,40
The x-axis of each graph represents the risk, estimated by predicted length of stay, for each template patient strata. The y-axis represents the difference in cost (focal minus control) inside each matched pair. A point falling on the horizontal line at 0 represents no difference in cost between the 2 patients in the matched pair, and a point falling below the line suggests a lower cost for the focal vs control patient. The solid lines represent the locally weighted scatterplot smoothing (LOWESS) line.41 LOWESS 95% CI bands (shaded areas) for the central tendency line were produced using the bootstrap method. A box plot at the bottom of each graph denotes the 5%, 25%, 50%, 75%, and 95% values of predicted risk over all strata. Each graph illustrates an individual hospital.
eTable 1.ICD-9 Codes for Asthma and Chronic Conditions
eTable 2. Summary of Cohort and Template
eTable 3. Model Results for Predicted Length of Stay
eTable 4. Resource Utilization Distribution by Hospital ($)
eTable 5. Length of Stay Distribution by Hospital
eTable 6. ICU Days Distribution by Hospital
eTable 7. Risk Synergy
eMethods. Details on Matching Methods and Cost Calculation Methodology
eFigure 1. Hospital Matching and Testing for Quality
eFigure 2. Median Cost by Length of Stay and ICU Utilization
eFigure 3. 90th Percentile Cost by Length of Stay and ICU Utilization
eFigure 4. Hospital Resource Utilization (Median Cost) vs ICU Utilization
eFigure 5. Hospital Resource Utilization (Median) vs Hospital Resource Utilization (90th Percentile)
eFigure 6. Hospital Resource Utilization (Median) vs Length of Stay (Trimmed Mean Days)
Customize your JAMA Network experience by selecting one or more topics from the list below.
Silber JH, Rosenbaum PR, Wang W, et al. Auditing Practice Style Variation in Pediatric Inpatient Asthma Care. JAMA Pediatr. 2016;170(9):878–886. doi:10.1001/jamapediatrics.2016.0911
Asthma is the most prevalent chronic illness among children, remaining a leading cause of pediatric hospitalizations and representing a major financial burden to many health care systems.
To implement a new auditing process examining whether differences in hospital practice style may be associated with potential resource savings or inefficiencies in treating pediatric asthma admissions.
Design, Setting, and Participants
A retrospective matched-cohort design study, matched for asthma severity, compared practice patterns for patients admitted to Children’s Hospital Association hospitals contributing data to the Pediatric Hospital Information System (PHIS) database. With 3 years of PHIS data on 48 887 children, an asthma template was constructed consisting of representative children hospitalized for asthma between April 1, 2011, and March 31, 2014. The template was matched with either a 1:1, 2:1, or 3:1 ratio at each of 37 tertiary care children’s hospitals, depending on available sample size.
Treatment at each PHIS hospital.
Main Outcomess and Measures
Cost, length of stay, and intensive care unit (ICU) utilization.
After matching patients (n = 9100; mean [SD] age, 7.1 [3.6] years; 3418 [37.6%] females) to the template (n = 100, mean [SD] age, 7.2 [3.7] years; 37 [37.0%] females), there was no significant difference in observable patient characteristics at the 37 hospitals meeting the matching criteria. Despite similar characteristics of the patients, we observed large and significant variation in use of the ICUs as well as in length of stay and cost. For the same template-matched populations, comparing utilization between the 12.5th percentile (lower eighth) and 87.5th percentile (upper eighth) of hospitals, median cost varied by 87% ($3157 vs $5912 per patient; P < .001); total hospital length of stay varied by 47% (1.5 vs 2.2 days; P < .001); and ICU utilization was 254% higher (6.5% vs 23.0%; P < .001). Furthermore, the patterns of resource utilization by patient risk differed significantly across hospitals. For example, as patient risk increased one hospital displayed significantly increasing costs compared with their matched controls (comparative cost difference: lowest risk, −34.21%; highest risk, 53.27%; P < .001). In contrast, another hospital displayed significantly decreasing costs relative to their matched controls as patient risk increased (comparative cost difference: lowest risk, −10.12%; highest risk, −16.85%; P = .01).
Conclusions and Relevance
For children with asthma who had similar characteristics, we observed different hospital resource utilization; some values differed greatly, with important differences by initial patient risk. Through the template matching audit, hospitals and stakeholders can better understand where this excess variation occurs and can help to pinpoint practice styles that should be emulated or avoided.
Asthma is the most prevalent chronic childhood illness and remains a leading cause of hospitalizations in children aged 1 to 15 years in the United States.1 Inpatient and emergency department treatment accounts for approximately one-third of all pediatric asthma-related health care costs.2 Since admissions for asthma are common and there are well-established clinical treatment pathways,3 one would expect similar best practices leading to equivalent use of resources, especially among hospitals that treat only children. However, considerable variation in resource utilization has been observed.4-6 In part, this variation may be owing to many potential choices in the management of inpatient asthma that reflect a myriad of clinical decisions, such as the use of diagnostic tests, bronchodilators, rescue inhalers, and inhaled and systemic corticosteroids, and in part it may be owing to differences in patient characteristics.7,8
We questioned whether resource utilization varies among hospitals caring for similar patients. To accomplish this, we used a new auditing methodology termed template matching9,10 to select a group of closely matched patients across hospitals. Template matching compares hospitals by selecting a reference template of patients and “stamping out copies” of template patients at each hospital through the use of multivariate matching.9-13 Determining whether resource utilization differs, even when examining similar patients, should help inform individual hospitals as to whether they may need to change their approach to care, and using a matching framework rather than regression modeling allows hospitals to identify specific groups of patients for whom care may be in need of change.
Question How much does practice style vary in the management of asthma admissions across children’s hospitals?
Findings A template-matched audit identified large, significant variation across hospitals. For the same template-matched patients, comparing practice style between the lower and upper eighth of hospitals, median cost varied by 87%, length of stay by 47%, and intensive care unit utilization by 254%.
Meaning Through the audit, hospitals can better understand which patient types are associated with increased or decreased resource utilization compared with other hospitals matched to the template, therefore helping to identify practice styles that should be emulated or avoided.
Data for this study were obtained from the Pediatric Hospital Information System (PHIS), an administrative database that contains inpatient, emergency department, ambulatory surgery, and observational data from 41 not-for-profit, tertiary care pediatric hospitals in the United States. These hospitals, all members of the Children’s Hospital Association (Overland Park, Kansas), represent some of the most technologically sophisticated facilities in the country. This study was approved by the institutional review board of The Children’s Hospital of Philadelphia; the need for informed consent was waived.
All nontransfer inpatient and observational unit nonresearch discharges for asthma occurring between April 1, 2011, and March 31, 2014, were considered if the patient was between ages 3 and 18 years. Asthma was defined using specific International Classification of Diseases, Ninth Revision, codes as reported in eTable 1 in the Supplement. Variables we matched on included age in days; sex; Medicaid status; common chronic conditions; asthma-affecting diagnoses; National Heart, Lung, and Blood Institute diagnoses of concern14; predicted length of stay (a risk score); a propensity score to be in the template; and asthma severity at admission. We included only the first asthma admission in the data set for each patient.
We constructed 500 templates, each consisting of 100 patients with asthma randomly sampled from the entire PHIS database. Of these, we selected 1 template with the smallest Mahalanobis distance15,16 to the mean of the entire database. Figure 1 describes the process of template creation and matching with the template. A description of the template is found in eTable 2 and the distance algorithm is described in the eMethods in the Supplement.
We desired a minimum of a 1.5:1 pool of patients at each hospital to be used to select matches to the template. This ratio would help to produce good quality matches. As shown in Figure 1, whenever possible we sought 300 patients matched to the template (using a 3:1 matching ratio). When this was not possible, we used a 2:1 ratio selecting 200 matched controls. When this was not possible, we used a 1:1 matching ratio producing a matched sample of 100 controls. All reported statistics account for these different matching ratios. The shared template of 100 patients permits hospitals of varied size to contribute as many patients as they can to the comparison of different hospitals.
We performed our matches using the R package MIPMatch.17-19 We required exact matches on asthma severity status (moderate, severe, or critical). For other variables we chose a balanced match that minimized the medical distance9,10 to template patients, defined using the Mahalanobis distance.15,16Medical distance indicates the level of difference between 2 patients in terms of medical covariates such as age, chronic illnesses, and presentation severity (eMethods in the Supplement).9,10,13,20
To improve the quality of the matches between the template and the specific hospital, we used “near fine balance”21-25; this method ensured that if the template had, for example, a 15% rate of upper respiratory tract infections for its 100 cases, each hospital provided a 15% rate of upper respiratory tract infections for its matched controls whenever possible, without requiring exact matches on that diagnosis for all patients in the hospital with respect to the template patient. A mean constraint was introduced on age at admission and a propensity score for being in the template. We also added a penalty value to the Mahalanobis distance for differences in predicted length of stay.
We examined the degree of similarity among the hospital-matched samples and tested the overall differences in patient characteristics and practice style variables across hospitals using the Kruskal-Wallis test, a nonparametric version of the 1-way analysis of variance test26 for each continuous variable of interest, and the Pearson χ2 test for each binary variable in a 2 × k table (with k indicating number of hospitals). Values of χ2/df below 1 indicate better balance than expected by random assignment, while values of χ2/df above 1 indicate worse balance than expected. These tests compare the balance achieved by matching with the balance that would have been expected had patients been randomly assigned to hospitals. When ranks of hospitals were tied, the mean rank is presented.
All matching was completed without knowledge of outcomes (in this study, the practice style variables), as suggested by Rubin.27,28 Unlike modeling, matching without knowledge of outcomes prevents researchers from selecting the most attractive of multiple analyses.
Ideally, for a fair comparison, every hospital would have treated the same 300 patients. Of course this is not possible; however, we can evaluate the fairness of each hospital comparison by determining whether the group of matched patients in each facility can be distinguished from the patients in the template or, alternatively, whether these 2 groups of patients appear to be a random split of 1 group of patients. We excluded hospitals that could not be matched closely, which is illustrated in eFigure 1 in the Supplement. We compared 16 baseline attributes by applying the Fisher exact test 16 times using the Simes29 method in an effort to control the false discovery rate in the 16 tests at a significance level of P = .05, as suggested by Benjamini and Hochberg.30
Once hospital matches were complete and of good quality, the facilities were compared on the following primary variables: total cost, length of stay, and the percentage of patients admitted to the intensive care unit (ICU), which was determined using PHIS service line codes. For this study, we defined ICU care using the PHIS service line codes for the pediatric ICU; ICU, unspecified; pulmonary care unit; and pediatric pulmonary care unit. The unit costs for each billing code were determined using a method similar to that of Keren et al,31 with modifications as described in detail in the eMethods in the Supplement, and were year specific. Each hospital’s costs were based on its resource use. To compare resource use not influenced by local charges, each billing code (the basis of counting the resources) was assigned a dollar value by applying a uniform formula across all hospitals. Costs were adjusted to 2014 prices using Bureau of Labor Statistics Consumer Price Index values for medical care items.32
We first test whether each matched sample at a hospital differed significantly from the matched controls at the other hospitals. For continuous outcomes, we used quantile tests33,34that determined whether each patient exceeded his or her own hospital’s median or 90th percentile value and then, in effect, used the Mantel-Haenszel statistic34,35 to test the equality of each hospital with the others in exceeding this value. To adjust for multiple testing, we determined whether the P value met the criteria for the Bonferroni correction using the cutoff of P < .05/k, with k indicating the number of tests (in this case, 37).
Specific advantage was defined as observing better patient outcomes in a specific or focal hospital rather than those of matched control patients from other facilities within the same matched set.13Risk synergy describes a situation during which, as admission risk increases or decreases, the specific advantage changes in a systematic way.13,20 For example, as admission risk increases, the focal hospital’s patients may have increasingly better outcomes than matched controls from other hospitals. When studying the specific advantage and risk synergy of the focal hospital compared with the matched control population, we tested for an interaction between patient-predicted risk strata and the main effect of the focal hospital. For discrete variables, such as use of the ICU, we tested synergy using a Mantel test for trend,36 and continuous variables were tested with robust regression,37,38 evaluating the interaction between an indicator for admission to the focal hospital and a linear term for average patient risk in the matched set (while also adjusting for the hospital main effect). Each of the patients within each stratum was assigned a risk defined by the mean predicted length of stay in the stratum as defined by the external predicted length of stay model used for matching (eTable 3 in the Supplement). If the Mantel test found a significant trend in a discrete variable or robust regression identified a significant interaction between escalating patient-predicted risk and the hospital’s performance on a continuous variable compared with matched controls despite controlling for the hospital’s main effect, the hospital was considered to demonstrate risk synergy for that variable.
Of 41 hospitals in the PHIS, 40 institutions were available with complete patient records for the full study period, with 859 997 patients in the PHIS data set meeting reporting requirements and therefore available for analysis; of these, there were 64 466 patients (7.5%) admitted with an asthma diagnosis. After excluding patients transferred from other hospitals, there were 48 903 patients. Further exclusions for illogical departmental billing costs yielded a final sample of 48 887 patients eligible for study.
All 40 hospitals met the minimum volume requirement for matching (>150 patients), and exact matching on asthma severity category was possible for 37 facilities, all of which also passed the match quality criteria. There were 24 hospitals with a 3:1 match, 6 hospitals with a 2:1 match, and 7 hospitals with a 1:1 match. Table 1 describes the 37 hospitals in the data set that were successfully matched to the template and available for analysis. After matching (n = 9100 patients; mean [SD] age, 7.1 [3.6] years; 3418 [37.6% females) to the template (n = 100 patients; mean [SD] age, 7.2 [3.7] years; 37 [37.0%] females), there was no significant difference in observable patient characteristics at the 37 hospitals meeting our the matching criteria.
Patients appeared to be different before matching but similar after matching in variables controlled by matching, as noted by the significant variation in patient characteristics across hospitals observed in the random sample before matching.
Table 1 also lists significant differences in practice style variables across hospitals. For the template-matched population, comparing utilization, median cost was 87% higher ($3157 vs $5912 per patient; P < .001) between the lower eighth39 (12.5th percentile) and upper eighth (87.5th percentile) of hospitals. Total hospital length of stay was 47% higher (1.5 vs 2.2 days; P < .001) between the 12.5th and 87.5th percentiles, and ICU utilization was 254% higher (6.5% vs 23.0%; P < .001).
Table 2 presents the variation in practice style across hospitals, ranking on median and 90th percentile cost, 90th percentile length of stay, and ICU utilization rate (eTables 4-6 in the Supplement include details). There was minimal variation in each hospital’s median length of stay (all were either 1 or 2 days). However, there was significant variation in each hospital’s 90th percentile for length of stay and ICU utilization rate. Although some hospitals, such as hospital JJ, performed relatively poorly across all practice style variables and some did consistently well, such as hospitals C and N, others displayed inconsistent results. Hospital CC was expensive with very long lengths of stay but was below average with respect to the percentage of patients using the ICU. Similarly, the most expensive hospital (KK) also was below average in ICU utilization.
Across all 37 hospitals, there was a significant correlation between median cost and hospital length of stay (Spearman correlation coefficient, 0.57; P < .001), with a similar partial correlation between median cost and length of stay controlling for ICU utilization (controlled r = 0.62; P < .001). The Spearman correlation coefficient between ICU utilization and median cost was nonsignificant (r = 0.03; P = .87) and the partial correlation controlling for hospital length of stay was also nonsignificant (controlled r = −0.28; P = .09). In other words, median cost was more related to hospital length of stay than ICU utilization. The correlation between length of stay and ICU utilization and trimmed length of stay was significant (r = 0.41; P = .01). Figure 2 illustrates the associations between all 3 practice style variables as well as the distribution of each practice style variable across all hospitals (eFigures 2-6 in the Supplement).
Figure 3 illustrates how hospitals differ in their ability to care for patients of varied levels of risk. We defined risk in this example as predicted length of stay from a model fit to PHIS patients not included in this study (eTable 3 in the Supplement). The longer the predicted stay, the greater the risk. Each patient received a predicted length of stay based on the model that we developed. We used the model to order the template patients (eTable 7 in the Supplement provides results for all 37 hospitals).
Cost results for the smoothed plots of 4 hospitals using locally weighted scatterplot smoothing41 are displayed in Figure 3. Hospital C had lower cost and no risk synergy—it uniformly showed a specific advantage across all levels of risk. Hospital CC had a specific disadvantage, with uniformly higher costs across all levels of risk and no risk synergy. Hospital N displayed typical costs for the patients at the lowest risk, but compared with matched controls, costs declined significantly as risk increased (comparative cost difference: lowest risk, −10.12%; highest risk, −16.85%; P = .01). Finally, compared with matched controls, hospital L showed relatively low costs for low-risk patients but high costs for high-risk patients (comparative cost difference: lowest risk, −34.21%; highest risk, 53.27%; P < .001).
Despite asthma being a disease with established protocols for treatment3 and our focus on a group of children’s hospitals, we found substantial variation among PHIS hospitals in ICU utilization, length of stay, and treatment costs in similar template-matched patients with asthma. Cost was associated more with length of stay than with ICU utilization. The lower association with ICU utilization may indicate differences in care substitutes for ICU utilization that may occur on the pediatric floor or variation in the definition of ICU care.
Template matching has not been used previously to control for asthma patient characteristics in this manner; thus, the results of this study are new. Moreover, we found that practice style variables among hospitals often differ systematically by risk of the patient. Some hospitals appear to consistently spend less on the strata comprising patients with less severe asthma than their matched controls but spend more than controls within more difficult patient strata (eg, hospital L). Furthermore, hospital L displayed overall costs that were no different from those of control facilities, so a hospital may incorrectly stop auditing there and assume they are doing an adequate job. However, hospital L performed very differently compared with the controls depending on the patient level of risk, highlighting the need to examine facilities across different levels of patient risk when studying variation in practice style and allowing for identification of potential types of patients for whom a hospital’s care can be improved. What could hospital L do next? Because template matching is based on patients rather than regression coefficients, hospital L’s quality improvement officer can identify and closely examine patients whose care seems to be worse than that of matched controls. In attempting to understand the hospital’s consumption of resources, they could investigate, at the patient level, actual practice differences that may be occurring at their institution vs the 36 other hospitals matched to the template. Did their hospital utilize ICUs as often as the matched controls did? Were there great differences in the choice of medications prescribed for similar patients? This deeper examination of the data is dependent on which data are collected and what pattern of risk synergy and specific advantage is detected.
One limitation of template matching rests in sample size requirements needed to produce good matches. Beyond expanding the time frame of the template analysis or the types of patients in the template, some hospitals may still be too small to be able to match the template or see only patients who are different from the template. For these hospitals, we suggest 2 alternative methods: hospital-specific template matching10 and indirect standardization matching.13 In these approaches, all or a sample of a hospital’s own patients compose an initial “boutique” template that is used to match patients from other hospitals to benchmark the specific institution’s results.
The template can be created to present different questions, as desired by policymakers or stakeholders. Templates can be representative, as was developed in this analysis, or targeted to specific types of patients who may be of interest. Developing the template provides unique opportunities for auditors to concentrate on challenging groups of patients.
Template matching allows hospitals to audit their utilization patterns in a new manner that uses closely matched patients from other hospitals. We observed large variation among children’s hospitals in resource utilization costs, length of stay, and ICU utilization relative to matched controls. The reasons for these differences varied across hospitals. Cost differences were better explained by length of stay than a hospital’s relative use of the ICU. Finally, reporting only whether a hospital is more or less costly in aggregate may overlook important comparative differences in resource use across patients’ predicted risk and therefore may miss opportunities to improve practice.
Accepted for Publication: March 30, 2016.
Corresponding Author: Jeffrey H. Silber, MD, PhD, Center for Outcomes Research, The Children’s Hospital of Philadelphia, 3535 Market St, Ste 1029, Philadelphia, PA 19104 (firstname.lastname@example.org).
Published Online: July 11, 2016. doi:10.1001/jamapediatrics.2016.0911.
Author Contributions: Dr Silber had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Silber, Rosenbaum, Wang.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Silber, Rosenbaum, Wang, Calhoun Zeigler.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Silber, Rosenbaum, Wang, Ludwig, Calhoun, Guevara.
Obtained funding: Silber, Even-Shoshan.
Administrative, technical, or material support: Silber, Calhoun, Guevara, Zorc, Zeigler, Even-Shoshan.
Study supervision: Silber, Rosenbaum.
Conflict of Interest Disclosures: None reported.
Funding/Support: This research was funded by grant U18-HS020508 from the Agency for Healthcare Research and Quality (AHRQ).
Role of the Funder/Sponsor: The AHRQ had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript, and decision to submit the manuscript for publication.
Disclaimer: The findings and conclusions of this report are those of the authors and do not necessarily represent the official position of AHRQ.
Additional Contributions: Data for this project were supplied by the Children’s Hospital Association Pediatric Hospital Information System (PHIS). The PHIS hospitals are some of the largest and most advanced children’s hospitals in the United States and have the most demanding standards of pediatric service in America. Traci Frank, AA (Center for Outcomes Research, The Children’s Hospital of Philadelphia), assisted with this research. There was no financial compensation