Kernisan LP, Lee SJ, Boscardin WJ, Landefeld CS, Dudley RA. Association Between Hospital-Reported Leapfrog Safe Practices Scores and Inpatient Mortality. JAMA. 2009;301(13):1341-1348. doi:10.1001/jama.2009.422
Author Affiliations: Divisions of Geriatrics (Drs Kernisan, Lee, Boscardin, and Landefeld) and Biostatistics (Dr Boscardin) and Philip R. Lee Institute for Health Policy Studies (Dr Dudley), University of California, San Francisco; National VA Quality Scholars Program (Drs Kernisan, Lee, and Landefeld) and Health Services Research Enhancement Award Program (Drs Lee, Boscardin, and Landefeld), San Francisco VA Medical Center, San Francisco, California; and Center for Advanced Study in the Behavioral Sciences (Dr Landefeld), Stanford University, Palo Alto, California.
Context The Leapfrog Hospital Survey allows hospitals to self-report the steps they have taken toward implementing the Safe Practices for Better Healthcare endorsed by the National Quality Forum. The Leapfrog Group currently ranks hospital performance on the safe practices initiative by quartiles and presents this information to the public on its Web site. It is unknown how well a hospital's resulting Safe Practices Score (SPS) correlates with outcomes such as inpatient mortality.
Objective To determine the relationship between hospitals' SPSs and risk-adjusted inpatient mortality rates.
Design, Setting, and Participants Observational analysis of discharge data for all urban US hospitals completing the 2006 safe practices initiative and identifiable in the Nationwide Inpatient Sample. Leapfrog provided an SPS for each hospital as well as 3 alternative scores based on shorter versions of the original survey. Hierarchical logistic regression was used to determine the relationship between quartiles of SPS and risk-adjusted inpatient mortality, after adjusting for hospital discharge volume and teaching status. Subgroup analyses were performed using data from patients older than 65 years and patients with 5% or greater expected mortality risk.
Main Outcome Measures Inpatient risk-adjusted mortality by quartiles of survey score.
Results Of 1075 hospitals completing the 2006 Safe Practices Survey, 155 (14%) were identifiable in the National Inpatient Sample (1 772 064 discharges). Raw observed mortality in the primary sample was 2.09%. Fully adjusted mortality rates by quartile of SPS, from lowest to highest, were 1.97% (95% confidence interval [CI], 1.78%-2.18%), 2.04% (95% CI, 1.84%-2.25%), 1.96% (95% CI, 1.77%-2.16%), and 2.00% (95% CI, 1.80%-2.22%) (P = .99 for linear trend). Results were similar in the subgroup analyses. None of the 3 alternative survey scores was associated with risk-adjusted inpatient mortality, although P values for linear trends were lower (.80, .20, and .11).
Conclusion In this sample of hospitals that completed the 2006 Safe Practices Survey, survey scores were not significantly associated with risk-adjusted inpatient mortality.
The Leapfrog Group is a well-known nonprofit business coalition that provides information regarding hospital safety and quality to its members (large companies that purchase health care) and to consumers.1- 3 Its primary method of evaluating hospitals is via voluntary participation in the Leapfrog Hospital Survey. Initially, these annual surveys assessed hospitals' adoption of 3 initiatives that the organization believed would improve patient safety: computerized physician order entry, staffing of intensive care units by trained intensivist physicians, and evidence-based referrals for high-mortality surgeries. In 2004 a fourth initiative, the Safe Practices Survey, was added to the hospital survey to allow hospitals to report efforts toward implementing the National Quality Forum's Safe Practices for Better Healthcare.4
Approximately 1100 urban hospitals have completed the Safe Practices Survey in recent years, and these results are reported to the public on the Internet (available at http://www.leapfroggroup.org). Nevertheless, it remains unclear how well quality as assessed by the Safe Practices Survey (consisting of hospitals' self-report of structural and process measures) correlates with outcomes of interest to patients and policy makers, such as mortality. To date, each of the first 3 initiatives has been the subject of multiple peer-reviewed studies,5- 11 and a recent study examining the first 3 initiatives collectively did find some positive associations between survey performance and reduced mortality.12
However, little has been published regarding the Safe Practices Survey, which is the most time-consuming component to complete. In particular, to our knowledge it is not yet confirmed that higher scores on the survey correlate with actual outcomes. This issue is pertinent, because survey scores reported on the Internet are ranked by quartiles, which likely suggests to consumers that hospitals in the highest quartile provide safer care than those in lower quartiles.13
In this article we present an analysis examining the relationship between scores achieved by urban hospitals on the 2006 Safe Practices Survey and risk-adjusted inpatient mortality. To address questions of generalizability and response bias, we also present comparisons between hospitals that participated in the survey and those that did not.
Mortality data were obtained from the most recent version (2005, available in 2007) of the Nationwide Inpatient Sample (NIS).14 This database includes inpatient discharge data collected via federal-state partnerships as part of the Agency for Healthcare Research and Quality's Healthcare Cost and Utilization Project. The 2005 NIS contains administrative data on all discharges from 1054 hospitals located in 37 states, approximating a 20% stratified sample of US hospitals (7 995 048 discharges total).15 Of these 1054 hospitals, 633 are classified as urban. However, only 24 states allow the release of hospital-identifying information, leaving 400 identifiable urban hospitals in the 2005 NIS.
The NIS data include classification and severity measures for each patient record, including an All Patient Refined Diagnosis Related Group (APR-DRG) and an APR-DRG mortality-risk subclass.16 The APR-DRG system is a well-known proprietary classification system developed by 3M Health Information Systems (Salt Lake City, Utah) that uses diagnosis codes, procedure codes, and other administrative data to classify patients into base disease categories as well as assign them to 1 of 4 levels of mortality risk (minor, moderate, major, and extreme) within each base disease category.17,18
Leapfrog provided survey data on 1075 urban hospitals that had participated in the 2006 Safe Practices Survey. Of these, 679 were located within the 24 states providing hospital-identified data to the NIS. Because the survey is completed by hospitals in the spring, the 2006 survey captures safety practices in place during the 2005 calendar year.
We discussed our analysis with the survey's authors before starting. They were interested in streamlining their survey and asked us to consider an “Action Safe Practices Score” (ASPS), obtained through a different scoring methodology. This “action-focused” methodology assigned points for a given safe practice solely on the basis of a hospital's answer to the most “actionable” item in the action portion of each Safe Practices Survey section. The original scoring methodology gave credit for establishing systems of awareness, accountability, ability (capacity-building investments), and action. (The online survey is available at http://leapfrog.medstat.com.)
In 2008 the survey was reduced from 27 to 13 safe practices: (1) creating a safety culture, (2) ensuring an adequate nursing workforce, (3) ensuring that a pharmacist is active in medication use, (4) not providing patient care summaries from memory, (5) providing patient care information and orders to all clinicians, (6) requiring patient readback of informed consent, (7) documenting resuscitation or end-of-life directives, (8) preventing mislabeled radiographs, (9) providing risk assessment and prevention for deep vein thrombosis/venous thromboembolism, (10) providing anticoagulation services, (11) preventing aspiration, (12) preventing central venous line sepsis, and (13) requiring hand washing. (A list of all National Quality Forum safe practices is available at http://www.ahrq.gov/qual/30safe.htm.) We accordingly repeated our analyses using “SPS-13” score and “ASPS-13” scores based on the 13 retained practices.
The sample for the primary analysis evaluating the association between Safe Practices Score (SPS) and mortality consisted of discharges from those urban US hospitals that had completed the Safe Practices Survey and that were identifiable in the 2005 NIS (155 hospitals). We also planned subgroup analyses for 2 populations in which we hypothesized that inpatient mortality might be more sensitive to adherence to safe practices: patients older than 65 years and patients with greater than 5% expected mortality.
Discharges excluded from the analysis included patients younger than 18 years, oncology patients, recipients of solid organ transplants, and patients transferred to or from another acute-care facility. Application of the exclusion criteria to discharges from urban hospitals in the 24 hospital-identifying NIS states reduced the number of eligible discharges from 4 873 959 to 3 672 146. The exclusions were standard per the recommendations of 3M Health Information Systems for use of the APR-DRG system in risk adjustment.18
The primary predictor of interest was each hospital's 2006 SPS. The maximum total SPS possible was 1000. An additional predictor of interest was each hospital's 2006 ASPS. We also subsequently tested SPSs and ASPSs based on the 13 safe practices retained in the 2008 survey (SPS-13 and ASPS-13, respectively).
To examine the relationship between survey scores and inpatient mortality, we built hierarchical logistic regression models, known more broadly as generalized linear mixed models,19 with each discharged patient as the unit of analysis and a random intercept for each hospital to capture the correlation of patients within a hospital. Because the distribution of survey scores was heavily skewed, we categorized the scores into quartiles prior to using them as a predictor in our models. Categorizing the score into quartiles also seemed appropriate because the Leapfrog Group rates hospitals on its Web site based on which quartile of survey score they fall into.
To adjust for mortality risk, we used the population of discharges from urban hospitals in the 24 hospital-identifying NIS states (3 672 146 discharges) to calculate the observed mortality rate for each combination of APR-DRG and APR-DRG mortality risk score. We then assigned to each patient the mortality risk associated with his or her APR-DRG and APR-DRG mortality risk category and used this expected mortality risk as an adjustor in our logistic regression models.
Additional hospital characteristics included in the model were volume of discharges and whether the hospital was a teaching hospital. These were treated as control variables to account for the possibility that a hospital's SPS may indicate better quality of care through mechanisms not mediated by hospital size and teaching status. Rural hospitals were excluded from our analysis because the Leapfrog Group does not target these hospitals and because the colinearity of rural location and hospital discharge volume would have rendered our models unstable. (Rural hospitals participating in the survey collectively accounted for only 71 169 discharges in the NIS.)
In our models, survey quartile categorizations were entered as 3 dummy variables, with the highest quartile serving as the reference group to produce odds ratios for mortality vs the highest SPS quartile as well as adjusted mortality rates. A test for linear trend of the log-odds of mortality was conducted using a likelihood ratio test with the linear orthogonal polynomial contrast.
A postanalysis power calculation was performed to assess the difference in mortality that our data would permit us to detect. Data were available on 1 772 064 admissions clustered in 155 hospitals of interest. Using a conservative estimate of 0.025 for intrahospital correlation of risk-adjusted mortality (others have estimated this correlation to be below 0.0120) would imply an effective sample size of 6000. Assuming an overall mortality rate of 2%, we calculated that we had 80% power to detect a 1 percentage point linear increase in mortality from the first to the fourth quartiles of the SPS, assuming 2-tailed α = .05. We also calculated that to have 80% power to detect a mortality increase from 1.9% (quartile 1) to 2.1% (quartile 4) would require data from 500 hospitals.
Because the human participants data used for this study were completely deidentified and not collected for the purpose of this research, this study was considered exempt from institutional review board approval, per the policies of the University of California San Francisco's Committee on Human Research. All analyses were performed using StataMP version 10.0 (StataCorp, College Station, Texas), except for the power calculations, which were performed using nQuery Advisor version 6.02 (Statistical Solutions, Saugus, Massachusetts). A 2-sided significance level of .05 was used for all hypothesis tests.
Of the 155 urban hospitals in the NIS that had completed the Safe Practices Survey, 34% were teaching hospitals and 66% were nonteaching hospitals (Table 1). Within the NIS, hospitals participating in the survey tended to have higher volumes of discharges than those that had not participated. Survey scores for participating hospitals identifiable in the NIS were similar to those for participating hospitals not identifiable (Table 1). The distribution of survey scores was heavily skewed, with most hospitals scoring above 770 (of a possible 1000), regardless of the scoring methodology used.
In 2005 there were 1 772 064 discharges from the 155 hospitals of interest, of which 37 033 resulted in an inpatient death (2.09%). Quartiles of SPS were not a significant predictor of mortality, regardless of whether we adjusted for expected mortality rate or added volume of discharges and teaching status to our models (Table 2). From lowest to highest quartile of SPS, inpatient mortality rates adjusted for patient and hospital characteristics were 1.97% (95% confidence interval [CI], 1.78%-2.18%), 2.04% (95% CI, 1.84%-2.25%), 1.96% (95% CI, 1.77%-2.16%), and 2.00% (95% CI, 1.80%-2.22%) (P = .99 for trend across all quartiles). Similarly, quartiles of performance using the SPS-13 (which assesses hospitals regarding the 13 safe practices retained in the 2008 survey) did not predict inpatient mortality: corresponding rates from lowest to highest quartile were 1.99% (95% CI, 1.80%-2.21%), 2.00% (95% CI, 1.81%-2.21%), 2.02% (95% CI, 1.82%-2.24%), and 1.95% (95% CI, 1.75%-2.16%) (P = .80 for trend across all quartiles).
Quartiles of survey performance using the action-based scoring methodology also did not significantly predict inpatient mortality. Fully adjusted inpatient mortality rates from lowest to highest quartile of ASPS were 2.08% (95% CI, 1.87%-2.30%), 2.02% (95% CI, 1.83%-2.23%), 2.02% (95% CI, 1.83%-2.24%), and 1.86% (95% CI, 1.68%-2.06%) (P = .20 for trend across all quartiles). Using ASPS-13, corresponding rates from lowest to highest quartile were 2.12% (95% CI, 1.91%-2.35%), 2.01 (95% CI, 1.82%-2.22%), 2.01% (95% CI, 1.82%-2.22%), and 1.85% (95% CI, 1.66%-2.05%) (P = .11 for trend across all quartiles).
To determine if survey scores might be more predictive of mortality in patients who might be at higher risk of death, we repeated our analyses with 2 subgroups of patients, those 65 years and older (n = 721 497) and those with 5% or greater expected mortality risk (n = 163 138). Overall in-hospital mortality rates in these populations were 3.9% and 18.3%, respectively. Quartiles of survey score were not predictive in these subgroups (Table 3 and Table 4).
We also examined the data to see if hospitals that had chosen to participate in the Safe Practices Survey had fewer inpatient deaths than other hospitals in the NIS (within the 24 states providing hospital-identifying information). Fully adjusted mortality among hospitals participating in the survey (1 772 064 discharges within 155 hospitals) was 1.96% (95% CI, 1.83%-2.09%), compared with 2.06% (95% CI, 1.94%-2.19%) among hospitals not participating (1 892 725 discharges within 243 hospitals) (P = .45).
On the Internet, hospital performance on the Safe Practices Survey is ranked by quartiles, which may suggest to consumers that hospitals in higher quartiles are safer than hospitals in lower quartiles. In this first study of the relationship between survey scores and hospital outcomes, we studied a national sample of hospitals and found no relationship between quartiles of score and in-hospital mortality, regardless of whether we adjusted for expected mortality risk and certain hospital characteristics (Table 2).
A hospital's performance on each safe practice is evaluated in the survey via several questions assessing institutional systems to promote awareness, accountability, ability (capacity-building investments), and action. Part of the rationale for designing the survey this way originally may have been to provide “training wheels” and give hospitals credit for creating systems that could eventually support full implementation of a given safe practice. However, awarding survey points for hospital administrative structures raises the possibility that the survey is capturing excessive noise, which may be overwhelming an important signal.
It seemed plausible that inpatient mortality could relate to whether actions were being taken to implement the safe practice. For this reason the survey's creators proposed the alternative “action-based” scoring method, which assigned all points for a safe practice based on whether the hospital indicated that it had implemented the key action for the practice. In our analysis, using this “action-based” scoring method slightly improved the ability of the survey ranking to predict in-hospital mortality, although the association was not statistically significant. Focusing on actions in the future may improve the survey's ability to discriminate between high-quality and low-quality hospitals.
The Safe Practices Survey has recently been shortened from 27 to 13 safe practices, largely in response to feedback regarding the considerable time required to complete the survey. Our findings indicate that scores based on the 13 retained practices are unlikely to be significantly associated with inpatient mortality, even if scoring is limited to actions taken, as was done in the ASPS-13. The findings do not rule out a modest association between the ASPS-13 and hospital mortality. As presented in our results (Table 2), the difference in risk-adjusted mortality between the best and worst quartiles determined by ASPS-13 was 0.27% (P = .11 for trend). This difference in absolute mortality risk corresponds to a number needed to treat of approximately 370, which some would find clinically significant. Nonetheless, the likelihood that the observed difference arose by chance is increased by the facts that multiple statistical tests were performed and that associations were not observed in the high-risk groups in which they were most strongly hypothesized.
Given the voluntary nature of this self-reported survey, it is also plausible that a lack of correlation with mortality might be due to confounding associated with a “healthy volunteer” effect. Hospitals that have already engaged in improving quality of care may be more likely to want to participate in the Safe Practices Survey. However, in our study we found that participation in the survey was not predictive of lower risk-adjusted mortality. This would suggest that our negative findings are unlikely to be owing to safer hospitals being more likely to participate in the survey.
To our knowledge, this is the first peer-reviewed analysis that has sought to assess whether better performance on the Safe Practices Survey correlates with outcomes indicative of improved patient safety. Given that 2 recently published analyses21,22 in the business/medical literature used the Safe Practices Survey as the metric of hospital quality, it seems clear that additional efforts to explore the validity and value of this well-publicized1,23,24 quality measure are needed. As the patient-safety movement increases in importance, hospitals face increasingly complex choices regarding improvement and reporting to the public. Likewise, consumers are faced with multiple sources of information on hospital quality and are encouraged to choose a facility based on this information. Despite a lack of evidence demonstrating the validity of the Safe Practices Survey, the survey is well known and influential. For this reason, validating the survey rankings as a measure and as a source of accurate information for consumers and researchers is important.
Our findings suggest, however, that the survey as currently designed does not discriminate between hospitals with higher and lower inpatient mortality. Some will question our choice of overall inpatient mortality as the outcome of interest. We acknowledge that very valid concerns about use of mortality rates as a measure of hospital quality of care have been raised.25- 27 Despite this, risk-adjusted mortality rates remain among the most commonly reported outcomes in both the published literature and in public reports of hospital quality. Furthermore, the Institute of Medicine has cited prevention of inpatient deaths as an important reason to focus on patient safety.28 Until consumer-oriented hospital quality reports become explicit as to what benefits consumers can expect from a “safer” hospital, consumers will almost certainly assume a safer hospital is one in which a patient is less likely to die.29 Thus, our analyses are consistent with the most likely consumer interpretation of the data presented on the Internet.
Our study results suggest several points regarding the Safe Practices Survey that would be valuable for the greater patient safety community to consider. An important issue is whether the SPS is measuring what needs to be measured. Many of the safe practices are processes to improve care, yet in its current form the survey is measuring the “processes around the process.” This often gives hospitals credit for what essentially may be good intentions. This also gives points for having structures that may support implementation of a safe practice, rather than awarding credit only when the safe practice is being consistently followed. Such a scoring system likely is vulnerable to inflation of scores. Of note, most hospitals score quite well on the survey (Table 1). It may be that all hospitals truly are doing well on the safe practices; however, it seems more likely that the survey as currently designed is unable to discriminate between truly high and low adherence to the safe practices.
It also is possible that too much is being measured. Steps have already been taken to address this by reducing the survey to 13 practices, but our results suggest that this alone is unlikely to improve the survey's ability to correlate with inpatient mortality.
It is unknown how well a hospital executive's report of actions being taken to support and implement safety practices correlates with actual activity within that hospital. It may not be reasonable to assume that hospitals are doing what their executives say they are. Our study results call into question the use of a lengthy unaudited survey as a tool for measuring adherence to safe practices. Further research to examine how well self-reported activities regarding hospital safety practices correlate with actual activities within the hospital will be helpful in determining the value of self-reported safety data.
Our study has important limitations. The most significant is that our main analysis had enough power to detect only 1% or greater differences in mortality. Although at a policy and epidemiology level the observed 0.2% difference in mortality rate between the first and fourth quartiles of performance using ASPS scoring is potentially important, we calculated that we would have needed on the order of 11 million admissions grouped by 500 hospitals to conclude statistical significance for a difference of this magnitude. Such an analysis was not feasible given the data available from the largest hospital data set in the nation. Furthermore, even if a statistically significant association were to be found with a larger sample, our results indicate that the overall magnitude of the relationship between survey score and mortality would almost certainly remain quite small and would be of unclear usefulness to individual consumers.
A second limitation is that we did not study other outcomes that might be responsive to adoption of the safe practices, such as complications. It is possible that high performance on the survey does correlate with decreased complication rates or with other outcomes of interest to purchasers and policy makers. However, complication rates are difficult to accurately measure using administrative data. Finally, our primary analysis examines 155 of the 1075 urban hospitals that participated in the Safe Practices Survey, raising the question of whether our findings can be generalized to the many survey participants not in the NIS. However, the NIS is designed to approximate a stratified random sample of US hospitals, and individual hospitals cannot choose whether to be included, so there is no reason to believe that survey participants included in the NIS differed in any systematic fashion from those not included. Furthermore, we found that survey participants had similar survey scores, regardless of whether they were in the NIS. Hence, although we cannot exclude the possibility that our findings are not generalizable to other hospitals participating in the survey, we have not identified any specific reason to question the generalizability of our findings.
In summary, although a recent study has found that hospitals performing well on the first 3 initiatives of the Leapfrog Hospital Survey do have lower risk-adjusted mortality,12 our analysis was unable to find a correlation between better performance on the fourth initiative (the Safe Practices Survey) and lower risk-adjusted mortality. It is possible that inviting hospitals to self-report on their patient safety practices and then assigning them to quartiles of score is not an effective way to assess hospital quality and safety. Our findings should not be interpreted, however, as indicating that the safe practices are not important or that they cannot be measured in an informative and valid way. Rather, future work should seek to establish valid methods for assessing adherence to the safe practices. Further research is needed to determine how performance on the Safe Practices Survey or other instruments designed to measure safe practices performance may correlate with other outcomes of interest to patients and policy makers.
Corresponding Author: R. Adams Dudley, MD, MBA, Philip R. Lee Institute for Health Policy Studies, University of California, San Francisco, 3333 California St, Ste 265, San Francisco, CA 94118 (email@example.com).
Author Contributions: Dr Kernisan had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Kernisan, Dudley.
Acquisition of data: Kernisan, Dudley.
Analysis and interpretation of data: Kernisan, Lee, Boscardin, Landefeld, Dudley.
Drafting of the manuscript: Kernisan, Dudley.
Critical revision of the manuscript for important intellectual content: Kernisan, Lee, Boscardin, Landefeld, Dudley.
Statistical analysis: Kernisan, Boscardin, Dudley.
Obtained funding: Dudley.
Administrative, technical, or material support: Kernisan, Dudley.
Study supervision: Lee, Landefeld, Dudley.
Financial Disclosures: Dr Dudley reported serving as an occasional consultant for the Leapfrog Group (but that he is not paid for this work) and serving on the Leapfrog Group's Hospital Rewards Program Steering Committee. Work on this article was not financially supported by the Leapfrog Group. Drs Kernisan, Lee, Boscardin, and Landefeld reported no financial or other conflicts of interest.
Funding/Support: This study was supported by the UCSF Division of Geriatrics, the UCSF Institute for Health Policy Studies, and the San Francisco VA Medical Center. Dr Kernisan was supported by T32 research fellowship grant 2 T32 AG000212-16 from the National Institute on Aging, National Institutes of Health, followed by a VA Quality Scholars fellowship. Dr Dudley's work was supported by a Robert Wood Johnson Foundation Investigator Award in Health Policy. This study was also supported by the Leapfrog Group, which provided its data at no cost.
Role of the Sponsor: The Leapfrog Group was involved in the design of the study and provided data but had no role in the analysis or interpretation of the data; the group had no role in the preparation of the manuscript and was not asked to approve the final manuscript but did review the manuscript and provided minor corrections of factual statements about its survey program. The Robert Wood Johnson Foundation had no role in the design of the study, the analysis of data, or the preparation of the manuscript.
Additional Contributions: We thank the Leapfrog Group for sharing its data with us.