Mehta RH, Liang L, Karve AM, Hernandez AF, Rumsfeld JS, Fonarow GC, Peterson ED. Association of Patient Case-Mix Adjustment, Hospital Process Performance Rankings, and Eligibility for Financial Incentives. JAMA. 2008;300(16):1897-1903. doi:10.1001/jama.300.16.1897
Author Affiliations: Duke Clinical Research Institute and Duke University Medical Center, Durham, North Carolina (Drs Mehta, Liang, Hernandez, and Peterson, and Ms Karve); Denver Veterans Affairs Medical Center, Denver, Colorado (Dr Rumsfeld); and University of California Los Angeles Medical Center, Los Angeles (Dr Fonarow).
Context While most comparisons of hospital outcomes adjust for patient characteristics, process performance comparisons typically do not.
Objective To evaluate the degree to which hospital process performance ratings and eligibility for financial incentives are altered after accounting for hospitals' patient demographics, clinical characteristics, and mix of treatment opportunities.
Design, Setting, and Patients Using data from the American Heart Association’s Get With the Guidelines program between January 2, 2000, and March 28, 2008, we analyzed hospital process performance based on the Centers for Medicare & Medicaid Services' defined core measures for acute myocardial infarction. Hospitals were initially ranked based on crude composite process performance and then ranked again after accounting for hospitals' patient demographics, clinical characteristics, and eligibility for measures using a hierarchical model. We then compared differences in hospital performance rankings and pay-for-performance financial incentive categories (top 20%, middle 60%, and bottom 20% institutions).
Main Outcome Measures Hospital process performance ranking and pay-for-performance financial incentive categories.
Results A total of 148 472 acute myocardial infarction patients met the study criteria from 449 centers. Hospitals for which crude composite acute myocardial infarction performance was in the bottom quintile (n = 89) were smaller nonacademic institutions that treated a higher percentage of patients from racial or ethnic minority groups and also patients with greater comorbidities than hospitals ranked in the top quintile (n = 90). Although there was overall agreement on hospital rankings based on observed vs adjusted composite scores (weighted κ, 0.74), individual hospital ranking changed with adjustment (median, 22 ranks; range, 0-214; interquartile range, 9-40). Additionally, 16.5% of institutions (n = 74) changed pay-for-performance financial status categories after accounting for patient and treatment opportunity mix.
Conclusion Our findings suggest that accounting for hospital differences in patient characteristics and treatment opportunities is associated with modest changes in hospital performance rankings and eligibility for financial benefits in pay-for-performance programs for treatment of myocardial infarction.
Quiz Ref IDPrior investigations have demonstrated wide variation among hospitals in the adherence to national guidelines of care among patients with coronary artery disease, and many recommended therapies continue to be underused in patients without any documented contraindications.1- 3 To address these quality gaps and to increase transparency in medicine, the Centers for Medicare & Medicaid Services (CMS) now releases public information on hospital acute myocardial infarction (AMI) and heart failure process performance.4 CMS has also launched a pay-for-performance pilot program that links hospital process performance with financial incentives.5
Although the goal of these programs is to accurately assess hospital performance, all hospitals may not be starting on a level playing field. Specifically, while outcomes comparisons traditionally adjust hospital comparisons to account for differences in patient case mix, process performance metrics generally have not accounted for these factors. Quiz Ref IDYet data suggest that patient features such as age, race or ethnicity, or disease severity are associated with patients' likelihood for receiving treatments and can vary among centers.6- 10 As such, hospitals serving large groups of patients who are elderly, female, poor, uninsured, or African American might have challenges competing with institutions that care for patients who are younger, male, wealthy, insured, or white.
We compared hospital ranking between observed and adjusted AMI process measures composite adherence after accounting for variation in the hospitals' patient demographics, clinical characteristics, and their patients' mix of treatment opportunities.11- 13 We sought to determine whether patient case mix varied among centers as a function of process performance; whether adjusting process measures for patients' case mix and treatment opportunity mix changes overall performance rating relative to other hospitals; and whether such case-mix adjustments would potentially alter hospitals' eligibility status for financial incentives (or disincentives) in a pay-for-performance program.
We used data from the American Heart Association’s (AHA’s) Get With the Guidelines program, the details of which have been previously published.11- 13 In brief, the AHA launched the initiative focused on the redesign of hospital systems of care to improve the quality of care of patients with coronary artery disease. This program uses a Web-based patient management tool (Outcome Sciences Inc, Cambridge, Massachusetts) to collect clinical data, provide decision support, and provide real-time online reporting features.11- 13 Data collected include patient demographics, medical history, symptoms on arrival, in-hospital treatment and events, discharge treatment and counseling, and patient disposition. Outcome Sciences serves as the data collection and coordination center for this registry. Participating institutions were instructed to submit histories for consecutive eligible patients with AMI diagnoses to the Get With the Guidelines coronary artery disease database. Institutions with large AMI volumes (>75 cases/y) were permitted to submit a sample of cases through random selection on a quarterly basis.
Because collected data were primarily used for institutional quality improvement and deidentified patient information was collected, sites were granted a waiver of informed consent under the common rule. The Duke Clinical Research Institute served as the data analysis center and institutional review board approval was granted to analyze these aggregate deidentified data for research purposes.
This study includes patient cases entered between January 2, 2000, and March 28, 2008, from the 574 hospitals (n = 350 221) participating in the Get With the Guidelines program, which includes teaching and nonteaching, rural and urban, and large and small hospitals from all census regions of the United States. We excluded patients without AMI (n = 137 559) and included only those with International Classification of Diseases, Ninth Revision (ICD-9 ) diagnoses 410. Patients were also excluded if they were not eligible for any of the 8 CMS process measures (n = 12 334) or were from hospitals that do not reliably report past medical history variables (n = 51 856; number of hospitals = 106). The remaining patients constituted the analysis sample with 148 472 patients from 449 hospitals. Data on patients were collected by participating hospitals without financial compensation. Race and ethnicity were self-reported by patients, recorded in the patient's medical record, and collected on the case report form in the following categories: white, black or African American, North American Indian or Native Alaskan, Hispanic or Latino, Asian, Native Hawaiian or other Pacific Islander, or other.
Quiz Ref IDWe focused on 8 performance measures included in the CMS core AMI measures for rating hospitals in the pay-for-performance program.5 These measures include aspirin at admission and on discharge, β-blockers at admission and on discharge, angiotensin-converting enzyme inhibitors for left ventricular systolic dysfunction, smoking cessation counseling, thrombolytics within 30 minutes of arrival, and primary percutaneous coronary interventions within 90 minutes of arrival. Indicator-specific inclusion and exclusion criteria were applied so that only eligible patients without contraindications or documented intolerance for that specific indicator were counted for each respective measure. A hospital's composite adherence score was calculated as the sum of correct care given, divided by the total number of eligible opportunities (based on the 8 measures) across all patients in a hospital. This is analogous to the methods used by CMS for public reporting and pay-for-performance programs.5
CMS rewards hospitals performing in the top 20% in the pay-for-performance program based on their composite adherence score, such that the top 10% are eligible to receive a 2% bonus payment from CMS and the next 10% are eligible to receive a 1% bonus payment.5 In contrast, hospitals performing in the bottom 20% are likely to receive reductions in their payments in the future. In keeping with this CMS policy, we divided hospitals into 3 categories based on their composite score: top 20% hospitals (likely to receive incentive payment for their better quality of care), middle 60% hospitals (would neither receive financial rewards nor be penalized), and the bottom 20% hospitals (will likely receive reductions in their payments).
Hospitals were first ranked based on their observed CMS composite score and then divided into the 3 financial incentive categories: top 20% composite performers, middle 60% performers, and bottom 20% performers. Patient and hospital characteristics in these 3 categories of hospitals were compared using Cochran-Mantel-Haenszel row mean scores tests for categorical variables, and Cochran-Mantel-Haenszel nonzero correlation test for continuous variables. Percentages were used to describe categorical variables and median and interquartile ranges (IQR) were reported for continuous variables.
A hierarchical multivariate logistic regression analysis was performed to adjust hospital performance for patient case mix and treatment opportunity mix. This analysis used opportunity-based data. Each measure for which a patient was eligible contributed an observation, and the outcome was a dichotomous variable with value 1 (positive) or 0 (negative) indicating whether the opportunity was fulfilled. For example, if a patient was eligible for 6 CMS measures and received 5, the patient would have 6 observations in the analysis data set, of which 5 would be positive events. The hospital's performance (ie, adherence rate in terms of the CMS core measures) was adjusted for patient case mix by having the baseline patient characteristics in the model. These characteristics included age, race or ethnicity, sex, body mass index (BMI [calculated as weight in kilograms divided by height in meters squared]), insurance status, past medical history (chronic obstructive pulmonary disease, hypertension, diabetes, heart failure, smoking status, dyslipidemia, prior myocardial infarction, prior stroke, peripheral vascular disease, dialysis, chronic depression, atrial fibrillation/flutter, and renal insufficiency), and systolic blood pressure.
In addition, overall adherence rates on the individual process metrics varied considerably among the 8 CMS core measures. For example, the overall adherence rate for aspirin at discharge was 96%, while time from emergency department (ED) admission to administration of fibrinolytic therapy of less than 30 minutes was 40.4%, and time from ED admission to balloon angioplasty performance of 90 minutes or less was 51.9%. Thus, the mix of CMS treatment opportunities faced by a hospital could influence its rankings. Composite scores from hospitals without percutaneous coronary interventions would not include ED admission to balloon angioplasty performance time, which may bias results favorably relative to a center providing primary angioplasty. Therefore, a variable with 8 levels indicating the 8 performance measures was included as a covariate in the model to take into account treatment opportunity mix.
In this analysis, missing values of covariates were less than 6% for all variables with the exception of BMI (9.7%) and systolic blood pressure (13.3%). However, it was observed that the missing percentage and the distribution of the available data for BMI and systolic blood pressure were similar across the different hospitals with different financial status; therefore, they were unlikely to influence the results.
The hierarchical approach treats hospitals as random effects and allows adjustment for within-hospital correlation in outcomes and model hospital effects to calculate hospital-specific outcomes. Adjusted scores for composite process measures were then determined from the hierarchical model, and hospitals were ranked again using the adjusted scores. To calculate adjusted scores, hospital-specific estimates of observed adherence rates were first calculated as the mean of predicted probability of adherence from the hierarchical model across all patients and opportunities for a given site. This estimate is sometimes called a “shrinkage estimate” because it accounts for the differential amounts of information across hospitals measured by the number of observations per hospital. This score was then multiplied by the national overall observed adherence rate and divided by the hospital's estimated expected adherence rate to calculate the adjusted score. The expected adherence score in the given hospital was calculated as the mean of the predicted probabilities of adherence based on the patient baseline characteristics and indicator of measures, but not incorporating the hospital random effect.14,15
Hospitals were then categorized, based on their rank relative to peer hospitals, into 3 pay-for-performance categories according to the patient case mix and treatment opportunity mix-adjusted adherence score as described previously. The changes of each hospital's rank and financial status based on observed vs adjusted scores were then evaluated. Finally, we performed a sensitivity analysis after excluding hospitals that had fewer than 30 patients and compared the unadjusted hospital rankings and financial ratings with those after adjustment (previously described).
All P values are 2-sided, with P < .05 considered statistically significant. All analyses were performed using SAS statistical software version 9.1 (SAS Institute Inc, Cary, North Carolina).
Of the 148 472 patients who met the study inclusion and exclusion criteria, 40 399 received care at top 20% hospitals (n = 90; median composite process performance, 97.6%; IQR, 96.6%-100%); 98 663 patients received care at middle 60% hospitals (n = 270; median composite process performance, 90.7%; IQR, 87.1%-93.3%); and 9410 patients received care at bottom 20% hospitals (n = 89; median composite process performance, 70.8%; IQR, 64.4%-78.4%).
The baseline demographics and patient characteristics in the 3 performance-based groups are shown in Table 1. Quiz Ref IDCompared with hospitals in the top performing group, the hospitals with poor performance rankings were more likely to be nonacademic and smaller (fewer licensed beds) with a patient population that was more likely to be from racial or ethnic minority groups and to have comorbid conditions such as diabetes, heart failure, chronic atrial fibrillation, renal insufficiency, lower systolic blood pressure at presentation, and lower left ventricular ejection fraction.
Table 2 shows the rates of the 8 performance measures in the 3 pay-for-performance groups. The lowest rates overall were seen for administration of fibrinolytic therapy within 30 minutes (40.4%), followed by primary percutaneous coronary interventions within 90 minutes of arrival in the ED (51.9%).
There was an overall agreement between the reimbursement categories based on observed and adjusted composite adherence scores for the various institutions (weighted κ, 0.74; 95% confidence interval [CI], 0.69-0.80). Despite this agreement, there was a median change in hospital ranking with the adjustment of 22 ranks (range, 0-214; IQR, 9-40). Table 3 shows the degree of agreement or disagreement between hospital rankings based on the observed composite scores as implemented in the pay-for-performance program vs hospital rankings based on adjusted scores.
Overall, 16.5% of hospitals changed their financial groups after adjustment for case mix and the distribution of eligibility for process performance. This change in rankings was distributed in such a way that 37 institutions (8.24%) would have benefited from patient case mix and opportunity mix adjustments (ie, would either have received rewards [29 hospitals, 6.46%] or would have avoided cutbacks [8 hospitals, 1.78%]). In contrast, another 37 sites (8.24%) were likely to be disadvantaged by such an adjustment, either losing their financial incentives (6.46%) or changing to a financial penalty category (1.78%).
Sensitivity analysis, excluding hospitals with small sample size (<30 patients, 132 hospitals), showed a similar modest change in hospital rankings with 15.2% of hospitals changing their financial group after the adjustments, with half benefiting and half losing their financial incentive or changing to the financial penalty category.
The increasing evidence of gaps in health care quality has resulted in the development and growth of public reporting of performance measures and pay-for-performance programs as a means of stimulating external accountability and practice change.5,16- 18 These programs align the quality of care delivered with payment for care with financial incentives for better performance and potential penalties for poor performance. Despite the enthusiasm for these programs, such systems are only as fair as the metrics used to assess hospital performance.
Our results demonstrate that one of the reasons certain centers may perform poorly on process performance assessments has to do with their patient case mix. Hospitals with the worst performance also tend to care for a higher frequency of patients who are from racial or ethnic minority groups with a higher incidence of comorbidities; features that have been previously linked to lower use of evidence-based therapies.19- 21 Additionally, these hospitals tend to be smaller institutions that possibly have lower revenue, less ability to generate revenue, or both. Thus, pay-for-performance systems may penalize institutions that care for patients who are at a socioeconomic disadvantage.
Our data also suggest that if process performance measures were adjusted for case mix and the mix of process measures for which patients were eligible, then hospital relative ranks would moderately change. Although there was a general correlation between observed and adjusted rankings, the median change in a hospital's rank relative to other hospitals would switch 22 ranking slots (range, 0-214; IQR, 9-40), whereas the rank of certain centers could increase or decrease up to 200 ranking slots after accounting for case mix and treatment opportunity mix.
Quiz Ref IDAdditionally, nearly 1 in 6 institutions changed their initial pay-for-performance financial performance rankings after accounting for patient case mix and treatment opportunity mix. Thus, our data suggest that the current method of ranking hospitals based on an unadjusted performance measures composite score, as done for the pay-for-performance system, may be less than optimal. The lack of adjustments in the pay-for-performance program may deprive some otherwise deserving institutions in the poor performing and no financial incentive groups, while rewarding hospitals taking care of patients who are younger, healthier, and of higher socioeconomic status.
These data should not be interpreted as supporting routine adjustments of process performance measures for public reporting. Rather, these data indicate the magnitude to which case mix and patient clustering may impact performance ratings. It is important to recognize that there are merits and limitations to incorporation of such case-mix and opportunity-mix adjustments in process performance assessment. Arguments supporting adjustment include that it levels the playing field among hospitals by recognizing differences in those for whom they care. Such adjustment has become a standard among outcomes comparisons.22- 24
Prior studies have shown that patients from some racial and ethnic minority groups and also uninsured patients cluster in inner city areas and seek treatment at hospitals that are generally underresourced (including physician, nurse, and other staffing shortages), have inadequate budgets, lack technical support such as health information systems, and lack capital and revenue.25,26 These hospitals typically depend on Medicare's dwindling disproportionate share payments and other state and federal subsidies,27,28 and other important priorities may prevent them from devoting adequate resources to target quality improvement. Case-mix adjustment would make it more likely that these hospitals caring for underserved patients would qualify for incentive payments, less likely be penalized, or both. Additionally, adjustment for number and mix of process performance measures faced by a center seems reasonable because there is significant variability in overall performance on given measures. For example, composite performance metrics from hospitals performing primary percutaneous coronary interventions would include measures for time from ED admission to procedure performance, while a hospital that does not perform percutaneous coronary interventions would not. As the average performance on ED admission time to percutaneous coronary intervention start time is generally much lower than on other measures, such inclusion would tend to bias results favoring hospitals that do not perform percutaneous coronary interventions. Thus, case mix and opportunity mix not only decrease the disparity in resources related to hospital revenue from differences in payer mix (and potentially fund-raising capability), but also minimize the disadvantage from variation in the denominator of eligible patients.
There are also arguments to be made against routine adjustments for case mix and opportunity mix for process performance measures. First, all patients who are eligible for a given evidence-based treatment without contraindications or intolerance should receive this therapy. Thus, such adjustment appears to give centers with higher case-mix severity a partial justification for worse performance. Similarly, adjustment tends to codify existing disparities in care among underserved populations and could encourage complacency in efforts to overcome such disparities.
As an alternative to these 2 extremes, some have proposed that hospital quality reports be stratified according to race or ethnicity, sex, and socioeconomic status.29 Thus, hospitals would receive both an overall process performance ranking and rankings for their varied patient subgroups. Stratified hospital performance comparisons could thereby highlight any underserved populations. Additionally, overall performance assessment for ranking or financial reward could then be based on comparison of not only a hospital's overall care patterns, but also its treatment of traditionally underserved populations. This strata performance evaluation scheme could thereby be similar to that implemented currently in public education programs under No Child Left Behind policies (ie, for health care, an analogous No Patient Left Behind).29 Others have proposed equitable pay-for-performance models that would give institutions in disadvantaged areas incentives to improve quality metrics. Emphasizing pay for improvement rather than just pay-for-performance may likely prove effective for hospitals, irrespective of case mix.30
Our current analysis must be considered in the context of certain limitations. First, although our database had common demographic and clinical characteristics, we lacked detailed measures of patient socioeconomic status (income, education levels, etc). Since our analysis lacked these potentially important case-mix factors, which may be associated with process performance, our current estimation of this association is, if anything, conservative. Second, we examined only care processes for AMI; these findings should be explored in other conditions and disease states. Third, although the patient management tool contained editing capabilities to ensure data entered were consistent with plausible ranges, Get With the Guidelines has not, to date, performed a national audit of its database. Finally, our sample was limited to those hospitals participating in the program. Although a large contingency of US centers are represented, these centers participate on a volunteer basis and tend to have slightly better process performance than the average US center.
Our data indicate that the hospitals ranked lowest in the pay-for-performance program care for a group of patients shown to be most vulnerable to poor adherence to performance measures. Adjusting for patient case mix and treatment opportunity mix in process comparisons would have a moderate but important association with the change in hospital performance rankings and eligibility for pay-for-performance financial benefits compared with unadjusted rankings based on observed measures. Future health care policy makers should consider these data when constructing performance rating systems as well as pay-for-performance programs.
Corresponding Author: Rajendra H. Mehta, MD, MS, Box 17969, Duke Clinical Research Institute, Durham, NC 27715 (email@example.com).
Author Contributions: Dr Peterson had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Mehta, Karve, Peterson.
Acquisition of data: Karve, Fonarow, Peterson.
Analysis and interpretation of data: Mehta, Liang, Hernandez, Rumsfeld, Fonarow, Peterson.
Drafting of the manuscript: Mehta, Karve.
Critical revision of the manuscript for important intellectual content: Mehta, Liang, Karve, Hernandez, Rumsfeld, Fonarow, Peterson.
Statistical analysis: Liang, Hernandez.
Obtained funding: Hernandez, Fonarow, Peterson.
Administrative, technical, or material support: Karve, Hernandez.
Study supervision: Hernandez, Fonarow.
Financial Disclosures: Dr Hernandez reports receiving research support from GlaxoSmithKline, Johnson & Johnson (Scios Inc), Medtronic, Novartis, and Roche Diagnostics; and honoraria from AstraZeneca, Novartis, Sanofi-Aventis, and Thoratec Corporation. Dr Peterson reports receiving research support from Schering Plough, BMS/Sanofi; and serving as the principal investigator for the American Heart Association's (AHA’s) Get With the Guidelines Analytical Center. Drs Hernandez and Peterson report detailed listings of financial disclosures at http://www.dcri.duke.edu/research/coi.jsp. Dr Rumsfeld reports receiving an honorarium for participating on the scientific research advisory board of United Healthcare. Dr Fonarow reports receiving research grants from GlaxoSmithKline, Medtronic, Merck, Pfizer, and the National Institutes of Health; consulting for AstraZeneca, Bristol-Myers Squibb, GlaxoSmithKline, Medtronic, Merck, Novartis, Pfizer, Sanofi-Aventis, and Schering Plough; receiving honoraria from AstraZeneca, Abbott, Bristol-Myers Squibb, GlaxoSmithKline, Medtronic, Merck, Novartis, Pfizer, Sanofi-Aventis, and Schering Plough; and serving as chair of the AHA's Get With the Guidelines Steering Committee. Drs Mehta and Liang and Ms Karve report no disclosures.
Funding/Support: The Get With the Guidelines program is supported by the AHA in part through an unrestricted education grant from Merck-Schering Plough. Dr Hernandez reports receiving support from an American Heart Association Pharmaceutical Roundtable grant 0675060N. Dr Fonarow reports receiving support from the Elliot Corday (Los Angeles, California) and Ahmanson (Los Angeles, California) Foundations.
Role of the Sponsor: Merck and Schering Plough had no role in the design and conduct of the study; in the collection, management, analysis, and interpretation of the data; or in the preparation, review, or approval of the manuscript. The AHA provides Get With the Guidelines program management with a volunteer steering committee and AHA staff. The manuscript was submitted to the AHA for review and approval prior to submission. The Corday and Ahmanson Foundations had no role in the conduct or reporting of this study.
Disclaimer: Dr Peterson, a contributing editor for JAMA, was not involved in the editorial evaluation or editorial decision making regarding publication of this article.
Additional Contributions: Elizabeth E. S. Cook, BA, from the Duke Clinical Research Institute provided editorial assistance for this article. Ms Cook did not receive additional compensation for her work in association with this article.