Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ. Analysis of Observational Studies in the Presence of Treatment Selection BiasEffects of Invasive Cardiac Management on AMI Survival Using Propensity Score and Instrumental Variable Methods. JAMA. 2007;297(3):278–285. doi:10.1001/jama.297.3.278
Author Affiliations: Institute for Clinical Evaluative Sciences, Toronto, Ontario (Drs Stukel and Alter, and Ms Vermeulen); Center for the Evaluative Clinical Sciences, Dartmouth Medical School, Hanover, NH (Drs Stukel and Fisher, and Mr Gottlieb); Department of Health Policy, Management, and Evaluation, University of Toronto, and Clinical Epidemiology and Health Care Research Program, Sunnybrook Health Sciences Centre, Toronto, Ontario (Drs Stukel and Alter); Veterans Administration Outcomes Group, White River Junction, Vt (Dr Fisher); Center for Outcomes Research and Evaluation, Maine Medical Center, Portland (Dr Wennberg); and Division of Cardiology and the Li Ka Shing Knowledge Institute of St Michael's Hospital, Toronto Rehabilitation Institute, and Department of Medicine, University of Toronto, Toronto, Ontario (Dr Alter).
Context Comparisons of outcomes between patients treated and untreated in observational studies may be biased due to differences in patient prognosis between groups, often because of unobserved treatment selection biases.
Objective To compare 4 analytic methods for removing the effects of selection bias in observational studies: multivariable model risk adjustment, propensity score risk adjustment, propensity-based matching, and instrumental variable analysis.
Design, Setting, and Patients A national cohort of 122 124 patients who were elderly (aged 65-84 years), receiving Medicare, and hospitalized with acute myocardial infarction (AMI) in 1994-1995, and who were eligible for cardiac catheterization. Baseline chart reviews were taken from the Cooperative Cardiovascular Project and linked to Medicare health administrative data to provide a rich set of prognostic variables. Patients were followed up for 7 years through December 31, 2001, to assess the association between long-term survival and cardiac catheterization within 30 days of hospital admission.
Main Outcome Measure Risk-adjusted relative mortality rate using each of the analytic methods.
Results Patients who received cardiac catheterization (n = 73 238) were younger and had lower AMI severity than those who did not. After adjustment for prognostic factors by using standard statistical risk-adjustment methods, cardiac catheterization was associated with a 50% relative decrease in mortality (for multivariable model risk adjustment: adjusted relative risk [RR], 0.51; 95% confidence interval [CI], 0.50-0.52; for propensity score risk adjustment: adjusted RR, 0.54; 95% CI, 0.53-0.55; and for propensity-based matching: adjusted RR, 0.54; 95% CI, 0.52-0.56). Using regional catheterization rate as an instrument, instrumental variable analysis showed a 16% relative decrease in mortality (adjusted RR, 0.84; 95% CI, 0.79-0.90). The survival benefits of routine invasive care from randomized clinical trials are between 8% and 21%.
Conclusions Estimates of the observational association of cardiac catheterization with long-term AMI mortality are highly sensitive to analytic method. All standard risk-adjustment methods have the same limitations regarding removal of unmeasured treatment selection biases. Compared with standard modeling, instrumental variable analysis may produce less biased estimates of treatment effects, but is more suited to answering policy questions than specific clinical questions.
In the face of the financial, practical, and ethical challenges inherent in undertaking randomized clinical trials (RCTs), investigators often use observational data to compare the outcomes of different therapies. These comparisons may be biased due to prognostically important baseline differences among patients, often as a result of unobserved treatment selection biases. Unmeasurable clinical and social interactions in the diagnostic-treatment pathway, and physicians' knowledge of unmeasured prognostic variables, may affect treatment decisions and outcomes. Physicians are frequently risk averse in case selection, performing interventions on lower-risk patients despite greater clinical benefit to higher-risk patients.1- 3
In some cases, especially when data are collected on detailed clinical risk factors, these differences can be controlled using standard statistical methods. In other cases, when unmeasured patients characteristics affect both the decision to treat and the outcome, these differences cannot be removed using standard techniques.
More than 280 000 US Medicare enrollees are admitted to the hospital with acute myocardial infarction (AMI) annually. Much of the effort to reduce high mortality rates has focused on invasive diagnostic and therapeutic interventions, such as cardiac catheterization followed by revascularization. Recent systematic reviews of RCTs assessing routine invasive vs conservative therapies found between 8% and 21% improved relative survival in the more invasively-treated group.4,5 Due to the complexity and cost of performing RCTs, there is interest in using observational studies to guide policy statements and clinical protocols, and in generalizing results to the community.
A recent population-based observational study found little benefit to invasive therapy in US regions in which medical management was of higher quality.6 We reanalyzed these data to demonstrate how the estimated benefit from invasive therapy depends on the statistical method used to adjust for overt (measured) and hidden (unmeasured) bias. Methods included multivariable model risk adjustment, propensity score risk adjustment, and propensity-based matching, which control for overt bias, and instrumental variable analysis, which is a method designed to control for hidden bias as well.
We derived the study cohort from the Cooperative Cardiovascular Project, a US national sample of Medicare enrollees hospitalized with first admission for AMI in nonfederal acute care hospitals in 1994-1995.7 The Cooperative Cardiovascular Project comprised clinical data abstracted from medical records during admission, including presentation characteristics, comorbidities, and inpatient treatments. The Cooperative Cardiovascular Project records were linked to Medicare health administrative files to follow up patients for 7 years for vital status and postadmission procedures, and to exclude those patients with AMI in the prior year. We included patients 65 to 84 years who were eligible for Medicare part A and B and not enrolled in a health maintenance organization at the time of admission. We restricted analyses to patients eligible for cardiac catheterization with American College of Cardiology/American Heart Association class I (ideal) or class II (uncertain) indications.6,8 Race, coded as black or nonblack, was obtained from the Medicare Denominator file. We controlled for race since it was associated with both the treatment (cardiac catheterization) and the outcome (mortality). The Committee for the Protection of Human Subjects at Dartmouth College approved the study and waived the requirement for written informed consent.
We examined whether invasive cardiac treatment predicted long-term mortality. Patient-level treatment was defined as receipt of cardiac catheterization within 30 days of index admission date, because cardiac revascularization, through percutaneous coronary intervention or coronary artery bypass graft surgery, is always preceded by coronary angiography and is a marker of intent to treat invasively. Patients who receive invasive cardiac treatment are generally younger, healthier, have lower AMI severity, and may differ in unobserved ways from those who do not.6,9 In contrast, mean AMI admission severity tends to be similar across areas.10,11 Regional treatment intensity was defined as the percentage of eligible patients receiving cardiac catheterization within 30 days of admission for 566 coronary angiography service areas.6,10 Age-, sex-, and race-adjusted regional rates were categorized into quintiles. Patients were assigned to the cardiac catheterization rate of their region of residence.
Patients were followed up from date of AMI admission (index event) through December 31, 2001. The main outcome measure was long-term mortality over 7 years of follow-up. Date of death was obtained from the Medicare Denominator file.
All models used the patient as the unit of analysis. We developed an AMI severity index using Cox proportional hazards regression models to predict 1-year mortality using all baseline patient characteristics of age, sex, race, socioeconomic status, comorbidities, and clinical presentation (c statistic = 0.77).6,12
Cox proportional hazards regression models were used to compare mortality rates between treatment groups, adjusting for 65 patient, hospital, and ZIP code characteristics associated with post-AMI mortality.6
Patient characteristics included age, sex, race, and their interactions; AMI location; presentation characteristics (atrial fibrillation, heart block, congestive heart failure, hypotension, shock, peak creatinine kinase >1000 U/L, cardiopulmonary resuscitation); comorbidities (history of congestive heart failure, dementia, diabetes mellitus, hypertension, metastatic cancer, nonmetastatic cancer, low ejection fraction, peripheral vascular disease, angina, smoking); preadmission ambulatory status; and admission from nursing home.
Hospital characteristics included annual AMI volume and teaching status, and ZIP code–socioeconomic characteristics included median Social Security income and percentage Medicare health maintenance organization. Because patients admitted to the same hospital may have correlated outcomes, survival models incorporated clustering by hospital to adjust the SEs.13 Model fit and proportionality of hazards were assessed using residual analyses.14,15 Analyses were performed by using the STATA procedure STCOX.16
Multivariable Model Risk Adjustment. The multivariable model risk adjustment model is the conventional modeling approach that incorporates all known confounders, including interactions, into the model. Controlling for these covariates produces a risk-adjusted treatment effect and removes overt bias due to these factors. Cox proportional hazards regression models were used to compare mortality rates between those patients who did or did not receive cardiac catheterization, adjusted for all 65 covariates.
Propensity Score Risk Adjustment. The propensity score is the probability of receiving treatment for a patient with specific prognostic factors.17- 19 It is a scalar summary of all observed confounders. Within propensity score strata, covariates in treated and control groups are similarly distributed, so that stratifying on propensity score strata removes more than 90% of the overt bias due to the covariates used to estimate the score.20 Propensity scores cannot remove hidden biases except to the extent that unmeasured prognostic variables are correlated with the measured covariates used to compute the score.19- 21
We computed the propensity score by using logistic regression with the dependent variable being receipt of cardiac catheterization, and the independent variables (covariates) being the 65 patient, hospital, and ZIP code variables. To provide optimal control for confounding, we computed a second propensity score based on the above covariates and all 3-way interactions of age, sex, race, and these variables (750 variables).20 Propensity scores were categorized into deciles. Cox proportional hazards regression models were used to compare mortality rates between those patients who did or did not receive cardiac catheterization, adjusting for propensity decile.17
Propensity-Based Matching. Propensity-based matching is used to select control patients who are similar to patients receiving treatment with respect to propensity score and other covariates, discarding unmatched individuals, thereby matching on many confounders simultaneously.17,22 Although matched analyses may analyze a nonrepresentative sample of patients receiving treatment, they may provide a more valid estimate of treatment effect because they compare patients with similar observed characteristics, all of whom are potential candidates for the treatment. Patients receiving cardiac catheterization were matched to the closest control whose propensity score differed by less than 0.10 among those patients within 5 years of age.22,23 Cox proportional hazard regression models were used to compare adjusted mortality rates between those patients who did or did not receive cardiac catheterization, conditional on matched pair.24
Instrumental Variable Analysis. Instrumental variable analysis is an econometric method used to remove the effects of hidden bias in observational studies.9,25 An instrumental variable has 2 key characteristics: it is highly correlated with treatment and does not independently affect the outcome, so that it is not associated with measured or unmeasured patient health status. We demonstrate that regional cardiac catheterization rate can serve as an effective instrumental variable because prognostic factors related to mortality, such as mean AMI severity, are similar across regions that have dramatically different cardiac catheterization rates.
The instrumental variable behaves like a natural randomization of patients to regional “treatment groups” that differ in likelihood of receiving cardiac catheterization. Unlike randomization, the difference in likelihood of treatment is not 100%, and one can explore but not prove that the groups are similar in unmeasured patient characteristics. Rather than compare patients with respect to the actual treatment received since this might be biased, instrumental variable analysis compares groups of patients that differ in likelihood of receiving cardiac catheterization. It thus estimates the treatment effect on the “marginal” population, defined as patients who would receive cardiac catheterization in regions with higher but not lower catheterization rates.26 Excellent nontechnical expositions of use of geographical instrumental variables exist in the literature.9,25,27
Instrumental variable models produce adjusted estimates of treatment effect on mortality at one time point, on an absolute rather than a relative scale.28 We first estimated adjusted absolute mortality differences 1 and 4 years after index admission between patients receiving vs not receiving cardiac catheterization, using multiple linear regression with the dependent variable being mortality considered as a binary variable. We then estimated instrumental variable–adjusted mortality differences, with the instrumental variable being the regional cardiac catheterization rate, using the STATA procedure IVREG.16 All models controlled for all 65 covariates. Technical details of instrumental variable model estimation are fully described in other articles.25,27,28
For comparison with the Cox proportional hazards regression model estimates, we approximated the corresponding relative mortality rates as 1 + Δ/mnoCATH, where Δ was the instrumental variable–adjusted absolute mortality difference, and mnoCATH was the Kaplan-Meier mortality rate among those patients without cardiac catheterization. These approximate relative rates are comparable but not identical with those from Cox proportional hazards regression models, because analyzing at a fixed point in time does not take into account the time to death and ignores censoring. Finally, Cox proportional hazards regression models were used to estimate relative mortality rates across quintiles of regional cardiac catheterization rate, demonstrating an implicit use of the instrumental variable technique.28
The study cohort consisted of 122 124 patients, 73 238 (60%) of whom received cardiac catheterization within 30 days (Table 1). Patients who received cardiac catheterization were younger, men, had lower AMI severity, and were more likely to be admitted to high-volume hospitals.
Mean cardiac catheterization propensity scores ranged from 0.16 to 0.90 across propensity deciles, with excellent discrimination between treatment groups (c statistic = 0.76). The distribution of key confounders, such as predicted 1-year mortality, age, and history of congestive heart failure, was similar within propensity deciles for those patients with and without cardiac catheterization, except possibly in the lowest decile (Table 2).
Propensity-based matching produced 31 193 matched pairs with standardized differences in patient characteristics of less than 10%, indicating a high degree of similarity in the distributions of prognostic variables (Table 1).17 No match was found for 42 045 patients receiving cardiac catheterization who were younger, had much lower AMI severity, and more likely to be admitted to a high-volume teaching hospital, because there were insufficient control patients with this prognostic profile.
Cardiac catheterization was associated with an approximate 50% relative decrease in mortality rate, using multivariable model risk adjustment, propensity score risk adjustment, or propensity-based matching (Table 3). Adding covariates, using complex propensity models, or finer matching did not alter these findings.
Mean cardiac catheterization rate within 30 days ranged from 29% to 82% across regions and 43% to 65% across cardiac catheterization quintiles. Table 4 reports selected baseline characteristics of study patients, according to quintiles of regional cardiac catheterization rate. Although there were small differences in specific risk factors, mean predicted 1-year mortality, our summary measure of AMI severity, was remarkably similar across regions (quintile 1 [lowest], 26.1%; quintile 2, 26.0%; quintile 3, 25.5%; quintile 4, 25.3%; and quintile 5 [highest], 24.6%). The balance in the distribution of all measured risk factors across regions provides reasonable evidence to infer that the distribution of unmeasured risk factors is likely balanced across regions as well. The wide range of cardiac catheterization rates and the similarity in average patient characteristics lend support to regional cardiac catheterization rates being a strong, valid instrumental variable.
Unadjusted 4-year mortality was 33.9% points lower in patients receiving cardiac catheterization vs patients not receiving cardiac catheterization (Table 5). Adjusted differences were attenuated, and instrumental variable estimates were further attenuated, producing an instrumental variable–adjusted absolute mortality decrease of 9.7% points. This corresponds with an approximate instrumental variable–adjusted relative mortality rate of 0.84 (95% confidence interval [CI], 0.79-0.90). Similar patterns were found at 1 year. The relative mortality rate in regions with the highest (>60.2%) compared with the lowest (<48.2%) cardiac catheterization rates was 0.95 (95% CI, 0.92-0.97), demonstrating an implicit use of instrumental variable techniques (Table 4).
Within a large observational data set, the estimated association of invasive cardiac treatment with long-term mortality is sensitive to the analytic method used. Cardiac catheterization predicted a 50% relative decrease in mortality using standard risk-adjustment methods, including a rigorous propensity-based matching analysis, even after accounting for a clinically rich set of prognostic variables. Using instrumental variable methods, the associated relative decrease in mortality was approximately 16%. When estimated treatment associations vary 3-fold depending on the method used, several questions should come to mind.
Do the results have face validity? The survival benefits of routine invasive care from RCTs are between 8% and 21%.4,5 Results in RCTs are optimized and tend to overestimate the relative benefits achievable in routine clinical practice, given the technological expertise and rapid onset of therapy required to produce optimal results. The overestimate of benefit using standard modeling is likely due to residual confounding related to the selection of lower-risk patients for cardiac catheterization.1,2,6 The magnitude of bias may be greater than usual because receiving catheterization required surviving from admission until this treatment. Even controlling for complete information on patients' admission severity could not eliminate this important survival bias. Such situations are not unusual in observational studies of surgical procedures.
The instrumental variable estimate of a 16% relative survival benefit was closer to RCT results because we used a strong, valid instrumental variable. Although there may be residual unmeasured regional illness differences, this is unlikely since predicted mortality was estimated using strongly prognostic risk factors and was similar for measured covariates across regions. Our instrumental variable predicted a wide range of cardiac catheterization rates (29%-82%). By contrast, McClellan et al9 reported smaller nonsignificant cardiac catheterization effects and larger SEs using an instrumental variable with a smaller range of regional cardiac catheterization rates (15%-27%). Instruments that are more predictive of treatment produce less biased estimates and smaller SEs, and provide closer approximations to the average population effects from RCTs.29,30
When are standard statistical methods likely to produce unbiased findings? The distribution of unmeasured prognostic factors are more likely to be similar when considering therapies with similar clinical indications and risk, such as typical vs atypical neuroleptics for schizophrenia,31,32 or rofecoxib vs celecoxib cyclooxygenase 2 (COX-2) inhibitors for arthritis.33 Randomized clinical trials and observational studies show the greatest similarities under such conditions.34,35 Observational studies of invasive procedures are more prone to bias because patients who are candidates for surgery often differ in unmeasurable ways from patients who are not. A study using propensity-based matching assessed the effects of in-hospital cardiac catheterization using Cooperative Cardiovascular Project data and found smaller long-term relative mortality rates (0.66-0.75)36; however, classifying patients who received cardiac catheterization after discharge and before 30 days as untreated likely attenuated the effects of cardiac catheterization compared with our study.
Which unmeasured factors might account for selection bias reflective of patient prognosis and physician decision-making behaviors? High-risk cardiac markers, such as dynamic or evolving ST- and T-wave changes, may appear during the hospital stay and require serial electrocardiographic interpretations that are rarely captured in observational studies. Relative contraindications, such as renal insufficiency or previous stroke, rarely conform to dichotomous decisions. Severity of comorbidities is difficult to capture. Referral selection may depend on interactions between comorbidities; for example, patients with concomitant aortic valve disease are more likely to be referred for cardiac catheterization but less so, as renal function progressively declines. Some prognosis factors, such as functional status or transient ischemic attack from previous cardiac catheterization, are not available in usual observational data sets. Social factors, such as employment, language barriers, and patient preferences, are rarely measured in these data. The factors comprising angiography decision making are thus complex, prognostically important, and often unmeasurable.
Is the similarity between multivariable and propensity model estimates expected? Mathematically, controlling for propensity score should produce similar results to model-based risk adjustment, because both control for the same measured covariates.37,38
The utility of instrumental variable analyses depends on finding a strong, valid instrumental variable and careful interpretation.25,26 The instrumental variable estimate measures the treatment effect on the “marginal” population. This excludes those patients who would “always” or “never” receive cardiac catheterization, focusing on patients with uncertain indications whose likelihood of being treated depends on local clinical judgment and catheterization laboratory supply.6,26 The treatment effect must be interpreted as potentially due to the instrument itself, as well as characteristics of care systems associated with the instrument. Along with providing more revascularization and less evidence-based medical treatment, high cardiac catheterization rate regions had more high-volume hospitals with specialized staff and equipment, and coronary care units.6,9 Finally, low cardiac catheterization rate regions did not preferentially select high-risk patients who were more likely to benefit from revascularization, ruling out better clinical decision making as an explanation of the smaller marginal survival effects from instrumental variable analyses.6,39,40
When are nontraditional approaches useful? Instrumental variable analyses are most suited to inform policy decisions.26 Because region or physician is often the level at which policy and resource allocation decisions are made, such studies assess the effects of health system factors on patient outcomes. These studies answer policy-relevant questions, such as “What are the benefits of increasing the regional cardiac catheterization laboratory capacity?”, because this would increase the routine provision of invasive services to the AMI population. Other studies have used such designs to evaluate the effects of health care spending,11,41 cardiac management strategies,6 and physician supply42 on patient outcomes. They do not necessarily address questions of clinical effectiveness, such as “What is the effect of providing invasive cardiac treatment to a specific patient?”
Randomized clinical trials cannot be undertaken in all situations in which evidence is needed to guide care. Well-designed observational studies are still needed to assess population effectiveness and to extend results to a general population setting. Our study serves as a cautionary note regarding their analysis and interpretation. First, propensity scores and propensity-based matching have the same limitations as multivariable risk adjustment model methods, and are no more likely to remove bias due to unmeasured confounding when strong selection bias exists. Second, instrumental variable analyses may remove both overt and hidden biases but are more suited to answer policy questions than to provide insight into a specific clinical question for a specific patient. Caution is advised regarding clinical protocols and policy statements for invasive care based on expected mortality benefits derived from traditional multivariable modeling and propensity score risk adjustment of observational studies.
Corresponding Author: Thérèse A. Stukel, PhD, Institute for Clinical Evaluative Sciences, G106, 2075 Bayview Ave, Toronto, Ontario M4N 3M5, Canada (email@example.com).
Author Contributions: As principal investigator, Dr Stukel had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Stukel, Fisher, Wennberg.
Acquisition of data: Fisher.
Analysis and interpretation of data: Stukel, Fisher, Wennberg, Alter, Gottlieb, Vermeulen.
Drafting of the manuscript: Stukel.
Critical revision of the manuscript for important intellectual content: Stukel, Fisher, Wennberg, Alter, Gottlieb, Vermeulen.
Statistical analysis: Stukel, Gottlieb, Vermeulen.
Obtained funding: Fisher.
Administrative, technical, or material support: Wennberg.
Study supervision: Stukel.
Financial Disclosures: None reported.
Funding/Support: This study was supported by grants from the Robert Wood Johnson Foundation, the US National Institute on Aging (1PO1-AG19783-01), and Canadian Institutes of Health Research (CTP79847) Team Grant in Cardiovascular Outcomes Research.
Role of the Sponsors: The funding agencies did not participate in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.
Disclaimer: The content of this article reflects the views of the authors alone and does not necessarily reflect the opinions of the Centers for Medicare & Medicaid Services or the funding agencies.
Acknowledgment: We thank Douglas O. Staiger, PhD, Department of Economics, Dartmouth College, Hanover, NH, for his insightful contributions, Kelvin Lam, MSc, Institute for Clinical Evaluative Sciences, Toronto, Ontario, for his assistance with data analysis, and Nancy MacCallum, MLIS, Institute for Clinical Evaluative Sciences, Toronto, Ontario, for her assistance with manuscript preparation. None received compensation for their work.