[Skip to Navigation]
Sign In
Figure 1.  Flow Diagram of Physician, Patient, and Site Progress Through the Trial
Flow Diagram of Physician, Patient, and Site Progress Through the Trial

LBS indicates larger bonus size; PCP, primary care physician; and PHO, physician-hospital organization.

aPatients were not uniquely attributed to 1 physician at this stage. The total number of unique patients was 16 815.

Figure 2.  Adjusted Analysis of Evidence-Based Quality Measure Achievement in the Randomized Clinical Trial
Adjusted Analysis of Evidence-Based Quality Measure Achievement in the Randomized Clinical Trial

Comparison groups include larger bonus size (LBS) plus increasing social pressure (ISP), LBS plus loss aversion (LA), and LBS only for 2015 through 2016. Data are expressed as adjusted odds ratios with 95% CIs (error bars) for pairwise comparisons. The adjusted model includes the covariates consisting of patient demographics (age, sex, race, and the number of chronic disease registries in which a patient is included) and physician demographics (age, sex, tenure, and specialty). Pairwise difference-in-differences comparisons indicate no significant difference.

Figure 3.  Analysis of Evidence-Based Quality Measure Achievement in Cohort Study
Analysis of Evidence-Based Quality Measure Achievement in Cohort Study

The cohort study evaluated larger bonus size (LBS) vs non-LBS groups from 2015 through 2016. A, Observed (unadjusted) changes in the primary outcome for the LBS group compared with the non-LBS group. B, Estimated risk-adjusted changes in the primary outcome for the LBS group compared with the non-LBS group. Error bars indicate 95% CIs.

Table 1.  Physician and Patient Characteristicsa
Physician and Patient Characteristicsa
Table 2.  Unadjusted Evidence-Based Quality Measure Achievement
Unadjusted Evidence-Based Quality Measure Achievement
1.
Rosenthal  MB.  Beyond pay for performance: emerging models of provider-payment reform.   N Engl J Med. 2008;359(12):1197-1200. doi:10.1056/NEJMp0804658 PubMedGoogle ScholarCrossref
2.
Rosenthal  MB, Frank  RG, Li  Z, Epstein  AM.  Early experience with pay-for-performance: from concept to practice.   JAMA. 2005;294(14):1788-1793. doi:10.1001/jama.294.14.1788 PubMedGoogle ScholarCrossref
3.
Centers for Medicare & Medicaid Services. Quality payment program: what’s the quality payment program? https://www.cms.gov/Medicare/Quality-Payment-Program/Quality-Payment-Program.html. Modified December 21, 2018. Accessed June 5, 2018.
4.
Jha  AK, Joynt  KE, Orav  EJ, Epstein  AM.  The long-term effect of premier pay for performance on patient outcomes.   N Engl J Med. 2012;366(17):1606-1615. doi:10.1056/NEJMsa1112351 PubMedGoogle ScholarCrossref
5.
Song  Z, Safran  DG, Landon  BE,  et al.  The “Alternative Quality Contract,” based on a global budget, lowered medical spending and improved quality.   Health Aff (Millwood). 2012;31(8):1885-1894. doi:10.1377/hlthaff.2012.0327 PubMedGoogle ScholarCrossref
6.
Van Herck  P, De Smedt  D, Annemans  L, Remmen  R, Rosenthal  MB, Sermeus  W.  Systematic review: effects, design choices, and context of pay-for-performance in health care.   BMC Health Serv Res. 2010;10:247. doi:10.1186/1472-6963-10-247 PubMedGoogle ScholarCrossref
7.
Eijkenaar  F, Emmert  M, Scheppach  M, Schöffski  O.  Effects of pay for performance in health care: a systematic review of systematic reviews.   Health Policy. 2013;110(2-3):115-130. doi:10.1016/j.healthpol.2013.01.008 PubMedGoogle ScholarCrossref
8.
de Bruin  SR, Baan  CA, Struijs  JN.  Pay-for-performance in disease management: a systematic review of the literature.   BMC Health Serv Res. 2011;11:272. doi:10.1186/1472-6963-11-272 PubMedGoogle ScholarCrossref
9.
Petersen  LA, Simpson  K, Pietz  K,  et al.  Effects of individual physician-level and practice-level financial incentives on hypertension care: a randomized trial.   JAMA. 2013;310(10):1042-1050. doi:10.1001/jama.2013.276303 PubMedGoogle ScholarCrossref
10.
Dudley  RA, Frolich  A, Robinowitz  DL, Talavera  JA, Broadhead  P, Luft  HS. Strategies to support quality-based purchasing: a review of the evidence, technical review 10. In:  Strategies to Support Quality-Based Purchasing: A Review of the Evidence. Rockville, MD: Agency for Healthcare Research and Quality; 2004.
11.
Khullar  D, Chokshi  DA, Kocher  R,  et al.  Behavioral economics and physician compensation: promise and challenges.   N Engl J Med. 2015;372(24):2281-2283. doi:10.1056/NEJMp1502312 PubMedGoogle ScholarCrossref
12.
Navathe  AS, Sen  AP, Rosenthal  MB,  et al.  New strategies for aligning physicians with health system incentives.   Am J Manag Care. 2016;22(9):610-612.PubMedGoogle Scholar
13.
Emanuel  EJ, Ubel  PA, Kessler  JB,  et al.  Using behavioral economics to design physician incentives that deliver high-value care.   Ann Intern Med. 2016;164(2):114-119. doi:10.7326/M15-1330 PubMedGoogle ScholarCrossref
14.
Asch  DA, Troxel  AB, Stewart  WF,  et al.  Effect of financial incentives to physicians, patients, or both on lipid levels: a randomized clinical trial.   JAMA. 2015;314(18):1926-1935. doi:10.1001/jama.2015.14850 PubMedGoogle ScholarCrossref
15.
Volpp  KG, John  LK, Troxel  AB, Norton  L, Fassbender  J, Loewenstein  G.  Financial incentive-based approaches for weight loss: a randomized trial.   JAMA. 2008;300(22):2631-2637. doi:10.1001/jama.2008.804 PubMedGoogle ScholarCrossref
16.
Song  Z, Safran  DG, Landon  BE,  et al.  Health care spending and quality in year 1 of the alternative quality contract.   N Engl J Med. 2011;365(10):909-918. doi:10.1056/NEJMsa1101416 PubMedGoogle ScholarCrossref
17.
Angrist  JD, Pischke S Jr.  Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press; 2009.
18.
Rubin  DB.  Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: Wiley & Sons Inc; 2004.
19.
Holm  S.  A simple sequentially rejective multiple test procedure.   Scand J Stat. 1979;6(2):65-70.Google Scholar
20.
Meyer  BD.  Natural and quasi-experiments in economics.   J Bus Econ Stat. 1995;13(2):151-161.Google Scholar
21.
Rosenbaum  PR, Rubin  DB.  The central role of the propensity score in observational studies for causal effects.   Biometrika. 1983;70(1):41-55. doi:10.1093/biomet/70.1.41 Google ScholarCrossref
22.
Muller  CJ, MacLehose  RF.  Estimating predicted probabilities from logistic regression: different methods correspond to different target populations.   Int J Epidemiol. 2014;43(3):962-970. doi:10.1093/ije/dyu029 PubMedGoogle ScholarCrossref
23.
McWilliams  JM, Hatfield  LA, Chernew  ME, Landon  BE, Schwartz  AL.  Early performance of accountable care organizations in Medicare.   N Engl J Med. 2016;374(24):2357-2366. doi:10.1056/NEJMsa1600142 PubMedGoogle ScholarCrossref
24.
Hanmer  MJ, Kalkan  OK.  Behind the curve: clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models.   Am J Pol Sci. 2013;57(1):263-277. doi:10.1111/j.1540-5907.2012.00602.x Google ScholarCrossref
25.
Centers for Medicare & Medicaid Services. Comprehensive Primary Care Plus. https://innovation.cms.gov/initiatives/comprehensive-primary-care-plus. Accessed August 28, 2018.
26.
Centers for Medicare & Medicaid Services. Oncology care model. https://innovation.cms.gov/initiatives/oncology-care/. Updated December 27, 2018. Accessed December 10, 2018.
2 Comments for this article
EXPAND ALL
Difficulties of changing physician behavior
Frederick Rivara, MD, MPH | University of Washingtonn
Interesting RCT that found physicians responded to bonus size but not social pressure or loss aversion in improving their performance. Seems like behavioral economics does not always work as expected with docs.
CONFLICT OF INTEREST: Editor in Chief, JAMA Network Open
Pay for performance commoditizes patients and physicians
Edward Volpintesta, MD | Bethel Medical Group
The problem with pay-for-performance is that it further commoditizes the practice of medicine.

It cannot measure if physicians have connected well with their patients and formed a trusting relationship; or if patients were satisfied with the care they received; or if they were seen in a timely manner; or if they believed that their doctor gave them sufficient listening time.

Pay-for-performance seems more applicable to assembly line production of automobiles, televisions, and toasters.

It is not surprising that physicians’ respond more to economic incentives than to "behavioral" ones.

Most doctors feel that they are grossly
underpaid for their labors and it is natural for them to seize the chance to improve their predicament.
CONFLICT OF INTEREST: None Reported
READ MORE
Original Investigation
Health Policy
February 8, 2019

Effect of Financial Bonus Size, Loss Aversion, and Increased Social Pressure on Physician Pay-for-Performance: A Randomized Clinical Trial and Cohort Study

Author Affiliations
  • 1Center for Health Equity Research and Promotion, Corporal Michael J. Crescenz Veterans Affairs Medical Center, Philadelphia, Pennsylvania
  • 2Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia
  • 3Center for Health Incentives and Behavioral Economics, University of Pennsylvania, Philadelphia
  • 4Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
  • 5Department of Health Care Management, Wharton School of Business, University of Pennsylvania, Philadelphia
  • 6Department of Population Health, School of Medicine, New York University, New York, New York
  • 7Advocate Physician Partners, Downers Grove, Illinois
  • 8Division of General Internal Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia
JAMA Netw Open. 2019;2(2):e187950. doi:10.1001/jamanetworkopen.2018.7950
Key Points

Question  Does increasing bonus size or adding the behavioral economic principles of social pressure or loss aversion improve pay-for-performance effectiveness among physicians?

Findings  In this randomized clinical trial of 54 physicians and cohort study including 66 physicians and 8188 patients, increased bonus size was associated with improved quality relative to a comparison group, although adding increased social pressure and opportunities for loss aversion did not improve quality.

Meaning  Increasing pay-for-performance bonus sizes may be associated with improved effectiveness, whereas adding the behavioral economic principles of social pressure and loss aversion may not be.

Abstract

Importance  Despite limited effectiveness of pay-for-performance (P4P), payers continue to expand P4P nationally.

Objective  To test whether increasing bonus size or adding the behavioral economic principles of increased social pressure (ISP) or loss aversion (LA) improves the effectiveness of P4P.

Design, Setting, and Participants  Parallel studies conducted from January 1 to December 31, 2016, consisted of a randomized clinical trial with patients cluster-randomized by practice site to an active control group (larger bonus size [LBS] only) or to groups with 1 of 2 behavioral economic interventions added and a cohort study comparing changes in outcomes among patients of physicians receiving an LBS with outcomes in propensity-matched physicians not receiving an LBS. A total of 8118 patients attributed to 66 physicians with 1 of 5 chronic conditions were treated at Advocate HealthCare, an integrated health system in Illinois. Data were analyzed using intention to treat and multiple imputation from February 1, 2017, through May 31, 2018.

Interventions  Physician participants received an LBS increased by a mean of $3355 per physician (LBS-only group); prefunded incentives to elicit LA and an LBS; or increasing proportion of a P4P bonus determined by group performance from 30% to 50% (ISP) and an LBS.

Main Outcomes and Measures  The proportion of 20 evidence-based quality measures achieved at the patient level.

Results  A total of 86 physicians were eligible for the randomized trial. Of these, 32 were excluded because they did not have unique attributed patients. Fifty-four physicians were randomly assigned to 1 of 3 groups, and 33 physicians (54.5% male; mean [SD] age, 57 [10] years) and 3747 patients (63.6% female; mean [SD] age, 64 [18] years) were included in the final analysis. Nine physicians and 864 patients were randomized to the LBS-only group, 13 physicians and 1496 patients to the LBS plus ISP group, and 11 physicians and 1387 patients to the LBS plus LA group. Physician characteristics did not differ significantly by arm, such as mean (SD) physician age ranging from 56 (9) to 59 (9) years, and sex (6 [46.2%] to 6 [66.7%] male). No differences were found between the LBS-only and the intervention groups (adjusted odds ratio [aOR] for LBS plus LA vs LBS-only, 0.86 [95% CI, 0.65-1.15; P = .31]; aOR for LBS plus ISP vs LBS-only, 0.95 [95% CI, 0.64-1.42; P = .81]; and aOR for LBS plus ISP vs LBS plus LA, 1.10 [95% CI, 0.75-1.61; P = .62]). Increased bonus size was associated with a greater increase in evidence-based care relative to the comparison group (risk-standardized absolute difference-in-differences, 3.2 percentage points; 95% CI, 1.9-4.5 percentage points; P < .001).

Conclusions and Relevance  Increased bonus size was associated with significantly improved quality of care relative to a comparison group. Adding ISP and opportunities for LA did not improve quality.

Trial Registration  ClinicalTrials.gov Identifier: NCT02634879

Introduction

Pay-for-performance (P4P) is being increasingly used by health insurers and health care systems to incentivize physicians to practice higher-value medicine. The introduction of the Merit Incentive Payment System as part of the Medicare Access and CHIP (Children's Health Insurance Program) Reauthorization Act has made P4P a centerpiece of the US shift from volume to value.1-3

However, P4P has not produced consistently positive results.2,4-8 Several explanations have been proposed, including that incentive sizes are too small and baseline performance on quality metrics is too high, reflecting little opportunity to improve; extrinsic incentives may crowd out intrinsic motivation; and the design of programs (eg, high-performance targets) only incentivizes physicians with performance near the thresholds.2,4-8 Further, few randomized clinical trials (RCTs) have evaluated traditional financial P4P incentives for physicians in pragmatic settings, that is, across payers and with incentives tied to comprehensive sets of quality metrics.9,10 Systematic reviews have suggested that programs with larger bonuses achieve greater effects, but they have not accounted for potential confounding factors.6-8 Whether increasing P4P bonus sizes is effective remains an open question.

Another promising strategy is to apply principles from behavioral economics to the design of P4P incentives to make them more effective within the same budget.11-13 Although behavioral economics principles have been applied extensively to financial incentives for patients, few have been rigorously applied to financial incentives for physicians.13-15

To address these knowledge gaps, we conducted a pragmatic RCT of P4P and a simultaneous prospective quasi-experimental comparison evaluating the following 2 interventions: (1) the addition of the behavioral economic principles of loss aversion (LA) and increased social pressure (ISP) to larger bonus sizes (LBS) and (2) LBS alone.

Methods
Study Design

This study follows the Consolidated Standards of Reporting Trials (CONSORT) reporting guideline and the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. Details of the study design, randomization scheme, and interventions are summarized below and presented in detail in the original protocol (Supplement 1). The study protocol was approved by the institutional review boards at the University of Pennsylvania (Philadelphia) and Advocate Health Care (Downers Grove, Illinois), including a waiver of informed consent for patients and physicians.

Setting

The study was conducted in a clinically integrated physician network led by Advocate Physicians Partners (Advocate), Downers Grove, Illinois. The Trinity physician-hospital organization was the setting for the RCT, with other Advocate practices (non-Trinity) serving as the comparison group for the cohort study. Compared with non-Trinity practices, Trinity serves a patient population that includes a higher proportion of minority patients as well as lower socioeconomic status communities. Trinity was selected for inclusion in the RCT by Advocate leadership because its physicians’ quality scores had been consistently lower than those of non-Trinity physicians, all of whom participate in the same P4P program.

This pragmatic study was thus part of a quality improvement study led by Advocate leadership. Budgetary limitations prevented testing LBS across the network of more than 4000 physicians, and it was not possible to randomize to LBS vs behavioral economic designs because it was not culturally acceptable to give different sized bonuses to physicians achieving identical quality scores. Hence, we prospectively designed the study as an observational analysis of LBS by applying them to all Trinity physicians and then randomizing them to evaluate the effect of adding LA and ISP. However, patient-level data became available after randomization that demonstrated fewer physicians with uniquely attributed patients (ie, in our design a patient could only be attributed to 1 physician) and thus led to a smaller sample size than at randomization.

The interventions were tested in a P4P program that started in 2005 with large incentives, for which the design has been reported previously.16 Briefly, the P4P program used a composite quality score (the clinical integration score), which represents a weighted percentage of evidence-based quality measures achieved. The P4P bonus paid is in proportion to the clinical integration score percentage with the maximum possible bonus determined by panel size and other factors such as payer mix.

Advocate leadership and the study investigators conducted in-person sessions with all eligible physicians to describe the study procedures and conduct a baseline survey. All physicians in the Trinity leadership voted to participate in the study. All Advocate-wide physicians were aware of the study because of networkwide communications and were routinely monitored per the existing P4P program.

Study Patient Population

The study patient population included patients with at least 1 of 5 chronic diseases (asthma, chronic obstructive pulmonary disease, type 2 diabetes, coronary artery disease or ischemic vascular disease, and congestive heart failure). Participating patients were attributed (using plurality of visits) for more than 12 months to participating physicians who were active in Advocate’s existing P4P program.

Interventions

The active phase of the trial was from January 1 through December 31, 2016. The LBS intervention provided maximum P4P bonuses larger than previous years by a mean of $3355 per physician, representing an approximately 32% increase in bonus size and an increase of $16 per patient (from $52 to $68 per patient). Quality metrics and scoring methods were left unchanged. This intervention represented an active control condition in which the physicians received larger bonuses than physicians not participating in the RCT (and larger bonuses than they received the prior year).

The LBS plus LA intervention included the larger maximum bonus plus prefunded incentives in a virtual health system bank account in the physician’s name. The prefunded incentives, which were 50% of the expected incentives based on prior year’s performance, were placed into the virtual account on January 1, 2016, and accessible by email request to the Advocate chief financial officer. If at the end of 2016 physicians earned less P4P dollars than were placed in the virtual accounts, physicians were required to pay back funds from the virtual account and/or any dollars overdrawn from it. Physicians in this intervention group received 4 additional pro formas in February, July, September, and November of 2016 that indicated dollar amounts for the prefunded incentive, those accessed year-to-date, the projected 2016 incentive bonus size, and the projected residual unearned incentive (eFigure 1 in Supplement 2). This intervention group was exposed to the behavioral economic principles of an endowment effect, in which people work harder not to give up something they already have, and LA, testing whether physicians would respond more strongly to the same incentive amount framed in terms of losses rather than gains. However, the risk of overdrawing was quite small, because 94% of physicians earned at least 50% of their bonuses in the prior year, meaning the intervention may better be interpreted as the opportunity for LA.

The LBS plus ISP intervention included the increased maximum bonus but also changed the composite quality score from 70% based on individual score and 30% based on the physician-hospital organization score to 50% individual and 50% group (herein defined as all physicians in the same intervention group). Physicians in this intervention group also received 4 additional pro formas on the same dates as above with the additional P4P bonus dollars that would be earned by the 20–percentage point increase in the weighting given to group score as well as an unblinded list of physicians with performance scores on 2 of the quality measures (eFigures 1 and 2 in Supplement 2). Physicians with scores below the performance threshold were identified. This intervention used the behavioral economic principle of ISP, through which individuals will try to improve their performance because others in the group can identify poor performers.

Outcome Measures

The primary study outcome was the 2015-2016 change in proportion of applicable chronic disease and preventive evidence-based measures within the P4P program meeting or exceeding benchmarks based on national Healthcare Effectiveness Data and Information Set standards at the patient level, representing a patient’s view of the proportion of evidence-based care received. Secondary outcomes included changes in the probabilities of meeting the individual measures (eTable 1 in Supplement 2).

Randomization

Eligible affiliated physicians in the RCT were randomized by practice site to active control or 1 of 2 intervention groups in a 1:1:1 ratio, stratified by primary care vs specialist. Because of the need to randomize before the start of 2016 (because quality metrics were on a calendar-year cycle) and owing to data transfer delays, randomization occurred before data for attribution of patients became available. Study participants and operational staff did not have any influence on randomization but were not blinded to group assignment because knowledge of the incentives is essential to their mechanism. Study investigators and data analysts remained blinded until all follow-up data were obtained and primary analyses were finalized.

Statistical Analysis

Data were analyzed using intention to treat and multiple imputation. Data analysis was performed from February 1, 2017, through May 31, 2018.

RCT Testing Addition of ISP and LA to LBS

Although randomization occurred at the physician-site level, the patient was the unit of analysis for the primary outcome. The patient-level primary outcome was measured using the number of measures achieved (events) of all applicable chronic disease measures (trials). The primary analysis used a generalized linear model with binomial distribution and logit link function to estimate the odds of achieving evidence-based chronic disease measures for each patient (ie, events and/or trials) clustered at the physician level to adjust for multiple patients nested within physicians.16-18 The model included adjustments for the overall proportion of measures achieved in 2016, each treatment group, and the interaction term for treatment group by 2016, which gave the primary effect of interest representing the change in the outcome for each treatment group compared with the change in the active control group. Examining the change was intended to address potential differences in baseline measures between groups. Additional adjustments included patient demographics (including race) and chronic condition and physician demographics, training and specialty, certification, years of experience, and practice characteristics. We conducted pairwise comparisons of the primary outcome for each treatment group against the control group and for the treatment groups compared with each other, exponentiating the coefficient and 95% CI estimates from the logistic regressions to compute adjusted odds ratios (aORs).

Approximately 11% of patients were missing follow-up quality measures included in the primary outcome. Multiple imputation with 20 imputations was used, achieving at least 98% relative efficiency and ensuring in-range values. All analyses were conducted on each imputed data set; results were combined using the standard rules from Rubin.18 We also conducted sensitivity analysis without imputing missing data, clustering by site (although most sites only had 1 physician), and with models using physician random effects.

Power calculations were derived assuming the comparison of each incentive group with the control group using a Bonferroni-corrected type I error of 0.017, followed by comparison of any incentive groups that showed significant differences from control using a sequential Holm-Bonferroni approach.19 The study was designed to have at least 80% power to detect differences in the change in proportion of evidence-based measures received between any incentive group and control of 5%. We used a conservative assumption of intraclass correlation coefficient of 0.25 within physicians. Simulation studies incorporating these variables indicated the need for approximately 3420 participants (1140 per group).

Cohort Study Testing LBS

To examine the association between the mean effect across all 3 groups receiving the LBS intervention and the proportion of evidence-based care received by patients, we used a difference-in-differences method to compute the change in the primary outcome for patients attributed to physicians in the LBS group vs those attributed to a propensity-matched set of physicians who did not receive an LBS.20 Physicians were matched using demographics, the preintervention (2015) level, and the preintervention time trend in the primary outcome (eMethods 1, eFigure 3, and eFigure 4 in Supplement 2).21 The time trend was measured as the difference between 2014 and 2015 risk-adjusted composite scores. As in the RCT analysis and other literature,22 we used the same primary outcome and specification, with the addition of physician fixed effects to account for unobserved time-invariant differences among physicians because of a lack of randomization. Model variables included the main intervention group effect, year effect, and interaction of group and year for the effect of interest. A test of trends between the groups from 2011 to 2015 did not indicate divergent trends (eMethods 2 and eTable 5 in Supplement 2). Standard errors were clustered to account for repeated measures at the patient level following previous studies.16,23 We also conducted a sensitivity analysis without physician fixed effects.

Estimated Risk-Standardized Primary Outcome

In the RCT and cohort study analyses, we estimated the risk-standardized proportion of evidence-based measures achieved using bootstrapping.22,24 All P values were 2-sided with P < .05 indicating significance. Analyses were conducted using SAS software (version 9.4; SAS Institute, Inc).

Physician Survey Methods

We also conducted online pretrial and posttrial physician surveys to assess the influence and acceptability of the interventions in several domains using paired t tests to compare mean Likert scale responses by group. The domains included baseline attitudes, teamwork, financial salience, practice environment, awareness and/or understanding, influence on clinical behavior, and unintended consequences.

Results
Sample Characteristics

A total of 86 physicians were randomized, although 32 received the interventions but were excluded from the analysis because they did not have unique attributed patients (Figure 1). Seven physicians (with 465 attributed patients) electively terminated their contracts with Advocate for reasons outside of the study, but patients were analyzed according to the assigned study group in an intention-to-treat approach. A total of 33 physicians (18 male [54.5%] and 15 female [45.5%]), 27 practice sites, and 3747 attributed patients (1358 male [36.2%] and 2384 female [63.6%] among those with available data; mean [SD] age, 64 [18] years) were included in the final RCT analysis. Nine physicians and 864 patients were randomized to the LBS-only group, 13 physicians and 1496 patients to the LBS plus ISP group, and 11 physicians and 1387 patients to the LBS plus LA group. Physician characteristics did not differ significantly by arm, such as mean (SD) physician age ranging from 56 (9) to 59 (9) years, and sex (6 [46.2%] to 6 [66.7%] male). Mean (SD) physician age was 57 (10) years, with a mean (SD) tenure of 12 (8) years with Advocate and predominantly in a primary care specialty (27 of 33 [81.8%]) (Table 1). Demographic and professional characteristics of enrolled physicians and attributed patients were not significantly different across intervention groups. There were small differences in patient race and age. Characteristics of the 33 matched physicians did not exhibit differences relative to the RCT physicians (Table 2 and eTables 2 and 4 in Supplement 2), whereas their 4371 attributed patients were less likely to be black (831 of 4371 [19.0%] vs 2667 of 3747 [71.2%]; P < .001) and were older (median age, 67 years [interquartile range, 57-75 years] vs 64 years [interquartile range, 55-73 years]; P < .001) compared with the RCT patients.

RCT Testing Addition of ISP and LA to LBS

Patients in all groups experienced an increase in the mean rate of receiving evidence-based care. Patients in the LBS-only group had an absolute increase of 4.2 percentage points (87.6% in 2015 to 91.8% in 2016) (Table 2 and eTable 2 in Supplement 2). Patients in the LBS plus LA group had an increase in the mean rate of 3.8 percentage points (83.9% in 2015 to 87.7% in 2016) and patients in the LBS plus ISP group had an increase of 4.4 percentage points (84.6% in 2015 to 89.0% in 2016). However, adjusted pairwise testing revealed no differences between groups, with intervention group point estimates less than those of the control group (LBS plus LA group vs the LBS-only: aOR, 0.86 [95% CI, 0.65-1.15; P = .31]; LBS plus ISP vs LBS-only: aOR, 0.95 [95% CI, 0.64-1.42; P = .81]; and LBS plus ISP vs LBS plus LA aOR, 1.10 [95% CI, 0.75-1.61; P = .62]) (Figure 2). Analysis of individual measures did not reveal any systematic pattern of changes between the groups (Table 2). Sensitivity analyses gave similar results (eFigures 5-8 in Supplement 2).

Cohort Study Testing LBS

Patients in the LBS cohort experienced an increase in the mean rate of receiving evidence-based care of 4.1 percentage points (85.0% in 2015 to 89.2% in 2016) compared with an increase of 2.0 percentage points (86.2% to 88.2%) (Table 2 and eTable 3 in Supplement 2) in patients in the matched non-LBS group (Figure 3). Adjusted analysis demonstrated a significant association between the LBS cohort and increased evidence-based care (aOR, 1.25; 95% CI, 1.16-1.35; P < .001) or an estimated adjusted absolute increase of 3.2 percentage points (95% CI, 1.9-4.5 percentage points; P < .001). Sensitivity analyses provided similar results (eFigures 8 and 9 in Supplement 2).

Analysis of individual measures demonstrated that the associated increases in evidence-based care were significant for the following 3 measures (all other measures without significant changes) in the LBS vs non-LBS groups: blood pressure control (1.6–percentage point increase vs 4.3–percentage point decrease; P < .001), receiving a foot examination with a diabetes diagnosis (7.5– vs 0.4–percentage point increase; P < .001), and cessation of tobacco use (6.5–percentage point increase vs 1.1–percentage point decrease; P = .04) (Table 2).

Physician Survey Results

Twenty-seven physicians (81.8%) responded to the preintervention survey and 32 (97.0%) to the postintervention survey. Although physicians in the LBS plus LA group indicated an increase in financial salience of the incentives (increase of 0.67 point on Likert scale [P = .04] vs −0.25 point for LBS-only group [P = .33] and 0.01 point for LBS plus ISP group [P = .41]), concerns about negative unintended effects also increased (change of 0.48 point [P = .01] vs 0.27 for LBS-only group [P = .14] and 0.11 point for the LBS plus ISP group [P = .25]) (eTable 3 in Supplement 2). The LBS plus ISP group indicated a decrease in teamwork (change of −0.37 point [P = .02] vs 0.03 point for LBS-only group [P = .48] and 0.18 point for LBS plus LA group [P = .30]). There were no detected harms to physicians and patients.

Discussion

In this study testing behavioral economic principles in the P4P design through an RCT and evaluating increased bonus sizes, we found an increase in bonus size was associated with significantly improved quality for patients receiving care for chronic disease relative to a comparison group during a single year. Adding LA and ISP did not lead to further quality improvements, although attrition and a small sample size limited statistical power. We made 3 important findings.

First, an increase in the maximum bonus size of approximately $3355 (a mean of $16 per patient) or 32% per physician was associated with a small but significant improvement in evidence-based care received by patients. This improvement is particularly notable because the Advocate P4P program already had relatively large bonuses to start, with a mean of approximately $10 000 per physician per year for panel sizes of approximately 200 patients. A critique of P4P programs has been inadequate bonus sizes, with prior studies limited to specific settings that are less representative of general practice.6-8 Our results suggest that in a general primary care physician program with substantial bonus sizes and a large budget, further increases in bonuses were associated with gains in quality. However, they should also be interpreted with caution given a unique, single-institution setting in which the group exposed to LBS was lower performing and may have had a greater opportunity to improve. Further, because comparison group physicians may not have been as aware of their inclusion in the study, results may have been confounded by the Hawthorne effect.

Second, the addition of behavioral economic principles of LA and ISP did not increase the incentives’ effectiveness. However, the final sample size of 33 physicians was small, and thus the study was underpowered to detect clinically meaningful effects (noticeable in large 95% CI ranges), although the point estimates did not indicate directional improvements for either intervention group relative to the control group. Further, it is important to understand why the LA arm was not effective, in that only 2 physicians withdrew money from their virtual accounts. This outcome was despite low risk of having to pay back overdrawn dollars, because 31 physicians (93.9%) earned larger bonuses than the amount placed into virtual accounts in the previous year. These design features are important given that national policies are now using LA. The virtual bank accounts used herein were quite different than placing reimbursement at risk (such as the Merit Incentive Payment System) or prepaying dollars in advance (such as Medicare’s Comprehensive Primary Care Plus for 2965 practices nationwide).25 Although stronger forms of LA have been successful,26 the virtual account approach used herein was softer and, as it turns out, less effective.

Third, the results on group performance–based incentives are also important, given increasing interest in their use.25,26 The group incentives in this study were shared across a group of physicians in the same organization who did not necessarily share resources or work on the same care team, unlike in a prior study.9 The intention was to test ISP alone, rather than ISP mixed with incentives for teamwork directly. One other RCT that evaluated smaller incentives at the group level9 tested practice-level incentives (including nonphysicians) for improving adherence to hypertension guidelines within the Veterans Affairs system and found no benefit to group vs individual incentives.

Limitations

This study has several limitations. First, this trial was conducted at a single health system network with a small sample size, dropout, and potential for confounding from the Hawthorne effect. However, these limitations represent the pragmatic nature of the study, because many physician networks and health plans face these challenges, and we used intention to treat and multiple imputation methods to mitigate bias. Second, the interventions may not have been strong enough or active long enough to drive behavior changes. Third, the evaluation of increased bonus size was an observational analysis subject to confounding.

Conclusions

In a P4P program for physicians caring for patients with chronic disease, increasing bonus sizes was associated with improvements in quality, whereas adding an opportunity for LA and the proportion of incentive based on group performance did not lead to additional benefit. Further refinement of applications of behavioral economic principles in P4P design should be tested with larger sample sizes.

Back to top
Article Information

Accepted for Publication: December 11, 2018.

Published: February 8, 2019. doi:10.1001/jamanetworkopen.2018.7950

Correction: This article was corrected on March 14, 2022.

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Navathe AS et al. JAMA Network Open.

Corresponding Author: Amol S. Navathe, MD, PhD, Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, 1108 Blockley Hall, Philadelphia, PA 19104 (amol@wharton.upenn.edu).

Author Contributions: Dr Navathe had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Navathe, Volpp, Bond, Sacks, Nelson, Patel, Shea, Sokol, Crawford, Emanuel.

Acquisition, analysis, or interpretation of data: Navathe, Volpp, Caldarella, Bond, Troxel, Zhu, Matloubieh, Lyon, Mishra Meza, Patel, Calcagno, Vittore, Sokol, Weng, McDowald, Crawford, Small, Emanuel.

Drafting of the manuscript: Navathe, Caldarella, Matloubieh, Lyon, Mishra Meza, Patel, Shea, Calcagno, Vittore, Small.

Critical revision of the manuscript for important intellectual content: Navathe, Volpp, Bond, Troxel, Zhu, Lyon, Sacks, Nelson, Patel, Sokol, Weng, McDowald, Crawford, Emanuel.

Statistical analysis: Navathe, Bond, Troxel, Zhu, Mishra Meza, Small.

Obtained funding: Navathe.

Administrative, technical, or material support: Navathe, Volpp, Caldarella, Matloubieh, Lyon, Sacks, Patel, Shea, Calcagno, Vittore, Sokol, McDowald, Crawford.

Supervision: Navathe, Volpp, Bond, Nelson, Patel, Weng, Crawford, Emanuel.

Conflict of Interest Disclosures: Dr Navathe reported receiving grants from the Commonwealth Fund and Robert Wood Johnson Foundation during the conduct of the study; grants from Hawaii Medical Services Association, Anthem Public Policy Institute, Oscar Health, and CIGNA Corporation; personal fees from Navvis and Company, Navigant, Inc, Lynx Medical, Indegene, Inc, Sutherland Global Services, Elsevier Press, Navahealth, Cleveland Clinic, and Agathos, Inc; and serving as an uncompensated board member from Integrated Services, Inc, outside the submitted work. Dr Volpp reported receiving grants from the Commonwealth Fund and Robert Wood Johnson Foundation during the conduct of the study; personal fees from CVS Health and VAL Health; equity from VAL Health; and grants from Hawaii Medical Services Association, Vitality/Discovery, Humana, Merck, and Weight Watchers outside the submitted work. Dr Patel reported receiving grants from Commonwealth Fund during the conduct of the study. Dr Shea reported receiving grants from the National Institutes of Health, National Heart, Lung, and Blood Institute, and Accreditation Council for Graduate Medical Education during the conduct of the study and outside the submitted work. Dr Crawford reported ownership interest in Associates in Nephrology, SC, and Research by Design, LLC. Dr Emanuel reported speaking fees from Leigh Bureau Bill Leigh/Jennifer Bowen; serving as a CNN consultant and contributor; investments in Oak HC/FT Venture Fund II, Applecart Project, Maniv, Silver Lake, United Health Group, Gilead, Allergan, Amgen, Baxter, and Medtronics; membership on the Council on Foreign Relations; serving as senior fellow for the Center for American Progress; contributing to the New York Times; serving as a board member to the Yale Open Data Access Program and the JAMA editorial board; compensated speaking engagements for J. P. Morgan Chase, University of Michigan, Ann Arbor, CVS Caremark, National Council for Behavioral Health, Sound Physicians, Merrill Lynch & Co, Inc, Marcus Evans, Inc, Klick Health, Entrée Health, American Health Lawyers Association, Athenahealth, AmeriHealth Caritas Family of Companies, McKesson Corporation, Valence Health, North Texas Specialty Physicians, Gerontological Society of America, Federacao Nacional Das Empresas de Seguros Privados, de Capitalizacao e de Previdencia Complementar Aberta-Fenaseg, Advocate Health Care, OrthoCarolina Annual Physician Retreat, Tanner Healthcare System, Mid-Atlantic Permanente Group, American College of Radiology, Marcus Evans Long-Term Care & Senior Living Summit, Loyola University Chicago, Oncology Society of New Jersey, Good Shepherd Community Care, Remedy Partners, Medzel, Kaiser Permanente Virtual Medicine, Wallace H. Coulter Foundation, Lake Nona Institute, Allocation, Partners Chicago, Pepperdine University, Huron 2017 CEO Forum, American Case Management Association, Chamber of Commerce, Blue Cross Blue Shield Minneapolis, United Health Group, Futures Without Violence, Children’s Hospital of Philadelphia (CHOP) Drug Pricing, Washington State Hospital Association, Association of Academic Health Centers, State Administration of Foreign Affairs, Blue Cross/Blue Shield of Massachusetts, Inc, Lumeris, CHOP, Roivant Sciences, Inc, Transformational Institute, Medical Specialties Distributors, LLC, Vizient University Health System Consortium, Center for Neurodegenerative Research Alzheimer’s Disease Center, United Health, Genentech Oncology, Inc, Council of Insurance Agents and Brokers, America’s Health Insurance Plans, Montefiore Physician Leadership Academy Launch, Medical Home Network, Healthcare Financial Management Association, Ecumenical Center–UT Health, American Academy of Optometry, Associação Nacional de Hospitais Privados, National Alliance of Healthcare Purchaser Coalitions, Optum Labs, Massachussetts Association of Health Plans, and District of Columbia Hospital Association; and uncompensated speaking engagements for the Board of Women Visitors Meeting (Mission of Healthcare), University of California, San Francisco, Philadelphia Aspen Challenge, National Business Group on Health, Consortium of Universities for Global Health, Berjen University, Delaware Healthcare Spending Benchmark Summit, American Academy of Ophthalmology (Geisinger: From Crisis to Cure), National Institute for Health Care Management, Berjen University, MCW Commencement, and RAND Corporation. No other disclosures were reported.

Funding/Support: This study was supported by grants from the Commonwealth Fund and the Robert Wood Johnson Foundation.

Role of the Funder/Sponsor: The funders/sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 3.

References
1.
Rosenthal  MB.  Beyond pay for performance: emerging models of provider-payment reform.   N Engl J Med. 2008;359(12):1197-1200. doi:10.1056/NEJMp0804658 PubMedGoogle ScholarCrossref
2.
Rosenthal  MB, Frank  RG, Li  Z, Epstein  AM.  Early experience with pay-for-performance: from concept to practice.   JAMA. 2005;294(14):1788-1793. doi:10.1001/jama.294.14.1788 PubMedGoogle ScholarCrossref
3.
Centers for Medicare & Medicaid Services. Quality payment program: what’s the quality payment program? https://www.cms.gov/Medicare/Quality-Payment-Program/Quality-Payment-Program.html. Modified December 21, 2018. Accessed June 5, 2018.
4.
Jha  AK, Joynt  KE, Orav  EJ, Epstein  AM.  The long-term effect of premier pay for performance on patient outcomes.   N Engl J Med. 2012;366(17):1606-1615. doi:10.1056/NEJMsa1112351 PubMedGoogle ScholarCrossref
5.
Song  Z, Safran  DG, Landon  BE,  et al.  The “Alternative Quality Contract,” based on a global budget, lowered medical spending and improved quality.   Health Aff (Millwood). 2012;31(8):1885-1894. doi:10.1377/hlthaff.2012.0327 PubMedGoogle ScholarCrossref
6.
Van Herck  P, De Smedt  D, Annemans  L, Remmen  R, Rosenthal  MB, Sermeus  W.  Systematic review: effects, design choices, and context of pay-for-performance in health care.   BMC Health Serv Res. 2010;10:247. doi:10.1186/1472-6963-10-247 PubMedGoogle ScholarCrossref
7.
Eijkenaar  F, Emmert  M, Scheppach  M, Schöffski  O.  Effects of pay for performance in health care: a systematic review of systematic reviews.   Health Policy. 2013;110(2-3):115-130. doi:10.1016/j.healthpol.2013.01.008 PubMedGoogle ScholarCrossref
8.
de Bruin  SR, Baan  CA, Struijs  JN.  Pay-for-performance in disease management: a systematic review of the literature.   BMC Health Serv Res. 2011;11:272. doi:10.1186/1472-6963-11-272 PubMedGoogle ScholarCrossref
9.
Petersen  LA, Simpson  K, Pietz  K,  et al.  Effects of individual physician-level and practice-level financial incentives on hypertension care: a randomized trial.   JAMA. 2013;310(10):1042-1050. doi:10.1001/jama.2013.276303 PubMedGoogle ScholarCrossref
10.
Dudley  RA, Frolich  A, Robinowitz  DL, Talavera  JA, Broadhead  P, Luft  HS. Strategies to support quality-based purchasing: a review of the evidence, technical review 10. In:  Strategies to Support Quality-Based Purchasing: A Review of the Evidence. Rockville, MD: Agency for Healthcare Research and Quality; 2004.
11.
Khullar  D, Chokshi  DA, Kocher  R,  et al.  Behavioral economics and physician compensation: promise and challenges.   N Engl J Med. 2015;372(24):2281-2283. doi:10.1056/NEJMp1502312 PubMedGoogle ScholarCrossref
12.
Navathe  AS, Sen  AP, Rosenthal  MB,  et al.  New strategies for aligning physicians with health system incentives.   Am J Manag Care. 2016;22(9):610-612.PubMedGoogle Scholar
13.
Emanuel  EJ, Ubel  PA, Kessler  JB,  et al.  Using behavioral economics to design physician incentives that deliver high-value care.   Ann Intern Med. 2016;164(2):114-119. doi:10.7326/M15-1330 PubMedGoogle ScholarCrossref
14.
Asch  DA, Troxel  AB, Stewart  WF,  et al.  Effect of financial incentives to physicians, patients, or both on lipid levels: a randomized clinical trial.   JAMA. 2015;314(18):1926-1935. doi:10.1001/jama.2015.14850 PubMedGoogle ScholarCrossref
15.
Volpp  KG, John  LK, Troxel  AB, Norton  L, Fassbender  J, Loewenstein  G.  Financial incentive-based approaches for weight loss: a randomized trial.   JAMA. 2008;300(22):2631-2637. doi:10.1001/jama.2008.804 PubMedGoogle ScholarCrossref
16.
Song  Z, Safran  DG, Landon  BE,  et al.  Health care spending and quality in year 1 of the alternative quality contract.   N Engl J Med. 2011;365(10):909-918. doi:10.1056/NEJMsa1101416 PubMedGoogle ScholarCrossref
17.
Angrist  JD, Pischke S Jr.  Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press; 2009.
18.
Rubin  DB.  Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: Wiley & Sons Inc; 2004.
19.
Holm  S.  A simple sequentially rejective multiple test procedure.   Scand J Stat. 1979;6(2):65-70.Google Scholar
20.
Meyer  BD.  Natural and quasi-experiments in economics.   J Bus Econ Stat. 1995;13(2):151-161.Google Scholar
21.
Rosenbaum  PR, Rubin  DB.  The central role of the propensity score in observational studies for causal effects.   Biometrika. 1983;70(1):41-55. doi:10.1093/biomet/70.1.41 Google ScholarCrossref
22.
Muller  CJ, MacLehose  RF.  Estimating predicted probabilities from logistic regression: different methods correspond to different target populations.   Int J Epidemiol. 2014;43(3):962-970. doi:10.1093/ije/dyu029 PubMedGoogle ScholarCrossref
23.
McWilliams  JM, Hatfield  LA, Chernew  ME, Landon  BE, Schwartz  AL.  Early performance of accountable care organizations in Medicare.   N Engl J Med. 2016;374(24):2357-2366. doi:10.1056/NEJMsa1600142 PubMedGoogle ScholarCrossref
24.
Hanmer  MJ, Kalkan  OK.  Behind the curve: clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models.   Am J Pol Sci. 2013;57(1):263-277. doi:10.1111/j.1540-5907.2012.00602.x Google ScholarCrossref
25.
Centers for Medicare & Medicaid Services. Comprehensive Primary Care Plus. https://innovation.cms.gov/initiatives/comprehensive-primary-care-plus. Accessed August 28, 2018.
26.
Centers for Medicare & Medicaid Services. Oncology care model. https://innovation.cms.gov/initiatives/oncology-care/. Updated December 27, 2018. Accessed December 10, 2018.
×