Changes in Patient Experiences and Assessment of Gaming Among Large Clinician Practices in Precursors of the Merit-Based Incentive Payment System

Key Points Question Do clinician practices game pay-for-performance programs by selectively reporting measures on which they already perform well, and does mandating public reporting on patient experience measures improve care? Findings In this cross-sectional analysis of patient experience data from Consumer Assessment of Healthcare Providers and Systems (CAHPS) surveys, practices were more likely to voluntarily include CAHPS measures in a Medicare pay-for-performance program when they previously scored higher on these measures. However, mandatory public reporting of CAHPS measures was not associated with improved patient experiences with care. Meaning These findings support calls to end voluntary measure selection in public reporting and pay-for-performance programs, including Medicare’s Merit-Based Incentive Payment System, but also suggest that requiring practices to report on patient experiences may not produce gains.


I. Policy context
We studied two precursors of Medicare's Merit-Based Incentive Payment System: the Value-Based Payment Modifier (VM), a pay-for-performance program, and the Physician Quality Reporting System (PQRS), a public reporting program. As detailed in eTable 1, the VM and PQRS were phased in over time, with different program components becoming mandatory based on the year and size of clinician practices. 1- 9 We exploited the 2014 phase-in of a PQRS policy, which required large practices with ≥100 clinicians to publicly report patient experience measures from the Consumer Assessment of Healthcare Providers and Systems (CAHPS) survey, to study the association between mandatory public reporting and changes in performance on CAHPS measures. We also examined practices' decision to include CAHPS measures as an optional component of quality scores for the VM, whose pay-for-performance • PQRS: Mandatory program with penalties for non-participation, though reporting patient experiences was optional • VM: did not apply • PQRS: Mandatory program with penalties for non-participation, though reporting patient experiences was optional • VM: Bonuses up to +2.0% x budget neutrality factor a based on overall quality and cost scores (practices exempt from penalties if they met PQRS reporting requirements) • PQRS: Mandatory program with penalties for non-participation; reporting patient experiences with care was mandatory • VM: Penalties or bonuses (-2.0% to up to +2.0% x budget neutrality factor a ) based on overall quality and cost scores. Patient experiences were optional components of overall quality score

2015
• PQRS: Mandatory program with penalties for non-participation, though reporting patient experiences was optional • VM: Bonuses up to +2.0% x budget neutrality factor a based on overall quality and cost scores (practices exempt from penalties if they met PQRS reporting requirements) • PQRS: Mandatory program with penalties for non-participation, though reporting patient experiences was optional • VM: Penalties or bonuses (-4.0% to up to +4.0% x budget neutrality factor a ) based on overall quality and cost scores.
• PQRS: Mandatory program with penalties for non-participation; reporting patient experiences with care was mandatory • VM: Penalties or bonuses (-4.0% to up to +4.0% x budget neutrality factor a ) based on overall quality and cost scores. Patient experiences were optional components of overall quality score

2016
• PQRS: Mandatory program with penalties for non-participation, though reporting patient experiences was optional • VM: Bonuses up to +2.0% x budget neutrality factor a based on overall quality and cost scores (practices exempt from penalties if they met PQRS reporting requirements) • PQRS: Mandatory program with penalties for non-participation, though reporting patient experiences was optional • VM: Bonuses up to +2.0% x budget neutrality factor a based on overall quality and cost scores (practices exempt from penalties if they met PQRS reporting requirements) • PQRS: Mandatory program with penalties for non-participation; reporting patient experiences with care was mandatory • VM: Bonuses up to +2.0% x budget neutrality factor a based on overall quality and cost scores (practices exempt from penalties if they met PQRS reporting requirements). Patient experiences were optional components of overall quality score The table shows the phase-in of the programs by performance year; practices received payment adjustments related to these programs 2 years later. a Budget neutrality factor was calculated by CMS and scaled percentage point payment adjustments so that VM bonuses and penalties were budget neutral in aggregate. Practices subject to the VM could receive additional upward payment adjustments for performance if they disproportionately served highrisk patients (mean Medicare HCC risk score of attributed patients in the upper quartile of Medicare beneficiaries nationally). b Transitional year omitted from public reporting analyses.

II. Fee-for-service Medicare CAHPS and CAHPS for PQRS surveys
We analyzed data from two main sources. First, we analyzed practice-level data from VM Practice Files. These files included annual patient experience scores for practices that reported these measures for the PQRS. The Centers for Medicare and Medicaid Services (CMS) calculated these scores based on responses to the CAHPS for PQRS survey, which was administered annually to a random sample of fee-for-service Medicare beneficiaries in practices where beneficiaries received most of their primary care. 10 Only practices that reported patient experiences with care for the PQRS were included in the survey. CMS reported practice-level scores for 11 patient experience domains assessed in the CAHPS for PQRS survey (eTable 2).
Practice-level scores were publicly reported by CMS, 11,12 though practices could voluntarily include them in an overall quality score for the VM. 2,13 (Practices could include all or none of the 11 CAHPS domain scores in their overall VM quality score. 14 ) The overall quality score was one of two factors that determined whether practices received performance-based payment adjustments-i.e., bonuses or penalties-through the VM (the other factor was perpatient spending). 1- 6 We analyzed annual scores for large practices (≥100 clinicians) from 2014-2016, when these scores were publicly reported and practices could voluntarily include them in the VM.
Second, we analyzed patient-level data from the Fee-for-service Medicare CAHPS survey, which is separate from, but closely related to, the CAHPS for PQRS survey. 13 The Feefor-service Medicare CAHPS survey was administered annually to a representative sample of individuals enrolled in traditional (i.e., fee-for-service) Medicare and who lived in the community. 15 We analyzed this survey to assess changes in patient experiences with care  19 the true correlations may be higher.
Finally, we examined the proportion of survey items with missing data in the 5 patient experience domains we analyzed from the Fee-for-service Medicare CAHPS survey. Missing data can be a result of item non-response, skip patterns, or the exclusion of items in certain survey years. We found that missing response rates were comparable in large vs. smaller practices, and except for 1 item (seeing a physician within 15 minutes of appointment time), did not change differentially between these two groups of practices between the 2012-2014 and 2016-2017 surveys (eTable 4). These findings support our use of these data in a difference-indifferences design, since the proportions of items with non-missing responses contributing to the analyses did not change differentially between large and smaller practices over time. b Numeric scores range from 0 (worst) to 10 (best); from 1 (never) to 4 (always); or have values of 1 (no), 2 (yes, somewhat), or 3 (yes, definitely). Prior to analysis, numeric scores were converted to a consistent 0-100 scale. For items part of domain scores, we first subtracted the overall mean for the item and calculated a domain score at the patient level as an equally weighted average of items with non-missing responses. c The access to specialists domain in the CAHPS for PQRS survey included an item about the number of specialists a patient saw. Because this item does not directly assess quality of care, we did not include it when constructing the "access to specialists" domain score from the Fee-forservice Medicare CAHPS survey. d We included this item to construct a care coordination domain score in analyses using the Fee-for-service Medicare CAHPS survey, although this item was not included in the CAHPS for PQRS survey. e Domain not included in our analyses of patient-level survey data because items were not assessed in the Fee-for-service Medicare CAHPS survey in all study years. f Shown here is the item from the Fee-for-service Medicare CAHPS survey, which corresponded to two items in the CAHPS for PQRS survey: (1) Physician gave easy-to-understand instructions about how to take prescription medicines; (2) Physician gave information in writing about how to take prescription medicines that was easy to understand.  , and an interaction between practice size and survey period. P-values calculated from standard errors clustered at the practice (taxpayer identification number) level. We did not adjust for survey weights or respondentlevel characteristics. d We included this item to construct a care coordination domain score in analyses using the Fee-for-service Medicare CAHPS survey, although this item was not included in the CAHPS for PQRS survey.

III. Analysis of patient experience scores in the Fee-for-service Medicare CAHPS survey and concordance with practice scores from the CAHPS for PQRS survey
This section describes how we constructed domain-level and composite patient experience scores in patient-level analyses of the Fee-for-service Medicare CAHPS survey and how these compare to scores that were publicly reported for practices based on the CAHPS for PQRS survey.

Construction of domain and composite patient experience scores in the Fee-for-service Medicare CAHPS survey
We aggregated items from the Fee-for-service Medicare CAHPS survey into patientlevel scores for 5 patient experience domains that closely corresponded to domains in the CAHPS for PQRS survey (eTable 2). To construct these domain scores, we first subtracted the overall mean for each item, used a linear transformation to convert all items to a consistent 0-100 scale, and calculated a patient-level domain score as an equally weighted average of the domain's constituent items. (Items with missing responses were excluded from domain scores.) We also constructed a composite patient-level score, which we defined as an equally weighted average of the 5 domain scores. Since item-level missingness was comparable over time between large and smaller practices in our difference-in-differences analysis (eTable 4), our approach to constructing domain scores should not assign systematically different weights to items for large vs. smaller practices across different periods.

Concordance of patient experiences in the Fee-for-service Medicare CAHPS survey with practice-level scores from the CAHPS for PQRS survey
The CAHPS for PQRS survey is separate from, but closely related to, the Fee-forservice Medicare CAHPS survey. Given the similarity of the surveys (eTable 2), we expect that practice-level scores based on the CAHPS for PQRS survey will be positively correlated with the mean scores of patients in those practices who were sampled in the Fee-for-service

V. Difference-in-differences analysis
Next, we used a difference-in-differences design to evaluate whether the phase-in of a PQRS policy that made public reporting of CAHPS measures mandatory for large practices (≥100 clinicians) was associated with differential improvements in patient experiences with care.
This section describes the study sample used in this analysis; provides additional details about the difference-in-differences design, interpretation of our estimates, and tests of this study design's assumptions; and presents the results of sensitivity analyses.

Respondent sample for difference-in-differences analyses
We conducted difference-indifferences analyses using patient-level survey data from we assessed patient experiences with care. We excluded respondents who did not have any primary care claims needed for attribution to practices. We attributed respondents to practices based on primary care visits because key items in the CAHPS survey focus on primary care (e.g., ratings of primary physicians). We conducted sensitivity analyses (below) among patients attributed to practices based on outpatient visits with primary care clinicians or specialists.

Practice size
In analyses using the Fee-for-service Medicare CAHPS survey (conducted among large and smaller practices), we measured practice size annually as the number of unique clinicians that billed under a practice taxpayer identification number (TIN), which we assessed from clinician-TIN billing relationships captured in Medicare Provider Practice and Specialty (MD-PPAS) files. 21 We measured practice size in the year prior to the CAHPS survey year to align this variable with the period for which we assessed patient experiences with care.

Respondent characteristics
We controlled for respondent characteristics using data from the following sources:  Respondent-reported difficulty with 1 or more activities of daily living: bathing, dressing, eating, using chairs, walking, and using the toilet.
 Self-reported general health and mental health scores  Respondent-rated health on a scale of 1-5, where 1 indicates poor self-rated health or mental health and 5 indicates excellent self-rated general or mental health.
Enrollment in Medicaid or one of the Medicare Savings Programs serve as proxies for socioeconomic status. 24 Baseline means of these covariates for large practices (111-150 clinicians) and small practices (50-89 clinicians) are shown in Table 1 of the main manuscript.

Other covariates
In addition to respondent-level characteristics, we adjusted for annual county-level Medicare Advantage (MA) penetration rates and Hospital Referral Region (HRR) fixed effects.
We measured MA penetration using the Area Health Resources File. 25  ); and HRR fixed effects ( ℎ ). Thus, our estimate of 3 represents the adjusted within-HRR differential change in patient experiences associated with mandatory public reporting for large practices (pooled across HRRs), through 2-3 years after the mandate's introduction. We adjusted all models for survey weights and clustered standard errors at the practice taxpayer identification number (TIN) level.
To facilitate interpretation, we scaled difference-in-differences estimates (̂3) by the practice-level standard deviation of the corresponding patient experience score in the preintervention period. We estimated the practice-level standard deviation of each score ( �) by fitting a multilevel linear regression model with practice (TIN) random effects to pre-intervention period survey data and adjusting for the respondent characteristics described above. The resulting scaled estimates and 95% confidence intervals (CIs) are given by: Scaled estimate = ̂3 / �; 95% CI: These scaled estimates can be interpreted as effect sizes relative to the distribution of practice scores in the baseline period. For example, an effect size of -0.16 SDs is equivalent to the difference between the median practice (50 th percentile) and a practice at the 44 th percentile of performance on a patient experience measure, assuming practice-level scores are normally distributed. Results are in Table 2 of the main manuscript.

Policy context and interpretation of difference-in-differences estimates
The interpretation of our difference-in-differences estimates reflects the structure and phase-in of the PQRS and VM. As described in eTable 1, VM payment incentives were fully Because of this program structure, our difference-in-differences estimates capture the association between mandatory public reporting and patient experiences with care in the context of pay-for-performance incentives. These estimates may capture both responses to public reporting incentives (independent of pay-for-performance) and potential interactions between public reporting and pay-for-performance incentives (e.g., if incentives attributable to public reporting are amplified when practices are also exposed to pay-for-performance incentives).
To formalize this idea, suppose the causal model describing the relationship between these programs and patient care is a function of practice exposure to public reporting requirements, pay-for-performance incentives, and an interaction between these programs. This model can be written as: where is a score for patient i of practice k in year t; denotes practice k's exposure to the reporting mandate in year t; and 4 denotes practice exposure to pay-forperformance incentives. Patient-and market-level covariates are held constant.
Our empirical difference-in-differences model (model 3) estimates the change in patient experiences from the pre-intervention to the post-intervention periods in large practices that became subject to the reporting mandate (first difference) to contemporaneous changes among smaller unaffected practices (second difference). When there is an interaction between public reporting and pay-for-performance incentives, as written in model (4), our empirical differencein-differences model gives the following estimates: Change among large practices that became subject to the reporting mandate: Post-intervention period: ( | = 1, 4 = 1) = 0 + 1 + 2 + 3 Pre-intervention period: ( | = 0, 4 = 0) = 0 First difference: 1 + 2 + 3 Change among smaller practices that were unaffected by the reporting mandate: Post-intervention period: ( | = 0, 4 = 1) = 0 + 2 Pre-intervention period: ( | = 0, 4 = 0) = 0 Second difference: 2 Empirical difference-in-differences estimate: 1 + 3 Thus, only pay-for-performance incentives that were identical for large and smaller practices in the post-intervention period ( 2 in model (4)

Tests of difference-in-differences assumptions
The difference-in-differences design isolates changes in patient experiences with care associated with public reporting, in the context of a pay-for-performance program, under the assumption that differences in patient experiences between large and smaller practices would have remained constant had the reporting mandate among large practices not been implemented. We tested this assumption in two ways.
First, we examined whether estimates could have been biased by differential changes in the composition of patients in large vs. smaller practices over our study period. Such changes could limit our ability to isolate changes in patient experiences with care associated with the reporting mandate from patient-level confounders. To assess this source of this bias, we compared changes in patient characteristics from Table 1 of the main manuscript, among patients of large vs. smaller practices from the pre-intervention to post-intervention periods.
Specifically, we estimated a patient-level linear difference-in-differences model for each characteristic, which was analogous to equation (2) but did not adjust for other patient-level covariates. The coefficient on the interaction between the large practice and post-period terms gives the differential change in that characteristic between patients of large vs. smaller practices from the pre-to post-intervention periods. Estimates are in Table 1  were vertically integrated with a hospital (based on billing patterns in Medicare claims). 28 These variables capture important practice attributes related to the mix of clinicians and organizational structure that could have affected performance on CAHPS measures. We did not find statistically significant differential changes between large vs. smaller practices across the preand post-intervention study periods (eTable 8). These findings support the assumption that the difference-in-differences design isolates changes in patient experiences associated with mandatory public reporting.
Second, we tested the difference-in-differences assumption that differences in patient experience scores of large vs. smaller practices would have remained constant in the absence of the public reporting mandate for large practices. We evaluated the plausibility of this assumption by comparing pre-intervention trends in patient experience scores among large vs.
smaller practices from 2011-2013. Finding "parallel" pre-intervention trends between large and smaller groups of practices (equivalent to constant pre-intervention differences) suggests that between-group differences likely would have remained constant in the absence of the reporting mandate. 29 To compare pre-intervention trends between large and smaller practices, we estimated patient-level linear event-study models for the form: We estimated separate models for each composite and domain-level patient experience score.
In each model, we omitted patient experiences in 2013 as the reference period (outcomes in the 2014 transition year are excluded). We plotted ̂, which are estimates of annual differential changes between large vs. smaller practices (relative to differences between large vs. smaller practices in 2013), and associated 95% confidence intervals, in eFigure 5. As shown in the plots, estimates of ̂ are close to and statistically indistinguishable from 0 for t=2011 and t=2012, implying constant pre-intervention differences (parallel pre-intervention trends).
eFigure 5 also traces out time-varying differential changes past 2013 (captured by ̂2 016 and ̂2 015 ), which we used to examine whether changes in care associated with mandatory public reporting for large practices emerged over time. We found no evidence of differential improvements in patient experiences emerging by 2016.

Sensitivity analyses
We conducted two sensitivity analyses. First, we analyzed changes in patient experiences with care from 2011-2013 to 2015-2016 using data from concurrent Fee-for-service Medicare CAHPS surveys (i.e., surveys administered from 2011-2013 and 2015-2016). We attributed patients to practices based on concurrent-year primary care claims to be consistent with the period in which we measured patient experiences. Our results, shown in eTable 9, were not substantively different from our main analyses in Table 2 of the main manuscript.
Second, we re-ran our difference-in-differences models on a broader sample of 26,380 Fee-for-service Medicare CAHPS respondents whom we attributed to practices based on  eFigure 5: Event-study plots of annual differential changes in composite and domain-level patient experience scores between large and smaller practices (relative to 2013)

Composite score
Rating of primary physician

Access to specialists Care coordination
Event study plots show the differential change in composite or domain-specific patient experience scores between large practices (111-150 clinicians) and smaller practices (50-89 clinicians) by year relative to 2013. We omitted 2014 as a transitional year. Patient experience scores reported on a 0-100 scale (to which we standardized responses from all survey items). The shaded blue circles represent the estimates ̂ from Equation 6 and the error bars represent 95% confidence intervals for these estimates. All estimates adjusted for respondent characteristics from Table 1 of the main manuscript, annual county-level MA penetration rates, HRR fixed effects, year fixed effects, and for survey weights. The 95% confidence intervals were calculated using robust standard errors clustered at the practice (taxpayer identification number) level.  c That is, changes in the characteristic shown in the table row between large and smaller practices from the pre-intervention to the postintervention periods. We estimated these differential changes by fitting a practice-level linear difference-in-differences model for each characteristic as a function of a post-intervention period indicator, an indicator that a patient's practice had 111-150 clinicians, and an interaction between these indicators. Differential changes are given by the regression coefficient on the interaction term. d Primary care clinicians defined as clinicians with Medicare specialty codes for general practice, family practice, internal medicine, geriatric medicine, nurse practitioner, or physician assistant (specialty codes: 01, 08, 11, 38, 50, 97). e Proportion of clinicians billing >75% of outpatient claims in a hospital outpatient department (place of service code 22) vs. a physician office. We measured the proportion of clinicians whose billing in a hospital outpatient department exceeded this 75% threshold, aggregated to the level of the practice taxpayer identification number and year.  [2015][2016] to assess patient experiences with care in the same years. We omitted 2014 as a transitional year. b Mean scores among practices with 111-150 clinicians in the pre-intervention period, adjusted for respondent characteristics in Table 1 of the main manuscript, annual county-level MA penetration rates, HRR fixed effects, year fixed effects, and survey weights. Scores are standardized to a 0-100 scale, with higher scores representing better patient experiences with care. c Standard deviation (SD) of the practice-level distribution of patient experience scores, estimated among all practices in the pre-intervention period. For this analysis, we estimated the practice-level standard deviation of scores from CAHPS surveys administered from 2011-2013. d Difference-in-differences estimates represent the differential change in composite or domain-specific patient experience scores between large practices (111-150 clinicians) and smaller practices (50-89 clinicians) from the pre-intervention period (2011-2013) to the post-intervention period (2015)(2016), adjusted for respondent characteristics in Table 1 of the main manuscript and survey weights. e Effect sizes are difference-in-differences estimates scaled by the practice-level standard deviation (SD) of each score. The corresponding 95% confidence intervals are also scaled by the practice-level SD in each score. f 95% confidence intervals and P-values were calculated using robust standard errors clustered at the practice (taxpayer identification number) level.
eTable 10: Difference-in-differences estimates among patients attributed to practices based on outpatient claims with primary care clinicians or specialists   Table 1 of the main manuscript, annual county-level MA penetration rates, HRR fixed effects, year fixed effects, and survey weights. Scores are standardized to a 0-100 scale, with higher scores representing better patient experiences with care. c Standard deviation (SD) of the practice-level distribution of patient experience scores, estimated among all practices in the pre-intervention period. For this analysis, we estimated the practice-level standard deviation of scores from CAHPS surveys administered from 2011-2013. d Difference-in-differences estimates represent the differential change in composite or domain-specific patient experience scores between large practices (111-150 clinicians) and smaller practices (50-89 clinicians) from the pre-intervention period (2011-2013) to the post-intervention period (2015)(2016), adjusted for respondent characteristics in Table 1 of the main manuscript and survey weights. e Effect sizes are difference-in-differences estimates scaled by the practice-level standard deviation (SD) of each score. The corresponding 95% confidence intervals are also scaled by the practice-level SD in each score. f 95% confidence intervals and P-values were calculated using robust standard errors clustered at the practice (taxpayer identification number) level.