eAppendix. Measure specifications
Friedberg MW, Schneider EC, Rosenthal MB, Volpp KG, Werner RM. Association Between Participation in a Multipayer Medical Home Intervention and Changes in Quality, Utilization, and Costs of Care. JAMA. 2014;311(8):815-825. doi:10.1001/jama.2014.353
Interventions to transform primary care practices into medical homes are increasingly common, but their effectiveness in improving quality and containing costs is unclear.
To measure associations between participation in the Southeastern Pennsylvania Chronic Care Initiative, one of the earliest and largest multipayer medical home pilots conducted in the United States, and changes in the quality, utilization, and costs of care.
Design, Setting, and Participants
Thirty-two volunteering primary care practices participated in the pilot (conducted from June 1, 2008, to May 31, 2011). We surveyed pilot practices to compare their structural capabilities at the pilot’s beginning and end. Using claims data from 4 participating health plans, we compared changes (in each year, relative to before the intervention) in the quality, utilization, and costs of care delivered to 64 243 patients who were attributed to pilot practices and 55 959 patients attributed to 29 comparison practices (selected for size, specialty, and location similar to pilot practices) using a difference-in-differences design.
Pilot practices received disease registries and technical assistance and could earn bonus payments for achieving patient-centered medical home recognition by the National Committee for Quality Assurance (NCQA).
Main Outcomes and Measures
Practice structural capabilities; performance on 11 quality measures for diabetes, asthma, and preventive care; utilization of hospital, emergency department, and ambulatory care; standardized costs of care.
Pilot practices successfully achieved NCQA recognition and adopted new structural capabilities such as registries to identify patients overdue for chronic disease services. Pilot participation was associated with statistically significantly greater performance improvement, relative to comparison practices, on 1 of 11 investigated quality measures: nephropathy screening in diabetes (adjusted performance of 82.7% vs 71.7% by year 3, P < .001). Pilot participation was not associated with statistically significant changes in utilization or costs of care. Pilot practices accumulated average bonuses of $92 000 per primary care physician during the 3-year intervention.
Conclusions and Relevance
A multipayer medical home pilot, in which participating practices adopted new structural capabilities and received NCQA certification, was associated with limited improvements in quality and was not associated with reductions in utilization of hospital, emergency department, or ambulatory care services or total costs over 3 years. These findings suggest that medical home interventions may need further refinement.
Professional associations, payers, policy makers, and other stakeholders have advocated the patient-centered medical home, a team-based model of primary care practice intended to improve the quality, efficiency, and patient experience of care.1,2 In general, medical home initiatives have encouraged primary care practices to invest in patient registries, enhanced access options, and other structural capabilities in exchange for enhanced payments—often operationalized as per-patient per-month fees for comprehensive care services.3,4 Dozens of privately and publicly financed medical home pilots are under way, and most use recognition by the National Committee for Quality Assurance (NCQA) to assess practice structural capabilities.4,5
Recent evidence reviews suggest that early “medical home” interventions have yielded modest improvements at best in quality and patient experience, with little evidence of effects on costs of care.6- 8 However, these reviews included studies that preceded development of NCQA medical home recognition criteria and lacked significant financial support from payers, potentially limiting their applicability to current medical home efforts.9,10 More recent evaluations have assessed medical home pilots including only 1 payer (potentially a small fraction of some practices’ patient panels) occurring over a 1- or 2-year time frame (possibly insufficient to observe effects requiring longer time frames),11- 13 or within large, integrated delivery systems atypical of most primary care practices.14,15 We hypothesized that a multipayer medical home initiative involving a longer intervention period and substantial financial support would be more likely to be associated with measurable improvements in quality and efficiency.
We evaluated associations between implementation of the Pennsylvania Chronic Care Initiative (PACCI), a statewide multipayer medical home pilot, and changes in the quality, utilization, and cost of care. This study was approved by the RAND human subjects protection committee and the University of Pennsylvania institutional review board, with implied informed consent for practice structural surveys and waived consent for analyses of deidentified claims data.
The pilot was initiated in Pennsylvania’s southeast region among volunteering small- and medium-sized primary care practices from June 1, 2008, to May 31, 2011. As detailed elsewhere,16,17 the PACCI was designed by a coalition of payers, clinicians, and delivery systems led by the Governor’s Office of Health Care Reform and implemented as a series of regional medical home pilots, beginning with the southeast region.
Practices in southeast Pennsylvania were invited to participate by the 6 health plans (3 commercial and 3 Medicaid managed care plans) and 3 professional organizations (the Pennsylvania Academy of Family Physicians, American College of Physicians, and American Academy of Pediatrics) participating in the PACCI, with the goal of enrolling a mix of practice sizes and specialties (family practice, internal medicine, pediatric, and nurse-managed health centers).
The intervention consisted of technical assistance including a Breakthrough Series Learning Collaborative,18 web-based Improving Performance in Practice (IPIP) disease registries to create monthly quality indicator reports,19 and assistance from IPIP practice coaches to facilitate practice transformation and achievement of NCQA Physician Practice Connections–Patient-Centered Medical Home (PPC-PCMH) recognition (with level 1 recognition required by the second pilot year).20 Performance improvement efforts targeted asthma for pediatric practices and diabetes for practices serving adults.
To support and motivate these improvements, each participating practice was eligible to receive a $20 000 “practice support” payment in year 1 and annual bonus payments per full-time equivalent clinician (ie, per physician or nurse practitioner) that varied based on NCQA medical home recognition level and practice size, ranging from $28 000 per clinician in NCQA level 1 practices with 10 to 20 clinicians to $95 000 per clinician in solo NCQA level 3 practices.
Within 3 months of recruiting pilot practices, the state selected comparison practices. To do this, a state contractor obtained lists of candidate practices from participating health plans and, without performing a strict 1-to-1 match, identified a group of comparison practices that had the same approximate composition as the pilot practices in practice size, specialty (pediatrics, family practice, internal medicine), location (urban, suburban), and affiliation with local health systems. Data on quality, utilization, and costs were unavailable for comparison practice selection.
We obtained, from Pennsylvania’s demonstration conveners, annual data on each pilot practice’s level of NCQA PPC-PCMH recognition (none or level 1, 2, or 3) and amounts of each pilot practice’s bonus payments.
Drawing from a survey instrument designed to assess practice readiness for the medical home,21 we devised a new survey to measure practices’ structural capabilities, including presence of performance feedback, disease management, registries, reminder and outreach systems for patients with chronic disease, and electronic health records (EHRs) (instrument available from authors on request).
We fielded the practice survey twice to each pilot and comparison practice: a “baseline” survey in September 2010, querying capabilities just prior to June 2008 (the pilot’s beginning), and a final survey in June 2011, querying capabilities at the pilot’s end. We addressed each survey to 1 leader per practice, identified by telephone call, who could report accurately on the practice’s structural capabilities.
We requested from each of the 6 participating health plans all medical and prescription drug claims and enrollment data spanning June 1, 2006, to May 30, 2011 (2 years prior to and 3 years after the pilot inception date of June 1, 2008), for their members who, at any time during this 5-year period, had 1 or more medical claims (for any service) with a pilot or comparison practice.
In each of 4 time periods (the preintervention period and intervention years 1, 2, and 3), we attributed patients to the primary care clinicians (specialty designations family practice, general practice, internal medicine, pediatrics, adolescent medicine, geriatric medicine, and nurse practitioner) who provided the plurality of qualifying services (Current Procedural Terminology codes 9920x, 9921x, 9924x, 99381-99387, 99391-99397, 99401-99404, 99411-99412, 99420-99429, 99339-99340, 99341-99345, 99347-99350, G0402, G0438, G0439), with the most recent service breaking ties. In sensitivity analyses, we reattributed patients based on the majority (>50%) of qualifying services.
We calculated claims-based process measures of quality using NCQA Healthcare Effectiveness and Data Information Set (HEDIS) specifications, following published recommendations22 and making adjustments as necessary to account for the limited available look-back period. Using laboratory data available from one commercial health plan, we also calculated 2 measures of diabetes control, also based on HEDIS specifications.
To measure utilization, we calculated rates of hospitalization (all-cause), emergency department (ED) visits (all-cause), and ambulatory visits following published recommendations.23 We also calculated rates of ambulatory care–sensitive hospitalizations and ED visits, because these may be more likely to represent avoidable (and potentially wasteful) resource utilization, relative to all-cause hospitalizations and ED visits. Because health care prices can be sensitive to local market factors, we calculated standardized prices for all claims using Optum normalized pricing software.24 The eAppendix in the Supplement presents measure specifications.
We compared pilot and comparison practices and their preintervention patient populations using Wilcoxon rank sum and Fisher exact tests. To characterize pilot practices’ adoption of structural capabilities, we compared baseline and postintervention rates of capability possession (eg, rates at which practices had EHRs) using Liddell exact tests to account for repeated measurements, excluding 3 practices that did not respond to both surveys.
To mitigate the potential influence of patient selection (which could in theory be induced by pilot participation), in our main analyses we assigned patients to practices based on preintervention attribution only. Under this intent-to-treat framework, any patients switching their primary care physicians (ie, becoming attributed to another practice) during the intervention would still be attributed to their preintervention practices for analysis. Two potential drawbacks of preintervention attribution are nondetectability of interventions targeting patients new to a practice (eg, enhanced patient intake procedures) and potential confounding by time-varying patterns of health care consumption.25 Therefore, we performed sensitivity analyses using sequential cross-sectional attribution (ie, reattributing patients in each year based on their visits in that year) to assign patients.
To evaluate associations between the PACCI intervention and changes in quality measure performance, we fit generalized logistic models using propensity weights to balance pilot and comparison practices’ shares of patients from each health plan and baseline performance on each measure. By giving more weight to observations from comparison practices whose baseline performance resembled that of pilot practices, this weighting method produced “average treatment effect on the treated” (ATT) estimates, answering the question: “Among practices closely resembling the pilot practices, what changes are associated with the intervention?”26 The dependent variable in each model was patient-level receipt of the indicated service, and independent variables were indicators for time period (preintervention and each intervention year), interactions between time period and pilot/comparison status, indicators for the health plan contributing each observation and patient enrollment in a health maintenance organization (HMO), and fixed effects (dummy variables) for each practice.
For measures of utilization, we fit negative binomial models, using ATT propensity weighting to balance health plans and baseline utilization rates. The dependent variable in each model was the utilization count in the time period of interest. Independent variables were indicators for time period (preintervention and each intervention year); interaction between time period and pilot/comparison status; indicators for the health plan contributing each observation and patient enrollment in an HMO; patient age, sex, and preintervention Charlson comorbidity score27; and practice fixed effects. In sensitivity analyses, we refit the utilization models using 2-part models (logistic and negative binomial).
For measures of cost, we fit propensity-weighted models with the same independent variables but used a linear functional form, following the methods of recent similar evaluations.28 In sensitivity analyses, we refit the cost models using log-transformed costs.29
We also identified patients who had multiple all-cause hospitalizations or ED visits (2 visits or ≥3 visits) within a single year. We then fit logistic models to identify pilot-associated changes in the prevalence of these “multiple use” patients, using the same independent variables as the other utilization models.
In all models, we used generalized estimating equations with robust standard errors to account for heteroscedasticity, autocorrelation, and clustering of patients within practices.30,31 Because these methods can be sensitive to missing data, which are created when patients change health plans, we included only health plan members who were continuously enrolled during the study period in our main models. However, because of small numbers of patients contributing observations for hemoglobin A1c and low-density lipoprotein cholesterol control and pediatric asthma controller medication use, models for these measures included patients lacking continuous enrollment.
To display adjusted performance data, we used recycled prediction methods, which allow estimates of associations on the original scale of the data, accounting for differences in covariate patterns between pilot and comparison practices.32,33
We conducted sensitivity analyses in addition to those described herein. We repeated our analyses using a plan-by-plan basis, excluding nurse-managed health centers (because no comparison practices were in this category), and including health plan members who were not continuously enrolled during the study period. Because reductions in utilization and costs of care might be more achievable and detectable among patients with chronic illness and diabetes was targeted by the PACCI, we repeated our utilization and cost models among only patients with diabetes.34
We considered 2-tailed P values <.05 significant. We performed data management and analyses using SAS version 9.2 (SAS Institute), SQL Server 2008 (Microsoft), and R version 3.0.0 (R Foundation).
All 34 volunteering practices were admitted to the pilot, but 2 withdrew prior to its initiation (ie, before receiving any assistance or supplemental funding) for reasons unrelated to the pilot (office manager illness in 1 practice and key staff member departure in the other). All 32 remaining practices completed the pilot; none dropped out. The 6 participating health plans together accounted for the majority of total revenues (median, 65%) among pilot practices.
The 32 pilot and 29 comparison practices were similar in baseline size, specialty, and patient case mix (Table 1). However, 6 pilot practices were nurse-managed health centers, whereas no comparison practices were. In total, 64 243 patients were attributed to pilot practices and 55 959 to comparison practices in the preintervention period.
Twenty-nine pilot practices (91%) completed both the baseline and year 3 structural surveys. A low survey response rate (24%) among comparison practices precluded meaningful analysis of comparison practices’ structural capabilities.
All of the pilot practices achieved NCQA PPC-PCMH recognition by the third intervention year, with half achieving level 3 status (Table 2). Pilot practices accumulated average bonuses of $92 000 per primary care physician and reported structural transformation on a wide range of capabilities. For example, use of registries to identify patients overdue for chronic disease services increased from 30% to 85% of pilot practices (P < .001), and electronic medication prescribing increased from 38% to 86% (P = .001).
Four of the 6 health plans (the largest 2 commercial and largest 2 Medicaid plans) supplied claims data. One of these 4 plans was unable to supply claims dated prior to January 1, 2007, limiting the effective study window to January 1, 2007, to May 30, 2011, for analyses of pooled claims data.
Of the 11 quality measures evaluated, pilot participation was significantly associated with greater performance improvement on 1 measure: nephropathy monitoring in diabetes (Table 3). Point estimates suggested improved performance among pilot practices relative to comparison practices for other diabetes measures and for colorectal cancer screening, but these differences were not statistically significant.
Pilot participation was associated with a greater increase in the rate of ambulatory care–sensitive hospitalization, relative to comparison practices, in intervention year 2 (Table 4). There were no other statistically significant differences in measures of utilization, costs of care, or rates of multiple same-year hospitalizations or ED visits (Table 5).
The results of sensitivity analyses were consistent with the primary results with 3 exceptions. First, in models including patients who were not continuously enrolled, pilot participation was statistically significantly associated with better performance for colorectal cancer screening. Second, in models with cross-sectional attribution, pilot participation was statistically significantly associated with worse performance on pediatric asthma appropriate medication use. Third, pilot participation was not statistically significantly associated, in any year, with ambulatory care–sensitive hospitalization rates in models with cross-sectional attribution and in models including patients lacking continuous enrollment.
Despite widespread enthusiasm for the medical home concept, few peer-reviewed publications have found that transforming primary care practices into medical homes (as defined by common recognition tools and in typical practice settings) produces measurable improvements in the quality and efficiency of care.6- 8 The southeast region of the PACCI, which featured relatively generous financial support from 6 commercial and Medicaid health plans, is to our knowledge the first multipayer pilot in the nation to report results over a 3-year period of transformation. We found that practices participating in the PACCI pilot adopted new structural capabilities and received NCQA certification as medical homes. Our evaluation also suggests that the quality of diabetes care improved, but we found few statistically significant results and no robust associations with utilization or costs.
Our findings differ with prior evaluations of demonstrations in large, integrated delivery systems14,15,35 and with cross-sectional studies that lacked an intervention.36,37 However, the southeast PACCI contained ingredients common to many current pilots, including an emphasis on NCQA recognition.4 Our findings are congruent with those of medical home interventions occurring among small primary care practices, even though the PACCI intervention included more participating practices, occurred over a longer time frame, and featured greater financial support.11- 13,38,39
Why was the PACCI southeast regional pilot not associated with broad quality improvements, lower utilization, and cost savings? This pilot—the first of the PACCI regions—was focused on quality improvement for chronic conditions and featured early financial rewards for NCQA recognition, possibly distracting from other activities intended to improve the quality and efficiency of care. In subsequent regions, PACCI organizers placed less emphasis on early NCQA recognition so that practices could focus more fully on learning collaborative participation (oral communication with Michael Bailit, MM, Bailit Health Purchasing, November 26, 2010).
In addition, southeast PACCI pilot practices had neither direct incentives to contain costs nor feedback on their patients’ utilization of care. Possibly as a consequence of these features of pilot design, we found that few pilot practices increased their night and weekend access capabilities, which could, in theory, have produced short-term savings by offering patients an alternative to more expensive sites of care (such as hospital emergency departments).
Several reasons may explain the limited (or absent) changes in quality, utilization, and costs of care that we observed, despite structural transformation among pilot practices. First, pilot practices were volunteers and may have been more quality-conscious than other practices, performing closer to their optimal levels at baseline. If so, “ceiling effects” resulting from near-optimal performance would imply that an average practice developing a medical home structure might show greater improvement than what we observed. Therefore, our findings may not generalize to other medical home interventions. Second, the structural survey response rate among comparison practices was low, so we cannot tell whether comparison practices were also transforming during the pilot (eg, due to federal incentives to adopt EHRs). Simultaneous transformation would weaken the ability of pilot practices to differentiate themselves from comparison practices. Third, prior cross-sectional research suggests that relationships between structural capabilities, NCQA recognition, and performance on measures of quality and utilization are modest.40- 42 The elements of practice transformation necessary to produce desired changes in patient care may be different from the capabilities assessed commonly by research surveys and certification tools.
Our study has limitations. First, pilot and comparison practices were not matched perfectly. Although propensity weighting and multiple regression can account for observed differences, unobserved differences could have introduced confounding. Second, failure to find statistically significant results for most diabetes measures may have resulted from insufficient power to detect performance improvement differences smaller than approximately 8 percentage points. Third, the baseline structural survey was retrospective, with potential respondent recall error. Fourth, we were unable to examine changes in patient experience.
One of the first, largest, and longest-running multipayer medical home pilots in the United States, in which participating practices adopted new structural capabilities and received NCQA certification, was associated with limited improvements in quality and was not associated with reductions in utilization of hospital, emergency department, or ambulatory care services or total costs of care over 3 years. These findings suggest that medical home interventions may need further refinement.
Corresponding Author: Mark W. Friedberg, MD, MPP, 20 Park Plaza, Ste 920, Boston, MA 02116 (firstname.lastname@example.org).
Author Contributions: Drs Friedberg and Werner had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Friedberg, Schneider, Volpp, Werner.
Acquisition of data: Friedberg, Schneider, Werner.
Analysis and interpretation of data: Friedberg, Schneider, Rosenthal, Werner.
Drafting of the manuscript: Friedberg, Schneider.
Critical revision of the manuscript for important intellectual content: Schneider, Rosenthal, Volpp, Werner.
Statistical analysis: Friedberg, Werner.
Obtained funding: Friedberg, Schneider, Volpp, Werner.
Administrative, technical, and material support: Werner.
Study supervision: Friedberg, Schneider, Volpp, Werner.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Friedberg reported having received compensation from the US Department of Veterans Affairs for consultation related to the medical home model and research support from the Patient-Centered Outcomes Research Institute via subcontract to the National Committee for Quality Assurance. Dr Volpp reported having received compensation as a consultant from CVS Caremark and VALHealth and having grants pending with CVS Caremark, Humana, Horizon Blue Cross Blue Shield, Weight Watchers, and Discovery (South Africa). Dr Werner reported having received grants or funding from the Veterans Health Administration, Horizon, and Healthcare Innovation. No other disclosures were reported.
Funding/Support: This study was sponsored by the Commonwealth Fund and Aetna.
Role of the Sponsors: The sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The views presented here are those of the authors and not necessarily those of Aetna, its directors, officers, or staff.
Additional Contributions: We gratefully acknowledge Aaron Kofner, MA, MS (RAND), Scott Ashwood, PhD (RAND), Scot Hickey, MA (RAND), Katharine Lauderdale, BA (RAND), Wei Chen, MS (University of Pennsylvania), and Jingsan Zhu, MS, MBA (University of Pennsylvania), for assistance with programming and data management; Claude Setodji, PhD (RAND), for statistical consultation; and Marcela Myers, MD (Commonwealth of Pennsylvania), and Michael Bailit, MM (Bailit Health Purchasing), for providing data on pilot bonus payments and NCQA recognition levels and facilitating other data collection. Mr Kofner, Dr Ashwood, Mr Hickey, Ms Lauderdale, Mr Chen, Mr Zhu, and Dr Setodji received compensation for their roles in the study.
Correction: This article was corrected online February 25, 2014, for errors in Table 2.