Using Consistently Low Performance to Identify Low-Quality Physician Groups | Cardiology | JAMA Network Open | JAMA Network
[Skip to Navigation]
Sign In
Figure 1.  Consistency of Low Adjusted Performance Across Multiple Measures
Consistency of Low Adjusted Performance Across Multiple Measures

The expected bar is the proportion of physician groups expected to fall into the bottom quartile if performance on each measure in a given year was independent. For example, falling into the bottom quartile for 3 measures was computed as the probability of 3 success outcomes in 6 Bernoulli trials with a success probability of 0.25.

Figure 2.  Consistency of Low Adjusted Performance Across Multiple Years
Consistency of Low Adjusted Performance Across Multiple Years

The expected bar is the proportion of physician groups expected to fall into the bottom quartile if performance in each year for a given measure was independent. For example, falling into the bottom quartile for 3 years was computed as the probability of 3 success outcomes in 4 Bernoulli trials with a success probability of 0.25. HbA1c indicates hemoglobin A1c; LDL, low-density lipoprotein.

Table 1.  Distribution of Adjusted Physician Group-Year Performance on Diabetes and Cardiovascular Disease Measures (N = 2349)
Distribution of Adjusted Physician Group-Year Performance on Diabetes and Cardiovascular Disease Measures (N = 2349)
Table 2.  Year-to-Year Correlations Between Adjusted Quality Performance on Diabetes and Cardiovascular Disease Measures Across Physician Groups
Year-to-Year Correlations Between Adjusted Quality Performance on Diabetes and Cardiovascular Disease Measures Across Physician Groups
1.
Khullar  D, Bond  AM, O’Donnell  EM, Qian  Y, Gans  DN, Casalino  LP. Time and financial costs for physician practices to participate in the Medicare merit-based incentive payment system: a qualitative study. 2021;2(5):e210527. doi:10.1001/jamahealthforum.2021.0527
2.
Schuster  MA, Onorato  SE, Meltzer  DO.  Measuring the cost of quality measurement: a missing link in quality strategy.   JAMA. 2017;318(13):1219-1220. doi:10.1001/jama.2017.11525PubMedGoogle ScholarCrossref
3.
Chen  J, Patel  MM.  Costs of quality measurement.   JAMA. 2018;319(6):615-616. doi:10.1001/jama.2017.20288PubMedGoogle ScholarCrossref
4.
Casalino  LP, Gans  D, Weber  R,  et al.  US physicians practices spend more than $15.4 billion annually to report quality measures.   Health Aff (Millwood). 2016;35(3):401-406. doi:10.1377/hlthaff.2015.1258PubMedGoogle ScholarCrossref
5.
Austin  JM, Jha  AK, Romano  PS,  et al.  National hospital ratings systems share few common scores and may generate confusion instead of clarity.   Health Aff (Millwood). 2015;34(3):423-430. doi:10.1377/hlthaff.2014.0201PubMedGoogle ScholarCrossref
6.
Jha  A. Hospital rankings get serious. an ounce of evidence. Harvard T.H. Chan School of Public Health blog. August 14, 2012. Accessed June 23, 2021. https://blogs.sph.harvard.edu/ashish-jha/2012/08/14/hospital-rankings-get-serious/
7.
Rosenthal  E. The hype over hospital rankings. New York Times. Published July 27, 2013. Accessed June 23, 2021. https://www.nytimes.com/2013/07/28/sunday-review/the-hype-over-hospital-rankings.html
8.
Osborne  NH, Nicholas  LH, Ghaferi  AA, Upchurch  GR  Jr, Dimick  JB.  Do popular media and internet-based hospital quality ratings identify hospitals with better cardiovascular surgery outcomes?   J Am Coll Surg. 2010;210(1):87-92. doi:10.1016/j.jamcollsurg.2009.09.038PubMedGoogle ScholarCrossref
9.
Ahluwalia  SC, Damberg  CL, Silverman  M, Motala  A, Shekelle  PG.  What defines a high-performing health care delivery system: a systematic review.   Jt Comm J Qual Patient Saf. 2017;43(9):450-459. doi:10.1016/j.jcjq.2017.03.010PubMedGoogle Scholar
10.
Jha  AK, Orav  EJ, Epstein  AM.  Low-quality, high-cost hospitals, mainly in South, care for sharply higher shares of elderly Black, Hispanic, and Medicaid patients.   Health Aff (Millwood). 2011;30(10):1904-1911. doi:10.1377/hlthaff.2011.0027PubMedGoogle ScholarCrossref
11.
Leapfrog Group. About the Grade. Hospital Safety Grade. Published 2013. Accessed June 23, 2021. https://www.hospitalsafetygrade.org/your-hospitals-safety-grade/about-the-grade
12.
Dimick  JB, Staiger  DO, Birkmeyer  JD.  Ranking hospitals on surgical mortality: the importance of reliability adjustment.   Health Serv Res. 2010;45(6 Pt 1):1614-1629. doi:10.1111/j.1475-6773.2010.01158.xPubMedGoogle Scholar
13.
Druss  B, Rosenheck  R.  Evaluation of the HEDIS measure of behavioral health care quality.   Psychiatr Serv. 1997;48(1):71-75. doi:10.1176/ps.48.1.71PubMedGoogle Scholar
14.
Gilstrap  LG, Chernew  ME, Nguyen  CA,  et al.  Association between clinical practice group adherence to quality measures and adverse outcomes among adult patients with diabetes.   JAMA Netw Open. 2019;2(8):e199139. doi:10.1001/jamanetworkopen.2019.9139PubMedGoogle Scholar
15.
Marshall  EC, Spiegelhalter  DJ.  Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates.   BMJ. 1998;316(7146):1701-1704. doi:10.1136/bmj.316.7146.1701PubMedGoogle ScholarCrossref
16.
US Centers for Medicare & Medicaid Services. Two-Step Attribution for Measures Included in the Value Modifier. Published August 2015. Accessed June 23, 2021. https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/PhysicianFeedbackProgram/Downloads/Attribution-Fact-Sheet.pdf
17.
Agency for Healthcare Research and Quality. Prevention Quality Indicators Technical Specifications Updates–Version 6.0 (ICD-9). Updated October 2016. Accessed June 23, 2021. https://www.qualityindicators.ahrq.gov/Archive/PQI_TechSpec_ICD09_v60.aspx
18.
National Quality Forum. Risk Adjustment for Socioeconomic Status or Other Sociodemographic Factors. Published August 2014. Accessed June 23, 2021. http://www.qualityforum.org/risk_adjustment_ses.aspx
19.
US Department of Health and Human Services. Report to Congress: Social Risk Factors and Performance Under Medicare’s Value-Based Payment Programs. Published December 21, 2016. Accessed June 23, 2021. https://aspe.hhs.gov/pdf-report/report-congress-social-risk-factors-and-performance-under-medicares-value-based-purchasing-programs
20.
Krumholz  HM, Bernheim  SM.  Considering the role of socioeconomic status in hospital outcomes measures.   Ann Intern Med. 2014;161(11):833-834. doi:10.7326/M14-2308PubMedGoogle ScholarCrossref
21.
National Academy of Medicine, Engineering, and Medicine.  Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors. National Academies Press; 2016.
22.
Nguyen  CA, Gilstrap  LG, Chernew  ME, McWilliams  JM, Landon  BE, Landrum  MB.  Social risk adjustment of quality measures for diabetes and cardiovascular disease in a commercially insured US population.   JAMA Netw Open. 2019;2(3):e190838. doi:10.1001/jamanetworkopen.2019.0838PubMedGoogle Scholar
23.
Zaslavsky  AM, Hochheimer  JN, Schneider  EC,  et al.  Impact of sociodemographic case mix on the HEDIS measures of health plan quality.   Med Care. 2000;38(10):981-992. doi:10.1097/00005650-200010000-00002PubMedGoogle ScholarCrossref
24.
Kim  M, Zaslavsky  AM, Cleary  PD.  Adjusting Pediatric Consumer Assessment of Health Plans Study (CAHPS) scores to ensure fair comparison of health plan performances.   Med Care. 2005;43(1):44-52.PubMedGoogle Scholar
25.
Zaslavsky  AM, Zaborski  LB, Ding  L, Shaul  JA, Cioffi  MJ, Cleary  PD.  Adjusting performance measures to ensure equitable plan comparisons.   Health Care Financ Rev. 2001;22(3):109-126. PubMedGoogle Scholar
26.
US Centers for Medicare & Medicaid Services. CCW Chronic Conditions: Combined Medicare and Medicaid Data. Published 2012. Accessed June 29, 2021. http://www.ccwdata.org
27.
Nguyen  CA, Chernew  ME, Ostrer  I, Beaulieu  ND.  Comparison of healthcare delivery systems in low- and high-income communities.  Am J Accountable Care. 2019;7(4):11-18.
28.
Roberts  ET, Zaslavsky  AM, McWilliams  JM.  The value-based payment modifier: program outcomes and implications for disparities.   Ann Intern Med. 2018;168(4):255-265. doi:10.7326/M17-1740PubMedGoogle ScholarCrossref
29.
Roberts  ET, Zaslavsky  AM, Barnett  ML, Landon  BE, Ding  L, McWilliams  JM.  Assessment of the effect of adjustment for patient characteristics on hospital readmission rates: implications for pay for performance.   JAMA Intern Med. 2018;178(11):1498-1507. doi:10.1001/jamainternmed.2018.4481PubMedGoogle ScholarCrossref
30.
Ahluwalia  SC, Damberg  CL, Haas  A, Shekelle  PG.  How are medical groups identified as high-performing? the effect of different approaches to classification of performance.   BMC Health Serv Res. 2019;19(1):500. doi:10.1186/s12913-019-4293-9PubMedGoogle ScholarCrossref
31.
Landon  BE, Hicks  LS, O’Malley  AJ,  et al.  Improving the management of chronic disease at community health centers.   N Engl J Med. 2007;356(9):921-934. doi:10.1056/NEJMsa062860PubMedGoogle ScholarCrossref
32.
Zaslavsky  AM, Beaulieu  ND, Landon  BE, Cleary  PD.  Dimensions of consumer-assessed quality of Medicare managed-care health plans.   Med Care. 2000;38(2):162-174. doi:10.1097/00005650-200002000-00006PubMedGoogle ScholarCrossref
33.
Jha  AK, Orav  EJ, Dobson  A, Book  RA, Epstein  AM.  Measuring efficiency: the association of hospital costs and quality of care.   Health Aff (Millwood). 2009;28(3):897-906. doi:10.1377/hlthaff.28.3.897PubMedGoogle ScholarCrossref
34.
O’Brien  SM, Shahian  DM, DeLong  ER,  et al.  Quality measurement in adult cardiac surgery: part 2—statistical considerations in composite measure scoring and provider rating.   Ann Thorac Surg. 2007;83(4)(suppl):S13-S26. doi:10.1016/j.athoracsur.2007.01.055PubMedGoogle ScholarCrossref
35.
Zaslavsky  AM, Zaborski  LB, Cleary  PD.  Plan, geographical, and temporal variation of consumer assessments of ambulatory health care.   Health Serv Res. 2004;39(5):1467-1485. doi:10.1111/j.1475-6773.2004.00299.xPubMedGoogle ScholarCrossref
36.
Hatfield  LA, Zaslavsky  AM.  Separable covariance models for health care quality measures across years and topics.   Stat Med. 2018;37(12):2053-2066. doi:10.1002/sim.7656PubMedGoogle ScholarCrossref
37.
Bradley  EH, Herrin  J, Elbel  B,  et al.  Hospital quality for acute myocardial infarction: correlation among process measures and relationship with short-term mortality.   JAMA. 2006;296(1):72-78. doi:10.1001/jama.296.1.72PubMedGoogle ScholarCrossref
38.
Jha  AK, Li  Z, Orav  EJ, Epstein  AM.  Care in US hospitals—the Hospital Quality Alliance program.   N Engl J Med. 2005;353(3):265-274. doi:10.1056/NEJMsa051249PubMedGoogle ScholarCrossref
39.
Wilson  IB, Landon  BE, Marsden  PV,  et al.  Correlations among measures of quality in HIV care in the United States: cross sectional study.   BMJ. 2007;335(7629):1085-1088. doi:10.1136/bmj.39364.520278.55PubMedGoogle ScholarCrossref
40.
Chernew  ME, Landrum  MB.  Targeted supplemental data collection–addressing the quality-measurement conundrum.   N Engl J Med. 2018;378(11):979-981. doi:10.1056/NEJMp1713834PubMedGoogle ScholarCrossref
41.
Landrum  MB, Nguyen  C, O’Rourke  E, Jung  M, Amin  T, Chernew  M.  Measurement Systems: A Framework for Next Generation Measurement of Quality in Healthcare. National Quality Forum; 2019.
42.
McDowell  A, Nguyen  CA, Chernew  ME,  et al.  Comparison of approaches for aggregating quality measures in population-based payment models.   Health Serv Res. 2018;53(6):4477-4490. doi:10.1111/1475-6773.13031PubMedGoogle ScholarCrossref
43.
Medicare Payment Advisory Commission. Applying the commission’s principles for measuring quality: population-based measures and hospital quality incentives. In:  Report to the Congress: Medicare and the Health Care Delivery System. MedPAC; 2018:175-207.
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    1 Comment for this article
    EXPAND ALL
    The Problem With Quality Measures
    Thomas Friedrich, MS, Health Sciences | Physician Practice
    A great deal of caution is in order when measuring the quality of delivered health care. While the vast majority of my career has been spent as a non-clinical manager in the delivery of health care, the early part of my training and career were spent in the area of performance measurement and incentive management in the corporate setting. My enthusiasm for that endeavor waned as I noticed factors that are too often overlooked in studies like this; such efforts, it turns out, can often be best classified as fool's errands. They are typically rooted in the context of a central-planning milieu in which the health care researchers are presumed to know better what is appropriate for a given patient than the health care providers actually delivering that care. That presumption is both precarious and arrogant.

    First, "performance" is challenging to measure. What we care about in this case, presumably, is morbidity and mortality.

    But, a great many things beyond the control of the practitioner are likely to swamp the modest effect of the delivery of care - genetics, diet, motivation, exercise, smoking, drug use, living arrangements, adherence to care recommendations, and so on. Some effort has been made here, and elsewhere, to build controls into the models, and thereby to isolate the effects of delivered care. But those controls are inadequate for the task and necessarily miss detail that is obvious on the ground. For example, a provider might know that a given patient will skip ordered LDL testing based on prior experience, or that the patient will refuse to take a statin or follow dietary or other medical guidance, so why bother, except to satisfy the bureaucracy/researchers? And yet, at the margin, providers ordering the test will be deemed "better" than those who don't; the only problem is that morbidity and mortality will be entirely unaffected, and the cost will be higher. At ground level, this is pejoratively called "box-checking".

    In addition, it is a long way from a proxy measure like "ordered an LDL test" or even "completed an LDL test" to measures that matter, like morbidity and mortality. Here, a short cut is often taken in the prescriptive performance research literature. If "B" (appropriate statin use) generally leads to "C" (reduced morbidity and/or mortality), then surely "A" (the ordering of LDL measurement and/or statin prescription) will lead to "C". Maybe. Maybe not. That is a question that is highly dependent on the patient, and one that must be answered directly and empirically, not presumed. In the quest for evidence-based medicine, evidence-based research is too often sacrificed in the interest of expediency.

    Second, the inevitable follow-up of performance measurements is the development of carrots and sticks to be applied against the mere mortals delivering actual care, to get them in line. Here, lessons from industrial application of incentives and central planning are instructive.

    "Pay us by the nail, and we make lots of little nails. Pay us by the pound, and we make one big one," is a quote that was repeated frequently when I was learning about the best ways to gain compliance from the masses via performance management. The gist is that what is rewarded is exactly what is motivated. In health care, we might, if sufficiently motivated, find a way to obtain an LDL test, even if we know full well that the test will not benefit the patient. At that point, the patient becomes beside the point to the incentivized provider; the point is to accumulate quality points and to cash those in for a better paycheck.
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Views 4,077
    Citations 0
    Original Investigation
    Health Policy
    July 28, 2021

    Using Consistently Low Performance to Identify Low-Quality Physician Groups

    Author Affiliations
    • 1Massachusetts Institute of Technology, Cambridge
    • 2Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts
    • 3Heart and Vascular Center, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire
    • 4Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire
    • 5Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
    • 6Division of General Internal Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
    JAMA Netw Open. 2021;4(7):e2117954. doi:10.1001/jamanetworkopen.2021.17954
    Key Points

    Question  Can quality measures identify low-quality physician groups when performance is correlated across multiple measures or multiple years?

    Findings  In this cross-sectional study of a commercially insured population with diabetes or cardiovascular disease, we found weak consistency of low performance scores across multiple measures but moderate to strong consistency of scores over multiple years. One percent or fewer of physician groups performed in the bottom quartile for all diabetes measures or all cardiovascular disease measures in any given year, while 4% to 11% were in the bottom quartile in all 4 years for most measures.

    Meaning  These results suggest that consistency in poor performance depends on the statistical properties of the measures.

    Abstract

    Importance  There has been a growth in the use of performance-based payment models in the past decade, but inherently noisy and stochastic quality measures complicate the assessment of the quality of physician groups. Examining consistently low performance across multiple measures or multiple years could potentially identify a subset of low-quality physician groups.

    Objective  To identify low-performing physician groups based on consistently low performance after adjusting for patient characteristics across multiple measures or multiple years for 10 commonly used quality measures for diabetes and cardiovascular disease (CVD).

    Design, Setting, and Participants  This cross-sectional study used medical and pharmacy claims and laboratory data for enrollees ages 18 to 65 years with diabetes or CVD in an Aetna health insurance plan between 2016 and 2019. Each physician group’s risk-adjusted performance for a given year was estimated using mixed-effects linear probability regression models. Performance was correlated across measures and time, and the proportion of physician groups that performed in the bottom quartile was examined across multiple measures or multiple years. Data analysis was conducted between September 2020 and May 2021.

    Exposures  Primary care physician groups.

    Main Outcomes and Measures  Performance scores of 6 quality measures for diabetes and 4 for CVD, including hemoglobin A1c (HbA1c) testing, low-density lipoprotein testing, statin use, HbA1c control, low-density lipoprotein control, and hospital-based utilization.

    Results  A total of 786 641 unique enrollees treated by 890 physician groups were included; 414 655 (52.7%) of the enrollees were men and the mean (SD) age was 53 (9.5) years. After adjusting for age, sex, and clinical and social risk variables, correlations among individual measures were weak (eg, performance-adjusted correlation between any statin use and LDL testing for patients with diabetes, r = −0.10) to moderate (correlation between LDL testing for diabetes and LDL testing for CVD, r = .43), but year-to-year correlations for all measures were moderate to strong. One percent or fewer of physician groups performed in the bottom quartile for all 6 diabetes measures or all 4 cardiovascular disease measures in any given year, while 14 (4.0%) to 39 groups (11.1%) were in the bottom quartile in all 4 years for any given measure other than hospital-based utilization for CVD (1.1%).

    Conclusions and Relevance  A subset of physician groups that was consistently low performing could be identified by considering performance measures across multiple years. Considering the consistency of group performance could contribute a novel method to identify physician groups most likely to benefit from limited resources.

    Introduction

    In the past decade, there has been a shift away from fee-for-service and toward population-based payment models that reward physician groups based on performance on quality measures. However, the multidimensionality and stochastic nature of quality measures may complicate assessment and, more specifically, the identification of inadequately performing physician groups. In particular, groups may be incorrectly identified as poor performing purely by chance in common forms of cross-sectional analyses.

    In addition, the burden of quality measurement on physicians and the substantial investments made in measurement development have been a continuous concern.1-4 In the US, physician practices spend more than $15.4 billion annually on reporting quality measures alone.4 With health care expenditures rising, it is increasingly critical that resources for tracking quality are used effectively.

    Previous work has assessed consistency in quality measurement in hospitals and health plans. Various national organizations and payers have examined performance across multiple measures to rate hospitals and also explored whether hospitals are ranked differently from 1 year to another.5-12 Other analyses have investigated the quality of health plans using the Healthcare Effectiveness Data and Information Set (HEDIS), finding variation and lack of correlation among performance measures.13,14

    Previous research has also focused on hospital or health plan quality performance and has mostly studied variation and changes in ranking (eg, different ranking methods, ranking in different years), but there has been less work on identification of low performance. It is important to measure quality at the physician group level because, increasingly, groups are responsible for population-based outcomes. In addition, rankings are often not useful because of the inherent noisiness of measures.15 We propose the use of consistently low performance across multiple quality measures or multiple years as a method to identify low-performing physician groups.

    In this study, we used medical and pharmacy claims and laboratory data from an Aetna health insurance plan to examine whether consistently low performance across multiple quality measures or multiple years can identify low-performing physician groups. We measured performance using commonly used measures for 2 common chronic conditions that have been the focus of many quality measurement and improvement efforts: diabetes and cardiovascular disease (CVD).

    Methods
    Study Data

    We used medical claims, pharmacy claims, and laboratory results from an Aetna health insurance plan between 2016 and 2019. Our sample included adults between ages 18 and 65 years with either diabetes or CVD (based on 2013 HEDIS eligibility criteria: 1 or more inpatient visit or 2 or more outpatient visits with an International Statistical Classification of Diseases and Related Health Problems, Tenth Revision [ICD-10] code for diabetes or CVD) who were continuously enrolled for at least 1 calendar year between 2016 and 2019.

    We attributed each enrollee to the physician group (defined by tax identification number [TIN]) accounting for the plurality of the enrollee’s office visits during a given year (see eAppendix 1 in the Supplement). We assigned enrollees with the same number of visits to multiple TINs to the one with the greatest total payments.16

    To ensure sufficient precision in group-level estimates, we restricted our sample to groups with at least 40 attributed enrollees with diabetes and at least 40 with CVD, of which 20 enrollees had to have laboratory and pharmacy data for every relevant measure in a given year.

    We obtained institutional review board approval from Harvard University’s Committee on the Use of Human Subjects. Informed consent was not required because data sets were deidentified. Analysis was conducted between September 2020 and May 2021. This article is compliant with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cross-sectional studies.

    Study Variables
    Quality Measures

    We constructed 10 quality measures, including HEDIS-based process and disease control measures and outcome measures commonly used in the literature (see eAppendix 2 in the Supplement). For diabetes, we constructed 3 process measures (hemoglobin A1c [HbA1c] testing, low-density lipoprotein [LDL] testing, and statin use [1 or more fill over the measurement year]), 2 disease control measures (HbA1c control [less than 8%] and LDL control [below 100 mg/dL; to convert to millimoles per liter, multiply by 0.0259]), and 1 utilization-based outcome measure (no emergency department [ED] visit, observation stay, or hospital admissions for diabetes or major adverse cardiovascular events [MACE]17). For CVD, we included measures for LDL testing, statin use, LDL control, and no ED, observation, or hospital admissions for MACE. Some measures were limited to subsets of enrollees with relevant pharmacy benefits with the same insurer (42.7% for diabetes cohort [235 546 of 551 415 enrollees] and 41.2% for CVD cohort [291 597 of 708 171 enrollees]) and available laboratory results (54.3% for diabetes cohort [299 215 of 551 415] and 42.6% for CVD cohort [301 671 of 708 171]). The clinical and area-level characteristics of enrollees with and without laboratory or pharmacy data were similar (eTable 1 in the Supplement).

    Covariates

    Because patient characteristics are often associated with performance,18-25 we adjusted for both patient-level clinical and area-level social risk variables. Clinical covariates included a set of comorbidities (atrial fibrillation, hypertension, chronic obstructive pulmonary disease, heart failure, and chronic kidney disease) that were coded using the CMS Chronic Condition Warehouse definitions,26 and a comorbidity score based on a DxCG Intelligence version 5.0.0 (Cotiviti) prediction model. We used zip code–level sociodemographic characteristics from the 2010 US Census (percentage of population who were Black, Hispanic, and college educated) and 5-year estimates from the 2010-2014 American Community Survey (percentage of population below poverty). Using methods from a previous study, we also assigned enrollees to urban, suburban, and rural classifications based on zip code.27

    Statistical Analysis

    We used inverse probability weighting to account for observed differences between enrollees with and without pharmacy or laboratory data. Specifically, to standardize the samples across measures that required laboratory or pharmacy data, we weighted each patient as the inverse of the probability that they had laboratory (for disease control) or pharmacy (for statin use) measures, respectively. These weights were estimated using a logit model that included all of the covariates described above and used the availability of data as the dependent variable for the relevant measures.

    Because low-performing groups may be more likely to treat high-risk patients, we followed a 2-step social risk adjustment methodology from earlier studies22,28,29 for computing adjusted quality performance. Prior work has shown that social risk adjustment can affect the variance and rankings of physician group performance on disease control and outcome measures, but is less of a factor for process measures.22 For consistency, we adjusted all quality measures, including process measures. In this approach, we removed the association of high-risk patients sorting to low-performing physician groups by basing our adjustment on within-group associations. In the first step, we fit inverse probability weighted linear regression models estimating the adherence of an enrollee attributed to a physician group in a given year as a function of patient-level age and gender, patient-level comorbidities and DxCG composite, zip code–level sociodemographic characteristics, and physician group fixed effects. We then computed an enrollee-year–level risk score as the projected performance for each measure estimated from only the coefficients on the enrollee characteristics (ie, not including the coefficients on the group fixed effects in the estimate).

    In the second step, we computed group-level performance scores. We estimated patient-level mixed-effects linear probability regression models that related the performance on a measure to the risk-score computed in step 1 and to physician group random effects. We then computed an adjusted score that represented a group’s estimated performance in a given year for an enrollee with average clinical and social risk. A more detailed explanation of this 2-step adjustment approach can be found in eAppendix 3 in the Supplement.

    To estimate the degree of variation in performance at the physician group level, we computed intraclass correlation coefficients (ICCs). The ICC represents the fraction of total variation in performance that is explained by differences between physician groups. The ICC can be low if there is little variation between groups or if the within-group variance is high. Measures with low ICCs generally have less ability to distinguish performance at the group level and are thus less useful for identifying low-performing clinicians. We also computed reliability for each measure. Reliability represents a measure’s reproducibility and is a function of the measure of variation within and between physician groups, as well as the sample size. We report reliability for practices based on the median number of enrollees.

    To examine consistency across measures and years, we computed pairwise and year-to-year (eg, 2016 vs 2017, 2017 vs 2018) correlations among physician groups’ risk-adjusted performance on measures. We tabulated the proportion of physician groups that performed in the bottom quartile of scores across multiple measures or multiple years. We defined low-performing physician groups as falling into the bottom quartile of quality performance. After excluding those that did not have complete data for all 4 years, 353 physician groups were included in the analysis of consistency in performance across years. All analyses were performed using Stata statistical software version 15.1 (StataCorp). P < .05 was treated as statistically significant, and all statistical tests were 2-sided.

    Results

    Our final sample included 786 641 unique enrollees (1 189 367 enrollee-years, of which 481 196 were patients with diabetes only, 637 952 patients with CVD only, and 70 219 patients with both) treated by 890 physician groups (634 in 2016, 564 in 2017, 593 in 2018, and 558 in 2019). A total of 414 655 (52.7%) of enrollees were men and the mean (SD) age was 53 (9.5) years.

    Performance on Individual Measures

    For diabetes, median (interquartile range [IQR]) rates of adjusted performance were high on HbA1c testing (90.9% [89.2%-94.5%]), LDL testing (89.4% [86.9%-92.5%]), and avoidance of hospital-based utilization (78.9% [77.6%-80.3%]) (Table 1). Performance was lower on HbA1c control (65.2% [62.9%-67.5%]), LDL control (66.4% [64.5%-68.3%]), and statin use (57.8% [56.5%-62.0%]). The IQRs, which illustrate the variability of adjusted performance across measures, were narrow: the difference between the upper and lower quartiles ranged from 2.7% for hospital-based utilization to 5.6% for LDL testing. Group-level variation explained a small proportion of the total variation in performance on the measures for diabetes: the ICCs, which indicate ability to distinguish performance, were lowest for hospital-based utilization (ranging from 0.6%-0.7% between 2016 and 2019) and highest for HbA1c testing (4.7%-5.8%). Measure reliability was high for testing measures but moderate for hospital-based utilization for a group with the median number of attributed enrollees.

    Similar to diabetes, median (IQR) rates of performance were high on LDL testing (79.3% [74.3%-84.9%]) and utilization-based outcomes (94.6% [92.2%-96.4%]) for CVD. Physician groups had lower performance on LDL control (40.3% [35.5%-44.7%]) and statin use (48.8% [41.1%-59.6%]). The IQR was lowest for hospital-based utilization (a range of 4.2%) and highest for statin use (18.5%). Again, ICCs were low; they were lowest for hospital-based utilization (0.3%-0.6%) and highest for LDL testing (3.1%-4.3%). Measure reliability was highest for testing and lowest for hospital-based utilization.

    Correlations Between Performance on Individual Measures

    There were weak to moderate correlations between most of the individual quality measures, and only 4 of 45 pairwise correlations were greater than 0.5 (eTable 2 in the Supplement). Correlations were weak between different types of individual measures (eg, 0.01 between statin use and LDL testing for diabetes) and moderate between the same type of measures across the 2 disease cohorts (eg, r = 0.43 between LDL testing for diabetes and LDL testing for CVD). Testing measures had low correlations with their corresponding control measures. The highest positive correlation was in the CVD cohort between LDL control and statin use. Avoidance of hospital-based utilization was negatively correlated with LDL control and statin use for CVD.

    We observed stronger consistency in performance on individual measures across years. Year-to-year correlations were highest for testing measures for both diabetes (mean r = 0.81 for HbA1c testing and 0.68 for LDL testing) and CVD (mean r = 0.82 for LDL testing) (Table 2). They were lowest for LDL control for both diabetes (mean r = 0.51) and CVD (mean r = 0.46).

    Consistency of Low Performance Across Multiple Measures or Multiple Years

    We observed minimal consistency in poor adjusted performance across multiple measures. Fewer than 0.2% of groups performed in the bottom quartile for all 6 diabetes measures in any given year (Figure 1). Similarly, 1% or fewer of groups performed in the bottom quartile for all 4 CVD measures in any given year.

    Considering all 10 diabetes and CVD measures together, consistency in low performance remained weak (eFigure 1 in the Supplement). Fewer than 2% of groups were low performing on 7 or more measures. However, some groups could be flagged as consistently low quality based on testing and hospital-based utilization measures that had overall high performance and low variation across groups. The distinction of being in the bottom quartile for those measures was less meaningful. Although the variances for some of the control measures were low, we included them in the analysis because their mean performance was low to moderate and would thus be important to consider in quality assessment and improvement efforts. To examine consistency of low performance on potentially more meaningful measures, we removed HbA1c testing for diabetes and LDL testing and hospital-based utilization for both diabetes and CVD. Including only disease control and statin use measures, about 1% or fewer were consistently low performing on all 5 (eFigure 2 in the Supplement).

    We found higher consistency of low performance across multiple years (Figure 2). The percentage of groups that performed in the bottom quartile in all 4 years ranged from 25 (7.1%) to 39 groups (11.1%) for testing measures, 14 (4.0%) to 18 groups (5.1%) for disease control measures, 14 (4.0%) to 16 groups (4.5%) for statin use measures, and 4 (1.1%) to 23 groups (6.5%) for avoidance of hospital-based utilization measures. These rates were higher than the expected 0.4% if performance in each year was independent.

    As expected, consistency in bottom quartile performance mirrored measure reliability. The percentage of groups that were consistently low performing (ie, falling into the bottom quartile in all 4 years) was highest for testing measures and lowest for hospital-based utilization, particularly in the CVD cohort. ICCs and reliability were lowest for hospital-based utilization measures (most out of physicians’ control).

    In sensitivity analyses, we found that consistency of low performance across multiple measures or multiple years was similar if performance was unadjusted (ie, without controls for age, sex, and clinical and social risk factors) (eFigure 3 in the Supplement). In addition, a similar approach could be used to identify high-quality groups that focuses on consistency of high performance across multiple years (eFigure 4 in the Supplement).

    Discussion

    In this study of physician groups caring for a commercially insured population with diabetes or CVD, we found that consistent low performance across multiple years could identify a subset of low-quality physician groups. Consistency in low performance across multiple measures was less useful in this setting. Our study expands previous research focused on hospital or health plan quality performance by examining performance at the physician group level and by introducing this novel, targeted approach to identify low-performing practices.

    Variation across groups and measure reliability was highest among testing measures. Testing rates were generally high, particularly in the diabetes cohort. Measure reliability was low to moderate among potentially more clinically relevant measures of disease control, statin use, and avoidance of hospital-based utilization. Classifying a group as low performing using a single year of performance for these measures would thus identify groups as poor performing purely by chance. Examining performance across multiple years provides a method to identify poor performance on clinically relevant measures that cannot be reliably assessed in a single cross-section. Consistency across multiple measures could be useful in settings where individual measures are more correlated or to target groups with consistently poor performance in subsets of measures (eg, low rates of LDL testing for both diabetes and CVD). Pooling data across multiple years and/or creating composite measures are alternative approaches to improve precision in inherently noisy measures.30-36

    As in previous studies,13,37-39 we observed low to moderate correlations between most individual measures. Although we did not find many groups that were consistently low performing across multiple measures, we did find a subset that were consistently low performing across multiple years. Rather than spreading efforts to track performance over a large number of practices, using this methodology to target a condensed number of underperforming groups could be the first line of defense against further declining or sustained low quality of care.40,41 Although all of the groups in this subset may not necessarily be low quality (because of statistical noise or unmeasured confounders), those that are truly low quality are most likely to be in this subset. Further scrutiny of these groups, perhaps through medical record reviews or site visits, could inform actionable initiatives that include financial incentives (or additional resources) or nonfinancial approaches to improve care. Similarly, high performers could be examined for best practices and possibly receive leniency on some data gathering requirements, allowing them to redirect some of their funds for data collection toward other areas of care.

    Limitations

    This study has several limitations. First, the data came from 1 commercial health insurance plan that may have higher proportions of enrollees in certain regions of the US. This health plan may not cover a large portion of physician groups, and our quality metrics were computed based on a subset of each group’s patient panel. However, commercial data made it possible to examine quality in nonelderly populations and to construct disease control measures, which are typically uncommon in administrative data. Second, because we focused on quality of care for 2 chronic diseases, our analysis only included 10 measures. Third, following common practice, we used billing arrangements to identify physician groups. Fourth, because we did not have access to enrollee sociodemographic characteristics, we used zip code–level characteristics to adjust for social risk. Although we adjusted for both clinical (enrollee-level) and sociodemographic (area-level) characteristics, there may still be residual effects that we were unable to capture. Fifth, our results cannot be generalized to small practices, and to evaluate consistency of performance across multiple years we could only include physician groups with complete data across all 4 years. Sixth, we used relative thresholds to define poor performance. This approach is commonly used in value-based payments. However, performance classification can be sensitive to approach,30,42 and absolute thresholds may be more appropriate in some settings.43

    Conclusions

    Moving forward with the use of quality performance to assess and reward health care professionals, it is important to consider the noisy nature of measures. In this article, we were able to identify a subset of physician groups based on their consistently low performance across multiple years. Consistency in performance could be applied to many other settings and could also be used to identify high-quality physicians. As quality measurement and incentives continue to be developed and are often directed at the organizational level, considering the consistency of group performance could afford a novel method to identify groups most likely to benefit from limited resources.

    Back to top
    Article Information

    Accepted for Publication: May 18, 2021.

    Published: July 28, 2021. doi:10.1001/jamanetworkopen.2021.17954

    Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Nguyen CA et al. JAMA Network Open.

    Corresponding Author: Mary Beth Landrum, PhD, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (landrum@hcp.med.harvard.edu).

    Author Contributions: Dr Landrum had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

    Concept and design: All authors.

    Acquisition, analysis, or interpretation of data: Nguyen, Gilstrap, McWilliams, Landon, Landrum.

    Drafting of the manuscript: Nguyen.

    Critical revision of the manuscript for important intellectual content: All authors.

    Statistical analysis: Nguyen, Gilstrap, McWilliams, Landrum.

    Administrative, technical, or material support: Nguyen, Gilstrap.

    Supervision: Nguyen, Chernew.

    Conflict of Interest Disclosures: Dr Chernew reported service as a board member of the Blue Cross Blue Shield Association advisory board, the Blue Health Intelligence advisory board, the HCCI board, the NIHCM board, and as MedPAC Chair; he reported equity in Virta Health and VBID Health; and he reported receiving speaking honoraria from America’s Health Insurance Plans, Blue Cross and Blue Shield of Florida, HealthEdge, Humana, Massachusetts Association of Health Plans, American Medical Association, GI Roundtable, and American College of Cardiology outside the submitted work. Dr McWilliams reported receiving grants from Arnold Ventures during the conduct of the study; he reported serving as an unpaid member of the board of directors for the Institute for Accountable Care. No other disclosures were reported.

    Funding/Support: This research was supported by a grant from the Laura and John Arnold Foundation.

    Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

    Disclaimer: The views presented here are those of the author and not necessarily those of the Laura and John Arnold Foundation, its directors, officers, or staff.

    Meeting Presentation: An early version of this article was presented at the AcademyHealth Annual Research Meeting on June 25, 2017, in New Orleans, LA.

    Additional Contributions: We thank Sartaj Alam, MS, and Andrew L. Hicks, MS, employees of Harvard Medical School, for assistance with the data. Contributors did not receive additional compensation beyond the terms of their employment.

    References
    1.
    Khullar  D, Bond  AM, O’Donnell  EM, Qian  Y, Gans  DN, Casalino  LP. Time and financial costs for physician practices to participate in the Medicare merit-based incentive payment system: a qualitative study. 2021;2(5):e210527. doi:10.1001/jamahealthforum.2021.0527
    2.
    Schuster  MA, Onorato  SE, Meltzer  DO.  Measuring the cost of quality measurement: a missing link in quality strategy.   JAMA. 2017;318(13):1219-1220. doi:10.1001/jama.2017.11525PubMedGoogle ScholarCrossref
    3.
    Chen  J, Patel  MM.  Costs of quality measurement.   JAMA. 2018;319(6):615-616. doi:10.1001/jama.2017.20288PubMedGoogle ScholarCrossref
    4.
    Casalino  LP, Gans  D, Weber  R,  et al.  US physicians practices spend more than $15.4 billion annually to report quality measures.   Health Aff (Millwood). 2016;35(3):401-406. doi:10.1377/hlthaff.2015.1258PubMedGoogle ScholarCrossref
    5.
    Austin  JM, Jha  AK, Romano  PS,  et al.  National hospital ratings systems share few common scores and may generate confusion instead of clarity.   Health Aff (Millwood). 2015;34(3):423-430. doi:10.1377/hlthaff.2014.0201PubMedGoogle ScholarCrossref
    6.
    Jha  A. Hospital rankings get serious. an ounce of evidence. Harvard T.H. Chan School of Public Health blog. August 14, 2012. Accessed June 23, 2021. https://blogs.sph.harvard.edu/ashish-jha/2012/08/14/hospital-rankings-get-serious/
    7.
    Rosenthal  E. The hype over hospital rankings. New York Times. Published July 27, 2013. Accessed June 23, 2021. https://www.nytimes.com/2013/07/28/sunday-review/the-hype-over-hospital-rankings.html
    8.
    Osborne  NH, Nicholas  LH, Ghaferi  AA, Upchurch  GR  Jr, Dimick  JB.  Do popular media and internet-based hospital quality ratings identify hospitals with better cardiovascular surgery outcomes?   J Am Coll Surg. 2010;210(1):87-92. doi:10.1016/j.jamcollsurg.2009.09.038PubMedGoogle ScholarCrossref
    9.
    Ahluwalia  SC, Damberg  CL, Silverman  M, Motala  A, Shekelle  PG.  What defines a high-performing health care delivery system: a systematic review.   Jt Comm J Qual Patient Saf. 2017;43(9):450-459. doi:10.1016/j.jcjq.2017.03.010PubMedGoogle Scholar
    10.
    Jha  AK, Orav  EJ, Epstein  AM.  Low-quality, high-cost hospitals, mainly in South, care for sharply higher shares of elderly Black, Hispanic, and Medicaid patients.   Health Aff (Millwood). 2011;30(10):1904-1911. doi:10.1377/hlthaff.2011.0027PubMedGoogle ScholarCrossref
    11.
    Leapfrog Group. About the Grade. Hospital Safety Grade. Published 2013. Accessed June 23, 2021. https://www.hospitalsafetygrade.org/your-hospitals-safety-grade/about-the-grade
    12.
    Dimick  JB, Staiger  DO, Birkmeyer  JD.  Ranking hospitals on surgical mortality: the importance of reliability adjustment.   Health Serv Res. 2010;45(6 Pt 1):1614-1629. doi:10.1111/j.1475-6773.2010.01158.xPubMedGoogle Scholar
    13.
    Druss  B, Rosenheck  R.  Evaluation of the HEDIS measure of behavioral health care quality.   Psychiatr Serv. 1997;48(1):71-75. doi:10.1176/ps.48.1.71PubMedGoogle Scholar
    14.
    Gilstrap  LG, Chernew  ME, Nguyen  CA,  et al.  Association between clinical practice group adherence to quality measures and adverse outcomes among adult patients with diabetes.   JAMA Netw Open. 2019;2(8):e199139. doi:10.1001/jamanetworkopen.2019.9139PubMedGoogle Scholar
    15.
    Marshall  EC, Spiegelhalter  DJ.  Reliability of league tables of in vitro fertilisation clinics: retrospective analysis of live birth rates.   BMJ. 1998;316(7146):1701-1704. doi:10.1136/bmj.316.7146.1701PubMedGoogle ScholarCrossref
    16.
    US Centers for Medicare & Medicaid Services. Two-Step Attribution for Measures Included in the Value Modifier. Published August 2015. Accessed June 23, 2021. https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/PhysicianFeedbackProgram/Downloads/Attribution-Fact-Sheet.pdf
    17.
    Agency for Healthcare Research and Quality. Prevention Quality Indicators Technical Specifications Updates–Version 6.0 (ICD-9). Updated October 2016. Accessed June 23, 2021. https://www.qualityindicators.ahrq.gov/Archive/PQI_TechSpec_ICD09_v60.aspx
    18.
    National Quality Forum. Risk Adjustment for Socioeconomic Status or Other Sociodemographic Factors. Published August 2014. Accessed June 23, 2021. http://www.qualityforum.org/risk_adjustment_ses.aspx
    19.
    US Department of Health and Human Services. Report to Congress: Social Risk Factors and Performance Under Medicare’s Value-Based Payment Programs. Published December 21, 2016. Accessed June 23, 2021. https://aspe.hhs.gov/pdf-report/report-congress-social-risk-factors-and-performance-under-medicares-value-based-purchasing-programs
    20.
    Krumholz  HM, Bernheim  SM.  Considering the role of socioeconomic status in hospital outcomes measures.   Ann Intern Med. 2014;161(11):833-834. doi:10.7326/M14-2308PubMedGoogle ScholarCrossref
    21.
    National Academy of Medicine, Engineering, and Medicine.  Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors. National Academies Press; 2016.
    22.
    Nguyen  CA, Gilstrap  LG, Chernew  ME, McWilliams  JM, Landon  BE, Landrum  MB.  Social risk adjustment of quality measures for diabetes and cardiovascular disease in a commercially insured US population.   JAMA Netw Open. 2019;2(3):e190838. doi:10.1001/jamanetworkopen.2019.0838PubMedGoogle Scholar
    23.
    Zaslavsky  AM, Hochheimer  JN, Schneider  EC,  et al.  Impact of sociodemographic case mix on the HEDIS measures of health plan quality.   Med Care. 2000;38(10):981-992. doi:10.1097/00005650-200010000-00002PubMedGoogle ScholarCrossref
    24.
    Kim  M, Zaslavsky  AM, Cleary  PD.  Adjusting Pediatric Consumer Assessment of Health Plans Study (CAHPS) scores to ensure fair comparison of health plan performances.   Med Care. 2005;43(1):44-52.PubMedGoogle Scholar
    25.
    Zaslavsky  AM, Zaborski  LB, Ding  L, Shaul  JA, Cioffi  MJ, Cleary  PD.  Adjusting performance measures to ensure equitable plan comparisons.   Health Care Financ Rev. 2001;22(3):109-126. PubMedGoogle Scholar
    26.
    US Centers for Medicare & Medicaid Services. CCW Chronic Conditions: Combined Medicare and Medicaid Data. Published 2012. Accessed June 29, 2021. http://www.ccwdata.org
    27.
    Nguyen  CA, Chernew  ME, Ostrer  I, Beaulieu  ND.  Comparison of healthcare delivery systems in low- and high-income communities.  Am J Accountable Care. 2019;7(4):11-18.
    28.
    Roberts  ET, Zaslavsky  AM, McWilliams  JM.  The value-based payment modifier: program outcomes and implications for disparities.   Ann Intern Med. 2018;168(4):255-265. doi:10.7326/M17-1740PubMedGoogle ScholarCrossref
    29.
    Roberts  ET, Zaslavsky  AM, Barnett  ML, Landon  BE, Ding  L, McWilliams  JM.  Assessment of the effect of adjustment for patient characteristics on hospital readmission rates: implications for pay for performance.   JAMA Intern Med. 2018;178(11):1498-1507. doi:10.1001/jamainternmed.2018.4481PubMedGoogle ScholarCrossref
    30.
    Ahluwalia  SC, Damberg  CL, Haas  A, Shekelle  PG.  How are medical groups identified as high-performing? the effect of different approaches to classification of performance.   BMC Health Serv Res. 2019;19(1):500. doi:10.1186/s12913-019-4293-9PubMedGoogle ScholarCrossref
    31.
    Landon  BE, Hicks  LS, O’Malley  AJ,  et al.  Improving the management of chronic disease at community health centers.   N Engl J Med. 2007;356(9):921-934. doi:10.1056/NEJMsa062860PubMedGoogle ScholarCrossref
    32.
    Zaslavsky  AM, Beaulieu  ND, Landon  BE, Cleary  PD.  Dimensions of consumer-assessed quality of Medicare managed-care health plans.   Med Care. 2000;38(2):162-174. doi:10.1097/00005650-200002000-00006PubMedGoogle ScholarCrossref
    33.
    Jha  AK, Orav  EJ, Dobson  A, Book  RA, Epstein  AM.  Measuring efficiency: the association of hospital costs and quality of care.   Health Aff (Millwood). 2009;28(3):897-906. doi:10.1377/hlthaff.28.3.897PubMedGoogle ScholarCrossref
    34.
    O’Brien  SM, Shahian  DM, DeLong  ER,  et al.  Quality measurement in adult cardiac surgery: part 2—statistical considerations in composite measure scoring and provider rating.   Ann Thorac Surg. 2007;83(4)(suppl):S13-S26. doi:10.1016/j.athoracsur.2007.01.055PubMedGoogle ScholarCrossref
    35.
    Zaslavsky  AM, Zaborski  LB, Cleary  PD.  Plan, geographical, and temporal variation of consumer assessments of ambulatory health care.   Health Serv Res. 2004;39(5):1467-1485. doi:10.1111/j.1475-6773.2004.00299.xPubMedGoogle ScholarCrossref
    36.
    Hatfield  LA, Zaslavsky  AM.  Separable covariance models for health care quality measures across years and topics.   Stat Med. 2018;37(12):2053-2066. doi:10.1002/sim.7656PubMedGoogle ScholarCrossref
    37.
    Bradley  EH, Herrin  J, Elbel  B,  et al.  Hospital quality for acute myocardial infarction: correlation among process measures and relationship with short-term mortality.   JAMA. 2006;296(1):72-78. doi:10.1001/jama.296.1.72PubMedGoogle ScholarCrossref
    38.
    Jha  AK, Li  Z, Orav  EJ, Epstein  AM.  Care in US hospitals—the Hospital Quality Alliance program.   N Engl J Med. 2005;353(3):265-274. doi:10.1056/NEJMsa051249PubMedGoogle ScholarCrossref
    39.
    Wilson  IB, Landon  BE, Marsden  PV,  et al.  Correlations among measures of quality in HIV care in the United States: cross sectional study.   BMJ. 2007;335(7629):1085-1088. doi:10.1136/bmj.39364.520278.55PubMedGoogle ScholarCrossref
    40.
    Chernew  ME, Landrum  MB.  Targeted supplemental data collection–addressing the quality-measurement conundrum.   N Engl J Med. 2018;378(11):979-981. doi:10.1056/NEJMp1713834PubMedGoogle ScholarCrossref
    41.
    Landrum  MB, Nguyen  C, O’Rourke  E, Jung  M, Amin  T, Chernew  M.  Measurement Systems: A Framework for Next Generation Measurement of Quality in Healthcare. National Quality Forum; 2019.
    42.
    McDowell  A, Nguyen  CA, Chernew  ME,  et al.  Comparison of approaches for aggregating quality measures in population-based payment models.   Health Serv Res. 2018;53(6):4477-4490. doi:10.1111/1475-6773.13031PubMedGoogle ScholarCrossref
    43.
    Medicare Payment Advisory Commission. Applying the commission’s principles for measuring quality: population-based measures and hospital quality incentives. In:  Report to the Congress: Medicare and the Health Care Delivery System. MedPAC; 2018:175-207.
    ×