Count refers to unique incidences of service provision; overall spending, total spending on all services covered by Medicare Parts A and B (see Table 1 for services included in each category and for operational definitions of all measures).
eAppendix. Methods and Results
eTable 1. Codes Used for Measures of Low-Value Services
eTable 2. Use and Associated Spending of Services Detected by Low-Value Service Measures, by Category
Customize your JAMA Network experience by selecting one or more topics from the list below.
Schwartz AL, Landon BE, Elshaug AG, Chernew ME, McWilliams JM. Measuring Low-Value Care in Medicare. JAMA Intern Med. 2014;174(7):1067–1076. doi:10.1001/jamainternmed.2014.1541
Copyright 2014 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
Despite the importance of identifying and reducing wasteful health care use, few direct measures of overuse have been developed. Direct measures are appealing because they identify specific services to limit and can characterize low-value care even among the most efficient providers.
To develop claims-based measures of low-value services, examine service use (and associated spending) detected by these measures in Medicare, and determine whether patterns of use are related across different types of low-value services.
Design, Setting, and Participants
Drawing from evidence-based lists of services that provide minimal clinical benefit, we developed 26 claims-based measures of low-value services. Using 2009 claims for 1 360 908 Medicare beneficiaries, we assessed the proportion of beneficiaries receiving these services, mean per-beneficiary service use, and the proportion of total spending devoted to these services. We compared the amount of use and spending detected by versions of these measures with different sensitivity and specificity. We also estimated correlations between use of different services within geographic areas, adjusting for beneficiaries’ sociodemographic and clinical characteristics.
Main Outcomes and Measures
Use and spending detected by 26 measures of low-value services in 6 categories: low-value cancer screening, low-value diagnostic and preventive testing, low-value preoperative testing, low-value imaging, low-value cardiovascular testing and procedures, and other low-value surgical procedures.
Services detected by more sensitive versions of measures affected 42% of beneficiaries and constituted 2.7% of overall annual spending. Services detected by more specific versions of measures affected 25% of beneficiaries and constituted 0.6% of overall spending. In adjusted analyses, low-value spending detected in geographic regions at the 5th percentile of the regional distribution of low-value spending ($227 per beneficiary) exceeded the difference in detected low-value spending between regions at the 5th and 95th percentiles ($189 per beneficiary). Adjusted regional use was positively correlated among 5 of 6 categories of low-value services (mean r for pairwise, between-category correlations, 0.33; range, 0.14-0.54; P ≤ .01).
Conclusions and Relevance
Services detected by a limited number of measures of low-value care constituted modest proportions of overall spending but affected substantial proportions of beneficiaries and may be reflective of overuse more broadly. Performance of claims-based measures in supporting targeted payment or coverage policies to reduce overuse may depend heavily on how the measures are defined.
Several recent initiatives, including the Choosing Wisely campaign by the American Board of Internal Medicine Foundation,1 have focused on directly defining wasteful health care services that provide little or no health benefit to patients. It is challenging, however, to translate evidence-based lists of low-value services generated by such initiatives into meaningful metrics that can be applied to available data sources, such as insurance claims.2 The value of most services depends on the clinical situation in which they are provided, and administrative data often lack the clinical detail necessary to distinguish appropriate from inappropriate use. Consequently, the number of low-value services that can be reliably identified in claims data may be limited, and the amount of low-value care detected by claims-based measures may be highly sensitive to how the measures are defined.
Direct approaches to measuring overuse may nevertheless be useful for characterizing the potential extent of wasteful care and informing policies to address low-value practices. Indirect approaches to measuring care efficiency, such as comparing total risk-adjusted spending per patient across geographic areas or provider organizations,3 may be challenging for policy makers and providers to act on because specific services contributing to wasteful spending are not identified.4 Furthermore, such relative measures may fail to characterize the full extent of low-value practices if they are widespread. In contrast, direct measures could be used to identify specific instances of overuse and assess their frequency among even the most efficient providers. In addition, even a limited set of direct measures could be useful for monitoring low-value care if it reflects underlying drivers of overuse more broadly. For analogous reasons, many quality measures relating to underuse have been developed and applied widely in quality improvement initiatives despite similar measurement challenges.5,6
Drawing from evidence-based lists and the medical literature, we created algorithms to measure selected low-value services that could be applied to insurance claims data with reasonable accuracy despite the limited clinical information in claims. Using 2009 Medicare claims, we examined the use of these services and their associated spending, varying the sensitivity and specificity with which the measures likely identified overuse. We also examined whether use of different types of low-value care was correlated within regions; positive correlations might suggest that the measures reflect common drivers of overuse.
We analyzed 2008-2009 claims data for a random 5% sample of Medicare beneficiaries, as well as demographic information from enrollment files and chronic conditions from the Chronic Conditions Data Warehouse (CCW).7 We applied measures of low-value services to 2009 claims, using 2008 claims and the CCW for relevant clinical history. Our study population consisted of 1 360 908 beneficiaries who were continuously enrolled in Parts A and B of traditional fee-for-service Medicare in 2008 and while alive in 2009. We further restricted the study population to individuals who, in 2009, were living in the United States or Washington, DC, and were at least 65 years old. Our study was approved by the Harvard Medical School Committee on Human Studies and the Privacy Board of the Centers for Medicare & Medicaid Services.
Quiz Ref IDWe considered services that have been characterized as low value by the American Board of Internal Medicine Foundation’s Choosing Wisely initiative,8 the US Preventive Services Task Force “D” recommendations,9 the National Institute for Health and Care Excellence “do not do” recommendations,10 the Canadian Agency for Drugs and Technologies in Health health technology assessments,11 or peer-reviewed medical literature.12 These services have been found to provide little to no clinical benefit on average, either in general or in specific clinical scenarios. From these services, we selected a subset that is relevant to the Medicare population and could be detected using Medicare claims with reasonable specificity, meaning that major clinical factors distinguishing likely overuse from appropriate use could be identified or approximated with claims and enrollment data (eAppendix in the Supplement). We also required the evidence base characterizing each service as low value to have been established before 2009. Many low-value services were not selected (eg, imaging for pulmonary embolism without moderate or high pretest probability8) because of difficulty distinguishing inappropriate from appropriate use with claims data.
For each selected service, we developed an operational definition of low-value occurrences using Current Procedural Terminology (CPT) codes, Berenson-Eggers Type of Service codes, International Classification of Diseases, Ninth Revision diagnostic codes, CCW indicators, timing of care, site of care, and demographic information (eTable 1 in the Supplement). When supported by clinical evidence or guidelines, we broadened the scope of some recommendations featured in lists of low-value services. For example, we expanded the Choosing Wisely definition of low-value preoperative pulmonary testing before cardiac surgery to include preoperative pulmonary testing before low- or intermediate-risk surgical procedures more broadly.13 We also combined similar low-value services (eg, various laboratory tests for hypercoaguable states) into single measures. Table 1 presents the operational definitions for the 26 measures of low-value care we developed and applied to claims.
Inherent in most of our claims-based measures of low-value care was a trade-off between sensitivity (greater capture of inappropriate use) and specificity (less misclassification of appropriate use as inappropriate). To assess the variability of our findings across a spectrum of these important measurement properties, we specified 2 versions of each measure, one with higher sensitivity (and lower specificity) and the other with higher specificity (and lower sensitivity) for detecting low-value care (Table 1). Even without a gold standard for assessing service appropriateness, the relative sensitivity and specificity of our measures can be inferred from the clinical criteria we applied. For example, limiting the colorectal cancer screening measure to beneficiaries older than 85 years instead of older than 75 years decreases its sensitivity (fewer low-value instances detected) but increases its specificity (smaller proportion of appropriate services misclassified as inappropriate).
We calculated spending on low-value services using standardized prices to adjust for regional differences in Medicare payments. We used the median spending per service nationally as the standardized price for each service, including payments from Medicare, beneficiary coinsurance amounts, and any payments from other primary payers. We included related services typically bundled with the low-value service in these price estimates (eg, contrast medium administration for an imaging study or anesthesia for a procedure). These bundles were defined based on examination of the most frequent CPT codes appearing during the day a low-value service was provided and thus would not include subsequent care prompted by the service (eg, further imaging for incidental findings on preoperative chest radiographs). Additional information on service detection and pricing, including the specific codes (eg, CPT, Berenson-Eggers Type of Service) used, is available in the eAppendix (Supplement).
We counted the number of times each beneficiary experienced each low-value service and calculated the per-beneficiary spending for each service. From these values, we calculated the percentage of beneficiaries receiving at least 1 low-value service and the aggregate spending for all beneficiaries for each service and in each of 6 service categories: low-value cancer screening; low-value diagnostic and preventive testing; low-value preoperative testing; low-value imaging; low-value cardiovascular testing and procedures; and other low-value surgical procedures. Aggregate spending estimates were multiplied by 20 to approximate spending for the entire Medicare population from 5% samples. We also calculated the proportion of total spending for services covered by Medicare Parts A and B (including coinsurance amounts and payments from other primary payers) devoted to services detected by low-value care measures.
We used hospital referral regions (HRRs) to examine how use of different types of low-value services was related among the same groupings of providers. Although we were not interested in geographic areas per se and although practice patterns vary within and between areas,4 HRRs nevertheless served as a useful unit of comparison to determine whether groups of providers that were more likely to provide one type of low-value service were more likely to provide another. First, we estimated mean per-beneficiary utilization counts in each service category at the HRR level using linear regression models with HRR fixed effects. To control for beneficiaries’ sociodemographic and clinical characteristics, we included as covariates age, age squared, sex, race, indicators of 21 CCW diagnoses present before 2009 (derived from claims dating back to 1999), indicators of having multiple comorbid conditions (2 to ≥7), the Rural-Urban Continuum Code for beneficiaries’ county of residence, and several socioeconomic measures of the elderly population at the zip code tabulation area level (median income, percentage below the federal poverty level, and percentage with a high school diploma). To account for additional dimensions of case mix not captured by the CCW, we included indicators of conditions that qualified patients for potential receipt of several low-value services (eg, a diagnosis of headache in 2009 qualifying beneficiaries for potentially inappropriate head imaging; see the eAppendix in the Supplement for details). For each pair of low-value service categories, we then estimated correlations between regional means in adjusted use weighted by the number of traditional fee-for-service Medicare beneficiaries in each HRR. Correlations were not substantially altered by use of random effects to estimate regional means or by the addition of indicators of qualifying conditions.
Among 1 360 908 beneficiaries in the study sample, 1 094 374 instances of care provision (80 services per 100 beneficiaries) were detected by the more sensitive measures of low-value services, corresponding to 21.9 million instances for the entire traditional Medicare population in 2009. Quiz Ref IDForty-two percent of beneficiaries received at least 1 service detected by the more sensitive measures. Our more specific but less sensitive measures of low-value care detected 454 783 services (33 per 100 beneficiaries), corresponding to 9.1 million services for the entire Medicare population. Twenty-five percent of beneficiaries received at least 1 of these services.
Spending for services detected by our more sensitive measures of low-value care totaled $8.5 billion for the entire Medicare population, or $310 per beneficiary, whereas spending for services detected by our more specific measures totaled $1.9 billion, or $71 per beneficiary. These amounts comprised 2.7% and 0.6%, respectively, of total annual spending in 2009 on services covered by Medicare Parts A and B.
The Figure presents utilization rates and their associated spending, decomposed by category of low-value care measures. Quiz Ref IDImaging, cancer screening, and diagnostic and preventive testing measures detected most of the use, whereas measures of imaging and cardiovascular testing and procedures detected most of the spending (see eTable 2 in the Supplement for these results in tabular form). Table 2 presents utilization rates and associated spending captured by each of the 26 measures of low-value care. Individual measures with major contributions to spending included both high-price, low-use items, such as percutaneous coronary intervention for stable coronary disease, and low-price, high-use items, such as screening for asymptomatic carotid artery disease.
Table 3 presents correlations between adjusted levels of regional service use in different categories of low-value care as detected by our more sensitive measures. Per-beneficiary utilization counts were positively correlated with one another for 5 of the 6 categories. Correlation coefficients ranged from 0.14 to 0.54 across all pairwise combinations of these 5 categories (P ≤ .01), with a mean of 0.33. Noncardiovascular surgical procedures were not positively correlated with use in other categories of measures. The measures exhibited good internal consistency across all categories (Cronbach α = 0.68).
Adjusted regional spending on services detected by more sensitive measures of low-value care ranged from $227 per beneficiary in the 5th percentile to $416 per beneficiary in the 95th percentile of HRRs (median, $304; interquartile range, $272-$343). Thus, low-value spending detected in regions at the 5th percentile of the regional distribution exceeded the difference in detected low-value spending between regions at the 5th and 95th percentiles ($189 per beneficiary).
In this national study of selected low-value services, Medicare beneficiaries commonly received care that was likely to provide minimal or no benefit on average. Even when applying narrower versions of our limited number of measures of overuse, we identified low-value care affecting one-quarter of Medicare beneficiaries. These findings are consistent with the notion that wasteful practices are pervasive in the US health care system.
Quiz Ref IDWithin regions, different types of low-value use generally exhibited significantly positive correlations with one another, ranging from weak to moderate in strength, although 1 category of low-value use (noncardiovascular surgical procedures) was not positively correlated with the others. These findings suggest that many low-value services may be driven by common factors. Therefore, claims-based measures, although limited in number and the amount of wasteful spending they detect, could be useful for monitoring low-value care more broadly, including some care that may be difficult to measure with claims.
Although these findings suggest that direct approaches to measuring wasteful care may be tractable and informative, other findings underscore potential challenges in developing and applying direct measures of overuse. In particular, the amount of low-value care we detected varied substantially with the clinical specificity of our measures. Estimates of the proportion of Medicare beneficiaries receiving at least 1 measured low-value service decreased from 42% to 25% when we used more restrictive definitions that traded off sensitivity for specificity, and the contribution of low-value spending to total spending decreased from 2.7% to 0.6%. For example, our more sensitive measure of low-value imaging for low back pain captured more inappropriate use of imaging studies at the expense of including some appropriate use. Our more specific measure was less likely to include appropriate use but probably excluded many low-value studies, as suggested by the 3-fold reduction in the number of studies captured.
Quiz Ref IDThus, the performance of administrative rules to reduce overuse through coverage policy, cost sharing, or value-based payment (eg, pay for performance) may depend heavily on measure definition. Such strategies may be appropriate for select services whose value is invariably low or whose low-value applications can be identified with high reliability. For other services, however, more sensitive measures could result in unintended restriction of appropriate tests and procedures by coverage and payment policies, whereas more specific measures could substantially limit the effect of these strategies. Provider groups seeking to minimize wasteful spending—for example, in response to global budgets—may be able to distinguish appropriate from inappropriate practices at the point of care without having to use rigid rules derived from incomplete clinical data.
We also found that, although spending on low-value services varied considerably across regions, spending on low-value services was substantial even in regions where it was lowest. For example, low-value spending at the 5th percentile of the regional distribution of low-value spending was greater than the difference in low-value spending between the 5th and 95th percentiles. This finding suggests potential advantages of direct measurement over relative spending comparisons as a basis for detecting overuse because overuse may be substantial even among more efficient providers.
Our study has several limitations. Most notably, we analyzed only 26 measures of low-value services. In selecting these measures, we emphasized the specificity with which overuse could be detected with claims data and created more restrictive versions that limited contributions of potentially valuable service use to low-value spending totals and utilization counts. Despite the limited number of services we examined, their frequency and correlations with one another suggest substantial and widespread wasteful care. Use of a broader set of less specific and more sensitive measures would capture more low-value care. Similarly, broader definitions of wasteful spending that include downstream costs of low-value service use (eg, repeat imaging for incidental findings) would capture more spending than our measures did. For example, one study estimated that testing costs may account for just 2% of the lifetime costs of prostate-specific antigen screening.48
Clinical data from linked medical records might support a more extensive assessment of the properties of claims-based measures. However, we would not expect the incorporation of more detailed data to substantially alter the amount of low-value care captured by many of our measures (eg, cancer screening in patients above certain ages, inappropriately frequent bone mineral density testing, homocysteine testing for cardiovascular disease, renal artery stenting, and vertebroplasty). Furthermore, by varying the definitions of our measures, we were able to demonstrate potential limitations of claims-based measures without having to use medical record data; any inconsistencies between claims and medical records in the amount of low-value care detected would have similar implications for strategies to address wasteful practices. Moreover, we focused on the potential utility of claims-based measures because medical record review as a means to measure and monitor wasteful care is costly and thus not feasible on a large scale. Nevertheless, validation of claims-based measures against a gold standard of clinical appropriateness will be needed to more precisely define their strengths and weaknesses and assess their utility for different purposes, such as monitoring, profiling, payment policy, or coverage design.
Although our analysis suggests that common drivers of low-value care exist, our study did not identify specific determinants of wasteful care. Factors associated with low-value care may also be associated with high-value care.49,50 Coupling measures of overuse with measures of underuse may therefore be important when evaluating programs intended to achieve more cost-effective care.
Finally, unmeasured variation in diagnostic coding practices or case mix may have contributed to positive correlations between regional use of different low-value services in our study. These were not likely sources of significant bias, however, because we found a significant positive correlation between categories of low-value services that did not rely on diagnosis codes to define (ie, age-inappropriate cancer screening and preoperative testing) and because our results were not sensitive to adjustment for additional conditions qualifying beneficiaries for potential receipt of several low-value services.
Many quality measures have been developed to assess underuse but few to assess overuse. Our study findings illustrate the potential utility and limitations of a direct approach to detect wasteful care. Despite their imperfections, claims-based measures of low-value care could be useful for tracking overuse and evaluating programs to reduce it. However, many direct claims-based measures of overuse may be insufficiently accurate to support targeted coverage or payment policies that have a meaningful effect on use without resulting in unintended consequences. Broader payment reforms, such as global or bundled payment models, could allow greater provider discretion in defining and identifying low-value services while incentivizing their elimination.
Accepted for Publication: February 7, 2014.
Corresponding Author: J. Michael McWilliams, MD, PhD, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (firstname.lastname@example.org).
Published Online: May 12, 2014. doi:10.1001/jamainternmed.2014.1541.
Author Contributions: Mr Schwartz and Dr McWilliams had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: All authors.
Acquisition of data: Schwartz, McWilliams.
Analysis and interpretation of data: Elshaug, Chernew, McWilliams.
Drafting of the manuscript: Schwartz, Elshaug, McWilliams.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Schwartz, McWilliams.
Obtained funding: McWilliams.
Administrative, technical, or material support: Schwartz, McWilliams.
Study supervision: Elshaug, Chernew, McWilliams.
Conflict of Interest Disclosures: Dr Chernew reports that he is a partner in VBID Health, LLC, which has a contract with Milliman to develop and market a tool to help insurers and employers quantify spending on low-value services. Dr Elshaug reports that he provides advice to the Australian Government Department of Health on policy responses to low-value health care. Mr Schwartz and Drs Landon and McWilliams report no conflicts.
Funding/Support: This work was supported by grants from the Beeson Career Development Award Program (National Institute on Aging grant K08 AG038354 and the American Federation for Aging Research), the Doris Duke Charitable Foundation (Clinical Scientist Development Award 2010053), the National Institute on Aging (grant P01 AG032952), the Agency for Healthcare Research and Quality (Institutional Training Grant 2T32HS000055-20), Harvard University (Christopher G. P. Walker Fellowship), and the Australian National Health and Medical Research Council (Sidney Sax Public Health Fellowship 627061).
Role of the Sponsor: The funding sources did not play a role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, and approval of the manuscript; or decision to submit the manuscript for publication.
Additional Contributions: We are grateful to Joseph P. Newhouse, PhD, and Frank Levy, PhD, for comments on an earlier draft of the manuscript. Drs Newhouse and Levy were not compensated for their contributions.