Figure. Flow of articles in the review. *Many articles include multiple studies of different interventions; the total number of studies is 241.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Korenstein D, Falk R, Howell EA, Bishop T, Keyhani S. Overuse of Health Care Services in the United States: An Understudied Problem. Arch Intern Med. 2012;172(2):171–178. doi:10.1001/archinternmed.2011.772
Author Affiliations: Division of General Internal Medicine, Department of Medicine (Dr Korenstein), and Departments of Health Evidence & Policy, Obstetrics, Gynecology, and Reproductive Science, and Psychiatry (Dr Howell), Mount Sinai School of Medicine, New York, New York; Department of Emergency Medicine (Dr Falk) and Division of General Internal Medicine, Department of Medicine (Dr Keyhani), University of California, San Francisco; Departments of Public Health and Medicine, Weill Cornell Medical College, New York (Dr Bishop); and Department of Veterans Affairs Health Services Research & Development Service, Research Enhancement Award Program (REAP), San Francisco (Dr Keyhani).
Background Overuse, the provision of health care services for which harms outweigh benefits, represents poor quality and contributes to high costs. A better understanding of overuse in US health care could inform efforts to reduce inappropriate care. We performed an extensive search for studies of overuse of therapeutic procedures, diagnostic tests, and medications in the United States and describe the state of the literature.
Methods We searched MEDLINE (1978-2009) for studies measuring US rates of overuse of procedures, tests, and medications, augmented by author tracking, reference tracking, and expert consultation. Four reviewers screened titles; 2 reviewers screened abstracts and full articles and extracted data including overuse rate, type of service, clinical area, and publication year.
Results We identified 172 articles measuring overuse: 53 concerned therapeutic procedures; 38, diagnostic tests; and 81, medications. Eighteen unique therapeutic procedures and 24 diagnostic services were evaluated, including 10 preventive diagnostic services. The most commonly studied services were antibiotics for upper respiratory tract infections (59 studies), coronary angiography (17 studies), carotid endarterectomy (13 studies), and coronary artery bypass grafting (10 studies). Overuse of carotid endarterectomy and antibiotics for upper respiratory tract infections declined over time.
Conclusions The robust evidence about overuse in the United States is limited to a few services. Reducing inappropriate care in the US health care system likely requires a more substantial investment in overuse research.
There are 3 categories of quality problems in health care: underuse is the lack of provision of necessary care (eg, no aspirin prescribed after myocardial infarction), misuse is the provision of wrong care (eg, incorrect medication dosing), and overuse is the provision of medical services with no benefit or for which harms outweigh benefits.1 Overuse contributes to high costs2; some estimates attribute up to 30% of US health spending to overuse.3 To reduce health care costs and improve patient care, eliminating overuse is a major health care reform goal of the National Priorities Partnership.2
Despite broad acknowledgment that overuse is common and costly, overuse research has been underemphasized compared with research on underuse of health services,4 which may limit our understanding. Defining overuse requires defining appropriate care. The most widely accepted method for defining appropriate care is the RAND Appropriateness Method (RAM),5 which uses an iterative process to integrate the best evidence with the opinions of a multispecialty expert panel, resulting in appropriateness ratings for interventions in a large number of specific clinical situations, including situations in which intervention is necessary (ie, failure to intervene represents underuse) and situations in which intervention is inappropriate (ie, performing the intervention represents overuse). National guidelines can also be used to define appropriate care, though some lack adequate specificity for application to the broad universe of clinical situations.6-8 Perhaps because it is difficult to define, study, and document, overuse has not become a standard component of quality of care assessments.4 Among the 39 measures of health care quality in the 2011 Healthcare Effectiveness Data and Information Set (HEDIS), 4 explicitly relate to overuse.9 Experts have advocated increased incorporation of overuse measures into quality assessments.1,10
An understanding of the prevalence of overuse of health care services across the US health system is needed to improve health care quality and eliminate waste. To help define the scope of health care overuse, document trends in overuse over time, and inform discussions about reducing overuse, we performed an extensive search for published articles using established systematic review methods11 and determined the rate of overuse of therapeutic procedures, diagnostic tests, and medications in the United States.
Studies included in the review were primary research directly documenting rates of overuse of medical and surgical procedures, diagnostic tests, or medications in the United States, published in English, with a sample size of 50 or more persons and an acceptable standard to define overuse. Acceptable standards included (1) recommendations based on literature review and a multidisciplinary iterative panel process (eg, RAM), (2) guidelines from a regional or national organization, or (3) a universally accepted well-referenced standard of care (eg, antibiotics are not indicated for the treatment of viral respiratory tract infections). Studies were excluded if the standard for assessing appropriateness was generated by author consensus or a single discipline panel or if supporting literature was not described. We excluded studies that ascertained overuse indirectly (through patient or health care provider self-report) and those with potentially biased (nonrandom, nonconsecutive, or poorly defined) or nongeneralizable (from one physician's panel) patient populations. We also excluded studies measuring misuse of care (eg, use of an antibiotic with overly broad spectrum) or inefficient care (use of a brand name drug instead of an equivalent generic). We included studies of interventions to decrease overuse meeting inclusion criteria if rates of overuse were presented for the control group. Given the diverse methodologies, there is no appropriate method for quality assessment of all included studies, so we did not perform formal assessments. However, our selection process excluded poor-quality studies as described herein.
We used MEDLINE (PubMed interface) for searches, limiting to human subjects, titles with abstracts, and publication after 1978 (the publication year of the first landmark article on quality measurement).12 We used an iterative process to identify search terms. We began by extracting relevant Medical Subject Heading (MeSH) terms from articles on health care overuse known to the authors. Using those terms, we searched MEDLINE to locate other articles on overuse, from which we extracted additional unique MeSH terms, repeating the process until identifying no new relevant MeSH terms. We streamlined the resulting list of terms by selecting more proximal terms within each MeSH tree. We added important non-MeSH terms including overuse, inappropriate, and unnecessary. We empirically attempted to eliminate search terms to minimize irrelevant results while retaining relevant ones. However, exclusion of terms resulted in loss of target articles, so we retained the broad search and eliminated irrelevant articles through manual review. Search terms included the following:
Medical Subject Headings: guidelines as topic; physician's practice patterns; utilization review; clinical audit; guideline adherence; health services misuse; small area analysis; Delphi technique; diagnostic techniques/procedures and utilization; laboratory techniques/procedures and utilization; prescriptions, drug and utilization; procedures and utilization; surgery and utilization.
Nonmedical Subject Headings: overuse; appropriateness; inappropriate procedure; inappropriate surgery; inappropriate test; inappropriate utilization; inappropriate medication.
We performed author tracking, examining publications of every first and last author of included studies, searched reference lists of included articles to identify additional articles, and consulted 2 experts in the field to supply additional references.
Each title was reviewed by 1 of 4 investigators (D.K., E.A.H., T.B., or S.K) to identify titles for abstract review. All abstracts were reviewed by 1 of 2 investigators (D.K. or S.K.) for possible inclusion, and a randomly selected set of 100 abstracts was reviewed by both investigators for determination of interrater reliability (Cohen κ). The same 2 investigators reviewed all full-text articles (including those identified through author tracking, reference tracking, and expert consultation), determined inclusion or exclusion in the review, and extracted data. We again measured interrater reliability on 40 randomly selected articles for the decision to include in the review.
We collected information regarding study design, population, and results from each included study, including the specific intervention of concern, the standard for determining overuse, study size, prospective vs retrospective approach, insurance status, and geographic scope. We classified each article by type of service (therapeutic procedure, diagnostic test, or medication) and clinical area (cardiac, gastrointestinal, genitourinary, hematologic, infectious diseases, musculoskeletal, neurologic, oncologic, respiratory, vascular, or other) and specifically identified studies addressing trends in overuse over time and preventive measures. We classified study samples as local (within 1 city or county), regional (multiple counties or 1 state), multistate, or national. Rate of overuse in each study was recorded; if rates were not directly presented, they were calculated using all available data from the original publication. If multiple rates of overuse were presented (eg, across specific sites of care) with no calculable overall rate, we recorded the range of overuse rates. Data presented only visually (eg, a bar graph without numeric labels) were extracted independently by 3 investigators (D.K., R.F., and S.K), with results averaged to determine the rate of overuse. If more than 1 appropriateness standard was applied, we defined the overuse rate as the rate observed using the highest quality standard, considering RAM as highest, followed by a national guideline. Services that function as either diagnostic tests or therapeutic procedures (eg, endoscopy) were classified according to their primary function in the original article. For descriptive clarity, we grouped specific medications by class (eg, proton pump inhibitors) or therapeutic function (eg, antibiotics).
Interrater reliability for the abstract selection process (Cohen κ, 0.93) and the decision to include the article in the review (Cohen κ, 0.85) were excellent. We report rates and/or ranges of rates of overuse of procedures, diagnostic tests, and medications, divided by year of publication. Results across time periods can provide a more detailed overview of the literature but cannot be directly compared.
Our search yielded 114 831 articles, of which 112 467 were excluded in title review and 172 were included in the final sample of articles (Figure). Fifty-three articles (30.8%) related to therapeutic procedures; 38 (22.1%), to diagnostic tests; and 81 (47.1%), to medications. Eighteen unique therapeutic procedures, 24 diagnostic tests, and 13 medications were evaluated. The eTable gives the characteristics of all included studies.
The 172 articles included 241 distinct studies of overuse; many articles contained evaluations of multiple services. Nearly half of the studies (n = 99) involved infectious diseases, with the majority evaluating the use of antibiotics in upper respiratory tract infections (URIs). Sixty-six were cardiac studies; 17, gastrointestinal studies; 8, genitourinary studies; 5, hematologic studies; 9, musculoskeletal studies; 3, neurologic studies; 14, oncologic studies; 22, respiratory studies; and 15, vascular studies; 3 studies were on other disease (eTable). Sixty-three studies (26.1%) evaluated a national patient sample; 42 (17.4%), a multistate sample; 55 (22.8%), a regional sample; and 78 (32.4%), a local sample. Seventy-two (29.9%) studies used a standard to define appropriate care developed through a panel consensus process (eg, RAM), and 104 (43.2%) used guidelines. Among guideline standards, 62 (25.7%) were sponsored by a specialty society; 38 (15.8%), by government; and 15 (6.2%), by not-for-profit and other organizations. The remaining studies, all relating to the use of antibiotic or antiviral medication, used an accepted well-referenced standard of care to define overuse. Insurance status was specified in 140 (58.1%) studies, among which 31 (12.9%) reported rates of overuse in Medicare populations, 13 (5.4%) in the Department of Veterans Affairs (VA), 9 (3.7%) in Medicaid populations, 17 (7.1%) in patients in private plans and 69 (28.6%) in mixed payer populations. Fifteen articles addressed overuse trends over time; 11 focused on antibiotics for URI, 2 on carotid endarterectomy (CEA), 1 on antibiotics for bronchiolitis, and 1 on computed tomographic scanning for epilepsy. Study characteristics for each clinical area are given in Table 1 and the eTable.
Antibiotics for URI was the most commonly investigated clinical service (59 articles), followed by noninterventional coronary angiography (CA) (16 articles), CEA (13 articles), and coronary artery bypass grafting (CABG) (10 articles). Seven additional studies addressed interventional CA. Table 2 gives rates of overuse before and after 2000 for all therapeutic procedures, diagnostic tests, and medications for which there were 4 or more studies.
Eighteen procedures were investigated in 68 studies. Three procedures (CA, CEA, and CABG) were the focus of 68.7% of all studies of therapeutic procedures. Overuse rates for CABG were generally lower than 15%, and overuse rates for CA were generally lower than 20%; both were fairly consistent over time. Rates of inappropriate CEA ranged from 1.0% to 33.0% prior to 2000 and were lower than 11% in all studies published after 2000 (Table 2).
Twenty-six diagnostic services were assessed. The most commonly assessed diagnostic tests included upper endoscopy (7 studies), colonoscopy (4 studies), plain film radiography in URI (6 studies), diagnostic imaging in low back pain (5 studies), and prostate-specific antigen testing (4 studies). No other diagnostic service was evaluated in 4 or more separate studies. Rates of overuse varied widely (Table 2 and eTable).
Eight articles comprising 10 evaluations of preventive diagnostic services were included in the review13-21 (Table 3). Overuse rates for preventive services ranged from 7.6% to 60.8%. Six studies addressed cancer screening (Table 3). Overuse of colon cancer screening was 8.0% in a VA study of fecal occult blood testing17 and 60.8% in a study of repeat screening colonoscopy in primary care.19 Overuse rates for prostate cancer screening with PSA ranged from 16.1% to 36.1% in 3 studies,13,18,21 and a study addressing Papanicolaou smears for cervical cancer screening15 found that 58.0% of ineligible women were screened. We identified no studies evaluating the overuse of screening mammography; 1 study of mammography in women with a preexisting diagnosis of breast cancer22 found an overuse rate of 29.9%.
Antibiotics were the most commonly investigated medication; 59 of the 81 articles on medications focused on antibiotics for URI (Table 3), with overuse rates between 2.0% and 89.0%. Of 11 studies addressing temporal patterns, 9 found declines in antibiotic overuse over time23-32; 2 found no change.33,34 Bronchodilators (6 studies) and acid blockers (3 studies) were the next most commonly studied therapeutics. Rates of overuse varied from 12.0% to 81.0% for bronchodilators33,35-39 and from 22.1% to 54.9%40-42 for acid blockers (including proton pump inhibitors and histamine blockers). One study of chemotherapy for colon cancer43 and 1 study of antiviral medications44 demonstrated infrequent overuse (<10%), while a study of 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase inhibitors45 found an overuse rate higher than 47%.
We found 172 articles encompassing 241 studies of overuse, evaluating multiple medications, procedures, and diagnostic tests. The majority of studies focused on 4 interventions: antibiotics for URI and 3 cardiovascular procedures (CE, CA, and CABG). Rates of overuse varied among studies and among services studied; overuse rates for CA and CABG were low.
Some overuse has declined over time including rates of inappropriate CEA.46 Inappropriate antibiotic use for viral URI has generally persisted despite universally accepted guidelines and many physician-directed, patient- and/or parent-directed, and public health interventions47,48 but appears to have declined over time.23-32 Reductions in inappropriate use of health services over time may be related to publication of guidelines for their appropriate use, as in the case of CEA,46 specific interventions to reduce overuse,47,48 or national educational campaigns as for antibiotics.49 The possibility that high-quality guidelines and targeted interventions can reduce overuse is encouraging, but the persistence of overuse of some services demonstrates that reducing inappropriate care can be challenging.
Despite interest in reducing inappropriate care in the United States, the overuse literature includes relatively few procedures and diagnostic tests, and particularly few newer costly ones.50 The limited overuse literature is understandable given the challenges of developing standards to measure overuse. High-quality national guidelines, optimally informed by an iterative process such as RAM, are necessary to facilitate both appropriate care and overuse studies. Such guidelines exist for some high-intensity services like CEA and CA. However, the process of defining appropriateness for many services remains incomplete owing to both gaps in the evidence51 and failure to translate evidence into appropriateness criteria. As a result, much care cannot be defined as either appropriate or inappropriate, precluding sophisticated study of overuse. Many overuse studies in our sample represent convenient measurements in simple situations in which any utilization is by definition overuse, for example antibiotics for colds or prostate cancer screening in very elderly men.
Recognizing the importance of appropriateness criteria to guide clinical care and research, the American College of Cardiology and the American Heart Association have invested in developing appropriateness criteria for cardiac tests and procedures, even in the absence of optimal evidence.50 Similar investments in other clinical areas are needed. These efforts would be aided and improved by parallel efforts to improve the quality of the underlying evidence and might serve ultimately to reduce overuse of health care services.
The small number of studies of preventive diagnostic services is notable. Underuse of some screening tests is common, and efforts to increase screening rates are widespread. However, it is important from a quality and cost standpoint to address both underuse and overuse of screening. We found only 8 publications addressing preventive diagnostic services, incorporating 10 evaluations. Overuse rates varied widely, but overuse of preventive services appears prevalent. Since preventive services are broadly applied, their overuse could lead both to substantial harms to patients (even if harm were rare)34 and to substantial costs. Furthermore, developing appropriateness standards for preventive services might be easier than for other services given their broad applicability, with differences in appropriate care based on relatively few prognostic factors rather than a large number of specific clinical situations. More investigation into the appropriateness of preventive services would allow for alignment of diverse guidelines and the incorporation of overuse indicators into standard quality measures.
Our study has several important limitations. First, the lack of MeSH terminology for overuse made our search challenging, and we may have missed relevant articles. However, our exhaustive reference and author tracking and expert consultation minimized the risk of our missing major publications. The large number of abstracts and articles for review precluded 2 reviewers examining each, which may have lead to errors or lack of reproducibility. However our high interrater reliability for a randomly selected subset of abstracts and full-text articles suggests methodological consistency. We excluded articles without a generally accepted standard, including those in which a few authors reviewed the literature and developed guidelines for practice. Since members of formal expert panels often disagree when defining inappropriate care,5 definitions of appropriate care by a few like-minded individuals are probably biased. While our approach excluded some articles without standards52 and articles measuring overuse indirectly (eg, through self-report), we excluded some well-known and well-done studies53 and may have underestimated the breadth and depth of the overuse literature. However, we provide an assessment of rates of clearly inappropriate care. In addition, we included multiple studies using the same patient sample to explore overuse in different populations (eg, differences by sex or race), so our sample may overrepresent the data on some clinical services. Finally, the scope and complexity of our article selection process precluded including very recent literature, so the review does not include articles published after 2009. We performed an updated targeted search and identified 12 publications from the past 2 years that met our inclusion criteria. We found studies addressing antibiotics for URI,54-57 antibiotics in neonates,58 IVC filters,59 preoperative testing,60 colon cancer screening,61 CA,62 and echocardiography63; rates of overuse were consistent with our review. One study of positron emission tomography with sestamibi found frequent overuse,64 and a single study of scheduled preterm births found low rates of recent inappropriate use.65 While this search was far less complete than our primary review, its results suggest that inclusion of the most recent literature in our review would not substantially change our findings.
In conclusion, our extensive review of the literature on the overuse of health care services in the United States suggests that inappropriate use of investigated services is often a problem and that rates of overuse vary widely. While rates of inappropriate use of a few specific services such as antibiotics for URI, CABG, CEA, and CA have been well described, and there is evidence that overuse of some services has declined over time, there are gaps in our understanding of the appropriateness of use of many other health services. Expanding the evidence base and establishing appropriateness criteria for a broader range of services could help target and eliminate overuse in health care services, which could reduce health care spending without adversely affecting the health of the public.
Correspondence: Deborah Korenstein, MD, Mount Sinai School of Medicine, One Gustave L. Levy Pl, PO Box 1087, New York, NY 10029 (firstname.lastname@example.org).
Accepted for Publication: October 11, 2011.
Author Contributions:Study concept and design: Korenstein, Falk, Bishop, and Keyhani. Acquisition of data: Korenstein, Falk, Howell, Bishop, and Keyhani. Analysis and interpretation of data: Korenstein, Falk, and Keyhani. Drafting of the manuscript: Korenstein and Keyhani. Critical revision of the manuscript for important intellectual content: Falk, Howell, Bishop, and Keyhani. Statistical analysis: Falk, Bishop, and Keyhani. Obtained funding: Falk and Keyhani. Administrative, technical, and material support: Korenstein, Falk, Howell, and Keyhani. Study supervision: Korenstein and Keyhani.
Financial Disclosure: None reported.
Funding/Support: Drs Korenstein and Keyhani received support from the Commonwealth Fund for work on this project. Dr Keyhani is also supported by a VA HSR&D Career Development Award.
Create a personal account or sign in to: