Figure 1. Inclusion and exclusion of eligible studies of surgical appropriateness criteria (AC). RUAM indicates RAND-UCLA Appropriateness Method.
Figure 2. Summary of studies on rates of overuse of the following surgical procedures in US populations: carotid endarterectomy,41,42,47,48,53- 56 coronary artery bypass grafting,25,45,46,57- 59 upper gastrointestinal tract endoscopy,41- 43,53,55 and hysterectomy.12,30,44 Asterisk indicates that year of publication is listed if the year the procedure was performed was not reported. n refers to number of patients studied.
Lawson EH, Gibbons MM, Ingraham AM, Shekelle PG, Ko CY. Appropriateness Criteria to Assess Variations in Surgical Procedure Use in the United States. Arch Surg. 2011;146(12):1433–1440. doi:10.1001/archsurg.2011.581
Author Affiliations: Department of Surgery, David Geffen School of Medicine, University of California, Los Angeles (Drs Lawson, Gibbons, and Ko); Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, Illinois (Drs Lawson, Ingraham, and Ko); Department of Surgery, Olive View–UCLA (University of California, Los Angeles) Medical Center, Sylmar, California (Dr Gibbons); Department of Surgery, University of Cincinnati College of Medicine, Cincinnati, Ohio (Dr Ingraham); RAND Health, Santa Monica, California (Dr Shekelle); and VA Greater Los Angeles Healthcare System, Los Angeles, California (Drs Shekelle and Ko).
Objectives To systematically describe appropriateness criteria (AC) developed in the United States for surgical procedures and to summarize how these criteria have been applied to identify overuse and underuse of procedures in US populations.
Data Sources MEDLINE literature search performed in February 2010 and May 2011.
Study Selection Studies were included if they addressed the appropriateness of a surgical procedure using the RAND-UCLA Appropriateness Method. Non-US studies were excluded.
Data Extraction Information was abstracted on study design, surgical procedure, and reported rates of appropriate use, overuse, and underuse. Identified AC were cross-referenced with lists of common procedures from the Nationwide Inpatient Sample and the State Ambulatory Surgery databases.
Data Synthesis A total of 1601 titles were identified; 39 met the inclusion criteria. Of these, 17 developed AC and 27 applied AC to US populations. Appropriateness criteria have been developed for 16 surgical procedures. Underuse has only been studied for coronary artery bypass graft surgery, and rates range from 24% to 57%. Overuse has been more broadly studied, with rates ranging from 9% to 53% for carotid endarterectomy, 0% to 14% for coronary artery bypass graft, 11% to 24% for upper gastrointestinal tract endoscopy, and 16% to 70% for hysterectomy. Appropriateness criteria exist for 10 of the 25 most common inpatient procedures and 6 of the 15 top ambulatory procedures in the United States. Most studies are more than 5 years old.
Conclusions Most existing AC are outdated, and AC have never been developed for most common surgical procedures. A broad and coordinated effort to develop and maintain AC would be required to implement this tool to address variation in the use of surgical procedures.
The US health care system is increasingly focused on promoting patient-centered care, improving clinical outcomes, and reducing variations in the provision of care. For surgery, the goal of these quality improvement efforts is to perform the right procedure for the right patient in the right way.
Considerable evidence shows widespread variations in rates of surgical procedures in the United States. Well-documented racial disparities and geographic variations in the use of surgical procedures are not fully explained by disease incidence or patient preferences.1- 4 These findings are more likely the result of overuse and underuse of surgical procedures.
Recent commentaries have proposed using the RAND-UCLA Appropriateness Method (RUAM) to improve the quality of surgical care by addressing variations in the use of surgical procedures.5,6 The RUAM synthesizes the best available evidence with expert clinical judgment to produce appropriateness criteria (AC), which weigh the relative risks and benefits of a procedure for specific clinical scenarios. These AC can be applied to patient populations to assess for overuse and underuse.
The purposes of this study are to systematically describe AC developed in the United States for surgical procedures and to summarize how these criteria have been applied to identify overuse and underuse. Our goals were to elucidate what is known and not known about the appropriate use of common inpatient and ambulatory procedures in the United States and to determine whether there is a need for future development and application of AC.
The RUAM was developed to systematically assess variation in the use of surgical procedures by defining which patients should and should not undergo surgical intervention vs medical therapy. An appropriate indication for a procedure is one for which “the expected health benefit (eg, increased life expectancy, relief of pain, reduction in anxiety, improved functional capacity) exceeds the expected negative consequences (eg, mortality, morbidity, anxiety, pain, time lost from work) by a sufficiently wide margin that the procedure is worth doing, exclusive of cost.”7(p55) This method starts with an extensive review of the literature on the risks and benefits of the procedure. A comprehensive and mutually exclusive set of clinical scenarios or indications for the procedure is then compiled, complete with specific definitions for any potentially ambiguous terms (eg, failed medical therapy would be explicitly defined). Because of the need to be all inclusive, the list typically includes many hundreds of specific clinical circumstances.
An expert panel then rates each indication in 2 rounds, with the second round occurring after an in-person discussion of the first-round results. Indications are classified as “appropriate” (the expected benefits of the procedure outweigh the expected harms), “equivocal” (the expected benefits and harms are roughly equal or there is disagreement amongst the panelists), or “inappropriate” (the expected harms outweigh the expected benefits). Appropriate indications are sometimes further classified as “necessary” by the panel, usually in a third round. An indication is considered necessary if it would be improper care to not offer the procedure to the patient, there is a reasonable chance the procedure will benefit the patient, and the magnitude of the benefit is not small.8Table 1 lists examples of indications with each of these classifications.
Overuse and underuse of a surgical procedure are determined by applying these theoretical indications to actual patients. Underuse is defined as any patient with a necessary indication who does not receive the procedure. The study sample is derived from a pool of patients who may or may not undergo the procedure. For example, patients who are diagnosed with coronary artery disease after undergoing coronary angiography then proceed with either coronary revascularization or medical management. Overuse is defined as any patient who undergoes a procedure for an inappropriate indication. The study sample for overuse (and for appropriate use) is thus derived from patients who underwent the procedure.
A substantial amount of research has been performed regarding the reliability and validity of the RUAM. Studies of test-retest reliability of the same panelists resulted in a correlation coefficient greater than 0.9.11 In addition, independent panels with the same composition of panelist specialties generate results that are about as reproducible as some common diagnostic tests (κ = approximately 0.5-0.7).12,13 The results of the RUAM are sensitive to panel composition, with physicians who perform the procedure being more enthusiastic about its appropriateness than nonperformers.14,15 The sensitivity and specificity of the RUAM to identify overuse and underuse for hysterectomy and coronoary revascularization have been estimated to be between 68% and 99% and 94% and 97%, respectively.16 Content and construct validity have been demonstrated,11 and there is evidence supporting the predictive validity of the AC for coronary revascularization.17- 19 For example, a prospective study19 on the appropriate use of coronary revascularization found that underuse was significantly associated with the adverse clinical outcomes of nonfatal myocardial infarction and mortality. The study further demonstrated a graded relationship between rating and outcome that persisted over the entire scale of appropriateness.
We searched MEDLINE in February 2010 and May 2011 for articles related to the appropriateness of surgical procedures using the following keywords in the title or abstract: transplantation OR surgery OR surgical procedures, operative, AND appropriateness. To be included in the study, articles had to be about an original research study addressing the appropriateness of a surgical procedure using the RUAM. Two physician reviewers (E.H.L. and M.M.G.) reviewed each study. Disagreements were resolved by consensus.
Study results were abstracted into data tables. Data abstracted included dates of study and publication, surgical procedure, details of the study design. and reported rates of appropriate use, overuse, and underuse.
Studies that developed AC are reported if they included comprehensive and mutually exclusive varying clinical scenarios rated by a panel in at least 2 rounds, with at least 1 round occurring after an in-person discussion. We excluded guidelines and consensus statements that were not developed using the RUAM. We also excluded studies that used a non-US panel. Studies assessing for appropriate use, overuse, and/or underuse of a surgical procedure were included if they applied AC developed by a US panel to a random sample or stratified sample of the US population.
To identify what is known and not known about the appropriateness of surgical care in the United States, we identified the most commonly performed inpatient and ambulatory procedures and cross-referenced these lists with the results of our literature search. The Nationwide Inpatient Sample,20,21 a Healthcare Cost and Utilization Project data set, was used to identify the most frequently performed inpatient surgical procedures in 2008 and the associated mean costs for these procedures. The 2008 Nationwide Inpatient Sample is the latest year for which summarized data were available and is a 20% stratified sample of US community hospitals. This data set contains all discharge data on 8 million hospitalizations from 1056 hospitals located in 42 states. International Classification of Diseases, Ninth Revision (ICD-9) codes22 for procedures were grouped into clinically meaningful categories using the Clinical Classification Software (http://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp).
The State Ambulatory Surgery Databases,21,23 a set of Healthcare Cost and Utilization Project data sets from 28 participating states, were used to identify the most frequently performed ambulatory surgical procedures in 2007 and the associated mean charges for these procedures. The 2007 State Ambulatory Surgery Databases are the latest year for which summarized data are available. These databases capture discharge data for surgical procedures performed on the same day in which patients are admitted and released. Some of the databases contain the ambulatory surgery encounter abstracts for that state (including records from both hospital-affiliated and freestanding surgery centers). Procedures were grouped by the Clinical Classification Software, as was done with the inpatient procedures.
Lists of common inpatient and ambulatory surgical procedures were created using these sources. Procedures that are not commonly performed by surgeons were excluded (eg, bronchoscopy and percutaneous coronary angioplasty), as were procedures related to pregnancy or childbirth (eg, cesarean section and circumcision), relatively low-risk procedures (eg, incision and drainage of skin lesions and wound debridement), and bedside procedures (eg, thoracentesis and abdominal paracentesis). Some categories defined by the Clinical Classification Software were combined to form new categories (low back surgery combines spinal fusion and laminectomy, excision intervertebral disk; transurethral prostatectomy is combined with excision, drainage, or removal of urinary obstruction). Other overly broad categories were excluded (eg, other operating room upper gastrointestinal [GI] therapeutic procedures). Because bariatric procedures are part of this broad category, a different source was used to determine the frequency and cost of these procedures.24
Our search identified 1601 articles, of which 395 were screened and of which 39 were included in this review (Figure 1). Articles were excluded if they did not address the appropriateness of a surgical procedure using AC rated by a US panel using the RUAM.
Of the included articles, 12 developed AC and 22 applied AC to assess for appropriate use, overuse, and/or underuse. In addition, 5 studies both developed and applied AC. Some articles addressed more than 1 surgical procedure. Articles that evaluate the reliability and validity of the RUAM will be summarized elsewhere.
In the United States, AC have been developed for 16 surgical procedures: coronary artery bypass grafting (CABG),10,12,25,26 bariatric procedures,9 abdominal aortic aneurysm surgery,27,28 carotid endarterectomy,26,28,29 hysterectomy,12,30 nephrectomy for metastatic renal cell cancer,31 cholecystectomy,26 upper GI tract endoscopy,26,32 colonoscopy,26,33 cataract procedures,28,34 tympanostomy tube placement,35 sinus procedures,36 low back surgery for sciatica,37 sentinel lymph node biopsy for melanoma,38 and tonsillectomy or adenoidectomy,35 and carpal tunnel surgery.39 For some procedures, AC have been developed multiple times, with CABG being the most frequently studied procedure (4 studies) followed by carotid endarterectomy (3 studies). Most studies were published more than 5 years ago, which is a proposed threshold for updating practice guidelines as new evidence accumulates.40
We identified 27 unique studies that applied AC developed by a US panel to a sample of the US population. Coronary artery bypass grafting and carotid endarterectomy were again the most frequently studied (10 and 8 studies, respectively). All identified studies reported on procedures that were performed before the year 2000.
Appropriate use has been assessed for 8 procedures: CABG, carotid endarterectomy, hysterectomy, upper GI tract endoscopy, cataract procedures, tympanostomy tube placement, sinus procedures, and low back surgery for sciatica (Table 2 and Table 3). In contrast, underuse has only been reported in US populations for CABG. Studies analyzing underuse of CABG in varied populations of patients who underwent coronary angiography in the early 1990s report rates ranging from 24% to 25%.15,18,51 In 3 studies, this rate was as high as 42%,52 and in other studies, as high as 57%.46 Overuse has been studied for 9 procedures, with rates ranging from 9% to 53% for carotid endarterectomy, 0% to 14% for CABG, 11% to 24% for upper GI tract endoscopy, and 16% to 70% for hysterectomy (Figure 2 and Tables 2 and 3). Procedures for which overuse has only been reported once include cataract procedures (2%),50 tympanostomy tube placement (23%),35 sinus procedures (16%),36 and low back surgery for sciatica (31%).37
Tables 2 and 3 demonstrate the many gaps in what is known regarding the appropriateness of major inpatient and ambulatory procedures, respectively. Appropriateness criteria exist for 10 of the 25 most commonly performed inpatient procedures and have been applied to assess for overuse and/or underuse for 6. Notably, AC have not been developed for half of the 10 most commonly performed inpatient procedures and have only been applied for 3 of these procedures. Some common inpatient procedures for which information regarding appropriateness is lacking include knee arthroplasty, partial and total hip replacement, appendectomy, colorectal resection, and treatment of hip and femur fractures and dislocations.
Less is known regarding the appropriate use of common ambulatory procedures. Of the top 15 ambulatory procedures performed in the United States, AC exist for only 6 of them and have been applied for only 3. Some common ambulatory procedures for which information regarding appropriateness is lacking include excision of semilunar cartilage of the knee, inguinal and femoral hernia repair, excision of cervix and uterus, and breast procedures, including biopsy, lumpectomy, and quadrantectomy.
Appropriateness criteria developed in the United States exist for 16 surgical procedures, and application of these criteria to varied US populations has revealed nontrivial rates of overuse and underuse. However, most AC are more than 5 years old, and furthermore, AC have never been developed for most common inpatient and ambulatory procedures.
Quality improvement efforts for the field of surgery, such as the Surgical Care Improvement Project (SCIP) and the National Surgical Quality Improvement Program (NSQIP), focus exclusively on improving processes and outcomes of care, respectively. Much less effort is spent on improving patient selection before surgery. A national effort to develop AC with clinical buy-in and decision support could improve surgical care by addressing disparities in the receipt of surgical procedures and reducing overuse and underuse. More appropriate patient selection could also potentially improve postoperative outcomes because studies15,18,19 on CABG show that patients who are treated in accordance with their appropriateness category have better clinical outcomes compared with patients who are treated inappropriately or who do not receive necessary interventions.
Developing AC and reporting rates of overuse and underuse may in themselves stimulate improved patient selection for surgery but are unlikely to be sufficient. Some groups have developed computer algorithms based on panel appropriateness ratings, which make AC more feasible for implementation in the clinical realm. With use of health information technology rapidly evolving, such algorithms could be integrated into electronic medical records, thus streamlining assessment of appropriateness of surgery for a patient. Surgeons and referring physicians could use AC as a decision support aid, with the algorithm serving as an electronic second opinion. A study by Junghans et al60 found that patient-specific appropriateness ratings were more effective than guidelines in changing the way physicians treated angina, which supports our hypothesis that clinical use of AC could reduce practice variations and improve the overall appropriateness of surgical care.
Use of AC could improve the current informed consent process by providing an independent assessment of the balance of risks and benefits of a procedure for the patient's specific clinical scenario. Indications classified as equivocal could trigger a multidisciplinary approach to assist patients in deciding whether to undergo surgery or proceed with medical management. Routinely providing this information and service to patients is consistent with the current emphasis on patient-centered care and would not aim to replace a physician's clinical judgment but rather to augment or guide decisions.
Administrators or health agencies seeking to reduce practice variation could use AC as a quality metric. They could be implemented as either a process measure (ie, were AC used in the decision-making process or was the patient provided information on appropriateness before surgical intervention) or as an outcome measure (ie, rates of overuse and underuse) for this purpose.
It is unclear what effect more appropriate care would have on cost to the health care system because the balance of overuse and underuse of procedures is unknown. The Health Services Utilization Study demonstrated that differences in overuse did not explain geographic variations in the use of 3 procedures (coronary angiography, carotid endarterectomy, and upper GI tract endoscopy). The researchers assigned appropriateness ratings to patients in areas of high, average, and low use of these procedures and found that the differences in levels of appropriate and inappropriate use among sites were small.53 Reducing variation in the use of surgical procedures is thus not isolated to reducing overuse but also must focus on identifying and reducing underuse.
Comparative effectiveness research is proposed as a means of reducing practice variation; however, the results of these studies will likely not be implemented for many years. In contrast, broad development and implementation of AC for common surgical procedures could occur during a relatively short period. The use of AC is an intermediate step and could set the agenda for future comparative effectiveness research by identifying gray areas where not enough is known regarding the balance of risks and benefits of a procedure (ie, equivocal indications).
Our study has possible limitations. First, studies on the development and application of AC may have been overlooked. To reduce this possibility, we started with broad search terms and had 2 physicians perform the screening and reference mining. Second, the implications of our findings may be limited by the possibility that the RUAM may have differing reliability and validity for different procedures. Most of the methodologic studies on the RUAM focus on a relatively small number of procedures (eg, CABG, carotid endarterectomy, and hysterectomy). Further evaluation of the method is warranted and could be performed concurrently with further development, application, and implementation of AC for a broad range of procedures.
Our initial search produced additional articles that developed AC using non-US panels, which we did not report. Studies comparing the results of AC developed in different countries have reported small but not insignificant differences in appropriateness ratings. A study comparing the appropriateness of CABG use in the United States and Canada using AC developed independently in each country found that 6% of US cases and 4% of Canadian cases were rated as inappropriate by Canadian criteria compared with 2% and 3%, respectively, using US criteria.61 Studies that focused on low back surgery and colonoscopy had similar findings.33,37 Although the differences are small, we believe that using only AC developed by US panels enhances the reliability and validity of the RUAM for detecting overuse and underuse.
In conclusion, AC are a feasible, realistic, and practical tool for addressing the persistent problem of variation in the use of surgical procedures. Apart from CABG, bariatric surgery, and carpal tunnel surgery, most AC are outdated. A coordinated effort to produce and maintain up-to-date AC for surgical procedures that are frequently performed, have elevated risk of morbidity and mortality, are controversial, and/or that use significant resources could potentially greatly improve the quality of surgical care if these criteria are integrated into everyday surgical practice. Some professional societies, such as the American College of Cardiology and the American College of Radiology, have already undertaken such programs. Surgical societies could benefit from such an initiative as well.
Correspondence: Elise H. Lawson, MD, MSHS, Department of Surgery, UCLA Medical Center, CHS 72-215, 10833 LeConte Ave, Los Angeles, CA 90095 (email@example.com).
Accepted for Publication: March 9, 2011.
Author Contributions:Study concept and design: Lawson, Gibbons, Ingraham, Shekelle, and Ko. Acquisition of data: Lawson, Gibbons, and Ingraham. Analysis and interpretation of data: Lawson, Gibbons, Ingraham, Shekelle, and Ko. Drafting of the manuscript: Lawson and Gibbons. Critical revision of the manuscript for important intellectual content: Lawson, Gibbons, Shekelle, and Ko. Administrative, technical, or material support: Lawson and Gibbons. Study supervision: Gibbons, Shekelle, and Ko.
Financial Disclosure: None reported.
Funding/Support: Dr Lawson's time was supported by the Robert Wood Johnson Foundation Clinical Scholars Program through the American College of Surgeons.