[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
[Skip to Content Landing]
Table 1.  
General Surgery Procedures Included in the Analysis by Order of Priority and Prevalence in the 230 769 Patients
General Surgery Procedures Included in the Analysis by Order of Priority and Prevalence in the 230 769 Patients
Table 2.  
Baseline Patient, Surgeon, and Hospital Characteristics of Overall Study Population and Study Samplea
Baseline Patient, Surgeon, and Hospital Characteristics of Overall Study Population and Study Samplea
Table 3.  
Sample Characteristics by Residency Program Tertile for All Adverse Eventsa,b
Sample Characteristics by Residency Program Tertile for All Adverse Eventsa,b
Table 4.  
Mean Adjusted Adverse Event Rates by Residency Program Tertilea
Mean Adjusted Adverse Event Rates by Residency Program Tertilea
Table 5.  
Spearman Rank Correlations of Risk-Standardized Adverse Events Among Residency Programs
Spearman Rank Correlations of Risk-Standardized Adverse Events Among Residency Programs
1.
Institute of Medicine.  Graduate medical education that meets the nation’s health needs.http://iom.nationalacademies.org/~/media/Files/Report%20Files/2014/GME/GME-RB.pdf. Published July 2014. Accessed March 10, 2015.
2.
Institute of Medicine.  Graduate medical education that meets the nation’s health needs: recommendations, goals, and next steps.http://iom.nationalacademies.org/~/media/Files/Report%20Files/2014/GME/GME-REC.pdf. Published July 2014. Accessed March 10, 2015.
3.
Peterson  LE, Carek  P, Holmboe  ES, Puffer  JC, Warm  EJ, Phillips  RL.  Medical specialty boards can help measure graduate medical education outcomes.  Acad Med. 2014;89(6):840-842.PubMedGoogle ScholarCrossref
4.
Dorfsman  ML, Wolfson  AB.  Direct observation of residents in the emergency department: a structured educational program.  Acad Emerg Med. 2009;16(4):343-351.PubMedGoogle ScholarCrossref
5.
Werner  RM, McNutt  R.  A new strategy to improve quality: rewarding actions rather than measures.  JAMA. 2009;301(13):1375-1377. PubMedGoogle ScholarCrossref
6.
Werner  RM, Bradlow  ET.  Relationship between Medicare’s Hospital Compare performance measures and mortality rates.  JAMA. 2006;296(22):2694-2702.PubMedGoogle ScholarCrossref
7.
Asch  DA, Nicholson  S, Srinivas  S, Herrin  J, Epstein  AJ.  Evaluating obstetrical residency programs using patient outcomes.  JAMA. 2009;302(12):1277-1283.PubMedGoogle ScholarCrossref
8.
Asch  DA, Epstein  A, Nicholson  S.  Evaluating medical training programs by the quality of care delivered by their alumni.  JAMA. 2007;298(9):1049-1051.PubMedGoogle ScholarCrossref
9.
Agency for Healthcare Research and Quality.  Welcome to HCUPNet.http://hcupnet.ahrq.gov/. Accessed March 10, 2015.
10.
Neuman  MD, Rosenbaum  PR, Ludwig  JM, Zubizarreta  JR, Silber  JH.  Anesthesia technique, mortality, and length of stay after hip fracture surgery.  JAMA. 2014;311(24):2508-2517.PubMedGoogle ScholarCrossref
11.
Miller  DC, Ye  Z, Gust  C, Birkmeyer  JD.  Anticipating the effects of accountable care organizations for inpatient surgery.  JAMA Surg. 2013;148(6):549-554.PubMedGoogle ScholarCrossref
12.
Osborne  NH, Nicholas  LH, Ryan  AM, Thumma  JR, Dimick  JB.  Association of hospital participation in a quality reporting program with surgical outcomes and expenditures for Medicare beneficiaries.  JAMA. 2015;313(5):496-504. PubMedGoogle ScholarCrossref
13.
Agency for Health Care Administration.  Florida Health Finder: order data.http://www.floridahealthfinder.gov/researchers/researchers.aspx. Accessed March 10, 2015.
14.
Bureau of Health Informatics, New York State Department of Health.  Statewide Planning and Research Cooperative System (SPARCS).http://www.health.ny.gov/statistics/sparcs/#datainfo. Revised February 2015. Accessed March 10, 2015.
15.
Centers for Disease Control and Prevention.  International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).http://www.cdc.gov/nchs/icd/icd9cm.htm. Updated June 18, 2013. Accessed March 10, 2015.
16.
Decker  MR, Dodgion  CM, Kwok  AC,  et al.  Specialization and the current practices of general surgeons.  J Am Coll Surg. 2014;218(1):8-15. PubMedGoogle ScholarCrossref
17.
American Medical Association.  AMA Physician Masterfile.http://www.ama-assn.org/ama/pub/about-ama/physician-data-resources/physician-masterfile.page. Accessed March 10, 2015.
18.
Centers for Medicare & Medicaid Services.  Hospital Compare datasets.https://data.medicare.gov/data/hospital-compare. 2015. Accessed March 10, 2015.
19.
Romano  PS, Campa  DR, Rainwater  JA.  Elective cervical discectomy in California: postoperative in-hospital complications and their risk factors.  Spine (Phila Pa 1976). 1997;22(22):2677-2692.Google ScholarCrossref
20.
Tuinen  MV, Elder  S, Link  C, Li  S, Song  JH, Pritchett  T. Surveillance of surgery-related adverse events in Missouri using ICD-9-CM codes. In: Henriksen  K, Battles  JB, Marks  ES, Lewin  DI, eds.  Advances in Patient Safety: From Research to Implementation (Volume 1: Research Findings). Rockville, MD: Agency for Healthcare Research and Quality; 2005. http://www.ncbi.nlm.nih.gov/books/NBK20458/. Accessed March 10, 2015.
21.
Silber  JH, Rosenbaum  PR, Koziol  LF, Sutaria  N, Marsh  RR, Even-Shoshan  O.  Conditional length of stay.  Health Serv Res. 1999;34(1, pt 2):349-363.PubMedGoogle Scholar
22.
Silber  JH, Rosenbaum  PR, Even-Shoshan  O,  et al.  Length of stay, conditional length of stay, and prolonged stay in pediatric asthma.  Health Serv Res. 2003;38(3):867-886.PubMedGoogle ScholarCrossref
23.
Silber  JH, Romano  PS, Rosen  AK, Wang  Y, Even-Shoshan  O, Volpp  KG.  Failure-to-rescue: comparing definitions to measure quality of care.  Med Care. 2007;45(10):918-925.PubMedGoogle ScholarCrossref
24.
Sheetz  KH, Krell  RW, Englesbe  MJ, Birkmeyer  JD, Campbell  DA  Jr, Ghaferi  AA.  The importance of the first complication: understanding failure to rescue after emergent surgery in the elderly.  J Am Coll Surg. 2014;219(3):365-370.PubMedGoogle ScholarCrossref
25.
Elixhauser  A, Steiner  C, Harris  DR, Coffey  RM.  Comorbidity measures for use with administrative data.  Med Care. 1998;36(1):8-27.PubMedGoogle ScholarCrossref
26.
Agency for Healthcare Research and Quality.  HCUP comorbidity software.http://www.hcup-us.ahrq.gov/toolssoftware/comorbidity/comorbidity.jsp. Accessed March 10, 2015.
27.
Agency for Healthcare Research and Quality.  National (nationwide) inpatient sample.http://www.hcup-us.ahrq.gov/nisoverview.jsp. Modified March 6, 2015. Accessed March 10, 2015.
28.
Sidak  Z.  Rectangular confidence regions for the means of multivariate normal distributions.  J Am Stat Assoc. 1967;62(318):626-633.Google Scholar
29.
Hsieh  HM, Tsai  SL, Shin  SJ, Mau  LW, Chiu  HC.  Cost-effectiveness of diabetes pay-for-performance incentive designs.  Med Care. 2015;53(2):106-115.PubMedGoogle ScholarCrossref
30.
Rosenthal  MB, Li  Z, Robertson  AD, Milstein  A.  Impact of financial incentives for prenatal care on birth outcomes and spending.  Health Serv Res. 2009;44(5, pt 1):1465-1479.PubMedGoogle ScholarCrossref
31.
Russ-Sellers  R.  The cost-effectiveness of pay-for-performance: a multidimensional approach to analysis.  Med Care. 2015;53(2):104-105.PubMedGoogle ScholarCrossref
32.
Young  GJ, Meterko  M, Beckman  H,  et al.  Effects of paying physicians based on their relative performance for quality.  J Gen Intern Med. 2007;22(6):872-876. PubMedGoogle ScholarCrossref
33.
Doximity.  Doximity residency navigator: our residency research methodology. https://s3.amazonaws.com/s3.doximity.com/mediakit/Doximity_Residency_Navigator_Survey_Methodology.pdf. Updated August 2015. Accessed March 10, 2015.
34.
Hartz  AJ, Kuhn  EM, Pulido  J.  Prestige of training programs and experience of bypass surgeons as factors in adjusted patient mortality rates.  Med Care. 1999;37(1):93-103.PubMedGoogle ScholarCrossref
35.
Mellinger  JD, Damewood  R, Morris  JB.  Assessing the quality of graduate surgical training programs: perception vs reality.  J Am Coll Surg. 2015;220(5):785-789. PubMedGoogle ScholarCrossref
Original Investigation
February 2016

Using Patient Outcomes to Evaluate General Surgery Residency Program Performance

Author Affiliations
  • 1Perelman School of Medicine at the University of Pennsylvania, Philadelphia
  • 2Department of Surgery, Center for Surgery and Health Economics, University of Pennsylvania Health System, Philadelphia
  • 3Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia
  • 4Department of Veterans Affairs’ Center for Health Equity Research and Promotion, Philadelphia Veterans Affairs Medical Center, Philadelphia, Pennsylvania
JAMA Surg. 2016;151(2):111-119. doi:10.1001/jamasurg.2015.3637
Abstract

Importance  To evaluate and financially reward general surgery residency programs based on performance, performance must first be defined and measureable.

Objective  To assess general surgery residency program performance using the objective clinical outcomes of patients operated on by program graduates.

Design, Setting, and Participants  A retrospective cohort study was conducted of discharge records from 349 New York and Florida hospitals between January 1, 2008, and December 31, 2011. The records comprised 230 769 patients undergoing 1 of 24 general surgical procedures performed by 454 surgeons from 73 general surgery residency programs. Analysis was conducted from June 4, 2014, to June 16, 2015.

Main Outcomes and Measures  In-hospital death; development of 1 or more postoperative complications before discharge; prolonged length of stay, defined as length of stay greater than the 75th percentile when compared with patients undergoing the same procedure type at the same hospital; and failure to rescue, defined as in-hospital death after the development of 1 or more postoperative complications.

Results  Patients operated on by surgeons trained in residency programs that were ranked in the top tertile were significantly less likely to experience an adverse event than were patients operated on by surgeons trained in residency programs that were ranked in the bottom tertile. Adjusted adverse event rates for patients operated on by surgeons trained in programs that were ranked in the top tertile and those who were operated on by surgeons trained in programs that were ranked in the bottom tertile were, respectively, 0.483% vs 0.476% for death, 9.68% vs 10.79% for complications, 16.76% vs 17.60% for prolonged length of stay, and 2.68% vs 2.98% for failure to rescue (all P < .001). The differences remained significant in procedure-specific subset analyses. The rankings were significantly correlated among some but not all outcome measures. The magnitude of the effect of the residency program on the outcomes achieved by the graduates decreased with increasing years of practice. Within the analyses of surgeons within 20, 10, and 5 years of practice, the relative difference in adjusted adverse event rates across the individual models between the top and bottom tertiles ranged from 1.5% to 12.3% (20 years), 9.1% to 33.8% (10 years), and 8.0% to 44.4% (5 years).

Conclusions and Relevance  Objective data were successfully used to rank the clinical outcomes achieved by graduates of general surgery residency programs. Program rankings differed by the outcome measured. The magnitude of differences across programs was small. Careful consideration must be used when identifying potential targets for payment-for-performance initiatives in graduate medical education.

Introduction

The 2014 Institute of Medicine report calls for restructuring of Medicare funding for Graduate Medical Education (GME) to incorporate payment-for-performance methods.1,2 The Institute of Medicine argues that US taxpayers should no longer unconditionally fund physician training but rather fund training that is best able to meet the nation’s health care needs. This call for payment for performance in GME raises the question, “How do we define and measure residency program performance?” To our knowledge, there is no consensus regarding how to evaluate GME. Programs use fellowship match rates, board pass rates, or subjective evaluations of observed encounters as proxy measures of training quality.3,4 However, these measures do not directly capture program performance in the core objective of GME—to train a future generation of physicians to deliver high-quality patient care. Furthermore, experience in measuring hospital performance has shown that process measures are not necessarily correlated with outcomes measures.5,6 The same may be true in measuring GME performance. Funding GME based on performance demands the creation of a system that reliably evaluates residency programs using objective clinical outcomes.

Prior work demonstrated that obstetrics and gynecology residency programs could be ranked by the complication rates of their graduates’ patients.7,8 However, this approach has not yet been applied to fields where there are less clearly defined indications for intervention and more variability in the types of procedures performed. This study expands this work into general surgery, a primary care specialty with a more diverse range of procedures and outcomes. General surgery was selected because there are approximately 2.65 million inpatient admissions for general surgical procedures annually in the United States,9 general surgical procedure outcomes have been widely examined using discharge claims,1012 and general surgery training is the foundation for many other surgical specialties. Four outcomes were used to examine the care provided by general surgery residency program graduates and to compare performance across programs.

Methods
Patients, Surgeons, Hospitals, and Residency Programs

Patients undergoing 1 of 24 general surgical operations in New York and Florida hospitals between January 1, 2008, and December 31, 2011,13,14 were identified for study inclusion using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) procedure codes.15 Operations were chosen to capture the breadth of inpatient procedures performed by general surgeons (Table 1).16 New York and Florida were selected for the study because of the ability to link patient claims to information on surgeons and hospitals. Physician identifiers were used to obtain current and historical data from the American Medical Association Physician Masterfile.17 Data on hospital-level quality measures were obtained from the 2014 Hospital Compare database.18 To avoid misclassifying a complex operation as a separately listed component procedure, patients undergoing multiple qualifying procedures were classified by the most comprehensive procedure coded in the discharge claim for each admission as determined by 3 of us (N.B., J.B.M., and R.R.K.) (Table 1). For example, a patient who underwent both a pancreatectomy and a cholecystectomy during the same admission was classified under pancreatectomy.

A total of 952 183 admissions included a qualifying general surgical operation. Patients were excluded if the the physician identifier in the state data set could not be linked to a record in the American Medical Association Physician Masterfile (n = 153), the physician did not identify general surgery as his or her primary or secondary specialty (n = 273 426), the recorded residency was at an institution without a general surgery residency program (n = 39 745), the physician was trained outside the United States (n = 195 741), the physician did not have an MD degree (n = 8593), or the residency completion date was after the date of the qualifying operation (n = 1078). To minimize the effects of practice habits developed after training, observations were excluded if the physician was more than 20 years out of residency at the time of the qualifying discharge (n = 132 775). Finally, patients of surgeons whose residency program could not be identified (n = 63) or whose surgeons trained at residency programs for which fewer than 5 alumni could be identified (n = 69 840) were excluded from the analysis. The final sample included 230 769 patients operated on by 454 general surgeons from 73 general surgery residency programs. The residency programs were located in 24 states, the District of Columbia, and Puerto Rico and represented 28.7% of the 254 currently accredited general surgery residency programs in the United States. The analysis was repeated excluding physicians more than 10 years out of residency and more than 5 years out of residency to examine the program effect at time points closer to the training period. For the analysis of surgeons within 10 years of training, there were 78 575 patients operated on by 319 general surgeons from 36 general surgery residency programs. For the analysis of surgeons within 5 years of training, there were 26 576 patients operated on by 121 general surgeons from 16 general surgery residency programs. Analysis was conducted from June 4, 2014, to June 16, 2015. The study was exempted from review by the University of Pennsylvania Institutional Review Board.

Adverse Events

The adverse events examined were death, development of 1 or more complications, prolonged length of stay (PLOS), and failure to rescue (FTR). Death was defined as death during the same hospital stay. Complications were identified by ICD-9-CM diagnosis codes (eTable 1 in the Supplement)19,20 for individual complications and collapsed into a binary variable representing the occurrence of any postoperative complication. To distinguish between complications and comorbidities, diagnosis codes were not considered if they were designated as present on admission. Prolonged length of stay was defined within each hospital as a binary variable indicating procedure-specific length of stay greater than the 75th percentile. Prolonged length of stay is a well-described measure used to reflect inefficiencies in care and to capture complications that prolong care.21,22 Failure to rescue was coded as a binary variable indicating in-hospital death following any complication.23,24 In defining FTR, death was included as a complication with the assumption that patients who died without a documented complication experienced an undocumented complication. Failure to rescue was defined only for the 11 701 patients (5.1% of cohort) who were admitted electively and died or developed complications following surgery performed on hospital day 0 to reflect the context in which FTR was initially developed.

Statistical Analysis

Owing to the nested nature of the data, with multiple patients associated with each surgeon and multiple surgeons associated with each residency program, hierarchical generalized linear models (HGLM) with a logit link function were used to assess the independent association between residency program and adverse events (eAppendix in the Supplement). A separate model was estimated for each of the 4 adverse events. Candidate covariates were chosen based on a review of the literature and clinical judgment and were selected for inclusion in each model using Pearson χ2 tests with a threshold of P < .10. Patient characteristics included age, sex, race, principal payer (Medicare, Medicaid, private insurance, self-pay, and other), Elixhauser index,2527 operation type, admission via the emergency department, surgery on the day of admission, operation year, and state. Surgeon characteristics included age, sex, decade of training completion, operative volume in tertiles, and identification of a subspecialty in addition to general surgery (defined as a binary variable). Surgeon subspecialty was included in the analysis to adjust for the effects of advanced training beyond residency. Given the study time frame, many surgeons entered residency before the duty hour requirement was reformed and before the accelerated rate of fellowship enrollment. Therefore, we used surgeon subspecialty as a proxy for fellowship training or focused practice. Hospital characteristics included were bed size, ownership, and setting. Hospital surgical quality was examined using data from the Hospital Value-Based Purchasing Program16 to account for the assumptions that better hospitals attract surgeons trained at better residency programs and that the variance in hospital quality in the form of better preoperative or postoperative care may account for the observed variance in clinical outcomes. Hospital surgical quality was defined as the mean performance score in the surgery-specific clinical process of care measures (eTable 2 in the Supplement). For each model, discrimination was assessed using the C statistic, and the proportion of variation explained was measured using Efron’s pseudo R2.

Using the analytical framework implemented in obstetrics7,8 and further described in the eAppendix in the Supplement, a risk-standardized adverse event rate (RSAER) for each residency program was calculated for each of the 4 adverse events. The RSAER reflects the program-specific HGLM-predicted adverse event rate divided by the HGLM-predicted adverse event rate of the average residency program. The residency programs were then ranked and grouped into tertiles based on their RSAERs for each adverse event. The 4 sets of program rankings were compared on a pairwise basis with the Spearman rank correlation using the Sidak correction for multiple comparisons.28

Using the results of fitting each HGLM, the adjusted adverse event rate (AAER) for each residency program was estimated as the HGLM prediction for the average patient treated by the average surgeon if the average surgeon had attended that specific residency program (eAppendix in the Supplement). Unlike the RSAER, the AAER differs between programs only in the inclusion of the predicted program effects; the characteristics of each program’s graduates and those graduates’ patients do not affect the AAER. The mean AAER was calculated for each tertile. The difference between the top and bottom tertiles was calculated to reflect the absolute risk reduction associated with operations performed by a surgeon from a program ranked in the top tertile compared with operations performed by a surgeon from a program ranked in the bottom tertile. The relative risk reduction was also calculated. To control for differences in case selection by alumni, AAERs were calculated in subset analyses of specific procedures linked to specific indications: emergency appendectomy for appendicitis and elective pancreatectomy for neoplasm. These subset analyses were limited to procedures performed on the day of admission to reduce the heterogeneity of the patient cohorts. In addition, a cross-validation analysis was performed in which half the patients were used to compute RSAERs and rank programs and the other half were used to compute AAERS (eTable 3 in the Supplement).

All analyses were performed using Stata/MP, version 13.1, statistical software (StataCorp) and SAS, version 9.4, software (SAS Institute Inc).

Results

Descriptive statistics are shown in Table 2. Characteristics were clinically similar across included and excluded cohorts. In the study population, the observed rates of adverse events were 1.8% for death, 15.0% for complications, 20.9% for PLOS, and 6.8% for FTR. Complete models for each adverse event are shown in eTable 4 in the Supplement. The model C statistics ranged from 0.74 (FTR) to 0.90 (death). The proportion of variation explained by the models ranged from 8.9% (FTR) to 22.2% (complications). Observed adverse event rates, RSAERs, and selected patient and surgeon characteristics by residency program tertile are shown in Table 3. Adjusted adverse event rates for each program tertile are shown in Table 4. Adjusted adverse event rates for programs ranked in the top tertile were significantly lower than those for programs ranked in the bottom tertile for all procedures as well as for subset populations.

Among the cohort of surgeons within 10 years of graduation from residency, the program effect was notably larger as evidenced by the larger absolute differences between the top and bottom tertiles across all outcomes and models. The relative difference in AAERs between the top and bottom tertiles ranged from 9.1% in the complication model to 33.8% in the FTR model. The RSAERs and AAERs were similar in magnitude to those computed from the full 20-year cohort (eTable 5 and eTable 6 in the Supplement). Among the cohort of surgeons within 5 years of graduation from residency, the program effect was even larger, with the relative difference between the top and bottom tertiles ranging from 8.0% in the PLOS model to 44.4% in the mortality model (eTable 7 in the Supplement).

The tertile rankings of the individual programs were consistent between death and FTR and between complications and PLOS. When comparing death and FTR, 52.1% of the 73 programs remained within the same tertile, 38.4% moved by 1 tertile, and 9.6% moved by 2 tertiles. Similarly, when judged by complications compared with PLOS, 50.7% of the programs remained within the same tertile, 38.4% moved by 1 tertile, and 11.0% moved by 2 tertiles. Rankings were not consistent between FTR and complications or PLOS. Table 5 shows the pairwise Spearman rank correlations comparing rankings for the individual adverse events.

Discussion

The call to restructure GME funding aligns with a broader movement across the health care industry toward models of payment for performance.2932 However, to our knowledge, a national standard for measuring GME performance does not exist. Attempts have been made to rank residency programs based on perception by experts in the field,33 but public perception of program prestige is not a reliable indicator of quality of clinical training.34,35 Given that the ultimate goal of general surgery residency is to prepare surgeons to achieve optimal patient outcomes after graduation, an intuitive measure of performance would be the clinical outcomes of patients of program graduates. Information on program performance in achieving this mission is important to the health care system, residency programs, surgical trainees, and patients.

This study demonstrates that general surgery residency programs can be ranked by the outcomes achieved by their graduates but that the selected measures affect the rank ordering of the programs. Patients whose procedures were performed by surgeons trained in the top and bottom tertiles of general surgery residency programs experienced different rates of adverse events. The differences across the program tertiles were relatively small among the cohort of surgeons with up to 20 years of practice. However, differences tended to be greater among surgeons with less than 10 years of experience and most pronounced among the cohort of surgeons with less than 5 years of experience. This finding suggests that the effects of training on outcomes are greatest at the onset of independent practice.

This article serves as a proof of concept that patient outcomes can be used to rank general surgery residency programs. Similar ranking systems have been attempted previously only in obstetrics and gynecology,7 where programs were ranked by graduates’ rates of patient complications during delivery. That study examined 2 procedures (vaginal and cesarean delivery) with a single indication and discrete outcomes (laceration, hemorrhage, and infection). Our study shows that such a method can be applied to a primary care specialty—general surgery—with a much broader range of procedures. Program rankings were consistent across cohorts of surgeons within 5, 10, or 20 years of practice, suggesting that the analytic strategy can produce stable estimates of the programs’ performances and that the effect of the programs on their graduates’ outcomes is strongest in the early years of independent practice. However, the study was unable to define a single metric for use in program assessment owing to the lack of consistency across all the outcome measures examined.

There are several limitations to this study. First, successful surgical outcomes are determined not just by technical excellence but also by good clinical judgment in determining candidacy for surgery. Selecting the right surgical procedure for the right patient at the right time is a clinical skill taught in residency. By comparing outcomes for the average patient operated on by the average surgeon at each residency program, the clinical judgment required to selectively operate on patients most likely to benefit from surgical rather than medical treatment options is penalized rather than rewarded. Once researchers define a method to assess appropriateness of the surgical intervention, it will be important to include it in the model. Despite this limitation, this study demonstrates that “better” residency programs can be defined based on what matters most—how graduates’ patients fare clinically after surgery.

Second, this study did not include the baseline caliber of the entering trainees. It is possible that the more highly ranked residency programs select more talented trainees with a greater aptitude for excellence in surgery, and the program itself had a minimal effect. In this case, the ranking system would remain an important metric for patients and hospitals when selecting surgeons but would lose its utility in guiding improvements in the training process.

Third, we were not able to directly measure fellowship status. Self-reported specialization was used as a proxy but may reflect a spectrum of additional training and/or narrowing of practice patterns without a formal fellowship. This finding should not significantly affect the results of the study, as many surgeons with additional fellowship training continue to perform procedures outside their area of specialization, and skills learned during residency form the foundation for any additional training or experience gained during fellowship. In addition, the first cohort of surgeons trained in the modern era only began to enter practice in 2008. Given the study time frame, many procedures were performed by surgeons who completed most or all of their residency training before the implementation of the new duty hour standards and the accelerated rate of fellowship enrollment. Thus, the effect of fellowship training is likely to be less important in this study than it will be in the future.

Fourth, the study is limited to information contained in administrative data across 2 states. Therefore, the results are subject to the same limitations common to all studies performed using inpatient claims data. Moreover, we were only able to examine program rankings for 28.7% of general surgery residency programs, and the desire for trainees to practice in certain areas of the country may have influenced the results.

Finally, the program graduates were grouped together during a 20-year period and does not account for potential changes in a given program over time. While subset analyses suggest that a focused analysis of surgeons who graduated more recently would give similar rankings, these analyses were limited by the low numbers of programs included. Future studies designed to control for some of these limitations will help to develop a system that appropriately incentivizes general surgery residency programs to train surgeons to achieve optimal patient outcomes that meet population needs.

The study has several strengths. Results include outcomes across a broad array of surgical procedures performed by general surgeons following the residency period. The study considers the role of advanced training by adjusting for surgeon specialty and examines 4 medical and surgical outcomes that can be influenced by the quality of the care provided to the patients. Results are also adjusted for the major patient, surgeon, and hospital characteristics known to influence outcomes.

Conclusions

This study demonstrates the feasibility of ranking general surgery residency programs using the outcomes of patients treated by the programs’ graduates. The ranking system was able to successfully classify programs based on outcomes achieved by surgeons with variable amounts of clinical experience beyond the training period. However, as the rankings differed by the individual measures tested, careful consideration will need to be put into the choice of metrics used in any residency program assessment system.

Back to top
Article Information

Accepted for Publication: July 4, 2015.

Corresponding Author: Rachel R. Kelz, MD, MSCE, Department of Surgery, Center for Surgery and Health Economics, University of Pennsylvania Health System, 3400 Spruce St, 4 Silverstein Bldg, Philadelphia, PA 19104 (rachel.kelz@uphs.upenn.edu).

Published Online: October 28, 2015. doi:10.1001/jamasurg.2015.3637.

Author Contributions: Dr Simmons had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Drs Bansal and Simmons contributed equally to this work.

Study concept and design: Bansal, Simmons, Epstein, Kelz.

Acquisition, analysis, or interpretation of data: All Authors.

Drafting of the manuscript: Bansal, Simmons, Kelz.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Simmons.

Administrative, technical, or material support: Epstein, Morris, Kelz.

Study supervision: Kelz.

Conflict of Interest Disclosures: None reported.

References
1.
Institute of Medicine.  Graduate medical education that meets the nation’s health needs.http://iom.nationalacademies.org/~/media/Files/Report%20Files/2014/GME/GME-RB.pdf. Published July 2014. Accessed March 10, 2015.
2.
Institute of Medicine.  Graduate medical education that meets the nation’s health needs: recommendations, goals, and next steps.http://iom.nationalacademies.org/~/media/Files/Report%20Files/2014/GME/GME-REC.pdf. Published July 2014. Accessed March 10, 2015.
3.
Peterson  LE, Carek  P, Holmboe  ES, Puffer  JC, Warm  EJ, Phillips  RL.  Medical specialty boards can help measure graduate medical education outcomes.  Acad Med. 2014;89(6):840-842.PubMedGoogle ScholarCrossref
4.
Dorfsman  ML, Wolfson  AB.  Direct observation of residents in the emergency department: a structured educational program.  Acad Emerg Med. 2009;16(4):343-351.PubMedGoogle ScholarCrossref
5.
Werner  RM, McNutt  R.  A new strategy to improve quality: rewarding actions rather than measures.  JAMA. 2009;301(13):1375-1377. PubMedGoogle ScholarCrossref
6.
Werner  RM, Bradlow  ET.  Relationship between Medicare’s Hospital Compare performance measures and mortality rates.  JAMA. 2006;296(22):2694-2702.PubMedGoogle ScholarCrossref
7.
Asch  DA, Nicholson  S, Srinivas  S, Herrin  J, Epstein  AJ.  Evaluating obstetrical residency programs using patient outcomes.  JAMA. 2009;302(12):1277-1283.PubMedGoogle ScholarCrossref
8.
Asch  DA, Epstein  A, Nicholson  S.  Evaluating medical training programs by the quality of care delivered by their alumni.  JAMA. 2007;298(9):1049-1051.PubMedGoogle ScholarCrossref
9.
Agency for Healthcare Research and Quality.  Welcome to HCUPNet.http://hcupnet.ahrq.gov/. Accessed March 10, 2015.
10.
Neuman  MD, Rosenbaum  PR, Ludwig  JM, Zubizarreta  JR, Silber  JH.  Anesthesia technique, mortality, and length of stay after hip fracture surgery.  JAMA. 2014;311(24):2508-2517.PubMedGoogle ScholarCrossref
11.
Miller  DC, Ye  Z, Gust  C, Birkmeyer  JD.  Anticipating the effects of accountable care organizations for inpatient surgery.  JAMA Surg. 2013;148(6):549-554.PubMedGoogle ScholarCrossref
12.
Osborne  NH, Nicholas  LH, Ryan  AM, Thumma  JR, Dimick  JB.  Association of hospital participation in a quality reporting program with surgical outcomes and expenditures for Medicare beneficiaries.  JAMA. 2015;313(5):496-504. PubMedGoogle ScholarCrossref
13.
Agency for Health Care Administration.  Florida Health Finder: order data.http://www.floridahealthfinder.gov/researchers/researchers.aspx. Accessed March 10, 2015.
14.
Bureau of Health Informatics, New York State Department of Health.  Statewide Planning and Research Cooperative System (SPARCS).http://www.health.ny.gov/statistics/sparcs/#datainfo. Revised February 2015. Accessed March 10, 2015.
15.
Centers for Disease Control and Prevention.  International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).http://www.cdc.gov/nchs/icd/icd9cm.htm. Updated June 18, 2013. Accessed March 10, 2015.
16.
Decker  MR, Dodgion  CM, Kwok  AC,  et al.  Specialization and the current practices of general surgeons.  J Am Coll Surg. 2014;218(1):8-15. PubMedGoogle ScholarCrossref
17.
American Medical Association.  AMA Physician Masterfile.http://www.ama-assn.org/ama/pub/about-ama/physician-data-resources/physician-masterfile.page. Accessed March 10, 2015.
18.
Centers for Medicare & Medicaid Services.  Hospital Compare datasets.https://data.medicare.gov/data/hospital-compare. 2015. Accessed March 10, 2015.
19.
Romano  PS, Campa  DR, Rainwater  JA.  Elective cervical discectomy in California: postoperative in-hospital complications and their risk factors.  Spine (Phila Pa 1976). 1997;22(22):2677-2692.Google ScholarCrossref
20.
Tuinen  MV, Elder  S, Link  C, Li  S, Song  JH, Pritchett  T. Surveillance of surgery-related adverse events in Missouri using ICD-9-CM codes. In: Henriksen  K, Battles  JB, Marks  ES, Lewin  DI, eds.  Advances in Patient Safety: From Research to Implementation (Volume 1: Research Findings). Rockville, MD: Agency for Healthcare Research and Quality; 2005. http://www.ncbi.nlm.nih.gov/books/NBK20458/. Accessed March 10, 2015.
21.
Silber  JH, Rosenbaum  PR, Koziol  LF, Sutaria  N, Marsh  RR, Even-Shoshan  O.  Conditional length of stay.  Health Serv Res. 1999;34(1, pt 2):349-363.PubMedGoogle Scholar
22.
Silber  JH, Rosenbaum  PR, Even-Shoshan  O,  et al.  Length of stay, conditional length of stay, and prolonged stay in pediatric asthma.  Health Serv Res. 2003;38(3):867-886.PubMedGoogle ScholarCrossref
23.
Silber  JH, Romano  PS, Rosen  AK, Wang  Y, Even-Shoshan  O, Volpp  KG.  Failure-to-rescue: comparing definitions to measure quality of care.  Med Care. 2007;45(10):918-925.PubMedGoogle ScholarCrossref
24.
Sheetz  KH, Krell  RW, Englesbe  MJ, Birkmeyer  JD, Campbell  DA  Jr, Ghaferi  AA.  The importance of the first complication: understanding failure to rescue after emergent surgery in the elderly.  J Am Coll Surg. 2014;219(3):365-370.PubMedGoogle ScholarCrossref
25.
Elixhauser  A, Steiner  C, Harris  DR, Coffey  RM.  Comorbidity measures for use with administrative data.  Med Care. 1998;36(1):8-27.PubMedGoogle ScholarCrossref
26.
Agency for Healthcare Research and Quality.  HCUP comorbidity software.http://www.hcup-us.ahrq.gov/toolssoftware/comorbidity/comorbidity.jsp. Accessed March 10, 2015.
27.
Agency for Healthcare Research and Quality.  National (nationwide) inpatient sample.http://www.hcup-us.ahrq.gov/nisoverview.jsp. Modified March 6, 2015. Accessed March 10, 2015.
28.
Sidak  Z.  Rectangular confidence regions for the means of multivariate normal distributions.  J Am Stat Assoc. 1967;62(318):626-633.Google Scholar
29.
Hsieh  HM, Tsai  SL, Shin  SJ, Mau  LW, Chiu  HC.  Cost-effectiveness of diabetes pay-for-performance incentive designs.  Med Care. 2015;53(2):106-115.PubMedGoogle ScholarCrossref
30.
Rosenthal  MB, Li  Z, Robertson  AD, Milstein  A.  Impact of financial incentives for prenatal care on birth outcomes and spending.  Health Serv Res. 2009;44(5, pt 1):1465-1479.PubMedGoogle ScholarCrossref
31.
Russ-Sellers  R.  The cost-effectiveness of pay-for-performance: a multidimensional approach to analysis.  Med Care. 2015;53(2):104-105.PubMedGoogle ScholarCrossref
32.
Young  GJ, Meterko  M, Beckman  H,  et al.  Effects of paying physicians based on their relative performance for quality.  J Gen Intern Med. 2007;22(6):872-876. PubMedGoogle ScholarCrossref
33.
Doximity.  Doximity residency navigator: our residency research methodology. https://s3.amazonaws.com/s3.doximity.com/mediakit/Doximity_Residency_Navigator_Survey_Methodology.pdf. Updated August 2015. Accessed March 10, 2015.
34.
Hartz  AJ, Kuhn  EM, Pulido  J.  Prestige of training programs and experience of bypass surgeons as factors in adjusted patient mortality rates.  Med Care. 1999;37(1):93-103.PubMedGoogle ScholarCrossref
35.
Mellinger  JD, Damewood  R, Morris  JB.  Assessing the quality of graduate surgical training programs: perception vs reality.  J Am Coll Surg. 2015;220(5):785-789. PubMedGoogle ScholarCrossref
×