[Skip to Navigation]
Sign In
Figure 1.  Histograms of Distributions of Resident Annual Evaluation, American Board of Surgery In-Training Examination (ABSITE) Percentile, and ABSITE Percentage Correct Scores
Histograms of Distributions of Resident Annual Evaluation, American Board of Surgery In-Training Examination (ABSITE) Percentile, and ABSITE Percentage Correct Scores

Histograms of distribution of resident annual evaluation scores (A), ABSITE percentile (B), and ABSITE percentage correct (C).

Figure 2.  Regression Models of American Board of Surgery In-Training Examination (ABSITE) Percentile and Evaluation Scores by Postgraduate Year (PGY) and of ABSITE Percentage Correct and Evaluation Scores by PGY
Regression Models of American Board of Surgery In-Training Examination (ABSITE) Percentile and Evaluation Scores by Postgraduate Year (PGY) and of ABSITE Percentage Correct and Evaluation Scores by PGY

Regression models of ABSITE percentile and annual (A) and medical knowledge (B) evaluation scores by PGY, and regression models of ABSITE percentage correct and annual (C) and medical knowledge (D) evaluation scores by PGY.

Figure 3.  Comparison of Mean Evaluation Scores Between Residents Who Passed and Failed the American Board of Surgery In-Training Examination (ABSITE)
Comparison of Mean Evaluation Scores Between Residents Who Passed and Failed the American Board of Surgery In-Training Examination (ABSITE)

Diamond indicates mean; horizontal line in center of box, median; top and bottom borders of box, upper and lower quartiles, respectively; error bars, maximum and minimum values; and notch, median 95% confidence interval.

Figure 4.  Comparison of Mean Evaluation Scores Between Residents Who Scored in the Top 30% of All American Board of Surgery In-Training Examination (ABSITE) Scorers and Those Who Did Not
Comparison of Mean Evaluation Scores Between Residents Who Scored in the Top 30% of All American Board of Surgery In-Training Examination (ABSITE) Scorers and Those Who Did Not

Diamond indicates mean; horizontal line in center of box, median; top and bottom borders of box, upper and lower quartiles, respectively; error bars, maximum and minimum values; and notch, median 95% confidence interval.

1.
American Board of Surgery. American Board of Surgery In-Training Examination (ABSITE). http://www.absurgery.org/default.jsp?certabsite. Accessed January 20, 2015.
2.
de Virgilio  C, Yaghoubian  A, Kaji  A,  et al.  Predicting performance on the American Board of Surgery qualifying and certifying examinations: a multi-institutional study.  Arch Surg. 2010;145(9):852-856.PubMedGoogle ScholarCrossref
3.
Jones  AT, Biester  TW, Buyske  J, Lewis  FR, Malangoni  MA.  Using the American Board of Surgery In-Training Examination to predict board certification: a cautionary study.  J Surg Educ. 2014;71(6):e144-e148.PubMedGoogle ScholarCrossref
4.
Abdu  RA.  Survey analysis of the American Board of Surgery In-Training Examination.  Arch Surg. 1996;131(4):412-416.PubMedGoogle ScholarCrossref
5.
Accreditation Council for Graduate Medical Education; American Board of Surgery. The General Surgery Milestone Project. https://www.acgme.org/acgmeweb/Portals/0/PDFs/Milestones/SurgeryMilestones.pdf. Accessed December 21, 2014.
6.
Goldstein  SD, Lindeman  B, Colbert-Getz  J,  et al.  Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores.  Am J Surg. 2014;207(2):231-235.PubMedGoogle ScholarCrossref
7.
Reid  CM, Kim  DY, Mandel  J, Smith  A, Bansal  V.  Correlating surgical clerkship evaluations with performance on the National Board of Medical Examiners examination.  J Surg Res. 2014;190(1):29-35.PubMedGoogle ScholarCrossref
8.
Elfenbein  DM, Sippel  RS, McDonald  R, Watson  T, Scarborough  JE, Migaly  J.  Faculty evaluations of resident medical knowledge: can they be used to predict American Board of Surgery In-Training Examination performance?  Am J Surg. 2015;209(6):1095-1101.PubMedGoogle ScholarCrossref
9.
Tabuenca  A, Welling  R, Sachdeva  AK,  et al.  Multi-institutional validation of a web-based core competency assessment system.  J Surg Educ. 2007;64(6):390-394.PubMedGoogle ScholarCrossref
10.
Minter  RM, Dunnington  GL, Sudan  R, Terhune  KP, Dent  DL, Lentz  AK.  Can this resident be saved? identification and early intervention for struggling residents.  J Am Coll Surg. 2014;219(5):1088-1095.PubMedGoogle ScholarCrossref
11.
Swing  SR.  The ACGME outcome project: retrospective and prospective.  Med Teach. 2007;29(7):648-654.PubMedGoogle ScholarCrossref
12.
Schell  RM, Dilorenzo  AN, Li  HF, Fragneto  RY, Bowe  EA, Hessel  EA  II.  Anesthesiology resident personality type correlates with faculty assessment of resident performance.  J Clin Anesth. 2012;24(7):566-572.PubMedGoogle ScholarCrossref
13.
Lacorte  MA, Risucci  DA.  Personality, clinical performance and knowledge in paediatric residents.  Med Educ. 1993;27(2):165-169.PubMedGoogle ScholarCrossref
Original Investigation
Association of VA Surgeons
January 2016

Association Between American Board of Surgery In-Training Examination Scores and Resident Performance

Author Affiliations
  • 1DeWitt Daughtry Family Department of Surgery, University of Miami Miller School of Medicine, Miami, Florida
  • 2Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, Florida
  • 3Department of Surgery, Bruce W. Carter Department of Veterans Affairs Medical Center, Miami, Florida
JAMA Surg. 2016;151(1):26-31. doi:10.1001/jamasurg.2015.3088
Abstract

Importance  The American Board of Surgery In-Training Examination (ABSITE) is designed to measure progress, applied medical knowledge, and clinical management; results may determine promotion and fellowship candidacy for general surgery residents. Evaluations are mandated by the Accreditation Council for Graduate Medical Education but are administered at the discretion of individual institutions and are not standardized. It is unclear whether the ABSITE and evaluations form a reasonable assessment of resident performance.

Objective  To determine whether favorable evaluations are associated with ABSITE performance.

Design, Setting, and Participants  Cross-sectional analysis of preliminary and categorical residents in postgraduate years (PGYs) 1 through 5 training in a single university-based general surgery program from July 1, 2011, through June 30, 2014, who took the ABSITE.

Exposures  Evaluation overall performance and subset evaluation performance in the following categories: patient care, technical skills, problem-based learning, interpersonal and communication skills, professionalism, systems-based practice, and medical knowledge.

Main Outcomes and Measures  Passing the ABSITE (≥30th percentile) and ranking in the top 30% of scores at our institution.

Results  The study population comprised residents in PGY 1 (n = 44), PGY 2 (n = 31), PGY 3 (n = 26), PGY 4 (n = 25), and PGY 5 (n = 24) during the 4-year study period (N = 150). Evaluations had less variation than the ABSITE percentile (SD = 5.06 vs 28.82, respectively). Neither annual nor subset evaluation scores were significantly associated with passing the ABSITE (n = 102; for annual evaluation, odds ratio = 0.949; 95% CI, 0.884-1.019; P = .15) or receiving a top 30% score (n = 45; for annual evaluation, odds ratio = 1.036; 95% CI, 0.964-1.113; P = .33). There was no difference in mean evaluation score between those who passed vs failed the ABSITE (mean [SD] evaluation score, 91.77 [5.10] vs 93.04 [4.80], respectively; P = .14) or between those who received a top 30% score vs those who did not (mean [SD] evaluation score, 92.78 [4.83] vs 91.92 [5.11], respectively; P = .33). There was no correlation between annual evaluation score and ABSITE percentile (r2 = 0.014; P = .15), percentage correct unadjusted for PGY level (r2 = 0.019; P = .09), or percentage correct adjusted for PGY level (r2 = 0.429; P = .91).

Conclusions and Relevance  Favorable evaluations do not correlate with ABSITE scores, nor do they predict passing. Evaluations do not show much discriminatory ability. It is unclear whether individual resident evaluations and ABSITE scores fully assess competency in residents or allow comparisons to be made across programs. Creation of a uniform evaluation system that encompasses the necessary subjective feedback from faculty with the objective measure of the ABSITE is warranted.

Introduction

The American Board of Surgery In-Training Examination (ABSITE) is an annual multiple-choice examination used to assess medical and applied knowledge of general surgery residents. It was originally designed as a tool for program directors to assess residents’ progress and is not a requirement for certification.1 Studies show that ABSITE performance is predictive of passing the American Board of Surgery Qualifying Examination2,3; therefore, program directors turn to this tool as an objective comparable measure. Extensive variability exists between how scores are used by each program and whether they affect promotion. A uniform standard is not determined by the Accreditation Council for Graduate Medical Education (ACGME).4 Regardless of the design of the ABSITE, this examination is now often used in manners beyond its original intent owing to a lack of other standardized evaluation techniques.

Semiannual review of residents is mandated and outlined by the General Surgery Milestone Project, a joint initiative of the ACGME and the American Board of Surgery.5 Evaluations are administered by the individual institutions depending on resident rotations. At our university-affiliated hospital, a comprehensive rotation-specific evaluation system has been used to assess essential qualities not readily testable on standardized examinations. The 7 categories evaluated are adapted from the General Surgery Milestone Project, including the following 6 core competencies in addition to technical skill: medical knowledge, patient care, interpersonal and communication skills, professionalism, practice-based learning and improvement, and systems-based practice. End-of-rotation evaluations in these competencies are a common approach at many institutions, but the structure is variable. Furthermore, training of faculty on the use of the scoring system as well as their education in knowledge or skills assessment of residents is not uniform.

Studies on the relationship between evaluations and board performance are limited and controversial. In the medical student population, one study showed high interrater reliability between resident and attending evaluations of medical students but poor correlation with standardized examination scores on the surgical clerkship.6 Conversely, another study showed a strong positive correlation between ward evaluation and National Board of Medical Examiners examination performance.7 To our knowledge, only 2 studies have evaluated this phenomenon in surgical residents. In regard to resident medical knowledge, one showed that there was poor correlation with ABSITE performance and that faculty evaluations cannot predict residents who will perform poorly.8 It should be noted that this study looked only at the relationship between medical knowledge assessed on evaluation and ABSITE score. The other showed that when using a standardized assessment of ACGME core competencies, faculty ratings were internally consistent and correlated with ABSITE and United States Medical Licensure Examination scores.9

Our study aims to add to the limited body of literature regarding the relationship between evaluations and ABSITE scores. It is unclear whether the ABSITE and evaluations form a reasonable assessment of residents. We hypothesize that there is no relationship between rotation-specific evaluation and ABSITE scores as measures of resident performance.

Methods

Quiz Ref IDA retrospective study of deidentified evaluation and ABSITE scores was conducted at our institution from July 1, 2011, through June 30, 2014. Evaluations in postgraduate years (PGYs) 1 through 5 were reviewed for each rotation completed during that academic year, ranging from 6 to 11 evaluations per resident depending on PGY. A mean annual score and a mean score for each of the 7 evaluation subsets were calculated for each resident: patient care, technical skill, practice-based learning, medical knowledge, interpersonal and communication skills, professionalism, and systems-based practice. Evaluations are completed by faculty through the New Innovations Residency Management Suite online system. Residents are required to review these nonanonymous evaluations semiannually with the program director. Data for each resident, separated by PGY at the time of testing, were acquired from the evaluation and ABSITE reports for the academic year corresponding to the examination. This study was approved by the Institutional Review Board of the University of Miami. Informed consent was waived owing to the nature of the study, which involved a retrospective analysis of previously collected and stored deidentified data.

All data were analyzed in SAS version 9.3 statistical software (SAS Institute, Inc). The type I error rate was set to 5% and P < .05 was considered statistically significant. For continuous variables, normally distributed data are reported as mean (standard deviation). Continuous variables were compared with t test for parametric data. Evaluations were compared with the corresponding year’s ABSITE score. The ABSITE percentile and percentage correct scores were considered in the analyses. Quiz Ref IDBinary outcomes included passing the ABSITE with a score above or equal to the national 30th percentile as well as ranking in the top 30% of all scores at our institution during the combined years of the study. Multivariable regression was performed to predict passing the ABSITE or achieving a top 30% score. Data were also analyzed in terms of percentage correct score adjusted for PGY. All components of the ABSITE and all evaluation subsets were regressed to determine whether an association existed.

Results

The population comprised residents in PGY 1 (n = 44), PGY 2 (n = 31), PGY 3 (n = 26), PGY 4 (n = 25), and PGY 5 (n = 24) during the 4-year study period (N = 150). One hundred fifty ABSITE scores and 1131 evaluations were included for analysis. Quiz Ref IDThe distribution of resident scores for annual evaluation (mean [SD], 92.24 [5.06]; median, 92.65; interquartile range, 7.82), ABSITE percentile (mean [SD], 49.18 [28.82]; median, 49.50; interquartile range, 53.00), and ABSITE percentage correct (mean [SD], 72.51 [8.07]; median, 42.00; interquartile range, 11.00) are shown in Figure 1. Evaluations had less variation compared with the ABSITE percentile (SD = 5.06 vs 28.82, respectively). Overall, there was no correlation between annual evaluation score and ABSITE percentile (r2 = 0.014; P = .15), percentage correct unadjusted for PGY level (r2 = 0.019; P = .09), or percentage correct adjusted for PGY level (r2 = 0.429; P = .91).

Quiz Ref IDOn binary logistic regression, the annual evaluation score was not significantly associated with passing the ABSITE (odds ratio = 0.949; 95% CI, 0.884-1.019; P = .15) or receiving a top 30% score (odds ratio = 1.036; 95% CI, 0.964-1.113; P = .33). There was no significant relationship between annual or any of the subset evaluation scores and ABSITE scores adjusted for PGY on multivariable linear regression. Figure 2A and B show the lack of correlation between annual or medical knowledge evaluations and ABSITE percentile by PGY, and Figure 2C and D show this for ABSITE percentage correct. No evaluation subset scores were predictive of passing the ABSITE on regression models.

Quiz Ref IDThere was no difference in mean evaluation score between those who passed vs failed the ABSITE (mean [SD] evaluation score, 91.77 [5.10] vs 93.04 [4.80], respectively; P = .14) (Figure 3) or between those who received a top 30% score vs those who did not (mean [SD] evaluation score, 92.78 [4.83] vs 91.92 [5.11], respectively; P = .33) (Figure 4).

Discussion

In the times of William Stewart Halsted, MD, surgical residents were continuously and rigorously evaluated through an apprenticeship model that was perhaps more reflective of true medical knowledge and skill than the prototype we have today. The modern-day residency model has shifted—residents frequently rotate between services and work with a multitude of attending surgeons. Perhaps now more than ever, the need for a comprehensive evaluation is even more critical, especially to identify struggling residents early in their training experience.10 A reform like this may help create a standard by which to compare residents during the fellowship application and promotion process but would require faculty education and “buy in” to ensure that the evaluation method is consistent within and across programs.

Our study found that favorable evaluation scores could not be used to predict ABSITE performance and that there was no difference in mean evaluation scores between those who passed and failed the ABSITE. We also showed that evaluations have low variability, which implies that they are not being used to their full potential to assess a resident’s competency and standing among their peers. These findings illustrate a concerning deficiency in the way we currently evaluate surgical trainees. The creation of a standardized, valid, and reliable system of evaluation is imperative to the future of surgical education.

The ACGME has required residency programs to base evaluations on the 6 core competencies for more than a decade,11 but the scales used are diverse. In addition to the structured variability inherent in these evaluations, we know that outside factors, such as personality, influence evaluation scores; therefore, assessments of the reliability and validity of these evaluations are paramount.12,13 One study using a standard web-based evaluation system at 5 different surgical training program sites found that faculty evaluations were internally consistent among the sites and that there was a correlation between competency ratings and ABSITE scores.9 These results show promise that a standardized and reliable system can be implemented across institutions.

Prior to 2014, there was a junior version and a senior version of the ABSITE. The content differed on these examinations based on resident PGY. As of 2014, the ABSITE structure changed so that all residents now receive a single examination that is compared nationally against residents of the same PGY.1 The 2014 ABSITE provided overall percentage correct and percentile scores in addition to percentage correct scores in the subcategories of patient care and medical knowledge. The 2011 to 2013 examinations also reported percentage correct scores for individual organ system subsets. For the purposes of our study, only overall percentage correct and percentile scores were considered, to account for the change in structure of the test.

Our study should be considered in the context of certain limitations. First, the possibility of type II error exists as a sample size of 150 may not be large enough to detect an association. The change in structure of the ABSITE during our study period is also a limitation, as test results were evaluated together across this period. Furthermore, the status of each resident (ie, categorical vs preliminary) was not factored into the analysis. Finally, we are unable to control for the inconsistency in completing the evaluations by faculty members, which may contribute to decreased internal validity. A limitation on external validity should also be considered, as other programs’ systems for evaluating residents have their own sets of strengths and weaknesses.

Conclusions

Favorable evaluations do not correlate with ABSITE scores. Evaluations do not show discriminatory ability. It is unclear whether resident evaluations and ABSITE scores fully assess competency in residents or whether these tools allow comparisons to be made across programs. Creation of a uniform evaluation system that encompasses the necessary feedback from faculty with the objective measure of the ABSITE is warranted. This tool will be vital to create a fair and effective method to determine resident promotion and ensure timely intervention for residents with deficiencies. Furthermore, it would potentially allow evaluation of surgery programs across institutions and influence surgical training paradigms.

Back to top
Article Information

Corresponding Author: Juliet J. Ray, MD, DeWitt Daughtry Family Department of Surgery, University of Miami Miller School of Medicine, Ryder Trauma Center, 1800 NW 10th Ave, Ste T 215 (D40), Miami, FL 33136 (jray@med.miami.edu).

Accepted for Publication: June 8, 2015.

Published Online: November 4, 2015. doi:10.1001/jamasurg.2015.3088.

Author Contributions: Drs Spector and Schulman had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Dr Ray is the first author and Dr Schulman is the senior author.

Study concept and design: Ray, Meizoso, Allen, Namias, Pizano, Spector, Schulman.

Acquisition, analysis, or interpretation of data: Ray, Sznol, Teisch, Meizoso, Allen, Namias, Sleeman, Spector, Schulman.

Drafting of the manuscript: Ray, Sznol, Teisch, Meizoso, Schulman.

Critical revision of the manuscript for important intellectual content: Teisch, Meizoso, Allen, Namias, Pizano, Sleeman, Spector, Schulman.

Statistical analysis: Ray, Sznol, Teisch, Meizoso, Allen, Schulman.

Administrative, technical, or material support: Namias, Spector.

Study supervision: Allen, Namias, Pizano, Sleeman, Spector, Schulman.

Conflict of Interest Disclosures: None reported.

Previous Presentation: This paper was presented at the 39th Annual Meeting of the Association of VA Surgeons; May 3, 2015; Miami Beach, Florida.

Additional Contributions: Tanya Spencer, Leela Mundra, BA, and Manasa Narasimman, University of Miami, Miami, Florida, provided assistance in data collection; they received no compensation.

References
1.
American Board of Surgery. American Board of Surgery In-Training Examination (ABSITE). http://www.absurgery.org/default.jsp?certabsite. Accessed January 20, 2015.
2.
de Virgilio  C, Yaghoubian  A, Kaji  A,  et al.  Predicting performance on the American Board of Surgery qualifying and certifying examinations: a multi-institutional study.  Arch Surg. 2010;145(9):852-856.PubMedGoogle ScholarCrossref
3.
Jones  AT, Biester  TW, Buyske  J, Lewis  FR, Malangoni  MA.  Using the American Board of Surgery In-Training Examination to predict board certification: a cautionary study.  J Surg Educ. 2014;71(6):e144-e148.PubMedGoogle ScholarCrossref
4.
Abdu  RA.  Survey analysis of the American Board of Surgery In-Training Examination.  Arch Surg. 1996;131(4):412-416.PubMedGoogle ScholarCrossref
5.
Accreditation Council for Graduate Medical Education; American Board of Surgery. The General Surgery Milestone Project. https://www.acgme.org/acgmeweb/Portals/0/PDFs/Milestones/SurgeryMilestones.pdf. Accessed December 21, 2014.
6.
Goldstein  SD, Lindeman  B, Colbert-Getz  J,  et al.  Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores.  Am J Surg. 2014;207(2):231-235.PubMedGoogle ScholarCrossref
7.
Reid  CM, Kim  DY, Mandel  J, Smith  A, Bansal  V.  Correlating surgical clerkship evaluations with performance on the National Board of Medical Examiners examination.  J Surg Res. 2014;190(1):29-35.PubMedGoogle ScholarCrossref
8.
Elfenbein  DM, Sippel  RS, McDonald  R, Watson  T, Scarborough  JE, Migaly  J.  Faculty evaluations of resident medical knowledge: can they be used to predict American Board of Surgery In-Training Examination performance?  Am J Surg. 2015;209(6):1095-1101.PubMedGoogle ScholarCrossref
9.
Tabuenca  A, Welling  R, Sachdeva  AK,  et al.  Multi-institutional validation of a web-based core competency assessment system.  J Surg Educ. 2007;64(6):390-394.PubMedGoogle ScholarCrossref
10.
Minter  RM, Dunnington  GL, Sudan  R, Terhune  KP, Dent  DL, Lentz  AK.  Can this resident be saved? identification and early intervention for struggling residents.  J Am Coll Surg. 2014;219(5):1088-1095.PubMedGoogle ScholarCrossref
11.
Swing  SR.  The ACGME outcome project: retrospective and prospective.  Med Teach. 2007;29(7):648-654.PubMedGoogle ScholarCrossref
12.
Schell  RM, Dilorenzo  AN, Li  HF, Fragneto  RY, Bowe  EA, Hessel  EA  II.  Anesthesiology resident personality type correlates with faculty assessment of resident performance.  J Clin Anesth. 2012;24(7):566-572.PubMedGoogle ScholarCrossref
13.
Lacorte  MA, Risucci  DA.  Personality, clinical performance and knowledge in paediatric residents.  Med Educ. 1993;27(2):165-169.PubMedGoogle ScholarCrossref
×