eFigure. Distribution of overal WiSCoR ratings and by domain
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Vande Walle KA, Quamme SRP, Beasley HL, et al. Development and Assessment of the Wisconsin Surgical Coaching Rubric. JAMA Surg. 2020;155(6):486–492. doi:10.1001/jamasurg.2020.0424
How is peer surgical coach performance evaluated?
In this study, the Wisconsin Surgical Coaching Rubric (WiSCoR), a novel tool for assessing peer surgical coach performance, was developed and assessed. Evidence to support the validity of WiSCoR includes systematic content development, consistent rater training, and high interrater reliability.
WiSCoR reliably assesses the performance of a surgical coach and can provide formative feedback to surgical coaches or assess the fidelity of coaching interventions to coaching principles.
Surgical coaching continues to gain momentum as an innovative method for continuous professional development. A tool to measure the performance of a surgical coach is needed to provide formative feedback to coaches for continued skill development and to assess the fidelity of a coaching intervention for future research and dissemination.
To evaluate the validity of the Wisconsin Surgical Coaching Rubric (WiSCoR), a novel tool to assess the performance of a peer surgical coach.
Design, Setting, and Participants
Surgical coaching sessions from November 2014 through February 2018 conducted by 2 statewide peer surgical coaching programs were audio recorded and transcribed. Twelve raters used WiSCoR to rate the performance of the surgical coach for each session. The study included peer surgical coaches in the Wisconsin Surgical Coaching Program (n = 8) and the Michigan Bariatric Surgery Collaborative coaching program (n = 15). The data were analyzed in 2019.
Interventions or Exposures
Use of WiSCoR to rate peer surgical coaching sessions.
Main Outcomes and Measures
There were 282 WiSCoR ratings from the 106 coaching sessions included in the study. WiSCoR was evaluated using a framework, including inter-rater reliability assessed with Gwet weighted agreement coefficent. Descriptive statistics of WiSCoR were calculated.
Eight coaches (35%) and 11 coachees (29%) were from the Wisconsin Surgical Program and 15 coaches (65%) and 27 coachees (71%) were from the Michigan Bariatric Surgery Collaborative. The validity of WiSCoR is supported by high interrater reliability (Gwet weighted agreement coefficient, 0.87) as well as a weakly positive correlation of WiSCoR to coachee ratings of coaches (r = 0.22; P = .04), rigorous content development, consistent rater training, and the association of WiSCoR with coach and coaching program development. The mean (SD) overall coach performance rating using WiSCoR was 3.23 (0.82; range, 1-5).
Conclusions and Relevance
WiSCoR is a reliable measure that can assess the performance of a surgical coach, inform fidelity to coaching principles, and provide formative feedback to surgical coaches. While coachee ratings may reflect coachee satisfaction, they are not able to determine the quality of a coach.
Peer surgical coaching is an approach to continuous professional development that uses adult learning theory to support a surgeon’s individual performance improvement.1-4 In peer surgical coaching, a practicing surgeon is paired with a trained surgeon coach. This partnership uses coaching sessions for collaborative analysis and constructive feedback to improve technical, cognitive, interpersonal, and stress management skills through goal setting and action planning. These coaching interactions provide an evidence-based approach to practice change.1-4
The success of this approach to continuous professional development requires adherence to coaching principles. However, to our knowledge, there is currently no tool to assess a surgical coach’s performance during a coaching session as measured by adherence to coaching principles. Without an instrument to evaluate a surgical coach, it is challenging to determine the quality of a coaching session or provide formative feedback to coaches on their performance. We aimed to fill this gap by developing and evaluating the validity of the Wisconsin Surgical Coaching Rubric (WiSCoR), a novel tool for assessing coach performance during a single peer surgical coaching session.
Building off the Wisconsin Surgical Coaching Program’s (WSCP) framework, core competencies for surgical coaching were identified.1 Through an iterative process, our multidisciplinary team developed WiSCoR as a tool to evaluate coach performance in these competencies during individual coaching sessions. We used the validity framework presented by Cook and Beckman5 that applies Messick’s6 5 sources of evidence for validity to medicine: content, response process, internal structure, associations with other variables, and consequences. This study was approved by the University of Wisconsin–Madison Health Sciences institutional review board and the University of Michigan institutional review board. This study was considered a quality improvement project with a waiver of consent.
First, we assessed how accurately WiSCoR’s content reflects the construct we are trying to measure, namely the performance of a surgical coach.5-8 This aspect of validity includes detailing the steps undertaken for rubric development and who was involved.5,7 The first step was a structured literature review from organizational psychology, coaching psychology, education, business, athletics, medicine, and other associated fields.9-20 To understand the activities of coaching and what makes a coach effective, we next conducted a comparative case analysis consisting of field observations and semistructured interviews of successful coaches in music, education, and athletics.2 This data was combined with prior literature to generate the WSCP’s framework. Experts in surgery, education, cognitive psychology, and executive coaching synthesized this information and the components of the WSCP’s framework to identify the key categories of coaching. These categories formed the domains of WiSCoR and are observable aspects of coaching.
WiSCoR consists of 4 domains and an overall rating. These include: (1) shares responsibility and contributes to equal exchange; (2) uses questions/prompts to guide coachee self-reflection/analysis; (3) provides constructive feedback and encouragement; and (4) guides goal setting and action planning (Figure 1). Markers of performance in domain 1 (shares responsibility and contributes to equal exchange) include how the coach provides the coachee with opportunities to contribute to the agenda and supports the coachee’s self-directed learning. Domain 2 (uses questions/prompts to guide coachee self-reflection/analysis) behavioral markers include whether and how the coach uses questions to facilitate a collaborative analysis of the coachee’s performance, including closed vs open questions, timing, number, and missed opportunities. Domain 3 (provides constructive feedback and encouragement) was assessed by whether and how often the coach provided feedback and if it was constructive, specific, nonjudgmental, and actionable.20 Finally, domain 4 (guides goal setting and action planning) evaluates how the coach supports the identification of coachee goals and development of an action plan.
Each domain is rated from low to high alignment with principles of coaching using a 5-point scale and anchoring descriptions. The highest rating is 5 (exemplary) followed by 4 (proficient), 3 (developing), 2 (neutral/ineffective), and 1 (counterproductive). A 5 is expected for a professional coach. We anticipated most coaches would be rated a 3, with 4 for those who exceled, 2 for those needing help, and 1 for those who should not continue serving as a coach. Once scores were assigned to each domain, an overall score was generated by the rater on a 5-point scale. This score was based on their overall impression of the coaching session after considering a coach’s performance in each domain, similar to the National Institutes of Health scoring system to generate an overall impact score.21 This overall score represents the construct of surgical coach performance.
The response process describes how well the rating process matches the intended construct.5-7 In our study, this is assessed by rater training and the clarity of scoring material.5,7 WiSCoR was piloted in the WSCP from 2014 to 2015 (details published elsewhere).1 The coaching sessions were based on video of the coachee operating that the coach and coachee reviewed together. Coaching sessions from the WSCP were audio-recorded and transcribed. Those involved in rubric development independently rated several coaching sessions using the recording and transcripts. This group of raters then discussed their scores to come to a consensus and determine the meaning of each score. In addition, the wording of the principles for each domain was modified and clarified. The raters included an executive coach, an expert in human performance, an education researcher, and several surgeons. All members contributed to the modifications made in each WiSCoR iteration. Once the final text and anchors were agreed on, raters independently rated the remainder of the coaching sessions in the WSCP.
WiSCoR was then used to assess coaches participating in the Michigan Bariatric Surgery Collaborative (MBSC) coaching program from 2015 to 2018 (details published elsewhere).4 Four of the 8 WiSCoR raters for the MBSC coaching program also were raters for the WSCP. The four new raters included surgeons, medical students, and clinical researchers. They were given the WiSCoR rubric and background in its development. These raters then started rating coaching sessions independently. Group discussions were held among the raters to calibrate scoring. Once there was agreement and consistency using WiSCoR, the remaining coaching sessions were rated independently.
The internal structure describes the reliability of the rubric scores.5 We chose the Gwet weighted agreement coefficient (AC2) as the measure of interrater reliability because it does not require the assumption of independent observations, can handle ordinal data, and can account for missing data.22,23 This was important, as many coaches and coachees participated in multiple coaching sessions and did not represent independent observations. Given our rating scale of 1 to 5, ordinal weighting was used in the calculation of the Gwet AC2. An agreement coefficient greater than 0.8 was considered almost perfect.24 The reliability calculations included 106 sessions. Fourteen sessions (13.2%) were from the WSCP and 92 (86.8%) were from the MBSC. There were 12 different raters from varied backgrounds as described previously. Two sessions (1.9%) were excluded because the recording was inaudible. All statistical analyses were performed in SAS, version 9.4 (SAS Institute).
Associations with other variables evaluates whether WiSCoR correlates with other associated outcomes as would be expected.5 While to our knowledge there is currently no other validated measure of coach performance or criterion standard, we were able to assess the association of WiSCoR with the coachee ratings of their coaches. We would expect a positive correlation between WiSCoR and coachee ratings of coach performance. After coaching sessions in both coaching programs, the coachees were given a survey. The survey included questions about their coach that corresponded to theWiSCoR domains. Coachee ratings were on a 5-point Likert scale, and the 4 questions corresponding to each WiSCoR domain were averaged to obtain an overall coach rating. Descriptive statistics and a Pearson correlation coefficient was calculated between the WiSCoR rating and the coachee’s rating of the coach. A P value of <.05 was considered statistically significant.
Consequence evidence considers the effect of WiSCoR.5,6 One intended use of WiSCoR is to give feedback to coaches by evaluating their performance according to clear domains of coaching principles. Coaches can use this feedback to develop their coaching skills, which may translate to more effective coaching sessions that accelerate coachee improvement and ultimately improve patient care. WiSCoR scores can also assist in identifying aspects of coach performance that could be improved through adjusting coach training. Lastly, WiSCoR can provide a measure of fidelity to coaching principles in future research projects and as surgical coaching programs are disseminated across practice settings and geographic locations.
There were 106 coaching sessions included in the study with a total of 282 WiSCoR ratings. The number of raters per coaching session ranged from 1 to 8, with most coaching sessions having 2 raters (Table 1). This includes ratings of coaching sessions done by more raters for consensus scoring to ensure consistency. There were 23 coaches and 38 coachees who participated in coaching sessions. A total of 19 coaches and 23 coachees participated in multiple coaching sessions. Eight coaches (35%) and 11 coachees (29%) were from the WSCP while 15 coaches (65%) and 27 coachees (71%) were from the MBSC coaching program. Most coaches (22 of 23 [96%]) and coachees (32 of 38 [84%]) were men.
Descriptive statistics of the WiSCoR ratings are shown in Table 2 and histograms are included in eFigure in the Supplement. The mean (SD) overall WiSCoR rating was 3.23 (0.82). The domain with the highest mean rating was domain 1: share responsibility (mean [SD] = 3.47 [0.99]). The domain with the lowest mean rating was domain 4: goal setting and action planning (mean [SD] = 2.88 [0.96]). The standard deviations of all domains were less than 1. The mode was also highest for domain 1: share responsibility (mode = 4) and lowest for domain 4: goal setting and action planning (mode = 2). Raters used the entire range of the scale (1-5) for the overall WiSCoR rating as well as for each domain.
The Gwet AC2 with the 95% CI for the overall WiSCoR rating, as well as for each domain, are shown in Table 2. Gwet AC2 was 0.87 for the overall WiSCoR rating and ranged from 0.84 to 0.93 for the individual domains. Domain 2, question for self-reflection, had the highest Gwet AC2 while domain 1, share responsibility, had the lowest Gwet AC2 (Table 2). All Gwet AC2 values are greater than 0.8, which corresponds to almost perfect agreement.24
Of the 106 coaching sessions, 89 (84%) had a WiSCoR and coachee rating of the coach. The mean rating of coaches by coachees was significantly higher than the mean (SD) WiSCoR rating (4.6 [0.5] vs 3.3 [0.8]; P < .001) (Table 3). The coachees only used the upper half of the range of the scale (range, 2.5-5), whereas WiSCoR scores included the entire scale (range, 1-5). A scatterplot of the coachee rating of the coach and the WiSCoR rating is shown in Figure 2. The correlation between the coachee rating and the WiSCoR rating of the coach was weak but significant and positive (r = 0.22; P = .04) (Table 3).
We describe the development and initial use of WiSCoR, a novel tool to evaluate the quality of peer surgical coaches during individual coaching sessions. Varied sources of evidence throughout the development process contributed to the validity argument for this tool:
Content: the content of WiSCoR was generated through literature review, field observations, and interviews of coaches from diverse fields. A multidisciplinary team of experts used this information to identify the essential components of a coach’s performance during a surgical coaching session, which became the 4 domains of WiSCoR.
Response process: WiSCoR was then ready for pilot testing, which allowed for refinement of the domain principle descriptions and training of raters through discussion.
Internal structure: subsequent use of WiSCoR showed high interrater reliability among a diverse group of raters.
Associations with other variables: while there is no criterion standard measure of coach performance, WiSCoR scores had a statistically significant but weak association with coachee ratings of their coach, likely because of a ceiling effect of the coachee ratings.
Consequences: using WiSCoR can identify opportunities for coach development grounded in coaching principles.
Based on Messick’s6 criteria for validity, these sources of evidence generate support for the use of WiSCoR in evaluating the performance of peer surgical coaches.5-7
We observed that coachee ratings had a significant but weak positive correlation with WiSCoR scores. Coachee ratings were consistently near the top of the scale, so this weak correlation may reflect a ceiling effect of coachee ratings. These high scores likely reflect that coachees enjoyed the coaching experience, valued the feedback and time with their coaches regardless of quality, and received no training in surgical coaching principles or meanings of the ratings. These scores may also reflect the professional respect coachees had for their coaches or prior friendships. Because there is only minor variation in coachee ratings, this score has little use in assessing the quality of a coach as it does not allow for differentiation among coaches. This emphasizes the need for a defined rating scale that can assess adherence to coaching principles.
The use of WiSCoR demonstrated a suitable rating scale. The mean overall coaching score and domain scores was near the middle of the 1-to-5 scale, which gives room to rate coaches above and below the mean. In addition, the entire range of the scale was used by raters for the overall score and each domain. This will permit discrimination among surgical coaches of varying performance levels without ceiling or floor effects. In addition, WiSCoR demonstrated high interrater reliability even when used by raters from various clinical and nonclinical backgrounds. This suggests that WiSCoR can be used reliably by those with experience in any field associated with surgical coaching with appropriate training and calibration. Using this tool does not require formal surgical training because it does not assess the accuracy of a coach’s feedback or guidance. However, WiSCoR does assess whether feedback is well reasoned and if the coachee has the opportunity to respond.
WiSCoR was designed to assess 4 domains: (1) shares responsibility and contributes to equal exchange; (2) uses questions/prompts to guide coachee self-reflection/analysis; (3) provides constructive feedback and encouragement; and (4) guides goal setting and action planning. These 4 domains represent the core competencies of surgical coaching. Fidelity to these fundamental coaching principles can be objectively measured by WiSCoR. As a result, WiSCoR can be used for coach feedback, coach training, and the development of surgical coaching programs. The ability to assess surgical coach performance will provide critical feedback as coaches develop this new skill. Feedback from WiSCoR includes a numeric rating for each coaching domain to help coaches identify areas of weakness. The rubric also describes specific behaviors for each domain that align with high and low ratings. Raters may add comments to the WiSCoR ratings with examples of these specific behaviors that led to their domain ratings. These descriptive details may provide more actionable feedback. Results from WiSCoR can also be used to refine surgical coach training. WiSCoR may identify coaching domains in which many coaches receive low ratings. This will provide an opportunity to emphasize or increase practice of high-performing behaviors in that domain during coach training. For example, in our experience, the goal setting and action planning domain had the lowest scores. This skill seemed to be the most difficult for coaches to learn. This may be because coaches were so engaged in collaborative analysis and feedback during the coaching session that they sometimes neglected to set specific goals and action plans. In future coach training, goal setting could be explicitly practiced or a checklist created to help coaches improve performance in this domain. Lastly, WiSCoR defines the principles of surgical coaching and can evaluate the fidelity of future surgical coaching programs to them. When creating a surgical coaching program, the development team can refer to WiSCoR and ensure these 4 principles are incorporated. Subsequently, the WiSCoR domains can be used to understand how a surgical coaching program adheres to surgical coaching principles for the purposes of evaluation of programs and research.
There are several limitations to the development and use of WiSCoR. While WiSCoR was developed and used in 2 different coaching programs, its reliability will continue to need to be evaluated as WiSCoR is used in new coaching programs with different structures and geographic locations. It is also possible that differences in sex may alter the coaching relationship and coach effectiveness, and because most participants in this study were men, further investigation is needed to validate this tool for more diverse populations. In addition, reliable use requires a period of training and calibration of raters. However, this can be accomplished in a relatively short period. Use of WiSCoR requires that the coaching session is either observed or audio-recorded and transcribed. WiSCoR is not intended to be used for all coaching sessions in practice, nor would this be feasible. WiSCoR provides an important tool for research, training, and audit. The ratings of coach performance may also be affected by the characteristics and receptivity of the coachee, which are not accounted for in this study. Coachees were voluntary participants in these coaching programs, which may have increased their responsiveness to coaching. While we have demonstrated evidence for the validity of WiSCoR through a rigorous development and evaluation process, we do not yet have data on the association of a coach’s WiSCoR ratings and their coachee’s performance improvement. In the future, further validation of WiSCoR could be obtained by evaluating the association of WiSCoR ratings with improvement in the coachee’s surgical skills and/or clinical outcomes.
WiSCoR is a novel instrument to assess the performance of peer surgical coaches. Evidence supporting the validity of the measure includes rigorous content development, consistent rater training, high interrater reliability, association with coachee ratings of coaches, and an evaluation of the effect of WiSCoR. Moving forward, WiSCoR can be used to assess the fidelity of sessions to coaching principles during future research and implementation programs and to provide feedback to surgical coaches to optimize their performance.
Accepted for Publication: February 11, 2020.
Corresponding Author: Caprice C. Greenberg, MD, MPH, Wisconsin Surgical Outcomes Research (WiSOR) Program, Department of Surgery, University of Wisconsin-Madison, 600 Highland Ave, K6/100 Clinical Science Center, Madison, WI 53792 (firstname.lastname@example.org).
Published Online: April 22, 2020. doi:10.1001/jamasurg.2020.0424
Author Contributions: Dr Greenberg had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Vande Walle, Pavuluri Quamme, Beasley, Ghousseini, Dombrowski, Fry, Dimick, Wiegmann, Greenberg.
Acquisition, analysis, or interpretation of data: Vande Walle, Pavuluri Quamme, Beasley, Leverson, Ghousseini, Fry, Dimick, Wiegmann, Greenberg.
Drafting of the manuscript: Vande Walle, Pavuluri Quamme, Dimick, Wiegmann, Greenberg.
Critical revision of the manuscript for important intellectual content: Vande Walle, Pavuluri Quamme, Beasley, Leverson, Ghousseini, Dombrowski, Fry, Dimick, Greenberg.
Statistical analysis: Vande Walle, Pavuluri Quamme, Leverson, Fry, Wiegmann.
Obtained funding: Pavuluri Quamme, Dimick, Greenberg.
Administrative, technical, or material support: Pavuluri Quamme, Dombrowski, Dimick, Wiegmann, Greenberg.
Supervision: Pavuluri Quamme, Wiegmann, Greenberg.
Conflict of Interest Disclosures: Dr Dimick reported personal fees from ArborMetrix, Inc during the conduct of the study. Dr Wiegmann reported personal fees from HFACS, Inc outside the submitted work. Dr Greenberg reported grants from Wisconsin Partnership Program and the National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Diseases (NIH/NIDDK) during the conduct of the study and serving on the Johnson and Johnson Institute's Global Education Council and as president/founder of the Institute for Surgical Coaching. She is not paid for either of these positions. No other disclosures were reported.
Funding/Support: Funding for this work was provided by NIH/NIDDK grant R01 DK101423-01 and the Wisconsin Partnership Program Education and Research Committee grant 2357. Dr Vande Walle was supported by the NIH training grant T32 CA090217.
Role of the Funder/Sponsor: The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: Dr Dimick is the Surgical Innovation Editor of JAMA Surgery, but he was not involved in any of the decisions regarding review of the manuscript or its acceptance.
Create a personal account or sign in to: