Break around 3.6 hours separates the short training time group (n = 10) on the left from the long training time group (n = 6) on the right. Anderson-Darling test did not reject normal distribution (P > .05).
A, Each line represents a follow-up group mean (SD) initial training score (71.18% [6.96%]), final training score (93.20% [0.39%]), initial follow-up score (86.37% [3.41%]), and final follow-up score (93.90% [0.61%]). B, Individual initial follow-up scores only. Lines of best fit for the short training time group (open squares, R2 = 0.103) and long training time group (filled circles, R2 = 0.742).
Open squares represent the short training time (STT) group, and filled circles represent the long training time (LTT) group. Lines of best fit for the STT (R2 = 0.216) and LTT (R2 = 0.945) groups.
Open squares represent short training time (STT) group, and filled circles represent the long training time (LTT) group. Lines of best fit for STT (R2 = 0.130) and LTT (R2 = 0.940) groups.
F1, F3, F5, and F7 represent follow-up groups 1, 3, 5, and 7 weeks after training, respectively, with lines of best fit (R2 = 0.367, 0.225, 0.871, and 0.742, respectively). Circle outlines indicate individuals in the long training time group.
Zhang N, Sumer BD. Transoral Robotic SurgerySimulation-Based Standardized Training. JAMA Otolaryngol Head Neck Surg. 2013;139(11):1111-1117. doi:10.1001/jamaoto.2013.4720
Simulation-based standardized training is important for the clinical training of physicians practicing robotic surgery.
To train robotic surgery–naïve student volunteers using the da Vinci Skills Simulator (dVSS) for transoral robotic surgery (TORS).
Prospective inception cohort in 2012.
Academic referral center.
Sixteen medical student volunteers lacking experience in robotic surgery.
Participants trained with the dVSS in 12 exercises until competent, defined as an overall score of at least 91%. After a 1-, 3-, 5-, or 7-week postinitial training hiatus (n = 4 per group), participants reachieved competence on follow-up.
Main Outcomes and Measures
Total training time (TTT) to achieve competency, total follow-up time (TFT) to reachieve competency, and performance metrics.
All participants became competent. The TTT distribution was normal based on the Anderson-Darling normality test (P > .50), but our sample was divided into a short training time (STT) group (n = 10 [63%]) and long training time (LTT) group (n = 6 [37%]). The mean (SD) TTT was 2.4 (0.6) hours for the STT group and 4.7 (0.5) hours for the LTT group. All participants reachieved competence with a mean TFT that was significantly shorter than TTT. There was no significant difference between STT and LTT in mean TFT at 1 and 3 weeks (P = .79), but the LTT group had a longer TFT at 5 and 7 weeks (P = .04) but with no difference in final follow-up scores (P = .12).
Conclusions and Relevance
Physicians in training can acquire robotic surgery competency. Participants who acquire skills faster regain robotic skills faster after a training hiatus, but, on retraining, all participants can regain equivalent competence. This information provides a benchmark for a simulator training program.
The incidence of oropharyngeal squamous cell carcinomas (OPSCCs) in the United States has been rising and is mostly due to human papillomavirus–related OPSCC cases, which from 1973 to 2004 had an overall annual percentage increase of 0.80 per 100 000 people.1 While nonsurgical therapy is commonly used in the treatment for OPSCC,2 advances in surgical techniques, such as transoral laser microsurgery and transoral robotic surgery (TORS), have allowed minimally invasive, transoral approaches for the resection of OPSCC. Transoral robotic surgery is a minimally invasive technique3,4 that maximizes functional and oncologic outcomes.5,6 Because it is a relatively new surgical modality, there is a corresponding need for optimal tools for resident training in TORS.
Surgical training for a variety of endoscopic and minimally invasive procedures includes the use of surgical computer-based simulators that can provide valuable experience for surgeons in training.7 One such system is the da Vinci Skills Simulator (dVSS) (Intuitive Surgical), which is designed to provide familiarity with the da Vinci console, one component of TORS. The simulator provides a computer-generated 3-dimensional image for the surgeon at the console and has series of exercises that consist of a variety of tasks that the surgeon completes using the console controls. The simulator records metrics, such as time to completion, drops, and economy of movement, and scores each metric to generate a composite overall score for each exercise. While previous studies have verified that the dVSS scores do correlate with surgeon experience8 and that simulator training does improve robotic surgery skills,9,10 which are then transferable to the operating room,11 to our knowledge, no previous study has identified the dVSS exercises that are most relevant for TORS, quantified the amount of simulator time necessary for a previously untrained student to become competent with the da Vinci System, or measured how quickly console skills decline after a period of no training.
To address these questions, we performed a study to evaluate the feasibility of using selected dVSS exercises to prepare physicians in training to perform TORS by improving console skills, estimate the simulator time needed to achieve basic competence with the da Vinci console, and quantify how quickly console skills decline requiring retraining.
After obtaining institutional review board approval from University of Southwestern Medical Center in April 2012, a volunteer sample of 16 medical students with no experience in robotic surgery was recruited to train with the dVSS. Informed consent was waived by the institutional review board.
All robotic training was performed using the da Vinci Surgical SiHD System installed at Zale Lipshy University Hospital. Participants received a standardized orientation that explained the basics of how the da Vinci Surgical Console and dVSS operated. Participants were then scheduled for individual training times, during which they had unlimited console time and unlimited attempts to perform 12 exercises relevant to TORS until they became competent. Each simulation exercise involves completing a task that trains various skill sets (eg, wrist manipulation, camera manipulation, precision in moving objects, suturing). The exercises that involved tasks relevant to performing TORS clinically were selected for as part of the training protocol in our study, meaning that exercises requiring the third robotic arm or requiring needle driving were eliminated. The 12 selected exercises were considered relevant for TORS because they involved tasks requiring precision in grasping, holding, and moving objects, wrist flexibility, or ability to efficiently cauterize vasculature, which are all skills useful for TORS. In addition, the size of the area in which the controllers traveled was measured for each exercise (Master Workspace Range) (Table) with less than 10 cm being the ideal range, which helps to train surgeons to operate in a limited-size space. The 12 selected exercised were performed in a prearranged order: Pick and Place, Camera Targeting 1, Camera Targeting 2, Peg Board 2, Ring Walk 2, Ring and Rail 2, Match Board 1, Match Board 2, Energy Switching 1, Energy Switching 2, Energy Dissection 1, and Energy Dissection 2.
Competence was defined as achieving an overall score of at least 91%, calculated by the simulator program using preprogrammed exercise metrics for each exercise. Previous validation studies reported “expert” mean or median overall scores of 87.0% to 88.3% on identical or similar simulation exercises.8,12 Instead of using a participant’s mean score on each exercise (which in our study would be too easily swayed by low outlier scores) to determine training completion, the study used the score achieved in a single exercise attempt as an end point of training. When the participant was able to achieve a standard of at least 91% in a single exercise attempt, he or she was determined to be sufficiently trained in that exercise and could then move on to the next exercise. A comparatively high value (a score of ≥91%) was used to partially account for the difference between measuring the mean score as opposed to a single final score but is also in part an arbitrary standard chosen for the sake of clearly defining “competence” in the study. Preprogrammed metrics measured for the exercises included time to complete exercise, economy of motion, number of instrument collisions, excessive instrument force time, instruments out of view time, master workspace range, number of drops, misapplied energy time, blood loss volume, and number of broken vessels. Not all performance metrics were applicable to every exercise task (Table).
Training total time (TTT), defined as the total console time required to achieve competence, was recorded, along with raw values for all metrics. Data on participants’ previous exposure to video games (inclusive of all console, computer, handheld, and arcade-based exposure) were also collected, in the form of a survey on current mean time played weekly and time of the longest duration the participant had ever played in 1 sitting (peak time ever played). Participants were allowed to take breaks during their training ad libitum, and time spent on breaks was excluded from TTT.
Each participant was randomly assigned to follow-up 1, 3, 5, or 7 weeks posttraining (n = 4 per group), and on their assigned follow-up date they repeated the exercises until they regained competence, an overall score of at least 91% on all exercises as previously defined. Participants had no exposure to the console or simulation program between initial training and follow-up. Total follow-up time (TFT), defined as the time to reachieve competence, was recorded, along with raw values for all metrics.
Initial training score and initial follow-up score were defined as the mean scores of the first attempts at all 12 exercises during the training and follow-up periods, respectively. Final training score and final follow-up score were defined as the mean scores of the last attempts at all 12 exercises during the training and follow-up periods, respectively (the attempts when a score ≥91% was achieved). If only 1 attempt was needed to achieve a score of at least 91% on a given exercise, the score of that single attempt counted as both initial and final scores.
Mann-Whitney U test (Wilcoxon rank sum test) was used as appropriate in determining statistical significance (P < .05). When indicated, 2 groups of follow-up data were combined owing to inadequate statistical power (n ≥5 necessary to use Mann-Whitney U test). Anderson-Darling Normality test was used to evaluate the distribution in training time.
The TTT to achieve competency, TFT to reachieve competency, initial and final training scores, initial and final follow-up scores, number of exercise attempts, and other performance metrics were analyzed and compared.
All participants successfully completed training, becoming competent. The mean (SD) TTT was 3.3 (1.2) hours, with a range of 1.5 to 5.4 hours. The TTT distribution seemed to be bimodal (Figure 1), with a clean break centered at 3.6 hours, dividing the participants into short training time (STT) (n = 10 [63%]) and long training time (LTT) (n = 6 [37%]) groups but was not statistically significant (P > .05). The results suggest that there are not 2 distinct distribution groups but rather there is likely a normal distribution in which STT by definition represents a group of participants who needed less time to achieve competency compared with the LTT group. To then compare possible differences in performance metrics between participants who trained faster against those who trained slower, data for STT vs LTT were analyzed for statistical significance in the various measured metrics.
The mean (SD) TTT was 2.4 (0.6) hours for the STT group and 4.7 (0.5) hours for the LTT group (P < .001). The difference in mean exercise time (ie, time needed to finish each single exercise attempt) was nonsignificant between STT and LTT (P = .64), but the difference in the total number of exercise attempts needed to complete training was significant between the STT and LTT groups (P = .003), with a mean (SD) of 54.7 (14.8) and 100.2 (16.2) attempts, respectively.
The STT and LTT differences in initial training and final scores were nonsignificant (P = .06 and .64, respectively). The mean initial and final training scores for all participants were 71.18% (6.96%) and 93.20% (0.39%), respectively. Final training scores for all participants, the STT group, and the LTT group showed a significant improvement over initial scores (P = <.001, .002, and <.001, respectively). Video game history, as measured by current total time played per week and peak duration ever played, and sex did not have significant correlations with TTT (P = .79, .37, and .77, respectively).
All participants were able to reachieve competence on follow-up. Differences in mean TTT between the 4 follow-up groups (1, 3, 5, and 7 weeks after training) were nonsignificant (P > .05), and the groups had an even distribution of total, STT, and LTT participants. The mean (SD) initial follow-up scores for all participants were significantly improved over initial training score (86.37% [3.41%] and 71.18% [6.96%], respectively; P < .001). However, retraining was necessary for each participant to reachieve competency; that is, no participant achieved a score of at least 91% on the first attempt of all 12 exercises (ie, mean number attempts per exercise >1.0 for all participants). Final follow-up scores for all participants were improved significantly over initial follow-up score (93.90% [0.61%] vs 86.37% [3.41%]; P < .001). Group averages of initial and final scores for training and follow-up are illustrated in Figure 2. The mean TFTs were 44 (5) minutes, 63 (3) minutes, 59 (23) minutes, and 82 (21) minutes for the 1-, 3-, 5-, and 7-week groups, respectively, which were all significantly shorter than their respective TTTs (P = .01, .01, .03, and .01, respectively).
The larger standard deviations at the 5- and 7-week follow-ups compared with those at the 1- and 3-week follow-ups were due to divergence between STT and LTT participants (Figure 3). Lines of best fit demonstrate a stronger positive correlation for LTT (R2 = 0.945) that is steeper than the weaker correlation for STT (R2 = 0.216). While there was no significant difference between STT and LTT in mean TFT for combined follow-up scores at 1 and 3 weeks (P = .79), there were significant differences in mean TFT for combined follow-up scores at 5 and 7 weeks (P = .04). Results for the mean number of attempts per exercise to reachieve competence show the same pattern of divergence at the 5- and 7-week follow-ups (Figure 4). Comparison of individual TTT with TFT by follow-up group further illustrates the divergence that occurs with the 5-week and the 7-week follow-up groups secondary to the influence of STT vs LTT participants (Figure 5). Lines of best fit demonstrate weaker, moderately positive correlations for the 1-week and 3-week follow-up groups (R2 = 0.367 and 0.225, respectively) and much stronger, steeper positive correlations for the 5-week and 7-week follow-up groups (R2 = 0.871 and 0.742, respectively). Similar findings are demonstrated by initial follow-up scores as well (R2 = 0.103 and 0.742 for the STT and LTT groups, respectively) (Figure 2).
Differences between the STT and LTT groups in mean exercise time, total number of exercise attempts, and final scores for follow-up were nonsignificant (P = .79, .09, and .12, respectively).
As the number of robot-assisted operations increase, standardizing and implementing a robotic console training protocol is becoming a critical component of surgical training.6 The current literature on robotic training largely consists of studies in which participants were trained and assessed by having them complete a preset number of attempts (usually 1) of a given set of exercises, possibly with a preset amount of time beforehand for free-range practice and/or a preset amount of time for total training.10,12- 16 Our study tries to quantify the minimum necessary training to achieve competency. Each participant’s goal then was simply to achieve a standardized preset level of competency. Furthermore, we wanted to quantify how the newly acquired skills decay after a period of no training since all participants achieved statistically equivalent levels of competency after training. A previously published study17 measured laparoscopic skill retention 5 months after training and showed that laparoscopic proficiency-based simulator training resulted in durable improvement in operative skill even in the absence of practice for up to 5 months. In our study, we quantified trainee retention skills at multiple time points to better assess the rate of decline in skills so as to estimate the volume of robotic cases or simulator training sessions necessary to retain competence. While our study selected for exercises that tested skills relevant to TORS, eliminating exercises requiring the third robotic arm or requiring needle driving, the general findings of the study are likely applicable to those of other types of robotic surgery.
Most important, the training results demonstrate that all participants were able to achieve competency. However, a wide range of TTT was necessary (1.5-5.4 hours); the longest time a participant needed was approximately 3.5 times longer than the shortest time needed to achieve the same level of competency. The range in TTT is the result of the significant difference in the total number of exercise attempts needed to complete training between STT and LTT (P = .003), with a mean (SD) of 54.7 (14.8) attempts vs 100.2 (16.2) attempts. Thus, while all students can achieve competency, some require longer time on the console. This finding suggests that a standard “one-size-fits-all” training time or number of exercise attempts requirement should not be used as a measure of successful training completion.
In the follow-up period, evaluation of retained competency was based not only on initial follow-up scores, but also on the mean number of attempts per exercise needed to reachieve competency, which correlated with time needed to retrain (TFT). First, no participant retained full competency; that is, each participant needed an average number of attempts of at least 1.0. However, all participants retained a substantial degree of competency as judged by their initial follow-up scores (Figure 2), which were significantly improved over initial training scores (P < .001). In addition, all were able to reachieve competency in a significantly shorter time than it took for their initial training.
Second, competency as measured by time and number of attempts was stable within each follow-up group up to 3 weeks after training; after 5 weeks, participants started to demonstrate a wide range of retained competency as measured by time and number of attempts. This finding is supported by data showing no significant difference between the STT and LTT group in mean TFT for follow-up in combined 1- and 3-week scores (P = .79), while there were significant differences in TFT in combined 5- and 7-week scores (P = .04). Such results suggest that simulator training approximately every 4 weeks or a corresponding adequate surgical volume load may be necessary to maintain reasonable surgical competency for all surgeons. While simulator retraining prior to a case after a longer break from the console may be possible, we cannot say how long that refresher training would have to be to regain competency.
Third, the STT participants as a whole demonstrated very little decline in console competency through the entire 7-week follow-up period as measured by initial score as well as by time and attempts, while the LTT participants as a group demonstrated a steeper decline in competency through the same follow-up period. The STT participants seemed to not only acquire the relevant skills faster but also seemed to retain these skills longer with significantly less decline in competency than the LTT participants, who had required longer initial training times. Although both STT and LTT groups retained statistically similar levels of competence at 1 and 3 weeks after training as shown previously by the second finding, the LTT group demonstrated a rate of decline in retained skills approximately 3.60 times steeper than the STT group. This rate difference is demonstrated by the expected TFT values, as derived from lines of best fit (Figure 3). They show that by 7 weeks after training, the LTT group was expected to need 112% more time to complete retraining compared with their 1-week posttraining expected TFT time; yet, during the same interval, the STT group was expected to need only 31.9% more time to complete retraining. This longitudinal trend might suggest then that surgeons who require a longer training time may benefit from more frequent and/or longer retraining despite having achieved the same standardized level of competency after initial training and having initially spent more time with the console during their training.
The limitations of this study include the small sample size, and therefore further investigation is necessary to confirm the study’s findings. Also, we did not follow up with the participants at later time points to see if their skills underwent further decline with time. These data are needed in addition to further studies on quantifying decay in retained surgical skill over time and on standardizing TORS training protocol in the context of a standardized protocol incorporating other skill sets required for TORS, such as preoperative assessment and endoscopy skills.
In conclusion, physicians in training are able to acquire and retain robotic surgery competency using the dVSS virtual reality simulator. While all were able to achieve competency, the findings suggest that a standard one-size-fits-all training time or number of exercise attempts requirement should not be used as a measure of successful training completion. Participants who were able to acquire relevant skills faster tended to retain robotic skills longer than participants requiring longer initial training times. However, on retraining, all participants were able to regain equivalent levels of competence with no significant difference in final scores. This information can help establish a virtual reality simulator–based training program for residents prior to their clinical introduction to TORS. It also can provide a benchmark for determining TORS surgical volume or simulator training necessary to maintain competency. This study then serves as a preliminary report of initial findings, and we intend to continue to expand the study to include a larger study size as well as to follow-up with participants at later time points.
Submitted for Publication: April 3, 2013; final revision received July 9, 2013; accepted July 28, 2013.
Corresponding Author: Baran D. Sumer, MD, Department of Otolaryngology–Head and Neck Surgery, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 (firstname.lastname@example.org).
Published Online: September 19, 2013. doi:10.1001/jamaoto.2013.4720.
Author Contributions: Dr Sumer had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Both authors.
Acquisition of data: Both authors.
Analysis and interpretation of data: Both authors.
Drafting of the manuscript: Both authors.
Critical revision of the manuscript for important intellectual content: Sumer.
Statistical analysis: Both authors.
Obtained funding: Zhang.
Administrative, technical, or material support: Sumer.
Study supervision: Sumer.
Conflict of Interest Disclosures: Dr Sumer has received honoraria from Intuitive Surgical Inc for serving as a surgical proctor. No other disclosures are reported.
Role of the Sponsor: Intuitive Surgical Inc had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Funding/Support: The study was supported by the University of Texas Southwestern Medical Student Research Program.
Previous Presentation: This study was presented at the American Head and Neck Society 2013 Annual Meeting; April 10, 2013; Orlando, Florida.