Arnedt JT, Owens J, Crouch M, Stahl J, Carskadon MA. Neurobehavioral Performance of Residents After Heavy Night Call vs After Alcohol Ingestion. JAMA. 2005;294(9):1025-1033. doi:10.1001/jama.294.9.1025
Author Affiliations: Department of Psychiatry and Human Behavior, Brown Medical School (Drs Arnedt, Owens, and Carskadon); Division of Ambulatory Pediatrics, Rhode Island Hospital (Dr Owens and Mss Crouch and Stahl); and Sleep and Chronobiology Research Laboratory, E. P. Bradley Hospital (Dr Carskadon), Providence, RI. Dr Arnedt is now with the Sleep and Chronophysiology Laboratory, Department of Psychiatry, University of Michigan, Ann Arbor.
Context Concern exists about the effect of extended resident work hours; however, no study has evaluated training-related performance impairments against an accepted standard of functional impairment.
Objectives To compare post-call performance during a heavy call rotation (every fourth or fifth night) to performance with a blood alcohol concentration of 0.04 to 0.05 g% (per 100 mL of blood) during a light call rotation, and to evaluate the association between self-assessed and actual performance.
Design, Setting, and Participants A prospective 2-session within-subject study of 34 pediatric residents (18 women and 16 men; mean age, 28.7 years) in an academic medical center conducted between October 2001 and August 2003, who were tested under 4 conditions: light call, light call with alcohol, heavy call, and heavy call with placebo.
Interventions Residents attended a test session during the final week of a light call rotation (non–post-call) and during the final week of a heavy call rotation (post-call). At each session, they underwent a 60-minute test battery (light and heavy call conditions), ingested either alcohol (light call with alcohol condition) or placebo (heavy call with placebo condition), and repeated the test battery. Performance self-evaluations followed each test.
Main Outcome Measures Sustained attention, vigilance, and simulated driving performance measures; and self-report sleepiness, performance, and effort measures.
Results Participants achieved the target blood alcohol concentration. Compared with light call, heavy call reaction times were 7% slower (242.5 vs 225.9 milliseconds, P<.001); commission errors were 40% higher (38.2% vs 27.2%, P<.001); and lane variability (7.0 vs 5.5 ft, P<.001) and speed variability (4.1 vs 2.4 mph, P<.001) on the driving simulator were 27% and 71% greater, respectively. Speed variability was 29% greater in heavy call with placebo than light call with alcohol (4.2 vs 3.2 mph, P = .01), and reaction time, lapses, omission errors, and off-roads were not different. Correlation between self-assessed and actual performance under heavy call was significant for commission errors (r = –0.45, P = .01), lane variability (r = –0.76, P<.001), and speed variability (r = –0.71, P<.001), but not for reaction time.
Conclusions Post-call performance impairment during a heavy call rotation is comparable with impairment associated with a 0.04 to 0.05 g% blood alcohol concentration during a light call rotation, as measured by sustained attention, vigilance, and simulated driving tasks. Residents’ ability to judge this impairment may be limited and task-specific.
Work-related sleep loss and fatigue in medical training has become a source of increasing concern.1- 4 Although some studies demonstrate post-call performance deficits,5- 8 other studies do not.9- 12 These mixed findings have been attributed to methodological limitations,13,14 such as low power, absence of objective sleep measurement, outcome measures insensitive to sleepiness or lacking ecological validity (having little relevance to real-world demands), absence of control for circadian factors or stimulant use (caffeine), and questionable rested control groups (>4 hours of sleep).10 The serious consequences of resident sleep loss have been demonstrated in an intervention study that found that interns obtained 5.8 hours less sleep, had 50% more attentional failures, and committed 22% more serious errors on critical care units while working a traditional schedule compared with a schedule of reduced hours.15,16
On self-report measures, residents express concern about occupational and interpersonal difficulties stemming from sleep loss.1,2,17- 19 Of particular concern, self-reported lifetime rates of motor vehicle near-misses and crashes among residents20 are 2.5 and 3 times those of nonresident drivers, respectively.21 One survey found that compared with faculty members over the previous 3 years, more pediatric house officers reported falling asleep behind the wheel (49% vs 13%) and having motor vehicle crashes (20% vs 11%) during residency.22 A prospective study found that for every extended work shift, the monthly risk of a motor vehicle crash increased by 9.1%.23 Despite the heightened risk, only 2 small studies have examined resident driving impairment experimentally. Both found impairment of simulated driving post-call, but neither provided objective on-call sleep duration24,25 and 1 study used a driving task lacking face validity.24
One approach to measuring the magnitude of performance deficits associated with sleep loss is to compare performance following sleep loss and alcohol consumption. Dawson and Reid26 found that impairment on a tracking task after 17 hours of wakefulness was equivalent to a blood alcohol concentration (BAC) of 0.05 g% (per 100 mL of blood) in a sample of nonresidents. These findings have been replicated with other tasks, including simulated driving, in samples of nonresident university students and truck drivers.27- 29 Alcohol serves as a useful index for comparison because it impairs performance, even at lower BACs,30- 34 and legal limits of intoxication have been established. At 0.05 g% BAC (3-4 standard drinks), alcohol increases self-confidence; decreases inhibitions; diminishes attention, judgment, and control35; and leads to hazardous driving.36
Our primary goal was to compare post-call performance during a heavy call rotation to non–post-call performance during a light call rotation with a BAC of 0.04 to 0.05 g%, using tests of sustained attention, vigilance, and simulated driving. To maximize rested and sleepy states, residents were tested during the final week of light and heavy call rotations. A prospective within-subject 2-session design was used to enhance feasibility and generalizability, reduce attrition, and experimentally control for time of testing, alcohol expectancy, beverage consumption, and test fatigue for our comparison of primary interest. A second goal was to evaluate the association between self-assessed and actual performance call-related sleep loss and alcohol ingestion.
Participants were recruited from the Brown University Pediatrics residency program, Providence, RI. Interested residents were screened for the following exclusion criteria: aged older than 40 years; sleep disorder diagnosis; chronic medical condition; current psychiatric illness; current use of medication known to affect the sleep/wake cycle or daytime alertness, or that is a contraindication for alcohol ingestion; no or minimal prior alcohol exposure, defined as responding “never drink alcohol” or “never” to the question “How often do you have 2-4 drinks in one occasion at least once in a while”37; current or prior treatment for alcohol or substance abuse; and positive urine pregnancy screen.
Of the 115 residents potentially eligible for participation between October 2001 and August 2003, 43 (37%) responded to e-mail solicitations and were screened for eligibility. Six were excluded (2 were older than 40 years, 1 had a sleep disorder, 2 were taking medication, and 1 never consumed alcohol) and 2 declined. One enrolled participant completed only 1 test session. The final sample included 16 men and 18 women (mean [SD] age, 28.7 [2.7] years); 14 were interns, 15 were second-year residents, and 5 were third-year residents. Participants and nonparticipants or dropouts (n = 81) did not differ by age (P = .74), sex (P = .21), or residency type (pediatric, medicine/pediatrics, triple board; P = .94). Participants were told that the purpose of the study was to compare performance following call-related sleep loss and following alcohol, but they were not told the primary study hypothesis. Participants provided written informed consent and received US $200 upon successful completion of each test session. The study was approved by the institutional review boards at Rhode Island Hospital and Brown University, Providence, RI, and the University of Michigan, Ann Arbor.
Our study was a prospective 2-session within-subject study with 4 conditions: light call, light call with alcohol, heavy call, and heavy call with placebo. Light call and light call with alcohol occurred consecutively during a single session and followed a non-call night during the final week of a 4-week light call rotation. Heavy call and heavy call with placebo were completed post-call after 4 weeks of a heavy call rotation. We did not use a no-call condition as a control to maximize the generalizability of our findings by including actual rotations that are present in the training program. We did not include a light call with placebo condition because our primary comparison of interest was between the heavy call with placebo and light call with alcohol conditions, and a third testing session would likely have had a high attrition rate.
Light call rotations (behavioral, elective, or selective) were 4-week daytime clinic rotations averaging 44 hours per week, along with sick-call, which requires night call only if the on-call resident becomes ill. Heavy call rotations (neonatal intensive care, pediatric intensive care, or wards) averaged 90 hours per week (80 hours per week after July 2003) and mandated call every fourth or fifth night (34-36 consecutive hours per overnight call). Residents were allowed to work outside the authorized training program (moonlight) only with written permission from the residency director.
One week before the first session, participants attended a 90-minute orientation during which all participants practiced the tests and received a daily sleep/activity diary and wrist activity monitor (Actiwatch 64, Mini Mitter Company Inc, Bend, Ore). For 7 days before each test session, participants maintained the diary, self-selected their sleep schedules, and had activity levels continuously monitored. Sleep parameters were estimated using sleep analysis software. We adapted scoring procedures from Acebo et al.38 Nocturnal sleep (from 2100 to 0900 hours) was analyzed separately from daytime sleep (from 0901 to 2059).
On the test day, participants were not allowed to nap, ingest caffeine after noon, or to ingest food or drink other than water within 4 hours of testing, verified by self-report before each test session.39 Twenty-two residents were tested first during heavy call; the mean (SD) intersession interval was 106 (82) days.
Participants arrived at 1500 hours for the light call and heavy call sessions. After completing self-report sleepiness measures, participants performed a 60-minute battery that included the Psychomotor Vigilance Test (Ambulatory Monitoring Inc, Ardsley, NY),40 the Continuous Performance Test,41 and a simulated driving task (DriveSim, York Computer Technologies, Kingston, Ontario). The sleepiness measures were repeated every 30 minutes. After each test, participants completed self-assessments of performance and effort.
Following the light call condition testing, participants consumed alcohol (light call with alcohol condition). The alcohol dose was 0.6 g/kg for men and 0.55 g/kg for women, to produce equivalent peak BACs of 0.05 g%.42 The alcoholic beverage consisted of a commercial brand of chilled 80-proof vodka mixed with tonic water in a 1:5 ratio and one-fourth lime. Following the heavy call condition testing, participants consumed placebo (heavy call with placebo condition). The placebo was an equal volume of chilled tonic water and one-fourth lime. The total volume was distributed among three 12-oz cups and consumed at an equal rate over 30 minutes. To enhance the appearance that participants were receiving alcohol in both conditions, the drinks were mixed in plain view, alcohol and tonic were decanted from vodka bottles, and beverages were served with fresh lime. Participants repeated the battery 20 minutes postingestion. Breath samples were analyzed before and after tests using a handheld breathalyzer (AlcoSensor IV, Intoximeters Inc, St Louis, Mo).
After each session, participants rated their certainty of having received alcohol on a 0 (certain did not receive) to 100 (certain did receive) scale. For the light call with alcohol sessions, 63.6% of the participants were completely certain they had received alcohol; for the heavy call with placebo sessions, 67.6% of the participants were completely certain that they had received placebo. Four participants in the heavy call with placebo group were at least 25% certain of receiving alcohol.
Participants in the heavy call conditions were excused and driven home by a significant other or by taxi. Participants in the light call conditions either remained in the laboratory under supervision until their BAC dropped below 0.02 g% or they signed a release and were driven home by a significant other or by taxi.
Stanford Sleepiness Scale. The Stanford Sleepiness Scale43 is a 7-item scale that requires participants to rate their current sleepiness from 1 (feeling active and vital, wide awake) to 7 (almost in reverie, sleep onset soon, lost struggle to remain awake).
Visual Analog Scale. The visual analog scale comprised the questions, “How alert do you feel?”, “How sleepy do you feel?”, and “Overall, how do you feel?” Participants marked a 100-mm line, with anchors “very little/very bad” to “very much/very good.” The distance from the left edge of the scale to the participant mark was the score (range, 0-100). Higher scores indicated greater levels of alertness, sleepiness, and overall functioning. Scores on the alertness and overall functioning scales have been inverted for consistency.
Psychomotor Vigilance Task. The Psychomotor Vigilance Task40 is a 10-minute visual sustained-attention test that is sensitive to sleepiness44,45 and alcohol.34 Participants pressed a button on the handheld unit in response to numbers scrolling on the liquid crystal display screen with a 2- to 10-second interstimulus interval. Dependent variables were median reaction time and frequency of lapse (reaction time >500 milliseconds). Higher scores indicated worse performance.
Continuous Performance Test. The Continous Performance Test41 is a 14-minute computer vigilance task, previously used with residents,46,47 which requires participants to respond to any alphabetic letter except “x/X.” Stimuli are displayed for 250 milliseconds with interstimulus intervals of 1, 2, and 4 seconds. Dependent variables included errors of commission (%) and omission (%). Higher scores indicated worse performance.
Simulated Driving Task. The simulated driving task is a 30-minute task, sensitive to sleepiness and alcohol,48,49 which runs on a computer with software (DriveSim 3.00, York Computer Technologies), peripheral steering wheel, accelerator, and brake. The task presents a driver’s orientation of a 2-lane highway with lane markings, speed signs, and small trees along the roadside. Other vehicles appear periodically but participants ignore them. Instructions are to stay in the center of the right lane and follow a fixed speed limit (60 miles per hour) while driving on the straight road. “Wind” periodically and randomly pushes the simulated vehicle right, left, or not at all. Traveling off the road elicits a beep and the car is automatically placed back on the road. Dependent variables were lane variability (SD of the vehicle center from the center of the right lane measured in feet), speed variability (SD of the difference in the vehicle speed from the posted speed measured in miles per hour), and “off-roads” (number of times the vehicle left the road). Higher scores indicated worse performance.
Posttest Self-assessments. Dependent variables were performance and effort ratings. Performance was assessed on a 7-point Likert scale response to the statement “I feel my performance during this test was . . . ”, anchored with 1 = extremely good, 4 = fair, and 7 = extremely poor. Effort was assessed on a 4-point Likert scale response to the statement “The effort I had to expend to achieve this level of performance was . . . ”, anchored with 1 = very little effort and 4 = an extreme effort.
Variables that deviated significantly from normality were transformed for parametric analyses or dichotomized (lapses, omissions, off-roads) and analyzed using nonparametric McNemar tests. Data are reported as mean (SE) unless otherwise indicated, with significance level set at P=.05.
Continuous performance and subjective measures were analyzed with training year (interns vs second-year residents and third-year residents) by condition (light call, light call with alcohol, heavy call, or heavy call with placebo) mixed repeated measures analysis of variance. Main effects were followed by pairwise comparisons between light call and each of the other 3 experimental conditions (light call with alcohol, heavy call, and heavy call with placebo) and between light call with alcohol and heavy call with placebo. Because of our nonrandomized design, we secondarily examined order effects by separately analyzing performance in participants whose first session occurred during light call (n = 12). Test-dependent variables for light call with alcohol, heavy call, and heavy call with placebo were compared with posttest self-assessments using Spearman rank order correlations. Analyses were conducted using SPSS version 12.0 for Windows (SPSS Inc, Chicago, Ill).
Residents were on-call more frequently during the heavy than light call rotation (mean [SE], 7.3 [0.3] vs 1.4 [0.3] nights; P<.001). One resident reported moonlighting 6 days before light call testing; no moonlighting occurred during heavy call.
Actigraphy results for the 7 days and the 24 hours before testing are shown in Table 1. For the 7 days preceding each test session, the mean nocturnal sleep period (the elapsed time between sleep onset and sleep offset as scored by the actigraphy software) was 7 hours 32 minutes during light call compared with 6 hours 17 minutes during heavy call (P<.001). There were similar results comparing light call and heavy call with respect to total sleep time and the cumulative sleep duration for the week.
For the 24 hours preceding each test session, there was significantly more sleep during light call than heavy call as measured by nocturnal sleep period (7 hours 24 minutes vs 3 hours 56 minutes, respectively; P<.001), total sleep time (6 hours 37 minutes vs 3 hours 2 minutes, respectively; P<.001), and the cumulative sleep duration (6 hours 48 minutes vs 3 hours 8 minutes, respectively; P<.001). Nocturnal sleep was also more efficient during light call than heavy call (89.4% vs 82.3%, P = .02), but there were no rotation differences in diurnal sleep.
Blood alcohol concentrations were 0.0 g% before all sessions and after light call, heavy call, and heavy call with placebo. Mean (SE) peak BACs in the light call with alcohol assessment were 0.046 g% (0.002 g%) before the Psychomotor Vigilance Test and 0.041 g% (0.002 g%) after the PsychomotorVigilance Test; 0.041 g% (0.002 g%) before the Continuous Performance Test and 0.040 g% (0.001 g%) after the Continuous Performance Test; and 0.040 g% (0.001 g%) before the simulated driving task and 0.037 g% (0.002 g%) after the simulated driving task.
The Stanford Sleepiness Scale and visual analog scale ratings are summarized in Table 2. There were no main effects or interactions involving training year. The Stanford Sleepiness Scale ratings were higher in heavy call with placebo (mean, 4.5), light call with alcohol (mean, 3.3), and heavy call (mean, 4.6) vs light call (mean, 2.3; P<.001 for all comparisons). On the visual analog scale, the main effect of condition was significant for all subscales: alertness (light call: mean, 31.6; light call with alcohol: mean, 45.3; heavy call: mean, 68.4; heavy call with placebo: mean, 59.5; P<.001); sleepiness (light call: mean, 33.0; light call with alcohol: mean, 38.2; heavy call: mean, 77.9; heavy call with placebo: mean, 74.7; P<.001); and overall (light call: mean, 25.7; light call with alcohol: mean, 26.5; heavy call: mean, 53.5; heavy call with placebo: mean, 51.3; P<.001). Post hoc comparisons indicated that alertness, sleepiness, and overall ratings were higher (worse) in heavy call with placebo compared with light call with alcohol (P<.001) and in both heavy call and heavy call with placebo relative to light call (P<.001).
Results for actual and self-assessed performance are shown by condition in Table 3 and Table 4, respectively.
Psychomotor Vigilance Task. There were no main effects or interactions involving training year. Median reaction time for light call was 225.9 milliseconds, with reaction times 7% to 10% slower in light call with alcohol (248.4 milliseconds, P<.001), heavy call (242.5 milliseconds, P = .001), and heavy call with placebo (242.3 milliseconds, P<.001), and there was no difference between light call with alcohol and heavy call with placebo (P = .19). Lapses occurred more often in heavy call with placebo than with light call, but were not significantly different from light call with alcohol. There were no main effects or interactions involving condition in participants whose first session occurred during light call (n = 9).
More participants rated their performance as poor, very poor, or extremely poor in heavy call (26.8%, P = .03) and heavy call with placebo (34.5%, P = .008) than with light call (3.8%). Self-assessments of poor or worse performance did not differ between heavy call with placebo and light call with alcohol (26.9%) or effort ratings of quite a lot or extreme (42.3% for light call with alcohol and 46.1% for heavy call with placebo).
Posttest performance ratings were associated with reaction time for light call with alcohol (r = −0.65, P<.001) but not for heavy call (r = −0.18, P = .36) or heavy call with placebo (r = −0.01, P = .95).
Continuous Performance Test. There were no performance differences by training year with this test. Compared with commission errors in light call (27.2%), there were 40% to 70% more commission errors in light call with alcohol (46.5%, P<.001), heavy call (38.2%, P<.001), and heavy call with placebo (40.6%, P<.001), and 15% more in light call with alcohol than heavy call with placebo (P = .02). Omission errors occurred more often with heavy call (median, 0.3%; P = .01) and heavy call with placebo (median, 0.7%; P = .01) than with light call (median, 0%), and heavy call with placebo did not differ from light call with alcohol (median, 0.3%; P = .18). For those residents who completed light call first (n = 11), commission errors did not differ between light call with alcohol (48.8%) and heavy call with placebo (44.5%, P = .37), but in both conditions they were worse than light call (27.5% [SE, 4.3%]; P<.001 vs light call with alcohol, P = .001 vs heavy call with placebo).
Self-ratings of poor, very poor, or extremely poor performance were more common in heavy call (51.6%) than light call (12.9%, P = .002). There was no difference in the frequency of these performance ratings between heavy call with placebo (54.9%) and light call with alcohol (35.5%), but quite a lot and extreme effort ratings were more frequent in heavy call with placebo (64.5%) than light call with alcohol (32.3%, P = .01).
Performance ratings were associated with commission errors for light call with alcohol (r = −0.61, P<.001) and heavy call (r = −0.45, P = .01), but not for heavy call with placebo (r = −0.25, P = .17).
Simulated Driving Task. Performance was not significantly different by training year with the simulated driving task. Relative to light call lane variability (5.5 ft), lane variability was 13% to 27% higher in light call with alcohol (6.2 ft, P = .002), heavy call (7.0 ft, P<.001), and heavy call with placebo (6.8 ft, P<.001); light call with alcohol and heavy call with placebo did not significantly differ (P = .06). Speed variability was 29% greater in heavy call with placebo than light call with alcohol (4.2 vs 3.2 mph, P = .01) and was 34% to 75% higher in light call with alcohol (P = .01), heavy call with placebo (P<.001), and heavy call (4.1 mph, P<.001) compared with light call (2.4 mph). Off-roads occurred more frequently in heavy call (median, 1; P = .02) and heavy call with placebo (median, 1; P = .049) than light call (median, 0), and were not different between heavy call with placebo and light call with alcohol (median, 1). For those participants whose first session followed light call, heavy call with placebo performance was worse than light call with alcohol for lane variability and speed variability.
More than half of heavy call residents (58.8%) rated their simulated driving performance as poor or worse compared with only 5.9% of participants in the light call group (P<.001). These ratings were also more common in heavy call with placebo (44.1%) than light call with alcohol (11.7%, P = .007). Almost three quarters of participants in the heavy call with placebo group (73.6%) rated their effort as quite a lot or extreme compared with 17.6% for light call with alcohol (P<.001).
Performance ratings were associated with lane variability in heavy call (r = −0.76, P<.001) and heavy call with placebo (r = −0.50, P = .003), but not in light call with alcohol (r = −0.32, P = .06). Speed variability was associated with self-ratings in heavy call (r = −0.71, P<.001) and heavy call with placebo (r = −0.51, P = .002), but not in light call with alcohol (r = −0.04, P = .85).
Our primary findings were post-call performance decrements in attention, vigilance, and simulated driving following 4 weeks of heavy call compared with a light call rotation, similar to impairments associated with 0.04 to 0.05 g% BAC. Compared with light call, heavy call performance was characterized by slower and more variable reaction times and more commission errors on validated tests of sustained attention and vigilance. Heavy call residents were also less able to maintain a consistent lane position and speed, and ran off the road more often on a simulated driving task. Compared with alcohol ingestion, heavy call simulated driving speed variability was 30% higher, and reaction time, attention lapses, omission errors, and crashes were similar. These results were independent of training year and occurred despite self-ratings of greater effort in the heavy call with placebo group on 2 of the 3 tasks.
This is the first study to our knowledge to directly compare impairment related to heavy night call with that related to alcohol ingestion, an accepted standard of functional impairment. We selected performance tasks with known sensitivity to sleep loss and alcohol. Previous studies have demonstrated that both conditions individually increase reaction time, errors of omission, and errors of commission on neurobehavioral assays,34,50,51 and that they induce simulated driving impairments, characterized by increased variability in driving performance and a greater tendency to drive off the road.48,49,52 Together, sleep loss and alcohol produce at least additive impairments in driving performance.48,53 The Continuous Performance Test and Psychomotor Vigilance Test findings from our study suggest that, consistent with nonresident45,54,55 and other resident16,56,57 studies, sustained attention and vigilance are particularly sensitive to training-related sleep loss. The observed post-call deficits likely result not only from acute sleep loss but also from the superimposed chronic partial sleep deprivation experienced during training.13
These laboratory tasks have not been validated against actual medical tasks. However, post-call deterioration has been found in simulated (laparoscopy)58,59 and actual (perioperative complications)60 medical procedures that require the skills inherent in these assessments. The driving simulator findings are particularly provocative. In the heavy call with placebo group, tracking and speed variability were, respectively, around 10% and 30% greater than the light call with alcohol group.These results must be interpreted cautiously because few controlled studies have compared simulated with actual driving,61 and the strength of the relationship is likely simulator-specific. However, taken together with resident-reported increased motor vehicle crash rates,23 it seems likely that resident driving skills are impaired post-call and contribute to increased injury risk.
Having demonstrated performance deficits, it is equally important to know whether residents recognize these deficits. We found significant associations between actual performance and self-assessed performance for the Continuous Performance Test and the simulated driving task but not the Psychomotor Vigilance Test. The associations on the 2 former tests (range, −0.45 to −0.76) are similar in magnitude to previous sleep deprivation studies,62,63 but indicated only a limited ability of the residents to judge their impairment. The associations may have been highest on the simulated driving task because participants were better able to judge good driving rather than good reaction times. We additionally did not find systematic adaptation to chronic sleep loss effects with increasing training year despite self-reports of such adaptation.4
We controlled for methodological confounding variables present in previous studies13 by requiring practice of the dependent measures, testing participants at the same time of day, objectively documenting sleep duration, including driving as a test with real-world relevance, and restricting medication, alcohol, and caffeine use. Robust call differences were found despite light call residents having a daily mean of only 6:38 and 6:37 of actigraphically defined sleep for the 7 days and the 24 hours before the test session. Greater differences might have been observed if participants maintained a consistent 8-hour sleep schedule during light call; however, our external validity is enhanced by having participants self-select sleep schedules. Additional studies are needed that include truly rested control conditions to determine if impairment is present even on light call rotations.
Our study had several limitations. First, the small sample size meant that our main comparisons of interest had low statistical power, and we did not perform an intention-to-treat analysis. However, we successfully detected simulated driving differences between the heavy call with placebo and light call with alcohol groups and we had a relatively large sample size for studies on residents using a within-subjects design. We did not randomize or counterbalance the order of test conditions and cannot discount the possibility of order effects on our findings. However, the tests we used have relatively small practice effects41,64,65 and all participants practiced the outcome measures before the first test session. In addition, secondary analyses, although underpowered, showed a similar pattern of results.
There may have been a self-selection bias, such that participants may have wanted to demonstrate worse impairment after heavy call than after alcohol ingestion, and our attempts to blind participants to the presence or absence of alcohol were frequently unsuccessful. However, we believe that these results are valid because we did not communicate our specific hypotheses to the participants, we found consistently worse light call with alcohol than light call performance, and effort ratings were higher in heavy call with placebo than light call with alcohol. It is unlikely that participants could have titrated their light call with alcohol performance to be systematically worse than light call but not worse than heavy call with placebo, or that they used greater effort in heavy call with placebo if the goal was to show worse heavy call with placebo than light call with alcohol impairment. Intentional poor performance in heavy call with placebo would have been achieved by exerting minimal effort on the tasks.
Although the tests selected for our study were carefully chosen surrogates for skills that we hypothesized would be impaired by sleep loss in medical residents, we are unable to draw firm conclusions about the degree of training-related impairment associated with actual medical tasks or medical decision making. Our findings do suggest, however, that some of the constituent skills necessary to perform medical tasks are likely to be impaired post-call during a typical heavy call rotation. Finally, our results may not generalize to subspecialties other than pediatrics or to other residency programs with different light and heavy call rotation schedules.
In conclusion, our study demonstrates that resident performance impairment post-call after 4 weeks of heavy call is equivalent to or worse than the impairment observed at 0.04 to 0.05 g% BAC on tests of sustained attention, vigilance, and simulated driving. Moreover, residents’ self-assessment of heavy call performance is limited and task-dependent. These findings have important clinical implications. Residents must be made aware of post-call performance impairment and the potential risk to personal and patient safety. There should be sleep loss, fatigue, and countermeasure education in residency programs. Because sleepy residents may have limited ability to recognize the degree to which they are impaired, residency programs should consider these risks when designing work schedules and develop risk management strategies for residents, such as considering alternative call schedules or providing post-call napping quarters. Additional studies should examine the impact of these operational and educational interventions on resident driving safety and on patient care and safety.
Corresponding Author: J. Todd Arnedt, PhD, Sleep and Chronophysiology Laboratory, Department of Psychiatry, University of Michigan, 2101 Commonwealth Blvd, Suite D, Ann Arbor, MI 48105 (firstname.lastname@example.org).
Author Contributions: Dr Arnedt had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Arnedt, Owens, Carskadon.
Acquisition of data: Arnedt, Owens, Crouch, Stahl.
Analysis and interpretation of data: Arnedt, Owens, Crouch, Carskadon.
Drafting of the manuscript: Arnedt, Owens.
Critical revision of the manuscript for important intellectual content: Arnedt, Owens, Crouch, Stahl, Carskadon.
Statistical analysis: Arnedt, Owens, Crouch.
Obtained funding: Arnedt, Owens.
Administrative, technical, or material support: Arnedt, Owens, Crouch, Stahl, Carskadon.
Study supervision: Arnedt, Owens.
Financial Disclosures: None reported.
Funding/Support: This study was supported by American Sleep Medicine Foundation (formerly the Sleep Medicine Education and Research Foundation) grant 01-03-01 from the American Academy of Sleep Medicine.
Role of the Sponsor: The American Academy of Sleep Medicine did not participate in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.
Acknowledgment: We thank the pediatric residents who participated in the study and are grateful for the support of the Department of Pediatrics in the Brown Medical School. We are also grateful to Christine Gould, BA, for volunteering her time to help with data entry and management during the preparation of this article.