Figure. Percentage change in key variables before and after implementation of 2011 Accreditation Council for Graduate Medical Education (ACGME) reforms. Overall, work hours decreased by 4%, average number of hours of sleep per night over the past week increased by 3%, mean depressive symptoms decreased by 2%, and medical errors increase by 17% following the new ACGME guidelines. * P < .05; † P < .001.
Sen S, Kranzler HR, Didwania AK, et al. Effects of the 2011 duty hour reforms on interns and their patients: a prospective longitudinal cohort study. JAMA Intern Med. Published online March 25, 2013. doi:10.1001/jamainternmed.2013.351.
eAppendix. Assessment of workload satisfaction and learning environment and for secular trends in error rates
Sen S, Kranzler HR, Didwania AK, Schwartz AC, Amarnath S, Kolars JC, Dalack GW, Nichols B, Guille C. Effects of the 2011 Duty Hour Reforms on Interns and Their PatientsA Prospective Longitudinal Cohort Study. JAMA Intern Med. 2013;173(8):657–662. doi:10.1001/jamainternmed.2013.351
Author Affiliations: Departments of Psychiatry (Drs Sen and Dalack) and Internal Medicine (Dr Kolars), University of Michigan, Ann Arbor; Department of Psychiatry, Perelman School of Medicine, and University of Pennsylvania, Philadelphia VA Medical Center, Philadelphia (Dr Kranzler); Department of Internal Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois (Dr Didwania); Department of Psychiatry, Emory University School of Medicine, Atlanta, Georgia (Dr Schwartz); Department of Radiation Oncology, University of Washington, Seattle, (Dr Amarnath); Departments of Pediatrics and Medicine, Keck University of Southern California School of Medicine, Los Angeles County + University of Southern California Medical Center, Los Angeles (Dr Nichols); and Department of Psychiatry, Medical University of South Carolina, Charleston (Dr Guille).
Importance In 2003, the first phase of duty hour requirements for US residency programs recommended by the Accreditation Council for Graduate Medical Education (ACGME) was implemented. Evidence suggests that this first phase of duty hour requirements resulted in a modest improvement in resident well-being and patient safety. To build on these initial changes, the ACGME recommended a new set of duty hour requirements that took effect in July 2011.
Objective To determine the effects of the 2011 duty hour reforms on first-year residents (interns) and their patients.
Design As part of the Intern Health Study, we conducted a longitudinal cohort study comparing interns serving before (2009 and 2010) and interns serving after (2011) the implementation of the new duty hour requirements.
Setting Fifty-one residency programs at 14 university and community-based GME institutions.
Participants A total of 2323 medical interns.
Main Outcome Measures Self-reported duty hours, hours of sleep, depressive symptoms, well-being, and medical errors at 3, 6, 9, and 12 months of the internship year.
Results Fifty-eight percent of invited interns chose to participate in the study. Reported duty hours decreased from an average of 67.0 hours per week before the new rules to 64.3 hours per week after the new rules were instituted (P < .001). Despite the decrease in duty hours, there were no significant changes in hours slept (6.8 → 7.0; P = .17), depressive symptoms (5.8 → 5.7; P = .55) or well-being score (48.5 → 48.4; P = .86) reported by interns. With the new duty hour rules, the percentage of interns who reported concern about making a serious medical error increased from 19.9% to 23.3% (P = .007).
Conclusions and Relevance Although interns report working fewer hours under the new duty hour restrictions, this decrease has not been accompanied by an increase in hours of sleep or an improvement in depressive symptoms or well-being but has been accompanied by an unanticipated increase in self-reported medical errors.
Over the past 25 years, there has been a growing concern that the long duty hours and sleep deprivation that have traditionally been common during residency training can lead to adverse consequences for both residents and the patients that they treat.1 How best to promote rigorous, high-quality medical training while maintaining the health of residents and maximizing the safety of patients is intensely debated.2
Using input from the Institute of Medicine (IOM), the Accreditation Council for Graduate Medical Education (ACGME) established a new set of duty hour recommendations effective July 2011.1,3While the ACGME recommendations include numerous changes in supervision and in required oversight by programs, the most controversial change was to limit the maximum shift length for first-year residents (commonly referred to as interns) to 16 hours. This recommendation was based on studies showing that reducing the length of intern shifts in the intensive care unit reduced the incidence of serious medical errors4 and that long duty hours harm both patients5,6 and residents themselves.7- 9 Limiting shift length is controversial because other research has shown a substantial increase in patient handoffs with shorter shifts and a higher rate of medical errors with more handoffs.10,11 Furthermore, a recent study found that duty hour restrictions increased the stress felt by residents, despite a reduction in extended shifts.12
To assess the impact of the 2011 changes we analyzed data from the Intern Health Study,13 a large, multispecialty and multi-institutional longitudinal study of interns. In particular, we compared the experiences of interns serving before and interns serving after the new duty hour standards in terms of duty hours, sleep duration, depressive symptoms, and medical errors.
Graduate medical education (GME) institutions and medical schools who expressed interest in taking part in the Intern Health Study were included in the sample for this study.13 Fifty-one residency programs at 10 university-based and 4 community-based GME institutions allowed us to invite their incoming residents to participate in the study. In addition, 4 medical schools agreed to invite their graduating students to participate in the study. In total, 4352 individuals entering internal medicine, general surgery, pediatrics, obstetrics and gynecology, emergency medicine, combined medicine and pediatrics, transitional year, and psychiatry residency programs during the 2009, 2010, and 2011 academic years were invited to participate via e-mail sent 2 months prior to commencing internship.
E-mail invitations for 348 individuals were returned as undeliverable, and we were unable to obtain a valid e-mail address. Fifty-eight percent of the remaining invited individuals (2323 of 4005) agreed to participate in the study. The institutional review boards at all of the participating institutions approved the study. Participants were compensated $50 for their time and effort. Because there was some variation from year to year in the set of residency programs assessed in the study, we performed a secondary analysis including only participants at residency programs that took part in the study in all 3 years of study.
The set of included residency programs varied slightly across the 2009, 2010, and 2011 cohorts for 2 reasons: (1) the participants invited through their graduating medical school enrolled in a different set of residency programs each year and (2) 9 residency programs took part in only 1 or 2 years of the study. To evaluate whether the different set of programs between cohorts was a confounder, we conducted a secondary analysis of the “common program subsample” including only residency programs that took part in the study during all 3 academic years.
The procedures used in the Intern Health Study have been detailed previously.13 Briefly, all surveys were conducted through a secure online website designed to maintain confidentiality, with participants identified only by numbers.
Participants completed a baseline survey 2 months prior to commencing internship that assessed general demographic factors (age, sex, marital status), and psychological factors (baseline Patient Health Questionnaire–9 [PHQ-9] depressive symptoms, history of depression and a measure of personality, the NEO-Five Factor Inventory Neuroticism).14,15 The PHQ-9 is a 9-item self-report component of the Primary Care Evaluation of Mental Disorders (PRIME-MD) inventory, designed to screen for depressive symptoms. A score of 10.0 or greater on the PHQ-9 has a sensitivity of 93% and a specificity of 88% for the diagnosis of major depressive disorder.16
Interns were contacted via e-mail on the first Tuesday of months 3, 6, 9, and 12 of the internship year and asked to complete a previously published survey addressing work hours, sleep, and medical error.13 Survey questions included “How many hours have you worked in the past week?” and “On average, how many hours have you slept per night over the past week?” To assess medical errors, the survey asked the question “Are you concerned that you have made any major medical errors in the last 3 months?”17 Interns' depressive symptoms were assessed with the PHQ-9. To minimize the time burden on interns, Subjective Well-Being, assessed through the 14-item Mental Health Continuum–Short Form, was administered only at month 6.18
Baseline factors were compared between the preimplementation and postimplementation cohorts with a 2-sample t test for continuous measures and a χ2 test for categorical measures. Generalized estimating equations (GEE) were used to assess the effects of the duty hour rule while accounting for the repeated measures within participants during internship. In the GEE analyses, cohort membership was entered as the predictor variable (preimplementation [2009 and 2010] cohort compared with postimplementation  cohort), and quarterly reports of duty hours, sleep hours, depressive symptoms, and medical errors were assessed as outcomes. To evaluate differences in response to duty hour reforms among medical specialties, a cohort membership × specialty interaction term was included in these analyses. Two-sample t tests were used to compare subjective well-being between cohorts. Analyses were performed using SPSS software (version 19.0; SPSS Inc).
Compared with all individuals entering internship in 2009, 2010, and 2011 nationally, our sample was younger (27.5 years old vs 28.8 years old) and included a slightly higher percentage of women (50.9% vs 48.6%). There were no significant differences in specialty, institution, or demographic variables between individuals who participated and those who did not (P > .05 for all comparisons) (demographic information for full residency programs provided by the American Association of Medical Colleges; e-mail communication; September 17, 2012).
For the 2323 study participants, there were no significant differences in specialty, baseline depressive symptom score, neuroticism, sex, or age between the preimplementation (2009 [n = 714]) and 2010 [n = 772]) and postimplementation (2011 [n = 837]) cohorts (Table 1; all P values > .05 [P = .23-.95]). Furthermore, there were no differences in these baseline characteristics between participants who completed 1, 2, 3, or 4 quarterly surveys (all P values > .05 [P = .18-.93]). On average, 65% of participants responded to each quarterly survey. In total, 8099 baseline and quarterly participant assessments were completed.
Across the 4 quarterly assessments, the mean number of duty hours reported per week decreased significantly from a mean (SD) of 67.0 (17.0) in the preimplementation cohort to 64.3 (21.7) in the postimplementation cohort (P < .001). There was no significant effect of specialty on the change in work hours (P = .50). The percentage of interns who reported working more than 80 hours per week decreased from 12.8% to 7.8% (P < .001). In contrast, the mean number of reported hours slept each day was not significantly different between cohorts, with 6.8 (3.7) hours of sleep for the preimplementation cohort and 7.0 (4.3) hours of sleep for the postimplementation cohort (P = .17). The mean PHQ depression score during the year also did not change significantly between cohorts, with a mean PHQ depressive symptom score of 5.8 (4.8) for the preimplementation cohort and 5.7 (4.6) for the postimplementation cohort (P = .55). Correspondingly, the percentage of interns meeting PHQ criteria for depression (PHQ score ≥ 10.0) did not change significantly between cohorts, with 20.0% of the preimplementation cohort and 18.7% of the postimplementation cohort meeting criteria for depression (P = .39). Consistent with the depressive symptom scores, the subjective well-being score did not change significantly under the new duty hour rules, with mean scores of 48.5 (13.9) for the preimplementation cohort and 48.4 (13.9) for the postimplementation cohort (P = .86). The percentage of interns who reported concern about making a serious medical error increased significantly from 19.9% in the preimplementation cohort to 23.3% in the postimplementation cohort (P = .007) (Table 2 and Figure). There was no significant effect of specialty on the change in error rates (P = .15). Interns in both the preimplementation and postimplementation cohort who met PHQ criteria for depression were significantly more likely to report a medical error compared with residents who did not meet criteria for depression (35.3% vs 17.8%; P < .001) (Table 2 and Figure).
Forty-two of the 51 residency programs participated in all 3 years of the study (N = 939 participants). In this subsample, the mean number of hours worked per week decreased from 67.1 (19.2) preimplementation to 63.5 (18.4) postimplementation. The magnitude of the decrease (3.6 hours) was similar to that in the full sample (2.7 hours) (P < .001). The percentage of interns who reported committing a serious medical error increased from 19.1% in the preimplementation cohort to 22.3% in the postimplementation cohort (P = .10). Although the increase in the percentage of interns committing a serious error with the implementation of the duty hour rules in the subsample (3.2%) was similar to that in the full sample (3.4%), the difference in the subsample did not reach statistical significance. All cohort differences in baseline and outcome measures that were nonsignificant in the overall sample were also nonsignificant in the subsample analyses. (Assessment of workload satisfaction and learning environment and for secular trends in error rates are included in the eAppendix.)
In this prospective, longitudinal, multi-institutional study, we found that during the 2011-2012 academic year, interns reported working fewer hours but more frequently reported concerns about committing medical errors than interns serving before the ACGME duty hour reforms were implemented. Furthermore, we found that the reforms were not associated with any reported changes in interns' sleep duration or their symptoms of depression.
Investigations of the 2003 ACGME duty hour restrictions differed as to whether residents worked fewer hours19,20 but generally agreed that these restrictions did not result in increased sleep for residents.19,21,22 With the more restrictive 2011 ACGME requirements, we found clear evidence that interns were working fewer hours. Similar to the 2003 reforms, however, our findings indicate that the 2011 ACGME restrictions did not significantly increase sleep for interns. In response to the new restriction on maximum shift length, many residency programs instituted a night float system. While we did not assess which programs switched to a night float system, our results are consistent with those of earlier work suggesting that the implementation of a night float system does not increase sleep for residents.23 Some studies have suggested that changing away from traditional internship schedules can reduce the number of hours preceded by little or no sleep even without having a substantial effect on total sleep hours experienced by residents.12,24
Previous work has demonstrated that physicians experience a substantial increase in depressive symptoms during internship.13,25 In this study, we found that following implementation of the 2011 duty hour rules, the levels of depressive symptoms and well-being among interns were unchanged from those of previous cohorts. High levels of depression have been linked to more medical errors and poorer clinical performance.17,26 In this sample, we found further evidence supporting this link, with interns meeting PHQ criteria for depression reporting medical errors (35.3%) at almost twice the rate as nondepressed interns (17.8%).
Based on information from national mortality studies and single-institution longitudinal studies, the IOM concluded that the 2003 ACGME reforms did not harm patients and may have modestly improved outcomes for a subset of patients.1 However, the magnitude of the improvement in patient safety was not as substantial as some had predicted, suggesting that a decrease in continuity of care may have largely offset a reduction in fatigue-related errors resulting from the reforms.1 In contrast with the evidence suggesting a modest improvement in safety with the 2003 reforms, in this study of the 2011 reforms, we found unexpected evidence of an increase in self-reported medical errors among interns. Although our analysis of errors in the common program subsample was not statistically significant, the difference in the rate of medical errors in the subsample was comparable with that in the overall sample, suggesting that the lack of significance may be due to inadequate statistical power.
There are multiple possible reasons why the 2011 ACGME reforms may not have achieve their stated goals. Given that increased sleep was a key mechanisms through which the new duty hour restrictions were intended to improve the health of residents,3 the lack of such an effect in the postimplementation cohort in our study is a cause for concern. Designing work schedules that account for circadian phase and explicitly training residents on practices to increase sleep time and improve sleep quality may be necessary.27
In addition, for many hospitals, the new duty hour restrictions were not accompanied by funding to hire additional clinical staff.28 As a result, the duty hour restrictions may have exacerbated the problem of work compression, with residents expected to complete the same amount of work as previous cohorts but in less total time.29 Increased work compression has been associated with poorer clinical performance and decreased satisfaction among residents.30 Consistent with increased work compression, the 2011 duty hour reforms were associated with a modest, nonsignificantly lower level of workload satisfaction, despite being associated with a reduction in duty hours. Residents have also noted that the new duty hour rules necessitated a shift toward longer daily shifts and eliminated the reprieve of a “postcall” day.31 Evidence that resource-intensive interventions have been effective in improving resident satisfaction and patient care suggests that it may be necessary to allocate substantially more resources to achieve the improvements targeted by the ACGME reforms.30
Finally, evidence strongly indicates that implementing the new ACGME reforms has resulted in an increased number of handoffs in most residency programs.31 The increase in handoffs may be a contributing factor to the increase in self-reported medical errors with the implementation of the new duty hour. While curricula in handoff training for residents have remained largely undeveloped, initial studies of standardized handoff systems have been effective in reducing errors.32,33
There are important features of our study that strengthen confidence in the results. First, we studied large cohorts of residents, working in over a dozen different hospital systems, community and academic hospitals, and hospitals in all major regions of the United States. The full sample provided statistical power to detect effects of small to moderate size. Second, by focusing on interns, the target of the most aggressive and controversial aspects of the 2011 ACGME reforms, the study provides important information on the core aspects of the reforms. Third, the prospective design allows us to differentiate cohort outcomes from outcomes owing to preexisting differences between the preimplementation and postimplementation cohorts. Fourth, by conducting quarterly assessments, we were able to obtain a picture of the ACGME reform effects throughout the year rather than a single cross-sectional snapshot.
In addition to these strengths there are limitations to our study. The most important limitation is the self-report nature of our assessment. Although self-report represents the only practical way to gather data on a large and varied sample of participants, this method is susceptible to participant bias, especially on topics as controversial as medical errors and duty hour reforms. We guarded against the influence of participant bias by embedding this study in a larger study with the stated aim of identifying genetic predictors of depression under stress, and, thus, the likelihood of participant bias in report of medical errors and duty hours was reduced.13 Furthermore, we did not ask interns about their opinions of duty hour reforms but queried them about more objective outcome measures of interest. Specifically related to medical errors, we did not find any evidence for trending in the self-reported error rates in the years preceding duty hour reforms or through the 4 quarters within years (eAppendix). This suggests that any long-term trends toward increasing patient safety awareness did not account for the increased rate of self-reported medical errors that we identified following the implementation of duty hour reforms. While the validity of self-report is questioned in skills assessment, self-report may be a more valid measure for variables assessed in this study.34 For example, self-reported work hours match well with swipe-in and swipe-out electronic recordings of resident work hours.35 Furthermore, some studies showed that most self-reported errors were confirmed by medical record review.36,37 Other work, however, has suggested that physician self-report and medical record review identified a similar number of errors, but that there was only partial overlap in the errors identified, with self-reported errors more likely to be preventable medical errors.38 Although it is difficult to obtain gold standard measures (eg, through medical record review, electronic recording of work hours) on large samples across multiple institutions, investigators were able to conduct single-program objective assessment studies of the effect of the 2003 duty hour reforms.19,39 Similar studies evaluating the 2011 duty hour reforms would provide an important complement to this study.
There are other important limitations to our study. First, because the preimplementation cohort served in 2009 to 2010 and 2010 to 2011 and the postimplementation cohort served in 2011 to 2012, the differences between cohorts could have resulted from stricter enforcement of 2003 duty hour reforms, anticipation of future reforms, or secular trends unrelated to duty hour reforms. Because our study spanned 3 sequential academic years, however, it is unlikely that long-term trends could explain the observed effects. Furthermore, because the identified national trend has been toward improvements in the quality of care, temporal trends would, if anything, have served to obscure effects of duty hour changes.40 Second, only 58% of invited individuals chose to participate in the study. Although this is a comparatively high participation rate for a multi-institutional study, any systematic differences between participants and nonparticipants could introduce bias. Finally, our study assessed only the effects of duty hour reforms during the first year of their implementation. Studies should assess changes in interns' sleep and rates of depression and medical errors in future years, after hospital systems have had time to adjust to the new duty hour restrictions.
In conclusion, this large prospective longitudinal study of the 2011 ACGME duty hour reforms found that, although the reforms reduced the total number of hours that interns are on duty, they did not affect interns' duration of sleep or mental health and increased the frequency of self-reported medical errors. Different strategies for improving resident education and patient care may be necessary to achieve the desired impact of ACGME reforms.
Correspondence: Srijan Sen, MD, PhD, Department of Psychiatry, University of Michigan, 5047 BSRB, 109 Zina Pitcher Pl, Ann Arbor, MI 48109 (email@example.com).
Accepted for Publication: January 1, 2013.
Published Online: March 25, 2013. doi:10.1001/jamainternmed.2013.351
Author Contributions: Dr Sen had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Sen, Schwartz, and Guille. Acquisition of data: Sen, Nichols, and Guille. Analysis and interpretation of data: Sen, Kranzler, Didwania, Amarnath, Kolars, Dalack, and Guille. Drafting of the manuscript: Sen, and Kolars. Critical revision of the manuscript for important intellectual content: Sen, Kranzler, Didwania, Schwartz, Amarnath, Kolars, Dalack, Nichols, and Guille. Statistical analysis: Sen. Obtained funding: Sen and Guille. Administrative, technical, and material support: Sen, Kranzler, Didwania, Schwartz, Dalack, Nichols, and Guille. Study supervision: Sen.
Conflict of Interest Disclosures: Dr Krantzler has been a consultant or advisory board member for Alkermes, Lilly, Lundbeck, Pfizer, and Roche. He is also a member of the American Society of Clinical Psychopharmacology, which is supported by Lilly, Lundbeck, Abbott, and Pfizer.
Funding/Support: The project was supported by the following grants: a Young Investigator Grant from the American Foundation for Suicide Prevention (Dr Sen), UL1RR024986 from the National Center for Research Resources (Dr Sen), MH095109 from the National Institute of Mental Health (Dr Sen), and AA013736 from the National Institute on Alcohol Abuse and Alcoholism (Dr Kranzler).
Role of the Sponsors: The funding agencies played no role in the design and conduct of the study; collection management, analysis, or interpretation of the data; and preparation, review, or approval of the manuscript.
Additional Contributions: Faren Grant, BA (University of Michigan), and Heather Bryant, MA (University of Michigan), served as study coordinators for the project and were compensated for their work on the Intern Health Study. We thank the participating interns and program directors for the time that they invested in this study.