Basic 1 indicates a single straightforward diagnosis; basic 2, 2 or more straightforward diagnoses; Card, cardiologists in practice or cardiology fellows; I, ischemia/infarction, M, metabolism/inflammation (eg, pericarditis, hyperkalemia, or drug effect); MS, medical students; N, normal; O, other; PG, postgraduate physicians (residents); R, rhythm; S, structure (eg, hypertrophy or conduction block). Boxes indicate the mean accuracy score for each study. The diamond and dashed vertical line indicate the median score across studies.
See Figure 2 for explanation of abbreviations and difficulty levels. Boxes indicate the mean accuracy score for each study, diamonds indicate pooled estimates across studies, and horizontal lines indicate 95% CIs.
eBox. Full Search Strategy
eTable. Features of Studies of Physician ECG Interpretation Accuracy
eFigure. Physician ECG Interpretation Accuracy After an Educational Intervention, All Training Levels Combined
Customize your JAMA Network experience by selecting one or more topics from the list below.
Cook DA, Oh S, Pusic MV. Accuracy of Physicians’ Electrocardiogram Interpretations: A Systematic Review and Meta-analysis. JAMA Intern Med. 2020;180(11):1461–1471. doi:10.1001/jamainternmed.2020.3989
How accurate are physicians and medical students in interpreting electrocardiograms (ECGs)?
In this meta-analysis of 78 original studies, the accuracy of ECG interpretation was low in the absence of training and varied widely across studies. Accuracy was higher after training but still relatively low and was higher, as expected, with progressive training and specialization.
Physicians at all training levels may have deficiencies in ECG interpretation, even after educational interventions.
The electrocardiogram (ECG) is the most common cardiovascular diagnostic test. Physicians’ skill in ECG interpretation is incompletely understood.
To identify and summarize published research on the accuracy of physicians’ ECG interpretations.
A search of PubMed/MEDLINE, Embase, Cochrane CENTRAL (Central Register of Controlled Trials), PsycINFO, CINAHL (Cumulative Index to Nursing and Allied Health), ERIC (Education Resources Information Center), and Web of Science was conducted for articles published from database inception to February 21, 2020.
Of 1138 articles initially identified, 78 studies that assessed the accuracy of physicians’ or medical students’ ECG interpretations in a test setting were selected.
Data Extraction and Synthesis
Data on study purpose, participants, assessment features, and outcomes were abstracted, and methodological quality was appraised with the Medical Education Research Study Quality Instrument. Results were pooled using random-effects meta-analysis.
Main Outcomes and Measures
Accuracy of ECG interpretation.
Of 1138 studies initially identified, 78 assessed the accuracy of ECG interpretation. Across all training levels, the median accuracy was 54% (interquartile range [IQR], 40%-66%; n = 62 studies) on pretraining assessments and 67% (IQR, 55%-77%; n = 47 studies) on posttraining assessments. Accuracy varied widely across studies. The pooled accuracy for pretraining assessments was 42.0% (95% CI, 34.3%-49.6%; n = 24 studies; I2 = 99%) for medical students, 55.8% (95% CI, 48.1%-63.6%; n = 37 studies; I2 = 96%) for residents, 68.5% (95% CI, 57.6%-79.5%; n = 10 studies; I2 = 86%) for practicing physicians, and 74.9% (95% CI, 63.2%-86.7%; n = 8 studies; I2 = 22%) for cardiologists.
Conclusions and Relevance
Physicians at all training levels had deficiencies in ECG interpretation, even after educational interventions. Improved education across the practice continuum appears warranted. Wide variation in outcomes could reflect real differences in training or skill or differences in assessment design.
Quiz Ref IDElectrocardiography is the most commonly performed cardiovascular diagnostic test,1 and electrocardiogram (ECG) interpretation is an essential skill for most physicians.2-4 Interpretation of an ECG is a complex task that requires integration of knowledge of anatomy, electrophysiology, and pathophysiology, visual pattern recognition, and diagnostic reasoning.5 Despite the importance of this diagnostic test and several position statements regarding education in ECG interpretation,2-4,6-8 evidence regarding the optimal techniques for training, assessing, and maintaining this skill is lacking.9 As part of understanding the potential need for further educational reform, it would be helpful to know the accuracy of physicians and physician trainees in interpreting ECGs. A systematic review10 published in 2003 found frequent errors and disagreements in physicians’ ECG interpretations as reported in 32 studies. However, that review10 did not offer a quantitative synthesis of results and is now 18 years old. Other reviews9,11-14 of physicians’ ECG interpretations have focused on training interventions rather than accuracy. We believe an updated review of evidence regarding physicians’ ECG interpretation accuracy, together with a quantitative synthesis, would be useful to physicians in practice, medical school teachers, and administrators.
The purpose of the present study (part of a larger systematic review of ECG education) is to systematically identify and summarize published research that measured the accuracy of physicians’ ECG interpretations. We focused this review on studies that assessed interpretation accuracy in a controlled (educational test) setting; this approach permits adjudication relative to a single correct (accurate) response, which is challenging in real clinical practice.
We systematically searched the PubMed/MEDLINE, Embase, Cochrane CENTRAL (Central Register of Controlled Trials), PsycINFO, CINAHL (Cumulative Index to Nursing and Allied Health), ERIC (Education Resources Information Center), and Web of Science databases for articles published from database inception to February 21, 2020, using a search strategy developed with assistance from a research librarian. The full search strategy can be found in the eBox in the Supplement; key terms included the topic (ECG, EKG, and electrocardiogram), population (medical education, medical students, residents, and physicians), and outcomes (learning effectiveness, learning outcomes, learning efficiency, and impact). We also hand searched the references of reviews to identify omitted articles.9-14 This study followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline.15
We included all studies that assessed the accuracy of physicians’ or medical students’ ECG interpretations in a test setting. We made no exclusions based on the language or date of publication. Working independently, 2 authors (M.V.P. and S.Y.O.) screened each study for inclusion, reading first the title and abstract and second the full text if needed. Conflicts were resolved by consensus.
For each included article, all authors (D.A.C., M.V.P., and S.Y.O.) worked in pairs using software designed for systematic reviews (DistillerSR, Evidence Partners Inc) to abstract data on study purpose (training intervention, survey of competence, or assessment validation), study participants (number and training level), accuracy scores, assessment features (number and selection of items, response format, and scoring rubric), validity evidence, and study methodological quality. We recorded accuracy scores separately for assessments conducted before training (including survey studies without intervention) and those performed after a training intervention. Interrater agreement was substantial (κ>0.6) for all extracted elements. All disagreements were resolved by consensus.
We appraised general methodological quality using the Medical Education Research Study Quality Instrument,16 which was developed to appraise the methodological quality of any quantitative medical education research study. We appraised the quality of the outcome measure (ie, accuracy test) using the 5 sources validation framework,17 which identifies 5 potential sources of validity evidence: content, response process, internal structure, relations with other variables, and consequences.18,19
We used the I2 statistic20 to quantify inconsistency (heterogeneity) across studies. The I2 estimates the percentage of variability across studies not attributable to chance, and values greater than 50% indicate substantial inconsistency. Because we anticipated (and confirmed) substantial inconsistency across studies, we pooled accuracy scores within each physician subgroup and weighted by sample size using random-effects meta-analysis. Given that meta-analysis may be inappropriate in the setting of large differences in test content and difficulty, we also reported the median score. We planned subgroup analyses by timing (before vs after a training intervention), response format (free text vs selection from a predefined list), and author-reported examination difficulty. We found only 1 study21 that restricted items to a single empirically determined level of difficulty and thus used the number of correct diagnoses per ECG (1 or >1 [more difficult]) as a surrogate for item difficulty. We used the z test22 to evaluate the statistical significance of subgroup interactions. We planned sensitivity analyses restricted to survey studies and to studies using robust approaches to determine correct answers. We used SAS software, version 9.4 (SAS Institute Inc) for all analyses. Statistical significance was defined by a 2-sided α = .05.
In this systematic review and meta-analysis, of the 1138 articles initially identified, 78 studies (enrolling a total of 10 056 participants) reported accuracy data (Figure 1).12,21,23-98 For 2 studies,59,68 data were reported in several different publications; in each case, we abstracted 1 report (the most complete). The eTable in the Supplement summarizes the key features of study designs and tests.
Forty-one studies23-25,27,29,31,34-36,40,42,43,48,51,52,56,57,59-62,64,65,67,69,71-75,77,81,82,84-86,91,93,95,96,98 involved medical students (n = 4256 participants), 42 studies12,21,23,24,26,32,33,35-39,41,44-47,49-51,53-55,58,60-63,66,69,70,75,76,79,80,84,88-91,94,97 involved postgraduate physicians (n = 2379), 11 studies25,38,47,60,66,68,78,83,87,92,94 involved noncardiologist practicing physicians (n = 1074), and 10 studies25,28,30,36,47,68,83,87,91,92 involved cardiologists or cardiology fellows (n = 2094); 4 mixed-participant studies35,47,74,84 did not report training level–specific sample sizes (n = 253). Twenty-six studies12,27,29,31,35,36,39,56-59,62,64,65,71,72,74,75,77,80-82,85,86,96,98 were randomized trials, 11 studies21,23,24,34,42,43,48,52,53,73,97 were 2-group nonrandomized comparisons, 19 studies25,30,32,41,50,66,69,70,78,79,84,87-90,92-95 were single-group pre-post comparisons, and 22 studies26,28,33,37,38,40,44-47,49,51,54,55,60,61,63,67,68,76,83,91 were cross-sectional (single time point). Twenty-two studies were identified as surveys.26,30,33,36,38-41,44-46,49,51,53-55,61,63,66-68,83 Of the 47 studies reporting posttraining data,12,23-25,27-29,31,32,34,35,37,41-43,47,48,50,52,56,57,59,62,64-66,69,71-75,77-82,84-89,95-97 the training interventions included face-to-face lectures or seminars (22 studies12,25,27,29,31,32,42,43,52,57,59,62,72,74,75,77,80-82,88,95,97), computer tutorials (20 studies12,27,29,35,47,48,66,69,71,73-75,77-79,82,84,86,87,89), independent study materials (7 studies24,25,34,57,64,85,96), clinical training activities (4 studies28,37,41,50), and 7 other miscellaneous activities23,24,52,56,65,90,92 (some studies used >1 intervention type). On further quality appraisal, all studies reported an objective outcome of learning (ie, accuracy), 54 studies23-26,28,30-35,37-44,46-59,61,62,64,65,67,68,74-76,78,81,85,88-90,92-94,96-98 had high (≥75%) follow-up, 16 studies12,26,28,37,41,44,45,51,55,66,67,70,75,76,79,87 involved more than 1 institution, and 66 studies12,21,23,24,27,29,32-37,39-42,44-76,78-82,85-88,90,91,93-98 used appropriate statistical tests.
The number of test items ranged from 1 to 100 (median, 10; IQR, 9-20). The ECG diagnoses represented in the test were reported in 53 studies (68%)12,21,25,26,30,31,33,34,36-38,40,43-47,49-55,57,59,61,63,64,66-71,75,76,78,80,83-92,94-96,98; these studies included normal ECGs (26 studies31,33,34,37,38,40,45,47,49,51,52,54,63,66,68,69,75,76,78,84,87,90-92,94,96) and abnormalities of rhythm (45 studies12,25,26,31,33,34,36,37,40,43-47,49-52,54,55,57,59,61,63,64,66-68,70,71,75,76,78,80,83-90,95,96,98), ischemia (42 studies12,21,25,26,30,31,33,34,36-38,40,44,46,47,49,51,52,55,57,59,61,63,67-71,75,76,78,83-85,88-92,94-96), structure (41 studies12,21,25,26,31,33,34,36-38,40,44-47,49-52,54,55,57,59,61,63,66-68,70,75,76,78,80,83-85,87,89-91,96), and metabolism/inflammation (23 studies21,25,26,31,34,36,38,46,47,49,53-55,57,61,63,66,68,70,75,76,91,96). The ECG complexity or difficulty was intentionally set or empirically determined in 32 studies.21,24,26,28,30-33,36,38,43-45,49,51,53,54,56,59-61,63,66,67,69,75,90-92,94,95,98 Among these, the ECGs reflected a simple interpretation (single diagnosis) in 20 studies,26,31,33,38,43-45,49,51,53,54,59,61,66,67,75,92,94,95,98 several straightforward diagnoses in 2 studies,28,63 deliberately complex cases in 3 studies,21,30,36 and a mixture of simple and complex cases in 7 studies.24,32,56,60,69,90,91 Participants provided free-text responses in 26 studies21,23,30,32,33,36-40,42,44-46,49-51,55,59,61,68,70,80,90,94,97 and selected from a predefined list of answers in 24 studies.12,28,47,53,54,56,62,66,67,69,71,75-77,79,81,84,87,91-93,95,96,98
We grouped studies according to the method of confirming the correct answer: (1) clinical data (such as laboratory test or echocardiogram; n = 5 studies30,38,53,69,94), (2) robust expert panel (≥2 people with clearly defined expertise and independent initial review or explicit consensus on final answers; n = 18 studies26,36,39,41,44-46,49,55,60-63,67,68,79,91,98), or (3) less robust expert panel, single individual, or undefined (n = 55 studies12,21,23-25,27-29,31-35,37,40,42,43,47,48,50-52,54,56-59,64-66,70-78,80-90,92,93,95-97). Thirty-seven studies (47%)21,26,28,30,31,34,36,38,39,41,44-46,49,51,53-56,58-63,65,67-70,73,75,76,79,91,94,98 reported information about test development, content, and scoring (content validity evidence), 9 studies (12%)29,39,41,46,47,56,59,70,81reported reliability or other internal structure validity evidence, 9 studies (12%)23,31,60,61,75,76,80,91,93 reported associations with scores from another instrument or with training status (relations with other variables validity evidence), 3 studies (4%)28,59,76 reported consequences validity evidence, and 1 study (1%)36 reported response process validity evidence.
Across all studies and all training levels, the median accuracy on the 62 pretraining assessments12,21,23-27,29-36,38-41,43-47,49-51,53-55,57-63,66-71,73,76-79,83,84,87-98 was 54% (range, 4%-95%; IQR, 40%-66%) (Figure 2). For the 47 studies12,23-25,27-29,31,32,34,35,37,41-43,47,48,50,52,56,57,59,62,64-66,69,71-75,77-82,84-89,95-97 that reported posttraining assessments, the median accuracy was 67% (range, 10%-88%; IQR, 55%-77%) (eFigure in the Supplement).
We conducted random-effects meta-analyses of pretraining assessment scores for each training group (Figure 3). Quiz Ref IDFor medical students, the pooled accuracy was 42.0% (95% CI, 34.3%-49.6%; median, 45%; IQR, 32%-58%; n = 24 studies25,27,29,31,34,36,40,43,51,57,59-62,67,69,71,73,77,91,93,95,96,98) with substantial inconsistency (I2 = 99%). For residents, the pooled accuracy was 55.8% (95% CI, 48.1%-63.6%; median, 57%; IQR, 44%-69%; n = 37 studies12,21,23,24,26,32,36,38,39,41,44-47,49-51,53-55,58,60-63,66,69,70,76,79,88-91,94,97; I2 = 96%). For practicing physicians, the pooled accuracy was 68.5% (95% CI, 57.6%-79.5%; median, 66%; IQR, 63%-78%; n = 10 studies25,38,60,66,68,78,83,87,92,94; I2 = 86%). For cardiologists and cardiology fellows, the pooled accuracy was 74.9% (95% CI, 63.2%-86.7%; median, 79%; IQR, 68%-86%; n = 8 studies25,30,36,68,83,87,91,92; I2 = 22%).
The pooled accuracies of posttraining assessment scores were higher than the pretraining scores. For medical students, the pooled accuracy after training was 61.5% (95% CI, 56%-66.9%; median, 61%; IQR, 55%-72%; n = 29 studies23-25,27,29,31,34,42,43,48,52,56,57,59,62,64,65,69,71-73,75,77,81,82,85,86,95,96; I2 = 91%; P<.001 for interaction comparing pretraining with posttraining scores). For residents, the pooled accuracy was 66.5% (95% CI, 57.3%-75.7%; median, 75%; IQR, 51%-79%; n = 15 studies12,32,37,41,47,50,62,66,69,75,79,80,88,89,97; I2 = 84%; P = 0.08 for interaction). For practicing physicians, the pooled accuracy was 80.1% (95% CI, 72.7%-87.5%; median, 81%; range, 72.9%-83.6%; n = 3 studies66,78,87; I2 = 37%; P = 0.09 for interaction). For cardiologists and cardiology fellows, the pooled accuracy was 87.5% (95% CI, 84.8%-90.2%; median, 88%; range, 83.0%-90.5%; n = 3 studies28,47,87; I2 = 0%; P = 0.04 for interaction).
We planned subgroup analyses according to item difficulty. Nineteen pretraining assessments26,31,33,38,43-45,49,51,53,54,59,61,66,67,92,94,95,98 included only ECGs with 1 diagnosis (less difficult), and 10 assessments21,24,30,32,36,60,63,69,90,91 included ECGs with multiple diagnoses or empirically determined high difficulty. In analyses limited to these 29 studies,21,24,26,30-33,36,38,43-45,49,51,53,54,59-61,63,66,67,69,90-92,94,95,98 the median pretraining accuracy (across all participants) for less difficult ECGs was 56% (IQR, 44%-66%; n = 19 studies26,31,33,38,43-45,49,51,53,54,59,61,66,67,92,94,95,98) and for difficult ECGs was 59% (IQR, 53%-67%; n = 10 studies21,24,30,32,36,60,63,69,90,91).
We also conducted subgroup analyses according to response format. Twenty-three pretraining assessments21,23,30,32,33,36,38-40,44-46,49-51,55,59,61,68,70,90,94,97 used free-text response, with a median accuracy of 54% (IQR, 43%-65%). Twenty assessments12,47,53,54,62,66,67,69,71,76,77,79,84,87,91-93,95,96,98 used a predefined list of answers, with a median accuracy of 60% (IQR, 48%-68%).
Acknowledging that assessments linked to training interventions might be enriched for ECG findings specific to that intervention (and thus less representative of real-life prevalence), we performed sensitivity analyses limited to the 22 survey studies26,30,33,36,38-41,44-46,49,51,53-55,61,63,66-68,83 (which would presumably be designed to ascertain performance in a more representative fashion). This analysis (across all participants) found a median accuracy of 55% (IQR, 44%-65%). Finally, in sensitivity analyses limited to assessments using clinical data or a robust panel to confirm the correct answer, the median pretraining accuracy was 58% (IQR, 44%-67%; n = 23 studies26,30,36,38,39,41,44-46,49,53,55,60-63,67-69,79,91,94,98).
Quiz Ref IDThis systematic review and meta-analysis identified 78 studies that assessed the accuracy of physicians’ ECG interpretations in a controlled (test) setting. Accuracy scores varied widely across studies, ranging from 4% to 95%.Quiz Ref ID The median accuracy across all training levels was relatively low (54%), and scores increased as expected with progressive training and specialization (medical students, residents, physicians in noncardiology practice, and cardiologists). Scores assessed after a training intervention were modestly higher but remained low (median, 67%). These findings have implications for training, assessment, and setting standards in ECG interpretation.
Several reviews9,11-13 have examined training on ECG interpretation, documenting improved accuracy after training compared with no intervention and evaluating the comparative effectiveness of several instructional modalities and methods. The only previous review10 of the accuracy of physicians’ ECG interpretations found 32 studies of postgraduate trainees and physicians in practice; to that report, we have added 46 additional studies, an expanded population (the addition of medical students), a robust quantitative synthesis, and detailed data visualizations. We found limited validity evidence for the original outcome measures, as has been previously reported for educational assessments in other domains, including clinical skills,99 continuing medical education,100 and simulation-based assessments.19,101
This review has important implications for practitioners, educators, and researchers. First and foremost, according to these findings, physicians at all training levels could improve their ECG interpretation skills. Even cardiologists had performance gaps, with a pooled accuracy of 74.9%. Moreover, substantial deficiencies persisted after training. These findings suggest that novel training approaches that span the training continuum are needed. Recent guidelines endorsed by professional societies3,4 have identified developmentally appropriate competencies for ECG interpretation and reviewed several options for training and assessment. Other original studies highlight creative use of workshops,95 peer groups,102 online self-study,87 and social media.79 Large ECG databanks, properly indexed by diagnosis and difficulty (such as the NYU Emergency Care Electrocardiogram Database) could enable regular and repeated practice in these skills. Research also highlights the disparate impact of different cognitive strategies on learning and performing ECG interpretation.64,103 Adaptive computer instructional technologies might further facilitate efficient learning of ECG interpretation.104 Enhanced clinical decision support at the point of care may also be helpful; automated computer ECG interpretations have demonstrated variable accuracy,105,106 but novel artificial intelligence–driven algorithms may improve on past performance.107 Careful consideration should also be given to instructor qualifications.4
Interpretation accuracy varied widely across studies, ranging from 49% to 92% for cardiologists and even more for other groups (Figure 3, Figure 4). High variability persisted after training. This finding suggests a role for increased standardization in ECG interpretation education. National or international agreement on relevant competencies, development and dissemination of training resources and assessment tools that embody educational best practices, and adoption of a mastery learning (competency-based) paradigm might help remediate these performance gaps. Inasmuch as ECGs can be simulated perfectly in a digital environment, online educational resources may prove particularly useful and easily shared. These resources might include adaptive training that accounts for variation in baseline performance and learning rate and aims for achievement of a defined benchmark (mastery). In addition, given available information, it is difficult to disentangle true differences in physicians’ skill or training from differences in ECG selection (sampling across domains and difficulty) and test calibration. Robust test item selection procedures were infrequently reported, and we propose this as an area for improved assessment.
The data in the present study highlight at least 4 additional areas for improvement in the assessment of ECG interpretation. First, nearly all the tests were developed de novo for a given study and never used again; the adoption or adaptation of previously used tests would streamline test development and facilitate cross-study comparisons. Second, tests were generally short (median, 10 ECGs), which limits reliability and precision of estimates. In comparison with the more than 120 diagnostic statements cataloged by the American Heart Association108 or the 37 common and essential ECG patterns identified in recent guidelines,4 a 10-item test seems unlikely to fully represent the domain. Third, investigators rarely used robust procedures for confirming the correct answer, such as independent expert review and consensus or use of clinical data. Fourth, validity evidence in general was rarely reported. In addition to the above-mentioned steps regarding the content of ECG assessments, we suggest reporting evidence to support their internal structure (eg, reliability) and relations with other variables.
This study has strengths. These include use of studies representing a broad range of study designs, a literature search supported by a librarian trained in systematic reviews, duplicate review at all stages, and a robust quantitative synthesis with planned subgroup and sensitivity analyses.
This study also has limitations. Quiz Ref IDAs with all reviews, the information obtained is limited by the methodological and reporting quality of the original studies. The tests used to assess interpretation accuracy varied widely and were often suboptimal; however, when analysis was limited to studies that used more robust approaches to select ECGs and confirm correct answers, the results were largely unchanged. We found limited validity evidence for the original outcome measures. We also note a paucity of outcomes for practicing physicians.
By design, this study focused its search and inclusion criteria on assessments conducted in a test setting in which the correct answers can be known. We believe this represents a best-case scenario for interpretation and expect that performance in a fast-paced clinical practice would typically be worse. Although tests created for educational purposes may not reflect the spectrum of difficulty or disease seen in clinical practice (eg, test ECGs might be enriched for challenging or rare cases), we did not find a difference in accuracy between more-and less-difficult tests. We restricted our study to physicians, acknowledging that a wide range of nonphysicians, including nurses, physician assistants, and paramedics, also interpret ECGs.
The meta-analysis results should not be interpreted as an estimate of or a suggestion that there is a single true level of accuracy in physicians’ ECG interpretation; indeed, the between-study variation in diagnoses and difficulty (with difficulty sometimes targeted deliberately by the investigators) stipulates that such is not the case. Rather, these analyses help to succinctly represent the existing evidence and support the implications suggested above.
Physicians at all training levels had deficiencies in ECG interpretation, even after educational interventions. Improvement in both training in and assessment of ECG interpretation appears warranted, across the practice continuum. Standardized competencies, educational resources, and mastery benchmarks could address all these concerns.
Accepted for Publication: July 5, 2020.
Corresponding Author: David A. Cook, MD, MHPE, Division of General Internal Medicine, Mayo Clinic College of Medicine, Mayo 17-W, 200 First St SW, Rochester, MN 55905 (firstname.lastname@example.org).
Published Online: September 28, 2020. doi:10.1001/jamainternmed.2020.3989
Author Contributions: Dr Cook had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: All authors.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: Cook, Pusic.
Statistical analysis: Cook, Oh.
Obtained funding: Cook, Pusic.
Administrative, technical, or material support: Cook, Pusic.
Supervision: Cook, Pusic.
Conflict of Interest Disclosures: Drs Cook, Oh, and Pusic reported receiving grants from the US Department of Defense during the conduct of the study. No other disclosures were reported.
Funding/Support: This work was funded by grant W81XWH-16-1-0797 from the US Department of Defense Medical Simulation and Information Sciences Research Program.
Role of the Funder/Sponsor: The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: Joseph Nicholson, MLIS, MPH, NYU Grossman School of Medicine, NYU Langone Health, New York, New York, helped in the development of the literature search, and Hilary Fairbrother, MD, NYU Grossman School of Medicine, NYU Langone Health, New York, New York, helped with study selection. These activities were undertaken as part of the normal course of their employment. They received no additional compensation.