Can a machine learning model help medical dispatchers improve recognition of out-of-hospital cardiac arrest?
In this randomized clinical trial of 5242 emergency calls, a machine learning model listening to calls could alert the medical dispatchers in cases of suspected cardiac arrest. There was no significant improvement in recognition of out-of-hospital cardiac arrest during calls on which the model alerted dispatchers vs those on which it did not; however, the machine learning model had higher sensitivity that dispatchers alone.
These findings suggest that while a machine learning model recognized a significantly greater number of out-of-hospital cardiac arrests than dispatchers alone, this did not translate into improved cardiac arrest recognition by dispatchers.
Emergency medical dispatchers fail to identify approximately 25% of cases of out-of-hospital cardiac arrest (OHCA), resulting in lost opportunities to save lives by initiating cardiopulmonary resuscitation.
To examine how a machine learning model trained to identify OHCA and alert dispatchers during emergency calls affected OHCA recognition and response.
Design, Setting, and Participants
This double-masked, 2-group, randomized clinical trial analyzed all calls to emergency number 112 (equivalent to 911) in Denmark. Calls were processed by a machine learning model using speech recognition software. The machine learning model assessed ongoing calls, and calls in which the model identified OHCA were randomized. The trial was performed at Copenhagen Emergency Medical Services, Denmark, between September 1, 2018, and December 31, 2019.
Dispatchers in the intervention group were alerted when the machine learning model identified out-of-hospital cardiac arrest, and those in the control group followed normal protocols without alert.
Main Outcomes and Measures
The primary end point was the rate of dispatcher recognition of subsequently confirmed OHCA.
A total of 169 049 emergency calls were examined, of which the machine learning model identified 5242 as suspected OHCA. Calls were randomized to control (2661 [50.8%]) or intervention (2581 [49.2%]) groups. Of these, 336 (12.6%) and 318 (12.3%), respectively, had confirmed OHCA. The mean (SD) age among of these 654 patients was 70 (16.1) years, and 419 of 627 patients (67.8%) with known gender were men. Dispatchers in the intervention group recognized 296 confirmed OHCA cases (93.1%) with machine learning assistance compared with 304 confirmed OHCA cases (90.5%) using standard protocols without machine learning assistance (P = .15). Machine learning alerts alone had a significantly higher sensitivity than dispatchers without alerts for confirmed OHCA (85.0% vs 77.5%; P < .001) but lower specificity (97.4% vs 99.6%; P < .001) and positive predictive value (17.8% vs 55.8%; P < .001).
Conclusions and Relevance
This randomized clinical trial did not find any significant improvement in dispatchers’ ability to recognize cardiac arrest when supported by machine learning even though artificial intelligence did surpass human recognition.
ClinicalTrials.gov Identifier: NCT04219306
Survival after out-of-hospital cardiac arrest (OHCA) has increased in several countries following improvements in bystander interventions, the response of emergency medical services (EMS), and postresuscitation care. Of these, early bystander interventions, particularly cardiopulmonary resuscitation (CPR) with early defibrillation, can have the greatest potential impact on outcome.1 Rapid recognition of cardiac arrest by the emergency medical dispatcher followed by instructing the caller to promptly perform CPR and retrieve an automated external defibrillator are essential steps.2-5 Such guidance is contingent on prompt recognition of OHCA by the dispatcher. In 2018, the overall rate of CPR by bystanders was 77% in Denmark,6 which corresponded to the rate of dispatcher-recognized OHCA at the Copenhagen Emergency Medical Services (EMS).3,7
A rate-limiting step to initiating bystander CPR and expediting EMS response lies in the recognition of OHCA by dispatchers. A promising strategy for improving OHCA recognition is the use of artificial intelligence systems based on deep neural networks to provide real-time information to the dispatcher. Such systems estimate the likelihood of OHCA based on the patterns of the words spoken that might be missed by the dispatcher.
A clinical decision support tool based on machine learning models was tested at the Copenhagen EMS and was able to identify OHCA with better sensitivity and only slightly lower specificity than medical dispatchers.7 Despite the potential for such clinical decision support tools to improve health care outcomes by offering data-driven insights, almost all decision support tools driven by artificial intelligence or machine learning have so far failed to do so in practice.8 Thus, while the translation of research techniques like machine learning into clinical practice presents a new frontier of opportunity, the real-world deployment of this modality remains rare and in need of further exploration.9
In this randomized clinical trial, we investigated the effect of real-time information on dispatchers’ ability to recognize OHCA during emergency calls. Our primary aim was to examine whether the machine learning model affected the clinical practice of medical dispatchers. We hypothesized that the machine learning model would increase recognition of OHCA when dispatchers were augmented with machine learning compared with standard call procedures. Secondary outcomes were differences between the 2 approaches in time-to-recognition of OHCA, initiation of dispatcher-assisted CPR (DA-CPR), and time to its initiation.
We followed the General Data Protection Regulation and registered the study at the Danish Data Protection Agency. The study is approved by the Danish Patient Safety Authority. The research ethics committee in the Capital Region of Denmark waived the need for ethical approval. The study followed the Consolidated Standards of Reporting Trials (CONSORT) reporting guideline. The regional ethical committee waived the need for preregistration of this study. However, we retrospectively registered this trial with ClinicalTrials.gov in August 2019. We obtained written informed consent from all study participants, ie, medical dispatchers.
The study was performed in Copenhagen, Denmark, with a population of 1.8 million persons and a size of 2563 km2. Copenhagen EMS handles approximately 130 000 emergency calls annually. The emergency phone number, 112, connects to the Emergency Medical Dispatch Centre, which is staffed by medically trained dispatchers comprised of nurses (70%) and paramedics (30%) who receive 6 weeks of training in communication, prioritization of emergency calls, and DA-CPR instructions. In cases of suspected OHCA, dispatchers instruct callers in CPR while simultaneously dispatching an ambulance and a physician-manned Mobile Critical Care Unit.
Emergency Call Processing
Emergency calls to Copenhagen EMS are analyzed by the previously described machine learning model,7 which listens to every call and can immediately alert the dispatcher when the model suspects OHCA. The machine learning model in this study is identical to that used previously7 and estimates OHCA with a 1-second resolution, meaning that for each second of conversation between caller and dispatcher, the machine learning model calculates the probability of whether there is an OHCA in the accumulated call information. If the probability of the machine learning model exceeds a prespecified threshold, the call is defined as a suspected OHCA and a warning can be issued to the dispatcher.
This double-masked, 2-group, randomized clinical trial (RCT) evaluated a machine learning model that analyzes calls in real time and delivers decision support to medically trained dispatchers compared with conventional call-handling without decision support from the machine learning model. A more extensive description of the study is available in the trial protocol in Supplement 1.
Eligibility and Randomization
All emergency calls received between September 1, 2018, and December 31, 2019, were considered for trial enrollment. However, only calls that the machine learning model suspected as OHCA were eligible for randomization to the intervention or control group. Calls placed on hold while the dispatcher conferred with EMS physicians were excluded if the machine learning alert only appeared after the hold. In addition, calls were excluded according to predefined postrandomization exclusions10: if the machine learning model erroneously identified an OHCA that was not subsequently confirmed by the Danish Cardiac Arrest Registry; calls with a broken chain of alert (eg, calls that were forwarded from out-of-hours nonemergency services); repeated calls on the same incident; calls from police requesting help from the EMS; and calls regarding EMS-witnessed OHCAs. Finally, calls for which CPR had been initiated prior to the call were excluded as an already confirmed OHCA.
Calls meeting the inclusion criteria were assigned to either the intervention or control group in a 1:1 ratio, using a random number generating program with no stratification factors.11,12 Medical dispatchers were masked to their group assignment. Unless the machine learning model generated and disclosed an OHCA alert, neither group knew whether the machine learning model was operative during the call. Investigators involved in data analysis and outcome assessment were also masked to group assignment.
In a post hoc analysis, we examined compliance to the machine learning model and the consequences if the intervention group had heeded every machine learning alert. We compared the time of the machine learning alert in the intervention group with the dispatcher’s time of OHCA recognition (without such an alert) in the control group.
All calls to EMS Copenhagen are transferred from the police and start with a conference between the police and medical dispatcher. In case of OHCAs that are recognized by the dispatcher, time to recognition is measured from the start of the call, which includes the conference time. After recognition of OHCA, the next event in the call is initiation of DA-CPR.
All dispatchers participated in the trial and were instructed to follow a specified protocol when receiving an alert from the machine learning model.13 When the alert appeared, the dispatcher had the option to ignore the alert, or if heeded, they were instructed to immediately dispatch an ambulance and a physician-staffed vehicle on the suspicion of OHCA. The precise time of the machine learning alert, both those heeded and ignored, was recorded and stored in a central database. The dispatcher’s reaction to the alert, whether heeded or ignored, did not influence the enrollment or subsequent analysis. For the control group, the alerts generated by the machine learning model were suppressed and not shown to the dispatchers. However, the time when the machine learning alert occurred, although not directed to the dispatcher, was recorded. The alert was designed as a conspicuous display on the dispatcher’s main screen while not otherwise interfering in the dispatcher’s work.
All emergency dispatch calls were recorded. To verify confirmed OHCA, calls were linked to the Danish Cardiac Arrest Registry. Calls for which the machine learning model had identified a true OHCA were analyzed by a group of trained evaluators who were masked to the randomization assignment using a modified dispatch data survey catalog from the American Cardiac Arrest Registry to Enhance Survival.14,15 Case review allowed us to collect data for the secondary end points, ie, time to recognition and time to DA-CPR.
The primary outcome was the rate of dispatchers’ recognition of subsequently confirmed OHCA. Secondary outcomes were dispatchers’ time to recognition of OHCA and the rate of DA-CPR, defined as when the dispatcher instructed the caller to place hands on the patient’s chest or similar action directives.
The performance of the dispatchers in the control and intervention groups was compared using the Danish Cardiac Arrest Registry as the reference for a confirmed OHCA as was the overall performance of the machine learning model. Results are reported with 95% CIs and P values when appropriate. Time-to-event analyses were conducted with Kaplan-Meier failure curves to estimate time to recognition in each group and for the machine leaning model overall.
In assessing the machine learning performance, calls classified by the model as OHCA and confirmed by the registry reference standard were considered true-positive findings. The Mann-Whitney test was used to compare calls classified as OHCA and non-OHCA by the machine learning model. A 2-tailed P < .05 was considered significant for all analyses. Data management and statistical analyses were performed using SAS version 9.4 (SAS Institute).
Based on our previous study,7 the difference in recognition between machine learning model and dispatcher was 10%. Setting the significance level (α) at 5% and the power (1 − β) at 95%, 356 calls were needed in each group, resulting in a total study sample of 712 calls.
During the intervention period, Copenhagen EMS received 226 130 calls, while the machine learning model processed only 169 049 calls (74.7%) due to downtime. The downtime was randomly spread across the day and across dispatchers, showing equal distribution between OHCA calls and non-OHCA calls and not affecting the randomization. Of the 169 049 calls processed by the machine learning model during the 16-month study period, the machine learning model identified 5847 calls as suspected OHCA. Of these, 5242 (89.7%) were eligible for randomization and were randomized to either control (2661 [50.8%]) or intervention (2581 [49.2%]) dispatcher groups; 4588 (87.5%) of the calls were excluded after randomization (Figure 1).
The characteristics of the included patients are shown in Table 1. A total of 654 patients were enrolled, of whom 318 (48.6%) were randomly assigned to the intervention group and 336 (51.4%) to the control group. The mean (SD) age was 70 (16.1) years across both groups, 419 of 627 patients (67.8%) with known gender were men, and 332 (50.8%) had OHCAs witnessed by bystanders; most OHCAs (539 [82.4%]) took place in a residential setting. Caller and patient characteristics were similar between groups.
Primary and Secondary Outcomes
In the analysis of primary outcomes, 654 calls were eligible for analysis. From the randomized calls with verified OHCA, the medical dispatchers recognized 296 of 318 (93.1%) in the intervention group and 304 of 336 (90.5%) in the control group, with no significant difference between the 2 groups (P = .15). In the 304 calls in the intervention group, the mean (SD) time to dispatchers OHCA recognition was 1.72 (1.52) minutes vs 1.70 (1.63) minutes in 296 calls in the control group (P = .90) (Figure 2).
In all 318 calls in the intervention group, the machine learning model recognized the OHCA and generated an alert in a mean (SD) of 1.39 (1.32) minutes from the emergency call’s onset, a difference of only 0.05 minutes compared with the suppressed alert not displayed in the 336 calls in the control group (mean [SD] time, 1.33 [1.51] minutes; P = .60)
Dispatchers started CPR instructions in 206 calls (64.8%) in the intervention group and 208 calls (61.9%) in the control group (P = .47), with no significant difference in elapsed time from when the OHCA was recognized. Elapsed time without CPR instructions was 2 seconds longer in the intervention than control group (Table 2).
In the intervention group, dispatchers’ reaction time from receiving the alert to their recognition of an OHCA was 20 seconds. The corresponding interval was 22 seconds from when the suppressed alert might have first appeared to the control group.
Machine Learning Model Compliance
Figure 3 shows how the machine learning model continually outperformed the dispatchers, both in the intervention group and in the control group. In the intervention group, 22 additional OHCAs would have been recognized by the dispatcher had they heeded the alert, and the 296 OHCAs would have been recognized 20 seconds sooner than they were in the absence of such an alert (mean [SD] time, 1.38 [1.32] minutes vs 1.71 [1.63] minutes; P = .008).
Overall Accuracy of the Machine Learning Model
We examined the 169 049 calls processed by the machine learning model, of which 1110 (0.7%) were OHCAs unwitnessed by EMS and confirmed by the Danish Cardiac Registry. Compared with medical dispatchers, the machine learning model had a significantly higher sensitivity (77.5% vs 85.0%; P < .001) but lower specificity (99.6% vs 97.4%; P < .001). The machine learning model had a significant lower positive predictive value than dispatchers (17.8% vs 55.8%; P < .001). Put in pragmatic terms, if the dispatchers had heeded the machine learning model alert, 54 additional OHCAs that were not recognized by dispatchers would have been recognized.
The machine learning model resulted in a sizeable number of false alerts in which an OHCA had not actually occurred. Thus, complete compliance with both the true and false alerts from the model in the intervention group would have resulted in 2122 cases of a false-positive OHCA (2122 of 2581 [82.2%] of all suspected OHCA calls). Had all such alerts also been disclosed to and heeded by the control group, 2161 cases (81.2%) of false-positive OHCAs would have been produced. From the 4283 false alerts, 1519 (35.5%) received a response other than a full OHCA response, increasing the total number of ambulances dispatched from 56 449 to 57 968, a 2.7% increase.
The current study was designed as an RCT that also addressed the paucity of RCTs investigating artificial intelligence.16 To our knowledge, this study is the only published study analyzing machine learning models processing speech in a real-time conversation between a medical dispatcher and patient or bystander, during which the system automatically extracts information from the conversation to be transformed into real-time decision support for the medical professional. There are only a few RCTs testing artificial intelligence and machine learning technologies, all of which either report negative or insignificant findings.16-18
This study tested a novel approach to telephone triage, in which a machine learning model automatically extracted information in real-time and transformed this into decision support for the medical dispatcher recognizing OHCA. The machine learning model was found to correctly recognize more OHCAs and do this significantly faster than the dispatchers. However, the use of a machine learning model in the current setting did not result in an increased number of correct recognitions of OHCA by dispatchers, and the time to recognition of OHCA was unchanged.
In 169 049 calls to Copenhagen EMS, including 1110 calls regarding OHCA, we found that the machine learning model could correctly recognize OHCA calls in more than 85.0% of OHCA calls, which is similar to a previous retrospective study, in which we found a sensitivity of 84.1%.7 The specificity was also similar to the retrospective study, with 97.4% compared with 97.3%. Thus, the developed machine learning decision support tool performs in a live, prospective setting with similar results to the retrospective studies. It is through prospective studies, preferably RCTs, that we can understand the utility of machine learning models, given that performance is likely to be worse when encountering real-world data that differ from those encountered in algorithm training.9 The very similar results of the prospective and retrospective studies demonstrates the robustness of the model, validating the basic principles of the machine learning model.
We further examined the consequences of compliance to all alerts from the machine learning model. This would have resulted in 54 additional OHCAs being recognized. Conversely, in a scenario with 100% compliance, 1519 ambulances would have been dispatched as part of a full OHCA response, increasing the total number of dispatches by 2.7%, from 56 449 to 57 968 ambulances dispatched with lights and sirens.
There is a paucity of literature about the clinical performance of machine learning models or artificial intelligence analyzing calls to support medical dispatchers. Experience with machine learning models has been limited to retrospective studies, whereas the current study showed higher sensitivity and less time to decision on suspected OHCA compared with medically trained dispatchers. However, as shown in this trial, being alerted to a suspected OHCA did not significantly affect the behavior of the dispatchers who responded to the OHCA call. The results underpin the need for discussing implementation when adapting new technology. These results are similar to the results of the Computerised Interpretation of Fetal Heart Rate During Labor trial and other RCTs testing machine learning models or artificial intelligence in clinical practice, which found no evidence of computerized decision support reducing the likelihood of poor outcomes.17,18
The human factor in the interaction with decision support tools is crucial, and several studies report hesitancy in the interaction with decision support tools.19,20 Our findings are concordant with a review focusing on compliance with decision support tools, which reported that actual use rates remain low despite clinicians acknowledging these technologies as useful.21
While the machine learning model delivers robust advice with high sensitivity, it appears that dispatchers did not comply with alerts. Further consideration is needed before implementing such a tool in clinical practice to determine whether further training of medical professionals can remedy the missing compliance.
We found that creating a machine learning model with high sensitivity was not sufficient to improve recognition of OHCA. It would also be necessary to establish confidence in the machine learning model. We found that extreme obedience to the machine learning model would have resulted in more OHCAs being recognized and reduced the time to recognition, although with more frequent high-priority dispatch of ambulances.
This study has limitations. A potential weakness is that the medical dispatchers can learn from prior exposure to the machine learning model when not passive and in turn improve outcomes in the control group when encountering an OHCA call during which the model remains passive. In hindsight, we should have provided more education to dispatchers to improve compliance with the machine learning model. Third, the servers analyzing the phone calls had downtime because the server was underdimensioned. To remediate this, the randomization period was prolonged.
In this RCT, we did not find any significant improvement in dispatcher recognition of OHCA when supported by machine learning, even though artificial intelligence did surpass human recognition. This suggests that machine learning has the potential to positively affect the recognition rate of OHCA while also improving the rate of DA-CPR, but these findings did not lead to improved recognition of OHCA or improved rate of DA-CPR. Future studies are needed to improve human-computer interaction. However, efforts should be made to improve the specificity of the machine learning model to improve the relevance of alerts.
Accepted for Publication: November 12, 2020.
Published: January 6, 2021. doi:10.1001/jamanetworkopen.2020.32320
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2021 Blomberg SN et al. JAMA Network Open.
Corresponding Author: Stig Nikolaj Blomberg, MsC, Copenhagen Emergency Medical Services, Telegrafvej 5, 2750 Copenhagen, Denmark (firstname.lastname@example.org).
Author Contributions: Mr Blomberg had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Blomberg, Christensen, Lippert, Kudenchuk, Folke.
Acquisition, analysis, or interpretation of data: Blomberg, Lippert, Ersbøll, Torp-Petersen, Sayre, Folke.
Drafting of the manuscript: Blomberg.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Blomberg, Christensen, Ersbøll.
Obtained funding: Blomberg, Lippert, Folke.
Administrative, technical, or material support: Blomberg, Lippert, Kudenchuk.
Supervision: Christensen, Lippert, Ersbøll, Torp-Petersen, Folke.
Conflict of Interest Disclosures: Mr Blomberg reported receiving unrestricted research grants from TrygFoundation during the conduct of the study. Dr Lippert reported receiving unrestricted research grants from TrygFonden and Laerdal Foundation during the conduct of the study. Dr Torp-Petersen reported receiving grants from Bayer and Novo Nordisk outside the submitted work. Dr Sayre reported receiving a fellowship from Stryker Physio-Control outside the submitted work. Dr Folke reported receiving grants from Novo Nordisk Foundation and the Laerdal Foundation during the conduct of the study. No other disclosures were reported.
Data Sharing Statement: See Supplement 2.
J. Unremarkable AI: fitting intelligent decision support into critical, clinical decision-making processes. In: CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
. Association for Computing Machinery; 2019:1-11. doi:10.1145/3290605.3300468
et al. Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in eye clinics: a multicentre randomized controlled trial. EClinicalMedicine
. 2019;9:52-59. doi:10.1016/j.eclinm.2019.03.001PubMedGoogle ScholarCrossref
D. Use of a clinical decision support tool to improve guideline adherence for the treatment of methicillin-resistant Staphylococcus aureus
: skin and soft tissue infections. Adv Emerg Nurs J
. 2011;33(3):252-266. doi:10.1097/TME.0b013e31822610d1PubMedGoogle ScholarCrossref
H. Retrospective evaluation of a computerized physician order entry adaptation to prevent prescribing errors in a pediatric emergency department. Pediatrics
. 2008;122(4):782-787. doi:10.1542/peds.2007-3064PubMedGoogle ScholarCrossref
AS. Point-of-care cognitive support technology in emergency departments: a scoping review of technology acceptance by clinicians. Acad Emerg Med
. 2018;25(5):494-507. doi:10.1111/acem.13325PubMedGoogle ScholarCrossref