Effect of Machine Learning on Dispatcher Recognition of Out-of-Hospital Cardiac Arrest During Calls to Emergency Medical Services

Key Points Question Can a machine learning model help medical dispatchers improve recognition of out-of-hospital cardiac arrest? Findings In this randomized clinical trial of 5242 emergency calls, a machine learning model listening to calls could alert the medical dispatchers in cases of suspected cardiac arrest. There was no significant improvement in recognition of out-of-hospital cardiac arrest during calls on which the model alerted dispatchers vs those on which it did not; however, the machine learning model had higher sensitivity that dispatchers alone. Meaning These findings suggest that while a machine learning model recognized a significantly greater number of out-of-hospital cardiac arrests than dispatchers alone, this did not translate into improved cardiac arrest recognition by dispatchers.


+ Visual Abstract + Related article + Supplemental content Introduction
Survival after out-of-hospital cardiac arrest (OHCA) has increased in several countries following improvements in bystander interventions, the response of emergency medical services (EMS), and postresuscitation care. Of these, early bystander interventions, particularly cardiopulmonary resuscitation (CPR) with early defibrillation, can have the greatest potential impact on outcome. 1 Rapid recognition of cardiac arrest by the emergency medical dispatcher followed by instructing the caller to promptly perform CPR and retrieve an automated external defibrillator are essential steps. [2][3][4][5] Such guidance is contingent on prompt recognition of OHCA by the dispatcher. In 2018, the overall rate of CPR by bystanders was 77% in Denmark, 6 which corresponded to the rate of dispatcher-recognized OHCA at the Copenhagen Emergency Medical Services (EMS). 3,7 A rate-limiting step to initiating bystander CPR and expediting EMS response lies in the recognition of OHCA by dispatchers. A promising strategy for improving OHCA recognition is the use of artificial intelligence systems based on deep neural networks to provide real-time information to the dispatcher. Such systems estimate the likelihood of OHCA based on the patterns of the words spoken that might be missed by the dispatcher.
A clinical decision support tool based on machine learning models was tested at the Copenhagen EMS and was able to identify OHCA with better sensitivity and only slightly lower specificity than medical dispatchers. 7 Despite the potential for such clinical decision support tools to improve health care outcomes by offering data-driven insights, almost all decision support tools driven by artificial intelligence or machine learning have so far failed to do so in practice. 8 Thus, while the translation of research techniques like machine learning into clinical practice presents a new frontier of opportunity, the real-world deployment of this modality remains rare and in need of further exploration. 9 In this randomized clinical trial, we investigated the effect of real-time information on dispatchers' ability to recognize OHCA during emergency calls. Our primary aim was to examine whether the machine learning model affected the clinical practice of medical dispatchers. We hypothesized that the machine learning model would increase recognition of OHCA when dispatchers were augmented with machine learning compared with standard call procedures.
Secondary outcomes were differences between the 2 approaches in time-to-recognition of OHCA, initiation of dispatcher-assisted CPR (DA-CPR), and time to its initiation.

Ethical Approval
We followed the General Data Protection Regulation and registered the study at the Danish Data Protection Agency. The study is approved by the Danish Patient Safety Authority. The research ethics committee in the Capital Region of Denmark waived the need for ethical approval. The study followed the Consolidated Standards of Reporting Trials (CONSORT) reporting guideline. The regional ethical committee waived the need for preregistration of this study. However, we retrospectively registered this trial with ClinicalTrials.gov in August 2019. We obtained written informed consent from all study participants, ie, medical dispatchers.

Setting
The study was performed in Copenhagen, Denmark, with a population of 1.8 million persons and a size of 2563 km 2 . Copenhagen EMS handles approximately 130 000 emergency calls annually. The emergency phone number, 112, connects to the Emergency Medical Dispatch Centre, which is staffed by medically trained dispatchers comprised of nurses (70%) and paramedics (30%) who receive 6 weeks of training in communication, prioritization of emergency calls, and DA-CPR instructions. In cases of suspected OHCA, dispatchers instruct callers in CPR while simultaneously dispatching an ambulance and a physician-manned Mobile Critical Care Unit.

Emergency Call Processing
Emergency calls to Copenhagen EMS are analyzed by the previously described machine learning model, 7 which listens to every call and can immediately alert the dispatcher when the model suspects OHCA. The machine learning model in this study is identical to that used previously 7 and estimates OHCA with a 1-second resolution, meaning that for each second of conversation between caller and dispatcher, the machine learning model calculates the probability of whether there is an OHCA in the accumulated call information. If the probability of the machine learning model exceeds a prespecified threshold, the call is defined as a suspected OHCA and a warning can be issued to the dispatcher.

Trial Design
This double-masked, 2-group, randomized clinical trial (RCT) evaluated a machine learning model that analyzes calls in real time and delivers decision support to medically trained dispatchers compared with conventional call-handling without decision support from the machine learning model. A more extensive description of the study is available in the trial protocol in Supplement 1.

Eligibility and Randomization
All emergency calls received between September 1, 2018, and December 31, 2019, were considered for trial enrollment. However, only calls that the machine learning model suspected as OHCA were eligible for randomization to the intervention or control group. Calls placed on hold while the dispatcher conferred with EMS physicians were excluded if the machine learning alert only appeared after the hold. In addition, calls were excluded according to predefined postrandomization exclusions 10 : if the machine learning model erroneously identified an OHCA that was not subsequently confirmed by the Danish Cardiac Arrest Registry; calls with a broken chain of alert (eg, calls that were forwarded from out-of-hours nonemergency services); repeated calls on the same incident; calls from police requesting help from the EMS; and calls regarding EMS-witnessed OHCAs.
Finally, calls for which CPR had been initiated prior to the call were excluded as an already confirmed OHCA.
Calls meeting the inclusion criteria were assigned to either the intervention or control group in a 1:1 ratio, using a random number generating program with no stratification factors. 11,12 Medical dispatchers were masked to their group assignment. Unless the machine learning model generated and disclosed an OHCA alert, neither group knew whether the machine learning model was operative during the call. Investigators involved in data analysis and outcome assessment were also masked to group assignment.
In a post hoc analysis, we examined compliance to the machine learning model and the consequences if the intervention group had heeded every machine learning alert. We compared the time of the machine learning alert in the intervention group with the dispatcher's time of OHCA recognition (without such an alert) in the control group.

Participant Procedures
All calls to EMS Copenhagen are transferred from the police and start with a conference between the police and medical dispatcher. In case of OHCAs that are recognized by the dispatcher, time to

JAMA Network Open | Emergency Medicine
Machine Learning and Dispatcher Recognition of Out-of-Hospital Cardiac Arrest JAMA Network Open. 2021;4(1):e2032320. doi:10.1001/jamanetworkopen.2020.32320 (Reprinted) January 6, 2021 3/10 recognition is measured from the start of the call, which includes the conference time. After recognition of OHCA, the next event in the call is initiation of DA-CPR.
All dispatchers participated in the trial and were instructed to follow a specified protocol when receiving an alert from the machine learning model. 13 When the alert appeared, the dispatcher had the option to ignore the alert, or if heeded, they were instructed to immediately dispatch an ambulance and a physician-staffed vehicle on the suspicion of OHCA. The precise time of the machine learning alert, both those heeded and ignored, was recorded and stored in a central database. The dispatcher's reaction to the alert, whether heeded or ignored, did not influence the enrollment or subsequent analysis. For the control group, the alerts generated by the machine learning model were suppressed and not shown to the dispatchers. However, the time when the machine learning alert occurred, although not directed to the dispatcher, was recorded. The alert was designed as a conspicuous display on the dispatcher's main screen while not otherwise interfering in the dispatcher's work.

Case Review
All emergency dispatch calls were recorded. To verify confirmed OHCA, calls were linked to the Danish Cardiac Arrest Registry. Calls for which the machine learning model had identified a true OHCA were analyzed by a group of trained evaluators who were masked to the randomization assignment using a modified dispatch data survey catalog from the American Cardiac Arrest Registry to Enhance Survival. 14,15 Case review allowed us to collect data for the secondary end points, ie, time to recognition and time to DA-CPR.

Outcomes
The primary outcome was the rate of dispatchers' recognition of subsequently confirmed OHCA.
Secondary outcomes were dispatchers' time to recognition of OHCA and the rate of DA-CPR, defined as when the dispatcher instructed the caller to place hands on the patient's chest or similar action directives.

Statistical Analysis
The performance of the dispatchers in the control and intervention groups was compared using the Danish Cardiac Arrest Registry as the reference for a confirmed OHCA as was the overall performance of the machine learning model. Results are reported with 95% CIs and P values when appropriate.
Time-to-event analyses were conducted with Kaplan-Meier failure curves to estimate time to recognition in each group and for the machine leaning model overall.
In assessing the machine learning performance, calls classified by the model as OHCA and confirmed by the registry reference standard were considered true-positive findings. The Mann-Whitney test was used to compare calls classified as OHCA and non-OHCA by the machine learning model. A 2-tailed P < .05 was considered significant for all analyses. Data management and statistical analyses were performed using SAS version 9.4 (SAS Institute).
Based on our previous study, 7 the difference in recognition between machine learning model and dispatcher was 10%. Setting the significance level (α) at 5% and the power (1 − β) at 95%, 356 calls were needed in each group, resulting in a total study sample of 712 calls.

Primary and Secondary Outcomes
In  Table 2).
In the intervention group, dispatchers' reaction time from receiving the alert to their recognition of an OHCA was 20 seconds. The corresponding interval was 22 seconds from when the suppressed alert might have first appeared to the control group.

Discussion
The current study was designed as an RCT that also addressed the paucity of RCTs investigating artificial intelligence. 16 To our knowledge, this study is the only published study analyzing machine learning models processing speech in a real-time conversation between a medical dispatcher and patient or bystander, during which the system automatically extracts information from the conversation to be transformed into real-time decision support for the medical professional. There are only a few RCTs testing artificial intelligence and machine learning technologies, all of which either report negative or insignificant findings. [16][17][18] This study tested a novel approach to telephone triage, in which a machine learning model automatically extracted information in real-time and transformed this into decision support for the medical dispatcher recognizing OHCA. The machine learning model was found to correctly recognize more OHCAs and do this significantly faster than the dispatchers. However, the use of a machine The shaded areas indicate 95% CIs; crosses, censored calls, ie, those that ended without the dispatcher recognizing the condition.
learning model in the current setting did not result in an increased number of correct recognitions of OHCA by dispatchers, and the time to recognition of OHCA was unchanged.
In 169 049 calls to Copenhagen EMS, including 1110 calls regarding OHCA, we found that the machine learning model could correctly recognize OHCA calls in more than 85.0% of OHCA calls, which is similar to a previous retrospective study, in which we found a sensitivity of 84.1%. 7 The specificity was also similar to the retrospective study, with 97.4% compared with 97.3%. Thus, the developed machine learning decision support tool performs in a live, prospective setting with similar results to the retrospective studies. It is through prospective studies, preferably RCTs, that we can understand the utility of machine learning models, given that performance is likely to be worse when encountering real-world data that differ from those encountered in algorithm training. 9 The very similar results of the prospective and retrospective studies demonstrates the robustness of the model, validating the basic principles of the machine learning model. There is a paucity of literature about the clinical performance of machine learning models or artificial intelligence analyzing calls to support medical dispatchers. Experience with machine learning models has been limited to retrospective studies, whereas the current study showed higher sensitivity and less time to decision on suspected OHCA compared with medically trained dispatchers. However, as shown in this trial, being alerted to a suspected OHCA did not significantly affect the behavior of the dispatchers who responded to the OHCA call. The results underpin the need for discussing implementation when adapting new technology. These results are similar to the results of the Computerised Interpretation of Fetal Heart Rate During Labor trial and other RCTs testing machine learning models or artificial intelligence in clinical practice, which found no evidence of computerized decision support reducing the likelihood of poor outcomes. 17,18 The human factor in the interaction with decision support tools is crucial, and several studies report hesitancy in the interaction with decision support tools. 19,20 Our findings are concordant with a review focusing on compliance with decision support tools, which reported that actual use rates remain low despite clinicians acknowledging these technologies as useful. 21 While the machine learning model delivers robust advice with high sensitivity, it appears that dispatchers did not comply with alerts. Further consideration is needed before implementing such a tool in clinical practice to determine whether further training of medical professionals can remedy the missing compliance.
We found that creating a machine learning model with high sensitivity was not sufficient to improve recognition of OHCA. It would also be necessary to establish confidence in the machine learning model. We found that extreme obedience to the machine learning model would have resulted in more OHCAs being recognized and reduced the time to recognition, although with more frequent high-priority dispatch of ambulances.

Limitations
This study has limitations. A potential weakness is that the medical dispatchers can learn from prior exposure to the machine learning model when not passive and in turn improve outcomes in the control group when encountering an OHCA call during which the model remains passive. In hindsight, we should have provided more education to dispatchers to improve compliance with the machine learning model. Third, the servers analyzing the phone calls had downtime because the server was underdimensioned. To remediate this, the randomization period was prolonged.