[Skip to Navigation]
Sign In
Invited Commentary
Emergency Medicine
January 11, 2019

Machine Learning in Clinical Medicine Still Finding Its Way

Author Affiliations
  • 1Department of Emergency Medicine, University of Colorado, Aurora
  • 2Section of Emergency Medicine, Department of Pediatrics, University of Colorado, Aurora
  • 3Children’s Hospital Colorado, Aurora
JAMA Netw Open. 2019;2(1):e186926. doi:10.1001/jamanetworkopen.2018.6926

Clinical decision support systems powered by machine learning (ML) concepts have been a long pursuit of future-oriented practitioners, patient safety experts, data scientists, and operational gurus alike. Machine learning algorithms, unencumbered by bias and logic, can see and learn associations that may not be otherwise evident. These tools hold great promise in creating new empirically derived archetypes to supplement or even supplant traditional evidence-based medicine paradigms. Showcasing these tools in real clinical applications, however, has been elusive.

Identifying the occult pediatric patient who may require hospital admission but not otherwise appear obviously ill is difficult. An aid that could improve the discriminating ability of a clinician to find the proverbial needle in the haystack is certainly welcomed. The study by Goto and colleagues1 investigated the potential of 4 ML approaches to help identify which pediatric patients presenting to an emergency department (ED) required hospital and critical care admission. The unique feature of this study was that it specifically examined the pediatric ED population in the largest publicly available data set, the National Hospital Ambulatory Medical Care Survey. The authors made the argument that the ML approaches were an improvement over a comparison model based on pediatric Emergency Severity Index levels, but unfortunately this demonstration has significant shortcomings in both design and results.

In the study by Goto and colleagues,1 the usefulness of the ML models boils down to a trade-off between sensitivity and specificity as the overall fit of the test models only was marginally better than the reference one. The ML models were more sensitive in the potential association of critical care admissions and more specific in the potential association of hospitalizations. In other words, the ML approach missed fewer critically ill children and better identified which patients could safely go home. The flipside is that the ML models would have placed more patients in the critical care unit who did not need to be there and dangerously discharged more patients who needed to be admitted. The authors contend that the former was more important than the latter. But even if one subscribes to this view, is this incremental improvement significant enough to change practice? The ML-assisted models misidentified 25% to 29% of the patients who required hospitalization. This is an unacceptable miss rate for most ED clinicians.

Clinically, further analysis including predictive value statistics would be more helpful. If the positive predictive value is better than the reference model, increase vigilance and deploy more resources, ie, see the patient faster, increase testing, etc. Conversely, if the negative predictive value is better, it is reasonable to scale back the intensity and the extent of the investigation. However, the prevalence for both critical care admissions (163 [0.3%]) and hospital admissions (2352 [4.5%]) in the cohort was so low that the positive predictive value will naturally be very low and the negative predictive value will be very high for almost any reasonable test. The more appropriate statistic to use in cases like this is a likelihood ratio, which is independent of prevalence. This, of course, has its own issues of ease of use but should be explored.

The most troubling aspect of this study is that the reference model, based on the pediatric Emergency Severity Index, was never intended to assume hospital admission but rather to stratify patients according to immediacy of attention required and anticipated ED resource needs. A better comparison would be clinician gestalt for how sick the patient is and the need for hospitalization. Emergency department triage nurses in a 2010 study were able to forecast hospital admission with a sensitivity of 75.6% and a specificity of 84.5%.2 This was replicated in a 2013 study where ED triage nurses again forecasted admissions with a high sensitivity of 71.5% and discharges with an impressive specificity of 88.0%.3 Similarly, ED physicians in a simulated study were able to anticipate intensive care unit admission with an accuracy of 85% just by seeing a patient for a mere 10 seconds per encounter.4 All were better than the 4 machine learning models in this study and required no advanced algorithms.

Operationally, the key question is asking whether accurate assumptions of hospital admission from triage are even useful. Would they change practice? Institutions allowing direct transfers from ED triage to the floor or intensive care unit are virtually nonexistent because of patient safety and efficiency concerns. Transfers to a clinical decision unit directly from ED triage are more common but usually only after carefully vetted agreements between sending and receiving teams are reached on a system level. Much transpires in an ED between the first look at triage and the eventual admission disposition decision including expedited diagnostic workups, stabilizing treatments, specialty consultations, and sometimes even an observation period to better forecast the trajectory of a patient’s clinical course. Multiple data points almost always result in a better disposition decision and safer transfer of care.

It is not surprising that this attempt by Goto and colleagues1 to study the global ED pediatric population with the limited variables in this large generic data set did not meaningfully improve on the reference pediatric Emergency Severity Index model and underperforms the de facto default of clinician intuition. Future efforts should consider building on existing validated risk stratification tools such as the Pediatric Early Warning Score.5 Attempts at demonstrating proof of concept will likely be more impressive if focused on specific clinical conditions with narrowly focused parameters.

The advent of ML in clinical medicine is upon us. One of the limiting rates of progress is the availability of relevant substrate to derive its algorithms. More coordinated effort needs to be devoted to better access and incorporation of data from electronic health records, registries, clinical portals, and other nontraditional repositories of information. By their very nature, ML tools will only be as robust as the data they see.

Back to top
Article Information

Published: January 11, 2019. doi:10.1001/jamanetworkopen.2018.6926

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Cheung DS et al. JAMA Network Open.

Corresponding Author: Dickson S. Cheung, MD, MBA, MPH, Department of Emergency Medicine, University of Colorado, 12401 E 17th Ave, Mail Stop B-215, Leprino Office Bldg, Seventh Floor, Room 708A, Aurora, CO 80045 (dickson.cheung@ucdenver.edu).

Conflict of Interest Disclosures: None reported.

Goto  T, Camargo  CA Jr, Faridi  MK, Freishtat  RJ, Hasegawa  K.  Machine learning–based prediction of clinical outcomes for children during emergency department triage.  JAMA Open Netw. 2019;2(1): e186937. doi:10.1001/jamanetworkopen.2018.6937Google Scholar
Stover-Baker  B, Stahlman  B, Pollack  M.  Triage nurse prediction of hospital admission.  J Emerg Nurs. 2012;38(3):306-310. doi:10.1016/j.jen.2011.10.003PubMedGoogle ScholarCrossref
Alexander  D, Abbott  L, Zhou  Q, Staff  I.  Can triage nurses accurately predict patient dispositions in the emergency department?  J Emerg Nurs. 2016;42(6):513-518. doi:10.1016/j.jen.2016.05.008PubMedGoogle ScholarCrossref
Sibbald  M, Sherbino  J, Preyra  I, Coffin-Simpson  T, Norman  G, Monteiro  S.  Eyeballing: the use of visual appearance to diagnose ‘sick.’  Med Educ. 2017;51(11):1138-1145. doi:10.1111/medu.13396PubMedGoogle ScholarCrossref
Seiger  N, Maconochie  I, Oostenbrink  R, Moll  HA.  Validity of different pediatric early warning scores in the emergency department.  Pediatrics. 2013;132(4):e841-e850. doi:10.1542/peds.2012-3594PubMedGoogle ScholarCrossref