The study by Safavi et al1 presents a neural network that predicts clinically important outcomes: discharge within 24 hours and barriers to discharge.1 As hospitals and health care practitioners face overcrowding and coordinating elective surgical case load with bed availability, Safavi et al1 applied machine learning to provide clinical decision-making support to a problem with clinical, operational, and socioeconomic causes. When compared with a baseline model that used historical length of stay data, their algorithm demonstrated higher sensitivity and specificity. Importantly, their methods also revealed barriers to discharge. As one might expect, Safavi et al1 found that variation in clinical practice and nonclinical reasons accounted for approximately 70% of discharge delays. Certainly, such a model can assist a health care system in illuminating the causes for discharge delay and, eventually, improving timeliness of discharge. Now that the United States has digitized its medical records, gleaning meaningful insights from electronic health record data represents an important bridge toward achieving the promise of health information technology: actionable and potentially automated improvements in care.
Neural networks are one of the more elaborate machine learning methods, and they are useful in making predictions for which interactions between predictor variables are not necessarily known a priori. Specifically, it is difficult to discover and model the potential predictor interactions by traditional modeling. Additionally, neural networks lend themselves to continuous improvement as new data become available—a useful feature for a large health care system that is continuously accruing more data for a common outcome, such as discharges from inpatient surgical care. However, neural networks are ordinarily opaque in terms of the underlying mathematics and the important variables used in making the final prediction. By removing predictor variables and evaluating their effect on the model’s accuracy, Safavi et al1 used a simple yet elegant method to reveal which predictors were most important in delaying discharge. In this way, Safavi et al overcame the black box–like quality of neural networks that usually limits their usefulness in clinical medicine. Understanding which variables most affect the final prediction makes the prediction more actionable and interpretable for intervention. The work by Safavi et al1 contributes to a growing body of literature examining methods to discover which predictor features contribute most to a machine learning model’s accuracy.2 Such methods could make black box methods, like neural networks, more acceptable and useful for clinicians. Clinicians are more likely to trust and use machine learning methods when they are explainable.
It is worth noting that Safavi et al1 did not simply report the neural network’s accuracy in predicting discharge compared with actual events. On the surface, a raw accuracy measure seems appropriate for this type of study; however, we must remember that although machine learning may be a new approach to solving a problem, it does not arise in a vacuum of predictive capability. When learning that any given machine learning method is accurate, the next question one should ask is “Compared with what?” Just because an approach is new does not mean the accuracy is any better than traditional or preexisting predictive models. Hence, Safavi et al1 compared the neural network’s performance with a baseline model that used historical length of stay data. When evaluating machine learning performance, identifying the relevant comparator model deserves proper consideration. Clinical judgment is an important and relevant model. In clinical decision-making support, machine learning models should augment our own decision-making capabilities. If a new model cannot outperform expert judgment, then one must question its utility. As new machine learning methods are evaluated for clinical implementation, methods for comparison against expert human prediction must be developed. Of course, experts do not always agree, but methods to combine and adjudicate expert opinions to develop a criterion standard for comparison with machine performance can ensure machine learning is really building on clinicians’ own capabilities.
The work presented by Safavi et al1 is just the beginning. Implementation is the next and arguably more difficult next phase of their work. What does that mean? Perhaps it means that the model should obey the “5 rights” of clinical decision-making support as first described by Osheroff et al in 20073: clinical decision support must provide the right information, delivered to the right people at the right time in the workflow, through the right channel, and in the right intervention format. Beyond the “5 rights,” one must also consider and study usability, human computer interaction, and burden for the clinician. I trust that Safavi and colleagues are undertaking that work, and I look forward to seeing their pathway to implementation reported. As use of machine learning in medicine expands and the field matures, I expect the next wave of studies in the literature will focus on the implementation science behind clinical decision-making support.
Published: December 11, 2019. doi:10.1001/jamanetworkopen.2019.17362
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Schneider DF. JAMA Network Open.
Corresponding Author: David F. Schneider, MD, MS, Department of Surgery, Division of Endocrine Surgery, University of Wisconsin, 600 Highland Ave, K4/728 CSC, Madison, WI 53792 (schneiderd@surgery.wisc.edu).
Conflict of Interest Disclosures: None reported.
2.Kyubin
L, Sood
A, Craven
M, eds. Understanding learned models by identifying important features at the right resolution. Paper presented at: 33rd AAAI Conference on Artificial Intelligence; July 23, 2019; Honolulu, HI.