[Skip to Navigation]
Sign In
Invited Commentary
Critical Care Medicine
December 21, 2018

Can Big Data Deliver on Its Promises?—Leaps but Not Bounds

Author Affiliations
  • 1Division of Pulmonary and Critical Care Medicine, Department of Medicine, Intermountain Medical Center, Murray, Utah
  • 2Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Utah School of Medicine, Salt Lake City
JAMA Netw Open. 2018;1(8):e185694. doi:10.1001/jamanetworkopen.2018.5694

The gold-standard approach to creating clinical prediction models no longer uses expert consensus to choose and weight parameters for simplistic models that can be calculated by hand.1 Three factors have driven a revolution in predictive modeling during the last 2 decades. First, broad adoption of electronic medical records (EMRs) eased access to a wide array and large volume of clinical and nonclinical data. Second, increasingly powerful computational techniques have become available to every researcher with a personal computer. Finally, and most recently, the application of machine learning techniques has exploded; from 358 in 2008, citations of machine learning in PubMed increased to 3543 in 2017 and more than 3700 during the first 3 quarters of 2018. Like artificial intelligence, however, machine learning is more a marketing term than a scientific method, encompassing a disparate array of model-building techniques that generally share the ability to operate without parameter prespecification.2

Marafino and colleagues3 undertook important work to build mortality prediction models by feeding data from more than 100 000 intensive care unit patients at 3 hospitals to a simple machine learning algorithm, penalized regression. To a basic model using only patients’ highest and lowest laboratory and vital sign values within 24 hours of intensive care unit admission, they first added more complex, manually curated measures of patients’ laboratory and vital sign trajectory before finally incorporating common words drawn piecewise from physician and nurse progress notes. The models’ discrimination improved as the number of predictors expanded from 48 to 192 to 1192, respectively, achieving an area under the receiver operator characteristic curve (AUC) of 0.92 in the most complex model. While there was no true external validation of the primary model (built with data from all 3 hospitals),4 the fact that alternate models built at 1 hospital and tested at the other 2 hospitals consistently performed better when they incorporated more data suggests the measured improvements in model accuracy were probably real rather than merely overfitting.

As any clinician will report, and the authors’ trajectory-based models easily distinguish, a patient whose tachycardia resolves and does not recur is likely better off than the patient for whom the tachycardia is sustained, recurrent, or treatment emergent. Using automated abstraction of clinician notes, the authors developed models with somewhat better discrimination than is reported for Acute Physiology and Chronic Health Evaluation IV (AUC, 0.88)5 without the costly and tedious manual medical record review traditionally required to incorporate patient history, diagnosis, and treatment data (a direct comparison with Acute Physiology and Chronic Health Evaluation IV would have increased the security of this finding). However, the authors’ models use less information and probably had lower AUC compared with neural network–based models that can handle unstructured data and clinical documentation without manual curation, nonlinear trajectories, and multidimensional interactions among predictors.6 A formal natural language processing approach that interpreted the relationships among words and phrases might also have improved prediction accuracy.7 The work of Marafino and colleagues3 nevertheless points toward the potential of prediction models that exploit the complexity of EMR data rather than mandating its simplification for ease of manual calculation. Furthermore, the relative simplicity of the authors’ approach may enhance its portability, which is often a crucial weakness of more complex machine learning methods.

A major challenge to interpretation is the incremental improvement in discrimination observed here. Numerous performance metrics for predictive models exist beyond the AUC, with usefulness that depends on the tested model’s intended application.8 If the models are used to trigger a clinical action or mobilize resources, for instance, the area under the precision-recall curve may be more useful and informative than the AUC because this metric quantifies a model’s ability to simultaneously maximize the fraction of cases captured (sensitivity or recall) while minimizing false-positive alerts (positive predictive value or precision) at a given outcome prevalence. In fact, the authors’ reported area under the precision-recall curve values suggest the more complex models’ predictive advantage is more substantial than might be appreciated from the AUC alone. For risk adjustment, conversely, the relationship between predicted mortality and observed mortality—model calibration—is the more important metric. It is notable, therefore, that review of the calibration curves suggests the authors’ more complex model may actually exhibit slightly worse calibration.

Machine learning techniques, even the more complex ones, do not magically solve the critical challenge confronting clinical prediction models: even reliable predictions may not alter decisions made by physicians and patients or improve patient-centered outcomes.9 Clinicians and researchers must also still consider the correct balance of prediction model accuracy, bedside usability, generalizability, and relevance for each particular application. Marafino and colleagues3 used relatively simplistic machine learning methods to develop a probably portable and efficient intensive care unit prediction model with performance possibly superior to the best available traditional models. Their study highlights on one hand the importance of rigor and skepticism in evaluating and deploying machine learning models and on the other hand the persistent potential for highly accurate predictions from machine learning methods that use even more of the available EMR data, require less manual data manipulation, and can accommodate complex predictor and outcome relationships.10 This study therefore represents another modest step toward harnessing the full potential for advanced predictive analytics from the complicated, in-depth patient data made available by EMRs.

Back to top
Article Information

Published: December 21, 2018. doi:10.1001/jamanetworkopen.2018.5694

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2018 Peltan ID et al. JAMA Network Open.

Corresponding Author: Samuel M. Brown, MD, MS, Division of Pulmonary and Critical Care Medicine, Department of Medicine, Intermountain Medical Center (T4/STICU), 5121 S Cottonwood St, Salt Lake City, UT 84107 (samuel.brown@hsc.utah.edu).

Conflict of Interest Disclosures: The authors’ primary institution (Intermountain Healthcare) collaborates with Google on projects related to healthcare information technology. This editorial is unrelated to that collaboration. Independently, Dr Brown has received reimbursement for travel expenses (no honorarium) from Google to speak at a Google Healthcare Faculty Workshop. Intermountain received funding from Faron Pharmaceuticals for Dr Brown’s service on a trial steering committee unrelated to this editorial. Dr Brown and Intermountain own a patent for an airway device unrelated to this editorial. Dr Brown receives author royalties from Oxford University and Brigham Young University for books unrelated to this editorial. Intermountain received capitation payments for patient recruitment on a study unrelated to this editorial that was sponsored by Asahi Kasei Pharma and on which Dr Peltan acted as site principal investigator. No other disclosures were reported.

Vincent  J-L, Moreno  R, Takala  J,  et al.  The SOFA (Sepsis-Related Organ Failure Assessment) score to describe organ dysfunction/failure.  Intensive Care Med. 1996;22(7):707-710. doi:10.1007/BF01709751PubMedGoogle ScholarCrossref
Beam  AL, Kohane  IS.  Big data and machine learning in health care.  JAMA. 2018;319(13):1317-1318. doi:10.1001/jama.2017.18391PubMedGoogle ScholarCrossref
Marafino  BJ, Park  M, Davies  JM,  et al.  Validation of prediction models for critical care outcomes using natural language processing of electronic health record data.  JAMA Netw Open. 2018;1(8): e185097. doi:10.1001/jamanetworkopen.2018.5097Google Scholar
Bleeker  SE, Moll  HA, Steyerberg  EW,  et al.  External validation is necessary in prediction research: a clinical example.  J Clin Epidemiol. 2003;56(9):826-832. doi:10.1016/S0895-4356(03)00207-5PubMedGoogle ScholarCrossref
Zimmerman  JE, Kramer  AA, McNair  DS, Malila  FM.  Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today’s critically ill patients.  Crit Care Med. 2006;34(5):1297-1310. doi:10.1097/01.CCM.0000215112.84523.F0PubMedGoogle ScholarCrossref
Rajkomar  A, Oren  E, Chen  K,  et al.  Scalable and accurate deep learning with electronic health records.  NPJ Digit Med. 2018;1(1):1609. doi:10.1038/s41746-018-0029-1Google Scholar
Hirschberg  J, Manning  CD.  Advances in natural language processing.  Science. 2015;349(6245):261-266. doi:10.1126/science.aaa8685PubMedGoogle ScholarCrossref
Steyerberg  EW, Vickers  AJ, Cook  NR,  et al.  Assessing the performance of prediction models: a framework for traditional and novel measures.  Epidemiology. 2010;21(1):128-138. doi:10.1097/EDE.0b013e3181c30fb2PubMedGoogle ScholarCrossref
SUPPORT Principal Investigators.  A controlled trial to improve care for seriously ill hospitalized patients: the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT).  JAMA. 1995;274(20):1591-1598. doi:10.1001/jama.1995.03530200027032PubMedGoogle ScholarCrossref
Wachter  RM, Howell  MD.  Resolving the productivity paradox of health information technology: a time for optimism.  JAMA. 2018;320(1):25-26. doi:10.1001/jama.2018.5605PubMedGoogle ScholarCrossref