[Skip to Navigation]
Sign In
Figure 1.  Threshold Performance Plots for the Epic Sepsis Model at the Hospitalization Level
Threshold Performance Plots for the Epic Sepsis Model at the Hospitalization Level

The distribution of predictions is displayed at the bottom. NPV indicates negative predictive value; PPV, positive predictive value. In the PPV plot, the blue-shaded region refers to the percentage of patients classified as positive. In the NPV plot, the blue-shaded region refers to the percentage of patients classified as negative.

Figure 2.  Distribution of Alert Times Based on an Epic Sepsis Model Score Threshold of 6 or Higher
Distribution of Alert Times Based on an Epic Sepsis Model Score Threshold of 6 or Higher

A, All alerts. B, Alerts in the 24 hours prior to the outcome. The first alert is highlighted in orange. Each point represents a hypothetical alert; no actual alerts were generated. Forty randomly selected patients who experienced sepsis and met the alerting threshold of 6 are shown here.

Table 1.  Characteristics of Patients
Characteristics of Patients
Table 2.  ESM Performance
ESM Performance
1.
Rivers  E, Nguyen  B, Havstad  S,  et al; Early Goal-Directed Therapy Collaborative Group.  Early goal-directed therapy in the treatment of severe sepsis and septic shock.   N Engl J Med. 2001;345(19):1368-1377. doi:10.1056/NEJMoa010307 PubMedGoogle ScholarCrossref
2.
Yealy  DM, Kellum  JA, Huang  DT,  et al; ProCESS Investigators.  A randomized trial of protocol-based care for early septic shock.   N Engl J Med. 2014;370(18):1683-1693. doi:10.1056/NEJMoa1401602 PubMedGoogle Scholar
3.
Gao  F, Melody  T, Daniels  DF, Giles  S, Fox  S.  The impact of compliance with 6-hour and 24-hour sepsis bundles on hospital mortality in patients with severe sepsis: a prospective observational study.   Crit Care. 2005;9(6):R764-R770. doi:10.1186/cc3909 PubMedGoogle ScholarCrossref
4.
Sawyer  AM, Deal  EN, Labelle  AJ,  et al.  Implementation of a real-time computerized sepsis alert in nonintensive care unit patients.   Crit Care Med. 2011;39(3):469-473. doi:10.1097/CCM.0b013e318205df85 PubMedGoogle ScholarCrossref
5.
Semler  MW, Weavind  L, Hooper  MH,  et al.  An electronic tool for the evaluation and treatment of sepsis in the ICU: a randomized controlled trial.   Crit Care Med. 2015;43(8):1595-1602. doi:10.1097/CCM.0000000000001020 PubMedGoogle ScholarCrossref
6.
Giannini  HM, Ginestra  JC, Chivers  C,  et al.  A machine learning algorithm to predict severe sepsis and septic shock: development, implementation, and impact on clinical practice.   Crit Care Med. 2019;47(11):1485-1492. doi:10.1097/CCM.0000000000003891 PubMedGoogle ScholarCrossref
7.
Downing  NL, Rolnick  J, Poole  SF,  et al.  Electronic health record–based clinical decision support alert for severe sepsis: a randomised evaluation.   BMJ Qual Saf. 2019;28(9):762-768. doi:10.1136/bmjqs-2018-008765 PubMedGoogle ScholarCrossref
8.
Delahanty  RJ, Alvarez  J, Flynn  LM, Sherwin  RL, Jones  SS.  Development and evaluation of a machine learning model for the early identification of patients at risk for sepsis.   Ann Emerg Med. 2019;73(4):334-344. doi:10.1016/j.annemergmed.2018.11.036 PubMedGoogle ScholarCrossref
9.
Afshar  M, Arain  E, Ye  C,  et al.  Patient outcomes and cost-effectiveness of a sepsis care quality improvement program in a health system.   Crit Care Med. 2019;47(10):1371-1379. doi:10.1097/CCM.0000000000003919 PubMedGoogle ScholarCrossref
10.
Guidi  JL, Clark  K, Upton  MT,  et al.  Clinician perception of the effectiveness of an automated early warning and response system for sepsis in an academic medical center.   Ann Am Thorac Soc. 2015;12(10):1514-1519. doi:10.1513/AnnalsATS.201503-129OC PubMedGoogle ScholarCrossref
11.
Ginestra  JC, Giannini  HM, Schweickert  WD,  et al.  Clinician perception of a machine learning-based early warning system designed to predict severe sepsis and septic shock.   Crit Care Med. 2019;47(11):1477-1484. doi:10.1097/CCM.0000000000003803 PubMedGoogle ScholarCrossref
12.
Rolnick  JA, Weissman  GE.  Early warning systems: the neglected importance of timing.   J Hosp Med. 2019;14(7):445-447. doi:10.12788/jhm.3229 PubMedGoogle ScholarCrossref
13.
Makam  AN, Nguyen  OK, Auerbach  AD.  Diagnostic accuracy and effectiveness of automated electronic sepsis alert systems: a systematic review.   J Hosp Med. 2015;10(6):396-402. doi:10.1002/jhm.2347 PubMedGoogle ScholarCrossref
14.
Benthin  C, Pannu  S, Khan  A, Gong  M; NHLBI Prevention and Early Treatment of Acute Lung Injury (PETAL) Network.  The nature and variability of automated practice alerts derived from electronic health records in a U.S. nationwide critical care research network.   Ann Am Thorac Soc. 2016;13(10):1784-1788. doi:10.1513/AnnalsATS.201603-172BC PubMedGoogle Scholar
15.
Van Calster  B, Wynants  L, Timmerman  D, Steyerberg  EW, Collins  GS.  Predictive analytics in health care: how can we know it works?   J Am Med Inform Assoc. 2019;26(12):1651-1654. doi:10.1093/jamia/ocz130 PubMedGoogle ScholarCrossref
16.
Davis  SE, Lasko  TA, Chen  G, Siew  ED, Matheny  ME.  Calibration drift in regression and machine learning models for acute kidney injury.   J Am Med Inform Assoc. 2017;24(6):1052-1061. doi:10.1093/jamia/ocx030 PubMedGoogle ScholarCrossref
17.
Caldwell  P. We’ve spent billions to fix our medical records, and they’re still a mess: here’s why. Mother Jones. Published October 21, 2015. Accessed April 24, 2020. https://www.motherjones.com/politics/2015/10/epic-systems-judith-faulkner-hitech-ehr-interoperability/
18.
Rhee  C, Dantes  RB, Epstein  L, Klompas  M.  Using objective clinical data to track progress on preventing and treating sepsis: CDC’s new “Adult Sepsis Event” surveillance strategy.   BMJ Qual Saf. 2019;28(4):305-309. doi:10.1136/bmjqs-2018-008331 PubMedGoogle ScholarCrossref
19.
Rhee  C, Dantes  R, Epstein  L,  et al; CDC Prevention Epicenter Program.  Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009-2014.   JAMA. 2017;318(13):1241-1249. doi:10.1001/jama.2017.13836 PubMedGoogle ScholarCrossref
20.
Centers for Disease Control and Prevention. Hospital toolkit for adult sepsis surveillance. Published March 2018. Accessed February 11, 2021. https://www.cdc.gov/sepsis/pdfs/Sepsis-Surveillance-Toolkit-Mar-2018_508.pdf
21.
Henry  KE, Hager  DN, Pronovost  PJ, Saria  S.  A targeted real-time early warning score (TREWScore) for septic shock.   Sci Transl Med. 2015;7(299):299ra122. doi:10.1126/scitranslmed.aab3719 PubMedGoogle Scholar
22.
Oh  J, Makar  M, Fusco  C,  et al.  A generalizable, data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers.   Infect Control Hosp Epidemiol. 2018;39(4):425-433. doi:10.1017/ice.2018.16 PubMedGoogle ScholarCrossref
23.
Singh  K, Valley  TS, Tang  S,  et al.  Evaluating a widely implemented proprietary deterioration index model among hospitalized COVID-19 patients.   Ann Am Thorac Soc. 2020. Published online December 24, 2020. doi:10.1513/AnnalsATS.202006-698OC PubMedGoogle Scholar
24.
R Core Team. R: a language and environment for statistical computing. Published online 2020. Accessed May 4, 2021. http://www.r-project.org/
25.
pROC: Display and analyze ROC curves [R package pROC version 1.16.2]. Accessed April 23, 2020. https://CRAN.R-project.org/package=pROC
26.
Singh  K. The runway package for R. Accessed October 21, 2020. https://github.com/ML4LHS/runway
27.
Bennett  T, Russell  S, King  J,  et al  Accuracy of the Epic Sepsis Prediction Model in a regional health system.   arXiv. Preprint posted online February 19, 2019. https://arxiv.org/abs/1902.07276 Google Scholar
28.
Healthcare Cost and Utilization Project. HCUP weighted summary statistics report: NIS 2018 core file means of continuous data elements. Accessed March 8, 2021. https://www.hcup-us.ahrq.gov/db/nation/nis/tools/stats/MaskedStats_NIS_2018_Core_Weighted.PDF
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    1 Comment for this article
    EXPAND ALL
    The Epic Sepsis Model: More Evaluation Needed
    Prem Thomas, MD | Yale New Haven Health
    The external validation of the proprietary Epic Sepsis Model undertaken by Wong et al[1] is commendable for its systematic approach to model performance and benefit. Further articles and vigorous discussion on the benefits of this and other models is needed. The article reports the model's diagnostic performance and measures of potential benefit and burden. Let us consider each category.

    In evaluating diagnostic models, establishing a rigorous gold standard defining the disease cohort lays the cornerstone for inference. Rhee et al[2,3] wrestled with the problem of accurate measurement of sepsis incidence and prevalence in hospitalizations. They provided the most exhaustive
    evaluation to date, assessing strategies based on International Classification of Diseases codes (explicit and explicit/implicit methods) versus a clinical surveillance definition using electronic health record data. Wong uses the Rhee clinical surveillance criteria but then introduces a second unevaluated criterion, similar to the explicit/implicit code strategy. The clinical surveillance definition had a 70% sensitivity, 98% specificity, and a 70% positive predictive value. The code strategy had lower specificity and a positive predictive value of just 31%. This second criterion weakens Wong's assessment, because it introduces the possibility of a significant number of false positives in the gold standard sepsis cohort. This may skew the report of model performance in unpredictable ways.

    Potential benefit was reported as the percent of patients who received timely antibiotics. Of the 2552 hospitalizations with sepsis, a full 34% failed timely antibiotics under usual care. The authors did not construct an actual alert but measured a theoretical benefit based on patients identified by a score ≥ 6, arrived at by committee. Thresholds involve a tug-of-war between sensitivity and specificity. According to the article's plots, a threshold of 3 would dramatically increase the number of patients identified which would increase the number who would have benefitted because they would otherwise fail antibiotics. An analysis of antibiotic benefit at a few thresholds would have been helpful.

    Theoretical alert burden was measured based only on a score trigger. Yet the implementation of actual alerts involves additional trigger criteria, such as user roles, chart actions, and time lockouts based on response to alerts. For example, an appropriate provider response such as "patient already under treatment" could lock further firing for a few days. While the hospitalization level burden provides a good baseline, the other horizons may not reflect reality.

    May the discussion continue!


    References

    1. Wong A, Otles E, Donnelly JP, et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Intern Med. Published online June 21, 2021. doi:10.1001/jamainternmed.2021.2626

    2. Rhee C, Dantes R, Epstein L, et al. Incidence and Trends of Sepsis in US Hospitals Using Clinical vs Claims Data, 2009-2014. JAMA. 2017;318(13):1241-1249. doi:10.1001/jama.2017.13836

    3. Rudd KE, Delaney A, Finfer S. Counting Sepsis, an Imprecise but Improving Science. JAMA. 2017;318(13):1228. doi:10.1001/jama.2017.13697
    CONFLICT OF INTEREST: None Reported
    READ MORE
    Original Investigation
    June 21, 2021

    External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients

    Author Affiliations
    • 1Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
    • 2Medical Scientist Training Program, University of Michigan Medical School, Ann Arbor
    • 3Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor
    • 4Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor
    • 5School of Public Health, University of Michigan, Ann Arbor
    • 6Department of Quality, Michigan Medicine, Ann Arbor
    • 7Health Information Technology and Services, Michigan Medicine, Ann Arbor
    • 8Nursing Informatics, Michigan Medicine, Ann Arbor
    JAMA Intern Med. 2021;181(8):1065-1070. doi:10.1001/jamainternmed.2021.2626
    Key Points

    Question  How accurately does the Epic Sepsis Model, a proprietary sepsis prediction model implemented at hundreds of US hospitals, predict the onset of sepsis?

    Findings  In this cohort study of 27 697 patients undergoing 38 455 hospitalizations, sepsis occurred in 7% of the hosptalizations. The Epic Sepsis Model predicted the onset of sepsis with an area under the curve of 0.63, which is substantially worse than the performance reported by its developer.

    Meaning  This study suggests that the Epic Sepsis Model poorly predicts sepsis; its widespread adoption despite poor performance raises fundamental concerns about sepsis management on a national level.

    Abstract

    Importance  The Epic Sepsis Model (ESM), a proprietary sepsis prediction model, is implemented at hundreds of US hospitals. The ESM’s ability to identify patients with sepsis has not been adequately evaluated despite widespread use.

    Objective  To externally validate the ESM in the prediction of sepsis and evaluate its potential clinical value compared with usual care.

    Design, Setting, and Participants  This retrospective cohort study was conducted among 27 697 patients aged 18 years or older admitted to Michigan Medicine, the academic health system of the University of Michigan, Ann Arbor, with 38 455 hospitalizations between December 6, 2018, and October 20, 2019.

    Exposure  The ESM score, calculated every 15 minutes.

    Main Outcomes and Measures  Sepsis, as defined by a composite of (1) the Centers for Disease Control and Prevention surveillance criteria and (2) International Statistical Classification of Diseases and Related Health Problems, Tenth Revision diagnostic codes accompanied by 2 systemic inflammatory response syndrome criteria and 1 organ dysfunction criterion within 6 hours of one another. Model discrimination was assessed using the area under the receiver operating characteristic curve at the hospitalization level and with prediction horizons of 4, 8, 12, and 24 hours. Model calibration was evaluated with calibration plots. The potential clinical benefit associated with the ESM was assessed by evaluating the added benefit of the ESM score compared with contemporary clinical practice (based on timely administration of antibiotics). Alert fatigue was evaluated by comparing the clinical value of different alerting strategies.

    Results  We identified 27 697 patients who had 38 455 hospitalizations (21 904 women [57%]; median age, 56 years [interquartile range, 35-69 years]) meeting inclusion criteria, of whom sepsis occurred in 2552 (7%). The ESM had a hospitalization-level area under the receiver operating characteristic curve of 0.63 (95% CI, 0.62-0.64). The ESM identified 183 of 2552 patients with sepsis (7%) who did not receive timely administration of antibiotics, highlighting the low sensitivity of the ESM in comparison with contemporary clinical practice. The ESM also did not identify 1709 patients with sepsis (67%) despite generating alerts for an ESM score of 6 or higher for 6971 of all 38 455 hospitalized patients (18%), thus creating a large burden of alert fatigue.

    Conclusions and Relevance  This external validation cohort study suggests that the ESM has poor discrimination and calibration in predicting the onset of sepsis. The widespread adoption of the ESM despite its poor performance raises fundamental concerns about sepsis management on a national level.

    Introduction

    Early detection and appropriate treatment of sepsis have been associated with a significant mortality benefit in hospitalized patients.1-3 Many models have been developed to improve timely identification of sepsis,4-9 but their lack of adoption has led to an implementation gap in early warning systems for sepsis.10,11 This gap has largely been filled by commercial electronic health record (EHR) vendors, who have integrated early warning systems into the EHR where they can be readily accessed by clinicians and linked to clinical interventions.12,13Quiz Ref ID More than half of surveyed US health systems report using electronic alerts, with nearly all using an alert system for sepsis.14

    One of the most widely implemented early warning systems for sepsis in US hospitals is the Epic Sepsis Model (ESM), which is a penalized logistic regression model included as part of Epic’s EHR and currently in use at hundreds of hospitals throughout the country. This model was developed and validated by Epic Systems Corporation based on data from 405 000 patient encounters across 3 health systems from 2013 to 2015. However, owing to the proprietary nature of the ESM, only limited information is publicly available about the model’s performance, and no independent validations have been published to date, to our knowledge. This limited information is of concern because proprietary models are difficult to assess owing to their opaque nature and have been shown to decline in performance over time.15,16

    The widespread adoption of the ESM despite the lack of independent validation raises a fundamental concern about sepsis management on a national level. An improved understanding of how well the ESM performs has the potential to inform care for the several hundred thousand patients hospitalized for sepsis in the US each year. We present an independently conducted external validation of the ESM using data from a large academic medical center.

    Methods
    Study Cohort

    Quiz Ref IDOur retrospective study included all patients aged 18 years or older admitted to Michigan Medicine (ie, the health system of the University of Michigan, Ann Arbor) between December 6, 2018, and October 20, 2019. Epic Sepsis Model scores were calculated for all adult hospitalizations. The ESM was used to generate alerts on 2 hospital units starting on March 11, 2019, and expanded to a third unit on August 12, 2019; alert-eligible hospitalizations were excluded from our analysis to prevent bias in our evaluation. The study was approved by the institutional review board of the University of Michigan Medical School, and the need for consent was waived because the research involved no more than minimal risk to participants, the research could not be carried out practicably without the waiver, and the waiver would not adversely affect the rights and welfare of the participants.

    The Epic Sepsis Model

    The ESM is a proprietary sepsis prediction model developed by Epic Systems Corporation using data routinely recorded within the EHR. Epic Systems Corporation is one of the largest health care software vendors in the world and reportedly includes medical records for nearly 180 million individuals in the US (or 56% of the US population).17 The eMethods in the Supplement includes more details.

    Definition of Sepsis and Timing of Onset

    Sepsis was defined based on meeting 1 of 2 criteria: (1) the Centers for Disease Control and Prevention clinical surveillance definition18-20 or (2) an International Statistical Classification of Diseases and Related Health Problems, Tenth Revision diagnosis of sepsis accompanied by meeting 2 criteria for systemic inflammatory response syndrome and 1 Centers for Medicare & Medicaid Services criterion for organ dysfunction within 6 hours of one another (eMethods in the Supplement).

    External Validation of the ESM Scores

    We used scores from the ESM prospectively calculated every 15 minutes, beginning on arrival at the emergency department and throughout the hospitalization, to predict the onset of sepsis. For patients experiencing sepsis, we excluded any scores calculated after the outcome had occurred. We evaluated model discrimination using the area under the receiver operating characteristic curve (AUC), which represents the probability of correctly ranking 2 randomly chosen individuals (one who experienced the event and one who did not). We calculated a hospitalization-level AUC based on the entire trajectory of predictions21-23 and calculated model performance across the spectrum of ESM thresholds. We also calculated time horizon–based AUCs (eMethods in the Supplement).

    Using the entire trajectory of predictions, we calculated a median lead time by comparing when patients were first deemed high risk during their hospitalization (based on our implemented ESM score threshold of ≥6 described below) with when they experienced sepsis. Model calibration was assessed using a calibration plot by comparing predicted risk with the observed risk.

    Selection of High-risk Threshold

    We evaluated the ESM’s performance at a score threshold of 6 or higher because this threshold was selected by our hospital operations committee to generate pages to clinicians and is currently in clinical use at Michigan Medicine (although patients eligible for alerts during the study period were excluded from our evaluation). This threshold is within the recommended score range (5-8) suggested by its developer.

    Evaluation of Potential Clinical Benefit and Alert Fatigue

    To evaluate potential benefit associated with the ESM, we compared the timing of patients exceeding the ESM score threshold of 6 or higher with their receipt of antibiotics to evaluate the potential added value of the ESM vs current clinical practice. We evaluated the potential impact of alert fatigue by comparing the number of patients who would need to be evaluated using different alerting strategies.

    Sensitivity Analysis

    To enhance the comparability of our results with other evaluations, we recalculated the hospitalization-level AUC after including ESM scores up to 3 hours after sepsis onset (eMethods in the Supplement). We used R, version 3.6.0 (R Group for Statistical Computing) for all analyses, as well as the pROC and runway packages.24-26 Statistical tests were 2-sided.

    Results

    We identified 27 697 patients who had 38 455 hospitalizations (21 904 women [57%]; median age, 56 years [interquartile range, 35-69 years]) who met inclusion criteria for our study cohort (Table 1). Sepsis occurred in 2552 of the hospitalizations (7%).

    ESM Performance

    The ESM had a hospitalization-level AUC of 0.63 (95% CI, 0.62-0.64) (Table 2). The AUC was between 0.72 (95% CI, 0.72-0.72) and 0.76 (95% CI, 0.75-0.76) when calculated at varying time horizons. At our selected score threshold of 6, the ESM had a hospitalization-level sensitivity of 33%, specificity of 83%, positive predictive value of 12%, and negative predictive value of 95% (Figure 1). The median lead time between when a patient first exceeded an ESM score of 6 and the onset of sepsis was 2.5 hours (interquartile range, 0.5-15.6 hours) (Figure 2). The calibration was poor at all time horizons possibly considered by the developer (eFigures 1, 2, and 3 in the Supplement).

    Evaluation of Potential Clinical Benefit and Alert Fatigue

    Of the 2552 hospitalizations with sepsis, 183 (7%) were identified by an ESM score of 6 or higher, but the patient did not receive timely antibiotics (ie, prior to or within 3 hours after sepsis). Quiz Ref IDThe ESM did not identify 1709 patients with sepsis (67%), of whom 1030 (60%) still received timely antibiotics.

    Quiz Ref IDAn ESM score of 6 or higher occurred in 18% of hospitalizations (6971 of 38 455) even when not considering repeated alerts. If the ESM were to generate an alert only once per patient when the score threshold first exceeded 6—a strategy to minimize alerts—then clinicians would still need to evaluate 8 patients to identify a single patient with eventual sepsis (Table 2). If clinicians were willing to reevaluate patients each time the ESM score exceeded 6 to find patients developing sepsis in the next 4 hours, they would need to evaluate 109 patients to find a single patient with sepsis.

    Sensitivity Analysis

    When ESM scores up to 3 hours after the onset of sepsis were included, the hospitalization-level AUC improved to 0.80 (95% CI, 0.79-0.81).

    Discussion

    In this external validation study, we found the ESM to have poor discrimination and calibration in predicting the onset of sepsis at the hospitalization level. When used for alerting at a score threshold of 6 or higher (within Epic’s recommended range), it identifies only 7% of patients with sepsis who were missed by a clinician (based on timely administration of antibiotics), highlighting the low sensitivity of the ESM in comparison with contemporary clinical practice. The ESM also did not identify 67% of patients with sepsis despite generating alerts on 18% of all hospitalized patients, thus creating a large burden of alert fatigue.

    Our observed hospitalization-level model performance (AUC, 0.63) was substantially worse than that reported by Epic Systems (AUC, 0.76-0.83) in internal documentation (shared with permission) and in a prior conference proceeding coauthored with Epic Systems (AUC, 0.73).27 Although our time horizon–based AUCs were higher (0.72-0.76), they are misleading because they treat each prediction as independent. Even a small number of bad predictions (ie, high scores that result in alerts in patients without sepsis) can cause alert fatigue, but these bad predictions only minimally affect time horizon–based AUCs (eMethods in the Supplement). The large difference in reported AUCs is likely due to our consideration of sepsis timing. A prior study that did not exclude predictions made after development of sepsis found that the ESM produced an alert at a median of 7 hours (interquartile range, 4-22 hours) after the first lactate level was measured, suggesting that ESM-driven alerts reflect the presence of sepsis already apparent to clinicians.27 Our sensitivity analysis including predictions made up to 3 hours after the sepsis event found an improved AUC of 0.80, highlighting the importance of considering sepsis timing in the evaluation.

    Limitations

    Our study has some limitations. Quiz Ref IDOur external validation was performed at a single academic medical center, although the cohort was large and relatively diverse.28 We used a composite definition to account for the 2 most common reasons why health care organizations track sepsis, namely, surveillance and quality assessment, although sepsis definitions are still debated.

    Conclusions

    Our study has important national implications. The increase and growth in deployment of proprietary models has led to an underbelly of confidential, non–peer-reviewed model performance documents that may not accurately reflect real-world model performance. Owing to the ease of integration within the EHR and loose federal regulations, hundreds of US hospitals have begun using these algorithms. Medical professional organizations constructing national guidelines should be cognizant of the broad use of these algorithms and make formal recommendations about their use.

    Back to top
    Article Information

    Accepted for Publication: April 18, 2021.

    Published Online: June 21, 2021. doi:10.1001/jamainternmed.2021.2626

    Correction: This article was corrected on August 2, 2021, to fix an error in the number needed to evaluate presented in Table 2 and the Results.

    Corresponding Author: Karandeep Singh, MD, MMSc, Department of Learning Health Sciences, University of Michigan Medical School, 1161H NIB, 300 N Ingalls St, Ann Arbor, MI 48109 (kdpsingh@umich.edu).

    Author Contributions: Dr Singh had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

    Concept and design: Wong, Otles, Pestrue, Phillips, Penoza, Singh.

    Acquisition, analysis, or interpretation of data: All authors.

    Drafting of the manuscript: Wong, Otles, Ghous, Singh.

    Critical revision of the manuscript for important intellectual content: Wong, Otles, Donnelly, Krumm, McCullough, DeTroyer-Cooley, Pestrue, Phillips, Konye, Penoza, Singh.

    Statistical analysis: Wong, Otles, Donnelly, Krumm, McCullough, Singh.

    Administrative, technical, or material support: Wong, Otles, Pestrue, Phillips, Konye, Penoza.

    Supervision: Singh.

    Conflict of Interest Disclosures: Dr Donnelly reported receiving grants from the National Institutes of Health, National Heart, Lung, and Blood Institute K12 Scholar during the conduct of the study; and personal fees from the American College of Emergency Physicians as an editor of Annals of Emergency Medicine outside the submitted work. No other disclosures were reported.

    Funding/Support: Mr Otles was supported by grant T32GM007863 from the National Institutes of Health. Dr Donnelly was supported by grant K12HL138039 from the National Heart, Lung, and Blood Institute.

    Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

    References
    1.
    Rivers  E, Nguyen  B, Havstad  S,  et al; Early Goal-Directed Therapy Collaborative Group.  Early goal-directed therapy in the treatment of severe sepsis and septic shock.   N Engl J Med. 2001;345(19):1368-1377. doi:10.1056/NEJMoa010307 PubMedGoogle ScholarCrossref
    2.
    Yealy  DM, Kellum  JA, Huang  DT,  et al; ProCESS Investigators.  A randomized trial of protocol-based care for early septic shock.   N Engl J Med. 2014;370(18):1683-1693. doi:10.1056/NEJMoa1401602 PubMedGoogle Scholar
    3.
    Gao  F, Melody  T, Daniels  DF, Giles  S, Fox  S.  The impact of compliance with 6-hour and 24-hour sepsis bundles on hospital mortality in patients with severe sepsis: a prospective observational study.   Crit Care. 2005;9(6):R764-R770. doi:10.1186/cc3909 PubMedGoogle ScholarCrossref
    4.
    Sawyer  AM, Deal  EN, Labelle  AJ,  et al.  Implementation of a real-time computerized sepsis alert in nonintensive care unit patients.   Crit Care Med. 2011;39(3):469-473. doi:10.1097/CCM.0b013e318205df85 PubMedGoogle ScholarCrossref
    5.
    Semler  MW, Weavind  L, Hooper  MH,  et al.  An electronic tool for the evaluation and treatment of sepsis in the ICU: a randomized controlled trial.   Crit Care Med. 2015;43(8):1595-1602. doi:10.1097/CCM.0000000000001020 PubMedGoogle ScholarCrossref
    6.
    Giannini  HM, Ginestra  JC, Chivers  C,  et al.  A machine learning algorithm to predict severe sepsis and septic shock: development, implementation, and impact on clinical practice.   Crit Care Med. 2019;47(11):1485-1492. doi:10.1097/CCM.0000000000003891 PubMedGoogle ScholarCrossref
    7.
    Downing  NL, Rolnick  J, Poole  SF,  et al.  Electronic health record–based clinical decision support alert for severe sepsis: a randomised evaluation.   BMJ Qual Saf. 2019;28(9):762-768. doi:10.1136/bmjqs-2018-008765 PubMedGoogle ScholarCrossref
    8.
    Delahanty  RJ, Alvarez  J, Flynn  LM, Sherwin  RL, Jones  SS.  Development and evaluation of a machine learning model for the early identification of patients at risk for sepsis.   Ann Emerg Med. 2019;73(4):334-344. doi:10.1016/j.annemergmed.2018.11.036 PubMedGoogle ScholarCrossref
    9.
    Afshar  M, Arain  E, Ye  C,  et al.  Patient outcomes and cost-effectiveness of a sepsis care quality improvement program in a health system.   Crit Care Med. 2019;47(10):1371-1379. doi:10.1097/CCM.0000000000003919 PubMedGoogle ScholarCrossref
    10.
    Guidi  JL, Clark  K, Upton  MT,  et al.  Clinician perception of the effectiveness of an automated early warning and response system for sepsis in an academic medical center.   Ann Am Thorac Soc. 2015;12(10):1514-1519. doi:10.1513/AnnalsATS.201503-129OC PubMedGoogle ScholarCrossref
    11.
    Ginestra  JC, Giannini  HM, Schweickert  WD,  et al.  Clinician perception of a machine learning-based early warning system designed to predict severe sepsis and septic shock.   Crit Care Med. 2019;47(11):1477-1484. doi:10.1097/CCM.0000000000003803 PubMedGoogle ScholarCrossref
    12.
    Rolnick  JA, Weissman  GE.  Early warning systems: the neglected importance of timing.   J Hosp Med. 2019;14(7):445-447. doi:10.12788/jhm.3229 PubMedGoogle ScholarCrossref
    13.
    Makam  AN, Nguyen  OK, Auerbach  AD.  Diagnostic accuracy and effectiveness of automated electronic sepsis alert systems: a systematic review.   J Hosp Med. 2015;10(6):396-402. doi:10.1002/jhm.2347 PubMedGoogle ScholarCrossref
    14.
    Benthin  C, Pannu  S, Khan  A, Gong  M; NHLBI Prevention and Early Treatment of Acute Lung Injury (PETAL) Network.  The nature and variability of automated practice alerts derived from electronic health records in a U.S. nationwide critical care research network.   Ann Am Thorac Soc. 2016;13(10):1784-1788. doi:10.1513/AnnalsATS.201603-172BC PubMedGoogle Scholar
    15.
    Van Calster  B, Wynants  L, Timmerman  D, Steyerberg  EW, Collins  GS.  Predictive analytics in health care: how can we know it works?   J Am Med Inform Assoc. 2019;26(12):1651-1654. doi:10.1093/jamia/ocz130 PubMedGoogle ScholarCrossref
    16.
    Davis  SE, Lasko  TA, Chen  G, Siew  ED, Matheny  ME.  Calibration drift in regression and machine learning models for acute kidney injury.   J Am Med Inform Assoc. 2017;24(6):1052-1061. doi:10.1093/jamia/ocx030 PubMedGoogle ScholarCrossref
    17.
    Caldwell  P. We’ve spent billions to fix our medical records, and they’re still a mess: here’s why. Mother Jones. Published October 21, 2015. Accessed April 24, 2020. https://www.motherjones.com/politics/2015/10/epic-systems-judith-faulkner-hitech-ehr-interoperability/
    18.
    Rhee  C, Dantes  RB, Epstein  L, Klompas  M.  Using objective clinical data to track progress on preventing and treating sepsis: CDC’s new “Adult Sepsis Event” surveillance strategy.   BMJ Qual Saf. 2019;28(4):305-309. doi:10.1136/bmjqs-2018-008331 PubMedGoogle ScholarCrossref
    19.
    Rhee  C, Dantes  R, Epstein  L,  et al; CDC Prevention Epicenter Program.  Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009-2014.   JAMA. 2017;318(13):1241-1249. doi:10.1001/jama.2017.13836 PubMedGoogle ScholarCrossref
    20.
    Centers for Disease Control and Prevention. Hospital toolkit for adult sepsis surveillance. Published March 2018. Accessed February 11, 2021. https://www.cdc.gov/sepsis/pdfs/Sepsis-Surveillance-Toolkit-Mar-2018_508.pdf
    21.
    Henry  KE, Hager  DN, Pronovost  PJ, Saria  S.  A targeted real-time early warning score (TREWScore) for septic shock.   Sci Transl Med. 2015;7(299):299ra122. doi:10.1126/scitranslmed.aab3719 PubMedGoogle Scholar
    22.
    Oh  J, Makar  M, Fusco  C,  et al.  A generalizable, data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers.   Infect Control Hosp Epidemiol. 2018;39(4):425-433. doi:10.1017/ice.2018.16 PubMedGoogle ScholarCrossref
    23.
    Singh  K, Valley  TS, Tang  S,  et al.  Evaluating a widely implemented proprietary deterioration index model among hospitalized COVID-19 patients.   Ann Am Thorac Soc. 2020. Published online December 24, 2020. doi:10.1513/AnnalsATS.202006-698OC PubMedGoogle Scholar
    24.
    R Core Team. R: a language and environment for statistical computing. Published online 2020. Accessed May 4, 2021. http://www.r-project.org/
    25.
    pROC: Display and analyze ROC curves [R package pROC version 1.16.2]. Accessed April 23, 2020. https://CRAN.R-project.org/package=pROC
    26.
    Singh  K. The runway package for R. Accessed October 21, 2020. https://github.com/ML4LHS/runway
    27.
    Bennett  T, Russell  S, King  J,  et al  Accuracy of the Epic Sepsis Prediction Model in a regional health system.   arXiv. Preprint posted online February 19, 2019. https://arxiv.org/abs/1902.07276 Google Scholar
    28.
    Healthcare Cost and Utilization Project. HCUP weighted summary statistics report: NIS 2018 core file means of continuous data elements. Accessed March 8, 2021. https://www.hcup-us.ahrq.gov/db/nation/nis/tools/stats/MaskedStats_NIS_2018_Core_Weighted.PDF
    ×