Development and Validation of a Machine Learning Model to Estimate Bacterial Sepsis Among Immunocompromised Recipients of Stem Cell Transplant

Key Points Question Can machine learning be used with electronic medical record data to improve bacterial sepsis prediction among recipients of allogeneic hematopoietic cell transplant (allo-HCT)? Findings In this prognostic study including 1943 recipients of allo-HCT, the population-specific full predictor bacterial sepsis decision support tool (SHBSL) had superior prognostic performance regardless of outcome or patient location compared with the clinical factor–specific SHBSL and existing tools. Additionally, SHBSL had higher positive predictive values relative to sensitivities than existing tools. Meaning These findings suggest that, if used at the time of blood culture collection, the SHBSL could provide relevant information regarding bacterial sepsis risk and antibiotic needs of recipients of allo-HCT.

Given this data limitation, we employed a two-step missing data approach.First, we restricted to factors that had measurements for at least 65% of the examined PBIs.Second, we filled the remaining missing measurement observations using a single, normal value imputation approach.Specially for each missing value, we randomly selected a value from the normal biological range of that factor.The specific normal value rangers used can be found in are e.Table2.actually fare (in relation to a specific outcomesuch as high-risk bacteremia).The NRI conducts this comparison by measuring the net improvement of appropriate patient placement (i.e.events in higher prediction categories) of one model (say SHBSL) over the other (say SIRS).
To calculate this metric, we considered SHBSL to be the "updated" model and the existing tool or C-SHBSL to be the "base" model.In order to get risk probability categories for existing tools, we estimated the risk probabilities associated with each score for SIRS, NEWS, and qSOFA using single variable logistic regressions.Using the resulting risk probability categories, we estimated the reclassification of SHBSL in reference to the existing tools.In order to calculate the NRI between SHBSL and C-SHBSL, we defined our categories as the quartiles of C-SHBSL's risk probabilities.Patient location was defined using full calendar days and patients were considered to be inpatient for the full day of hospital admission and discharge.Culture location was defined based on the patient location for each day.
---a 95% Confidence Intervals estimated using Clopper Pearson methods b No PBIs accompanied within these scores in the validation dataset There were multiple NEWS scores that did not appear in our validation dataset, namely higher ones (15-2).This is likely the fact that we are observed a large amount of missingness in predictors included in this tool and we imputed missingness using a single normal value.For example, we observed >90% missingness within GCS and >60% missingness within oxygen saturation (Spo2).The cross validated area under the curve (cvAUC) estimates for each user supplied algorithm and the resulting Super Learner for the full predictor (SHBSL) and clinical factor-specific predictor (C-SHBSL) prediction tools.The name structure for is algorithm is as follows: SL refers to the fact that the algorithms are running within the super learner framework, algorithm name (detailed list above), screen (screen.corP_strickor screen.glmnet)or absence of screen (All).
Despite the Super Learner ensembles having slightly lower cvAUCs than (screened -SHBSL and base -C-SHBSL) random forest with balanced groups of 100 samples each, we chose to present the ensembles as our final models.We made this decision because the ensembles performed essentially as well as the random forest (<1 AUC percentage point difference) and they provide the least biased option for future updates.Because excessive testing has shown that, in small data settings, the Super Learner performs essentially as well as any provided algorithm 4 (as we saw in our data), we feel comfortable assuming that if, upon future parameter updating, the best fit provided algorithm is no longer the group-matched random forest, the Super Learner will still perform essentially as well as (if not better than) the best provided algorithm.Our tools were developed using data from a single center and may be biased towards data collection practices of that center.In the absence of external data (another center's) data for validation at the time of manuscript development, we performed numerous sensitivity analyses to better understand the generalizability of our developed tools.Each sensitivity analysis was conducted among patients randomized to the held-out validation dataset.We estimated each tools AUROC and sensitivity and specificity and the following cut-points: 2+ for SIRS and qSOFA, 7+ for NEWS, and Upper-Left method selected optimal cut-point for our tools (selected in the primary analysis held-out validation data -SHBSL: 5.0%, C-SHBSL: 9.3%).

Missing Data
Because vitals and laboratory measure missingness may differ between clinical staff, our tool may be biased towards the missingness patterns of our center.We performed numerous sensitivity analyses to examine the robustness of our tool's superiority over existing tools under varying missingness scenarios.Along with testing the robustness of our results to missing data patterns, these analyses also provide insight into the predictor collection requirements of each tool (ie.does each tool require complete data to run optimally or can it produce similar estimations under varying amounts of missingness).Specifically, we examined the predictive value of each tool under the following scenarios: 1. Complete Case (restricted to only samples with full data present) 2. Partial Missingness (restricted to samples with at least one missing sample) 3. Average Missingness (restricted to samples with at least three (the average observed number of missing measurements) missing measurements

Race Stratified
Despite the Fred Hutchinson Cancer Research Center / Seattle Cancer Care Alliance (FHCRC/SCCA) having a large catchment area in the Northwest region of the United States, its transplant patient population is largely non-Hispanic white and may not reflect (racially/ethnically) the national transplant population.6][7][8] To examine the generalizability of our tool to a more racially/ethnically diverse population, we estimated the predictive value of our tool stratified by race/ethnicity.Due to limited data for non-white racial/ethnic groups, we examined the following stratification:

Window of Measurement Collection
We developed tools to provide decision support at the time of culture collection and based on our predictor measurement inclusion defined the run time of the tools as 2 hours following culture collection.This time period was selected because predictor measurements were commonly collected within the first few hours following culture collection and we considered these measures to be associated with the collection of the culture.However, in practice, it may be desirable to run these tools closer in time to the culture.For this reason, we estimated the AUC, sensitivity, and specificities of each examined tools using data from the 24 hours and: 1. 1 hours prior 2. 0.

Culture Collection Practices
Our tools are designed to supply decision support at the time of blood culture collection and are likely biased towards the culture collection practices at our center.While we cannot retrospectively change the culture collection practices of our center to examine the generalizability of our tools, we were able to adjust the cultures we included and include a proportion of non-culture time to simulate a "what if cultures had been collected" scenario.
1. Excluding Surveillance Cultures -Surveillance status was not perfectly recorded, and this sensitivity analysis was performed among cultures with recorded reasons other than "surveillance."While this reduces the number of surveillance cultures included in our analysis, it may not fully remove them.

Antibiotic Influence on Culture Results
We carefully selected our definition of bacterial sepsis for our tool development but understand that culture confirmation of a blood borne infection comes with limitations.A large limitation being that recent antibiotic use may impact culture results.We addressed this in tool development and the primary evaluation by excluding cultures collected within 3 days of antibiotic use but excluded our center's primary prophylaxis, levofloxacin.To further account for the potential impact of recent antibiotic use, we performed the following sensitivity analysis: 1

eFigure 3 . 4 .
Flowchart of Hematopoietic Cell Transplant Recipient (HCT) Study Population and Potential Bloodstream Infection Cohort eFigure Histograms of Location of Patient and Collected Cultures by Day Since Transplant

eFigure 6 .
Cross-Validated Area Under the Curve Estimates of Super Learner and User-Supplied Algorithms A. SHBSL -Full Predictor Tool B. C-SHBSL -Clinical Factor-Specific Tool

eFigure 1 .
Bacterial Sepsis Prognosis Tool Schematic eAppendix 1. Outcome Justification eTable 1. Frequency of Potential Factors Screened for Model Inclusion eFigure 2. Missingness Frequency and Pattern for Potential Model Factors eAppendix 2. Imputation Approach eTable 2. Reference Value Ranges for Single Reference Value Imputation eAppendix 3. Detailed Description of Super Learner Library eAppendix 4. Net Reclassification eFigure 3. Flowchart of Hematopoietic Cell Transplant Recipient (HCT) Study Population and Potential Bloodstream Infection Cohort eFigure 4. Histograms of Location of Patient and Collected Cultures by Day Since Transplant eTable 3. Population Demographic and Transplant Factors by Randomly Assigned Modeling and Validation Dataset eTable 4. Discrimination and Predictive Accuracy of Sepsis Prognosis Tools for High-Sepsis Risk Bacteremia eTable 5. Categorical Reclassification Index Comparing Risk Classification of Full Predictor Tool (SHBSL) With Other Examined Tools eFigure 5. Summary of Prediction Scores by Patient Location eAppendix 5. Ensemble Selection Rational eFigure 6. Cross-Validated Area Under the Curve Estimates of Super Learner and User-Supplied Algorithms eTable 6. Cross-Validated and Bootstrapped Area Under the Curve Estimates for High-Sepsis Risk Bacteremia eTable 7. Calibration and Observed vs Estimated High-Sepsis Risk Bacteremia Probabilities eAppendix 6. Sensitivity Analyses eTable 8. Predictive Ability of Examined Prognostic Tools for High-Sepsis Risk Bacteremia Among Allogeneic HCT Recipients with PBIs Under Varying Missing Data Assumptions eTable 9. Predicative Ability of Sepsis Prognosis Tools Among Varying Racial/Ethnic Groups eTable 10.Predicative Ability of Sepsis Prognosis Tools Under Varying Factor Measurement Collection Time Windows eTable 11.Predictive Ability of Sepsis Prognosis Tools Under Varying Culture Collection and Potential Infection Restriction Definitions a GVHD: Graft vs. Host Disease eAppendix 2. Imputation Approach

eTable 2 .
Reference Value Ranges for Single Reference Value Imputation Discrimination and Predictive Accuracy of Sepsis Prognosis Tools for High-Sepsis Risk Bacteremia a a Experiences during follow-up b Non-Hispanic eTable 4.

eTable 6 .
Cross-Validated and Bootstrapped Area Under the Curve Estimates for High- 4. Single Value from Observed Range Imputation (values within the observed predictor range were selected at random for each observation) eTable 8. Predictive Ability of Examined Prognostic Tools for High-Sepsis Risk Bacteremia Among Allogeneic HCT Recipients with PBIs Under Varying Missing Data 2. Percent non-Culture 24 Hours: To the observed PBIs in the validation data (2286), we added additional periods of 24-hour follow-up.The 24-hour (one full calendar day) data collection periods were randomly selected among calendar days between transplant and end of post-transplant follow-up.24-hour periods within 3 days of a culture collection and during follow-up cultures periods were excluded.Based on Upper-Left Selected Cut-points from primary validation analysis (C-SHBSL: 9.3%; SHBSL: 5.0%) b AUROC: area under the receiver operating characteristic curve, PR-AUC: area under the precision recall curve c 95% Confidence Interval estimated using 2000 stratified bootstrapped replicates d 95% Confidence Intervals estimated using generalized estimating equations with robust standard errors a . All Cultures Regardless of Recent Antibiotic Use 2. No Antibiotic Use in Last 3 Days (Including Levofloxacin) 3. No Use in Last 7 Days (Excluding Levofloxacin) eTable 12. Predictive Ability of Sepsis Prognosis Tools Under Varying Recent Antibiotic Based on Upper-Left Selected Cut-points from primary validation analysis (C-SHBSL: 9.3%; SHBSL: 5.0%) b AUROC: area under the receiver operating characteristic curve, PR-AUC: area under the precision recall curve c 95% Confidence Interval estimated using 2000 stratified bootstrapped replicates d 95% Confidence Intervals estimated using generalized estimating equations with robust standard errors a