Assessment of Machine Learning to Estimate the Individual Treatment Effect of Corticosteroids in Septic Shock

Key Points Question Can machine learning–derived estimated individual corticosteroid therapy effect yield better results than treat all or treat no one strategies in adults with septic shock? Findings In this cohort study using individual patient data from 2548 patients in 4 multicenter trials, the individual estimation-based treatment strategy always yielded a positive net benefit. Compared with individual estimation-based treatment rule, strategies to treat all patients or to treat no one were associated with a worse outcome. Meaning These findings suggest that the decision to treat patients with septic shock with hydrocortisone or hydrocortisone and fludrocortisone should be based on the estimated individual treatment effect as derived from machine learning.


Introduction
Sepsis continues to place a burden on the health care system worldwide, accounting for approximately 11 million deaths per year. 1 Apart from eradicating the infection and restoring cell metabolism with oxygen therapy, fluid replacement, and vasopressors, there is no specific treatment for sepsis. 2 In septic shock, there is moderate evidence from randomized clinical trials (RCTs) that corticosteroids may improve short-term survival. 3,4However, in practice, clinicians remain uncertain about the benefit of corticosteroids at the individual level.
Clinical trials are usually performed to estimate the average treatment effect (ATE).The translation of findings into clinical practice follows commonly a binary model in which if a trial yields positive results, all patients are treated, or in case of negative results, no one is treated.This approach assumes that the treatment effect for every patient will be similar to the ATE observed in the original trial.][10] Identifying the sources of treatment response heterogeneity (eg, gene variations in oncology) is central to the development of individualized treatment rules and personalized medicine. 8[13] In case of HTE, accessing data from RCTs offers the opportunity to train models for the estimation of the individual treatment effect (ITE) for a particular intervention. 5,8,14,15An estimation model for the ITE could assist in identifying patients likely to respond to treatment vs those unlikely to respond with the aim of guiding clinician decision-making and improving treatment efficiency. 7,14erefore, the primary objectives of this study were to estimate the ITE of corticosteroids in adults with septic shock in intensive care units (ICUs) using machine learning and to evaluate the net benefit of corticosteroids when the decision to treat is based on the individual estimated absolute treatment effect.

Data Studies
7][18] Study characteristics are presented in eTable 1 in the Supplement.The study by Annane et al 16 found that in corticotropin nonresponders (ie, patients who did not increase their cortisol by 9 μg/dL or more [to convert to nanomoles per liter, multiply by 27.588] in response to 250 μg corticotropin stimulation test), steroid supplementation improved survival. 16In the CORTICUS study, 17 no effect of hydrocortisone on survival was found, regardless of patients' response to a corticotrophin test.The COIITSS study 18 reported a 3% absolute reduction in in-hospital mortality in patients receiving hydrocortisone plus fludrocortisone compared with those receiving hydrocortisone alone.In the CRICS-TRIGGERSEP study, 19 90-day all-cause mortality was lower among patients who received hydrocortisone plus fludrocortisone than among those receiving placebo.A fifth study by Arabi et al 20 was used as an external validation cohort.The study by Arabi et al 20 was stopped for futility at interim analysis after 75 patients were enrolled; although patients in the hydrocortisone group had a higher rate of shock reversal, hydrocortisone was not associated with a reduction in 90-day mortality.

Population
We considered adults with septic shock as defined in individual trials.A summary of the inclusion and exclusion criteria for each individual trial is provided in eTable 2 in the Supplement.Missing data were handled by creating a binary missingness indicator.Since missingness may sometimes occur not at random, the indicators were included in the models to account for potential informative missingness.

Interventions
The experimental interventions considered for this analysis were hydrocortisone 50 mg as intravenous bolus every 6 hours for 5 to 7 days with or without tapering and hydrocortisone 50 mg as an intravenous bolus given every 6 hours plus enteral fludrocortisone 50 μg daily, given for 7 days without tapering.The control was either placebo or usual care.

Outcomes
We considered 90-day mortality the primary outcome.The secondary outcome was 28-day mortality.

ATE and ITE
We defined the ATE as the difference in 90-day mortality should everyone be treated with corticosteroids vs no one being treated.The ITE was defined as the difference in outcome at the individual level, should this patient receive or not receive the treatment.
The ATE was estimated separately for each study included in the analyses based on machine learning fits of the outcome given baseline factors and treatment using the targeted maximum likelihood estimator (TMLE) 21 adjusting for study, age, sex, admission category (ie, medical, elective surgery, or emergent surgery), severity of illness scores (measured using Simplified Acute Physiology Score [SAPS II] 22 and Sepsis-related Organ Failure Assessment [SOFA] score 23 ), characteristics of infection (hospital-vs community-acquired), infection site and pathogens, adrenal status (ie, baseline cortisol level and cortisol increment after 250 μg of corticotrophin), arterial lactate level, blood glucose levels, maximal dose of norepinephrine equivalent during the first 24 hours, and initial need for mechanical ventilation.The TMLE is a broad estimation framework for data-adaptive estimation methods that facilitates the construction of asymptotically efficient estimators with desirable finite-sample properties. 24The ATE was expressed as a relative risk (RR) and an absolute risk reduction (ARR) with their 95% CIs.
The ITE was estimated using 2 different approaches and expressed as an ARR: the baseline severity of illness model and the optimal individual model.First, as previously proposed, 14 we assumed that the ITE, ITE SAPS II , could be estimated as the baseline severity of illness as evaluated based on the SAPS II score 14 minus the baseline multiplied by the RR: in which Y is the outcome, A is the binary treatment indicator, W is the score indicative of the baseline severity, RR is the relative risk of A=1 (treatment) vs A=0 (control) and P(Y = 1|A = 0, W) is the baseline risk.In this particular case, W is the SAPS II 14 and baseline risk is obtained using the equation logitP(Y = 1|A = 0, SAPS2) = −7.7631+ 0.0737 × SAPS II + 0.9971 × ln(SAPS II + 1).
We used empirical RR derived from the original trial results and assumed that treatment effect increases linearly with baseline risk.
Alternatively, because treatment effect may not correlate linearly with baseline severity of illness, we developed an optimal individual model, that is, an estimation model for the probability of dying during the first 90 days following ICU admission using individual data from Annane et al, 16 CORTICUS, 17 COIITSS, 18 and CRICS-TRIGGERSEP. 19The optimal individual estimation model is a model for (P[Y= 1|A,W]), that is, it includes patients' characteristics as well as the treatment actually received (ie, hydrocortisone, hydrocortisone + fludrocortisone, or control).Thus, estimating the ITE based on this model does not require the assumption that treatment effect increases linearly with baseline risk.Specifically, the variables included as factors in the estimation models were treatment received (ie, hydrocortisone, hydrocortisone + fludrocortisone, or control), study, age, sex, admission category (ie, medical, elective surgery, or emergent surgery), severity of illness scores (ie, SAPS II 22 and SOFA score 23 ), characteristics of infection (hospital-vs community-acquired), infection site and pathogens, adrenal status (ie, baseline cortisol level and cortisol increment after 250 μg of corticotrophin), arterial lactate level, blood glucose levels, maximal dose of norepinephrine equivalent during the first 24 hours, and initial need for mechanical ventilation.As an alternative to standard regression approaches, we used an ensemble machine learning algorithm called Super Learner. 25Within this algorithm, we used 10-fold cross-validation and the cross-validated performance of the area under the curve (AUC) of the receiver operator curve as the measure of fit to derive the final model.The library of algorithms included in the Super Learner included parametric (ie, logistic regression with and without interaction terms, stepwise regression models based on the Akaike information criterion, and Bayesian generalized linear model) and nonparametric learners (ie, generalized additive models, multivariate adaptive regression splines, gradient boosting, random forest, kernel support vector machine, and support vector machine).The wide range was chosen so that the resulting algorithm could virtually flexibly fit any functional form.

Net Benefit
We used the net benefit to quantify the impact of treatment initiation strategies account for both the reduction in the event rate and the risk associated with the treatment. 14,15t D i be the individualized estimation of treatment effect for patient i.Let T be the threshold for D, such that treatment is initiated in patient i if t < D i and treatment is avoided if t > D i .If t = D i , it is uncertain whether the treatment should be prescribed.Hence, the threshold T is used to represent the risk associated with the treatment.The net benefit is defined as 15 Net benefit = decrease in event rate -treatment rate × T More specifically, it is calculated as: in which Y (0,1),i is the individual outcome under each treatment option, n 1 is the number of patients treated, and n 0 is the number of patients not treated.
Based on this definition, the net benefit was calculated as for the treat everybody strategy and for the baseline severity of illness strategy, in which P[ITE SAPS II ] > T is the proportion of patients with an expected reduction in event rate greater than T (ie, the treatment rate according to this treatment rule).The optimal individual model strategy was calculated as The net benefit of treating no one serves as the reference and is equal to zero.The net benefit as described by Vickers et al 15 represents the decrease in the proportion of events associated with treatment minus the proportion of patients treated multiplied by the cost of treatment.
Thus, a negative net benefit means that treating no one is preferable over treating based on a particular strategy (eg, everyone, based on an estimation model, or based on a scoring system) for this particular threshold.In this study, we compare the following treatment strategies: treat all patients, treat no one, treat based on the severity score, or treat based on the Super Learner-derived estimated ITE.

Number Willing to Treat
Ideally, the decision threshold takes into account the potential harms secondary to receiving the treatment.For instance, if the harm associated with experiencing the outcome is considered to be 10-fold worse than those of treatment adverse effects, the appropriate decision threshold is 10%.In this case, the treatment should be initiated only in individuals whose estimated absolute treatment effect exceeds 10%.Usually, however, clinicians do not make a decision based on a decision threshold but rather evaluate what is the maximum acceptable number of patients needed to treat to avoid 1 outcome event.In this context, Dorresteijn et al 14 proposed to use the number willing to treat (NWT), defined as the inverse of the decision threshold.If treating 10 patients is assumed to generate as much harm as 1 outcome event, clinicians would be willing to treat up to 10 patients to prevent 1 event.In such case, the NWT is 10, which is equal to 1 / T in which T is 10%.
[18][19] Therefore, as suggested by Dorresteijn et al, 14 we calculated the net benefit for a range of possible values of NWT.

Performance of the Estimation Models
The performance of the estimation model was evaluated both internally and externally using the data from a different trial. 20To evaluate the discrimination performance of the model, we computed the cross-validated AUC together with its 95% CI.Model calibration was evaluated by plotting the estimated probability vs observed prevalence of the outcome and by computing the Brier score. 26e same metrics were estimated in the external validation cohort. 20

Decision Trees
To help clinicians decide if a given patient should be receiving steroids or not, we complemented the analysis by generating a decision tree based on age, sex, admission category (ie, medical, elective surgery or emergent surgery), SAPS II, SOFA score, characteristics of infection (ie, hospital-vs community-acquired), infection site, adrenal status (baseline cortisol level and cortisol increment after 250 μg of corticotrophin), arterial lactate level, and maximal dose of norepinephrine equivalent during the first 24 hours.The decision tree was generated using a pruned recursive partitioning algorithm.The complexity parameter was optimized using 20-fold cross validation.
All statistical analyses were performed on R statistical software version 3.5.1 (R Project for Statistical Computing).running on macOS (Apple) platform.P values were 2-sided, and statistical significance was set at .05.Data were analyzed from September 2019 to February 2020.

Individual Studies and Pooled ATE
The

Baseline Severity of Illness and Optimal Individual Model
The observed mortality rate at 90 days was 47.7% (95% CI, 45.7% to 49.6%) (eTable 3 in the Supplement).Based on the SAPS II, the mean estimated probability of death was 55.0% (95% CI, 53.8% to 56.1%)in the overall sample (eTable 4 in the Supplement).The AUC of the SAPS II was 0.64 (DeLong 95% CI, 0.62 to 0.67) (Figure 1).
Based on the optimal individual model, the mean estimated probability of death was of 47.7% (95% CI, 46.8% to 47.8%) in the overall sample (eFigure 1 in the Supplement).The optimal individual model discrimination is illustrated in Figure 1.The cross-validated AUC was 0.74 (95% CI, 0.72 to 0.76).Figure 2 illustrates the good calibration of the optimal model (Brier score = 0.21).The estimation performance was similar when using 28-day mortality as the outcome (cross-validated AUC, 0.74; 95% CI, 0.72 to 0.76; Brier score = 0.20).In the external validation cohort, the AUC of the optimal individual model of patients was 0.77 (DeLong 95% CI, 0.59 to 0.92), and the Brier score was 0.28.
The distribution of the ITE for each corticosteroid regimen is illustrated in eFigure 2 in the Supplement .Using the baseline severity of illness model to decide which treatment individual patients should be receiving, the estimated mean ARR was of 5.85% (95% CI, 5.73% to 5.97%) (eFigure 3 in the Supplement).Using the optimal individual model, the estimated mean ARR was of 2.90% (95% CI, 2.79% to 3.01%).

Net Benefit and NWT
As illustrated in Figure 3, the expected net benefit seemed to highly depend on the treatment strategy.The net benefit of the treat everybody strategy of treating all patients with hydrocortisone or hydrocortisone with fludrocortisone was positive for any NWT greater than 25, meaning that treating all patients with hydrocortisone or hydrocortisone with fludrocortisone was superior to treating no one if the NWT was high (ie, very little harm associated with treatment) but not if the  NWT was low (ie, considerable harm associated with treatment).For an NWT of approximately 25, the benefits of treating all patients and treating no one were equivalent (net benefit close to zero).
When the NWT decreased to less than 25, the net benefit of treating all patients with hydrocortisone or with hydrocortisone and fludrocortisone was found to be negative, meaning that treating all patients with hydrocortisone or hydrocortisone with fludrocortisone was inferior to treating no one.
Using the estimation-based treatment strategies (ie, based on the severity of illness model or on the optimal individual model) were consistently associated with greater net benefit than treating all patients, regardless of the NWT (Figure 3).While both estimation-based net benefit curves converged to zero for very low NWT values, a treatment strategy based on the optimal individual model was significantly more beneficial than treating based only on the SAPS II.When the NWT was 25, the net benefit was 0.01 for the treat all with hydrocortisone strategy and −0.01 for the treat all with hydrocortisone and fludrocortisone strategy at the cost of treating 100% of patients; the net benefit was 0.06 for the treat by SAPS II strategy at the cost of treating 13.3% of patients 0.31 for the treat by optimal individual model strategy at the cost of treating 14.9% of patients.eFigure 4 in the Supplement illustrates the net benefit according to the proportion of patients who received the treatment for each estimation-based strategy.None of these results were substantially altered when using 28-day mortality as the outcome.The net benefit associated with the optimal individual model in the external validation cohort is illustrated in eFigure 5 the Supplement.

Interpretation for Clinical Practice
eTable 4 in the Supplement illustrates the difference in characteristics between the patients with a low estimated ITE (first quartile of the ITE distribution) vs high predicted ITE (last quartile of the ITE distribution).eFigure 6 in the Supplement proposes a decision tree for an NWT of 50, corresponding to a decision threshold of 0.02.

Discussion
This cohort study found that a personalized approach based on the estimated ITE to decide if a patient with septic shock should be treated with corticosteroids was never harmful to the patients, regardless of potential corticosteroid-related adverse effects.Conversely, a treatment policy based on the ATE (ie, treat all patients or treat no one) identified from RCTs and meta-analyses may generate more harm than benefit at the individual level.The y-axis is the net benefit for each treatment strategy compared with treating no one.Treating no one served as a reference and is equal to zero.For treat all patients and treat based on the Simplified Acute Physiology Score (SAPS II), the treatment considered is either hydrocortisone alone or hydrocortisone with fludrocortisone.For the optimal individual model, the treatment is the one expected to produce the maximal effect at the individual level.The x-axis is the NWT, which is equal to 1 / decision threshold.Shading indicates 95% CI.
RCTs are usually used to estimate the ATE, which is then interpreted in binary manner, whereby for a positive result, the recommendation is to treat all patients and for a negative result, the recommendation is to treat no one.RCTs are considered as the criterion standard for evidence-based medicine.However, as first reported by Hill et al in 1966, 27 the ATE is probably not the most informative measure of treatment effect for a clinician who is seeking the best treatment strategy for a particular patient.This is owing to HTE, which describes how treatment's effect varies across individuals 8 and can be defined as nonrandom variability in a treatment effect, explaining that the individual response to treatment may vary substantially from the ATE.Different approaches have been proposed to deal with the, including subgroup analysis to identify clusters of patients characterized by a more homogeneous response to treatment. 8An alternative and arguably superior approach is to develop accurate, multivariable models to estimate which treatment option is likely to be best at the individual level. 8,14This estimation approach can rely on risk modeling, whereby treatment effect is reported across risk strata for the primary outcome.In this study, we used a baseline risk modeling approach based on the SAPS II and showed that a treatment decision based on risk modeling yielded superior net benefit that a decision purely based on the ATE ignoring the heterogeneity among patients in the impact of treatment.
The estimation approach can be more complex and rely on treatment effect modeling. 82][33] Dorresteijn et al 14 used the data from RCTs on the benefit of statins to estimate treatment effect for individual patients and showed than the treatment effect modeling approach was associated with more net benefit than treating everyone or no one.Likewise, for corticosteroids in septic shock, we found that a treatment effect modeling approach is superior to a treatment decision strategy based on the ATE and a baseline risk modeling approach.The outcome of treating patients with septic shock with corticosteroids was evaluated based on the net benefit to account for potentially severe adverse effects associated with this class of medication.For each treatment strategy, the net benefit was estimated using a range of adverse effect severity.This range was expressed using the NWT, in which the higher the NWT, the fewer adverse effects associated with the drug and vice versa.Using this approach, we found that a treatment strategy based on the optimal individual model was consistently superior to other approaches.Moreover, for any NWT less than 25, the net benefit of treating all patients with corticosteroids (hydrocortisone or hydrocortisone + fludrocortisone) was found to be negative.
The Super Learner was used to model the probability of death during the first 90 days following admission.This ensemble machine learning approach was shown to be mathematically optimal, that is, have oracle properties. 25The Super Learner was previously shown to perform better than a number of alternative modeling approaches in the context of mortality estimation in the ICU. 34It was also shown to be a method of choice when dealing with heterogeneity in treatment effect. 35nsistently, we found that the Super Learner-based estimation model used to estimate the ITE was associated with better discrimination properties than the SAPS II and more importantly, with excellent calibration.Hence, the net benefit associated with the Super Learner-derived optimal individual model was superior to the treat all strategy and the strategy based on SAPS II.Of note, since the severity of illness model approach relies on multiplying the baseline risk by 1 minus the RR, it does not allow for ITE to go in opposite directions based on patients' characteristics.This important difference between the 2 approaches is illustrated in eFigure 3 in the Supplement.Finally, the SAPS II is well known to overestimate the probability of death, 34,36 thereby resulting in an inflation of the estimated net benefit.Interestingly, Luedtke et al 37 have shown that the Super Learner can be used not only to unbiasedly estimate the ATE but also to learn a treatment rule based on covariates and estimate the impact of using this optimal rule.

Limitations
This study has some limitations.First, the results may not be generalizable to all patients since the data used to derive the individual estimations are constrained by the inclusion and exclusion criteria used in the RCTs [16][17][18][19] used to train the models.However, these 4 studies [16][17][18][19] were selected because they yielded conflicting results regarding the overall benefit of treating patients with steroids, they represent a typical situation in which the estimation of ITE can be used to identify those who will benefit from treatment.To challenge the performance of our algorithm, we tested it using data from an external trial 20 and, despite a limited size and a substantially higher death rate, found overall good performance.In the future, making the ITE estimation model available to all would be a way to address this limitation by prospectively collecting additional observational data and further recalibrating the models to improve their performance in a particular environment.Second, we used the SAPS II as an alternative to the optimal individual model to identify patients who may benefit from receiving corticosteroids.The SAPS II was developed to estimate hospital mortality, while we used 90-day mortality as the primary outcome measure.Nevertheless, we found consistent results for 28-day mortality and 90-day mortality.The SOFA score 23 is often preferred by clinicians to evaluate illness severity in the ICU.However, since this score was not intended to estimate mortality, there is no direct way to use it to estimate the ITE.The validation cohort included adults with cirrhosis and septic shock. 20While this is a very specific group of patients, it helped to challenge even further the performance of the algorithm.Third, to refine the individualized treatment strategy, one would need to choose the appropriate NWT accounting for the frequency and the severity of adverse effects.Fourth, the decision tree generated to illustrate the use of the ITE in clinical practice should be considered with caution, and the net benefit of such an estimation-based treatment strategy would have to be confirmed prospectively.

Conclusions
This cohort study found that an individualized estimation-based treatment strategy to decide which patients with septic shock to treat with corticosteroids and which corticosteroid regimen to administer yielded positive net benefit regardless of potential corticosteroid-associated adverse effects.This promising result will need to be validated in a prospective manner.
was used to externally validate the results.As a study exclusively based on the analysis of fully deidentified data, this research was considered nonhuman participants research and deemed exempt from informed consent by the Comité des Protection des Personnes Ile de France III.This study was part of the Rapid Recognition of Corticosteroid Resistant or Sensitive Sepsis (RECORDS) program approved by the Comité des Protection des Personnes Ile de France III.This report follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) and Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines.
Of these, 515 patients received hydrocortisone alone, 1009 patients received hydrocortisone plus fludrocortisone, and 1024 patients received a placebo or no treatment.
Table provides the point estimates for the ATE of corticosteroids on 90-day mortality as reported in individual study as well as the ATE pooled across studies.Compared with the control,

Table .
Estimated Treatment Effect on 90-Day and 28-Day Mortality a Compares hydrocortisone with fludrocortisone vs placebo.b Compares hydrocortisone vs placebo.c Compares hydrocortisone with fludrocortisone vs hydrocortisone.
Description of Studies Included in the Analysis eTable 2. Inclusion and Exclusion Criteria for Each Trial eTable 3. Characteristics of the Population eFigure 1.Estimated Probability of 90-day Mortality Based on SAPS II and on the Optimal Individual Model eFigure 2. Distribution of the Estimated Individual Treatment Effect by Steroid Regimen eFigure 3. Distribution of Maximal Absolute Risk Difference by Treatment Strategy eFigure 4. Net Benefit According to the Proportion of Patients Receiving Treatment eFigure 5.Estimated Net Benefit Based Number Willing to Treat in the External Validation Cohort eFigure 6. Decision Tree for a Number Willing to Treat of 50 Patients eTable 4. Characteristics of the Patients With a Estimated Individual Treatment Effect Within the First vs Fourth Quartiles