In A, the ribbons connect from an individual phenotype to an organ system if the group mean is greater or lesser than the overall mean for the entire cohort. For example, the δ phenotype (light blue) is more likely to have members with abnormal cardiovascular and hepatic dysfunction (ribbons connect with these portions of the circle) vs β phenotype members (light purple) who are more likely to have kidney dysfunction and other abnormal variables (eg, increased age, comorbidity). In B-E, each phenotype is highlighted separately and the ribbons connect to the different patterns of clinical variables and organ system dysfunctions on the top of the circle.
In all panels, the variables are standardized such that all means are scaled to 0 and SDs to 1. A value of 1 for the standardized variable value (x-axis) signifies that the mean value for the phenotype was 1 SD higher than the mean value for both phenotypes shown in the graph as a whole. ALT indicates alanine transaminase; AST, aspartate transaminase; Bands, also known as premature neutrophil count; BUN, blood urea nitrogen; CRP, C-reactive protein; ESR, erythrocyte sedimentation rate; GCS, Glasgow Coma Scale; INR, international normalized ratio; Pao2, partial pressure of oxygen; SENECA, Sepsis Endotyping in Emergency Care; SBP, systolic blood pressure; WBC, white blood cell.
Ratio of IL-6 was calculated as the cytokine value standardized by the median value for the α phenotype in each study (referent) illustrated on a log scale. All comparisons within data sets across phenotypes were significant (P < .001). Errors bars indicate the upper bound of the interquartile range of the biomarker standardized by the median value for the α phenotype. Across multiple cohorts and randomized trials, inflammatory cytokines IL-6, IL-10, and TNF measured at baseline were greater in the γ phenotype (pink) and δ phenotype (blue) compared with the α phenotype (green), suggesting a predominantly hyperinflammatory response. TNF indicates tumor necrosis factor.
Heatmap shows the ratio of the median biomarker value for various markers of the sepsis host response grouped by those reflecting coagulation, endothelium, inflammation, and renal injury. Orange represents a greater median biomarker value for that phenotype compared with the median for the entire study, whereas colors in the tan to brown range represent lower median biomarker values compared with the median for the entire study. Empty cells are those for which the biomarker was not measured. The factor V, factor IX, plasminogen, protein C, and protein S biomarkers were reversed on the scale to coordinate the color map. The IL-1b and IL-12 biomarkers are not shown due to having less than 0.5-fold changes. ACCESS indicates A Controlled Comparison of Eritoran in Severe Sepsis; COL-4, collagen type 4; GenIMS, Genetic and Inflammatory Markers of Sepsis; ICAM, intercellular adhesion molecule 1; IGFBP-7, insulin-like growth factor–binding protein 7; KIM-1, kidney injury molecule 1; PAI-1, plasminogen activator inhibitor 1; ProCESS, Protocol-Based Care for Early Septic Shock; PROWESS, Activated Protein C Worldwide Evaluation in Severe Sepsis; TAT, thrombin-antithrombin; TIMP-2, tissue inhibitor of metalloproteinase 2; TNF, tumor necrosis factor; VCAM, vascular cell adhesion molecule.
All panels show significant differences in mortality by phenotype (log-rank P < .001). In the SENECA derivation and validation cohorts, in the GenIMS cohort, and in the 3 randomized clinical trials, clinical phenotypes are associated with short-term mortality. This suggests that phenotypes are generalizable and prognostic across data sets with different severity, temporality, and definitions of sepsis and septic shock. ACCESS indicates A Controlled Comparison of Eritoran in Severe Sepsis; GenIMS, Genetic and Inflammatory Markers of Sepsis; ProCESS, Protocol-Based Care for Early Septic Shock; PROWESS, Activated Protein C Worldwide Evaluation in Severe Sepsis; SENECA, Sepsis Endotyping in Emergency Care.
aThe cumulative mortality data are only for unique patients in the SENECA derivation cohort (16 652 of 20 189 total patients) and in the SENECA validation cohort (31 160 of 43 086 total patients).
For each trial (ACCESS, PROWESS, and ProCESS), panel A shows the actual distribution of the 4 phenotypes in that trial (horizontal bar graph) and the observed proportion of trials concluding no difference (neutral), harm, or benefit in simulation (vertical stacked bar graph). Each simulation represents 10 000 iterations using sampling with replacement. Panel B shows how simulated trial results vary when the case mix is changed to the distributions shown in the top set of graphs by varying α (panel B) and δ (panel C). ACCESS indicates A Controlled Comparison of Eritoran in Severe Sepsis; EGDT, early goal-directed therapy; HBN, harm, benefit, or neutral; ProCESS, Protocol-Based Care for Early Septic Shock; PROWESS, Activated Protein C Worldwide Evaluation in Severe Sepsis; SENECA, Sepsis Endotyping in Emergency Care.
Collaborators and acknowledgements
eFigure 1. Study schematic
eFigure 2. Heatmap of correlation of phenotype variables
eFigure 3. Patient accrual in SENECA derivation cohort
eFigure 4. OPTICS plot for SENECA derivation data
eFigure 5. Consensus k clustering in SENECA derivation data
eFigure 6. Descriptive plots of phenotyping variables, age through chloride
eFigure 7. Descriptive plots of phenotyping variables, c-reactive protein through INR
eFigure 8. Descriptive plots of phenotyping variables, serum lactate through troponin
eFigure 9. Rank order of variables importance
eFigure 10. Frequency of phenotypes across 12 hospitals in SENECA derivation data and GenIMS
eFigure 11. Proportion of phenotypes without parenteral antibiotics or blood cultures as first intervention
eFigure 12. Histogram of latent class probabilities by phenotype
eFigure 13. t-SNE plots of phenotype assignments in SENECA derivation cohort
eFigure 14. Probability of assignment for phenotype members and non-members
eFigure 15. Consensus k means clustering results from SENECA validation cohort
eFigure 16. Mean standardized differences between variables across phenotype pairs in SENECA derivation and validation cohorts
eFigure 17 Euclidean distances by phenotype for GenIMS cohort study
eFigure 18. T-SNE plots of phenotype assignments in RCTs
eFigure 19. Euclidean distances by phenotype for ACCESS trial
eFigure 20. Euclidean distances by phenotype for PROWESS trial
eFigure 21. Euclidean distances by phenotype for ProCESS trial
eFigure 22. 365-day mortality by phenotype in ACCESS, PROWESS, and ProCESS trials
eFigure 23. Cumulative 28-day survival by treatment arm within phenotypes in the ACCESS trial
eFigure 24. Cumulative 365-day survival by treatment arm within phenotypes in the ACCESS trial
eFigure 25. Cumulative 28-day survival by treatment arm within phenotypes in the PROWESS trial
eFigure 26. Cumulative 365-day survival by treatment arm within phenotypes in the PROWESS trial
eFigure 27. Cumulative 28-day survival by treatment arm within phenotypes in the ProCESS trial
eFigure 28. Cumulative 365-day survival by treatment arm within phenotypes in the ProCESS trial
eFigure 29. Simulation of phenotype enrichment in the ProCESS trial
eFigure 30. Simulation of phenotype enrichment in the ACCESS trial
eFigure 31. Simulation of phenotype enrichment in the PROWESS trial
eFigure 32. Control group mortality rates in simulation compared to contemporary RCTs
eFigure 33. Alluvial plot of phenotypes by baseline SOFA score
eFigure 34. Distribution of phenotypes across APACHE quartiles in 3 RCTs
eFigure 35. Alluvial plot of phenotypes by infection site in the ACCESS trial
eFigure 36. Distribution of phenotypes among patients with bacteremia in SENECA derivation cohort
eFigure 37. Comparison of clinical variables between phenotypes and APACHE3 quartiles
eFigure 38. Comparison of biomarkers between phenotypes and APACHE3 quartiles
eFigure 39. Sensitivity analysis of enrichment by APACHE3 quartile in ProCESS trial
eTable 1. Clinical variables used in models to derive phenotypes
eTable 2. Biomarkers available in cohort and trial data
eTable 3. Range, direction, and transformation of variables for model in SENECA cohorts
eTable 4. Missing data
eTable 5. Characteristics of cohort studies
eTable 6. Characteristics of infection and organ dysfunction screening in SENECA derivation and validation cohorts
eTable 7. Characteristics of 3 randomized trials
eTable 8. Characteristics in derivation and validation data after multiple imputation
eTable 9. Blood culture rate and parenteral antibiotic administration by phenotype
eTable 10. Statistical measures of fit for latent class models
eTable 11. Clinical characteristics of phenotypes derived using latent class analysis
eTable 12. Clinical characteristics of phenotypes derived in SENECA validation cohort
eTable 13. Clinical characteristics of phenotypes after excluding variables with missing data
eTable 14. Clinical characteristics of phenotypes after excluding variables with missing data and high correlation
eTable 15. Clinical characteristics of phenotypes using 12-hour window of EHR data
eTable 16. Clinical characteristics of phenotypes predicted in the GenIMS cohort study
eTable 17. Clinical characteristics of phenotypes predicted in the ACCESS trial
eTable 18. Clinical characteristics of phenotypes predicted in the PROWESS trial
eTable 19. Clinical characteristics of phenotypes predicted in the ProCESS trial
eTable 20. Biomarkers by phenotypes in the GenIMS cohort study
eTable 21. Biomarkers by phenotypes in the ACCESS randomized trial
eTable 22. Biomarkers by phenotypes in the PROWESS randomized trial
eTable 23. Biomarkers by phenotypes in the ProCESS randomized trial
eTable 24. Primary and secondary outcomes by study
eTable 25. Primary and secondary outcomes by phenotype
eTable 26. Baseline characteristics of ACCESS trial in simulation scenarios
eTable 27. Baseline characteristics of PROWESS trial in simulation scenarios
eTable 28. Baseline characteristics of ProCESS trial in simulation scenarios
eTable 29. Control group mortality rate in phenotype simulations in 3 RCTs
eTable 30. Site of infection by phenotype in the ACCESS trial
eTable 31. Clinical characteristics by APACHE3 quartile in ProCESS
eTable 32. Biomarkers by APACHE3 quartile in ProCESS
Customize your JAMA Network experience by selecting one or more topics from the list below.
Seymour CW, Kennedy JN, Wang S, et al. Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis. JAMA. 2019;321(20):2003–2017. doi:10.1001/jama.2019.5791
Are clinical sepsis phenotypes identifiable at hospital presentation correlated with the biomarkers of host response and clinical outcomes and relevant for understanding the heterogeneity of treatment effects?
In this retrospective analysis using data from 63 858 patients in 3 observational cohorts, 4 novel sepsis phenotypes (α, β, γ, and δ) with different demographics, laboratory values, and patterns of organ dysfunction were derived, validated, and shown to correlate with biomarkers and mortality. In the simulations using data from 3 randomized clinical trials involving 4737 patients, the outcomes related to the treatments were sensitive to changes in the distribution of these phenotypes.
Four novel clinical phenotypes of sepsis were identified that correlated with host-response patterns and clinical outcomes and may help inform the design and interpretation of clinical trials.
Sepsis is a heterogeneous syndrome. Identification of distinct clinical phenotypes may allow more precise therapy and improve care.
To derive sepsis phenotypes from clinical data, determine their reproducibility and correlation with host-response biomarkers and clinical outcomes, and assess the potential causal relationship with results from randomized clinical trials (RCTs).
Design, Settings, and Participants
Retrospective analysis of data sets using statistical, machine learning, and simulation tools. Phenotypes were derived among 20 189 total patients (16 552 unique patients) who met Sepsis-3 criteria within 6 hours of hospital presentation at 12 Pennsylvania hospitals (2010-2012) using consensus k means clustering applied to 29 variables. Reproducibility and correlation with biological parameters and clinical outcomes were assessed in a second database (2013-2014; n = 43 086 total patients and n = 31 160 unique patients), in a prospective cohort study of sepsis due to pneumonia (n = 583), and in 3 sepsis RCTs (n = 4737).
All clinical and laboratory variables in the electronic health record.
Main Outcomes and Measures
Derived phenotype (α, β, γ, and δ) frequency, host-response biomarkers, 28-day and 365-day mortality, and RCT simulation outputs.
The derivation cohort included 20 189 patients with sepsis (mean age, 64 [SD, 17] years; 10 022 [50%] male; mean maximum 24-hour Sequential Organ Failure Assessment [SOFA] score, 3.9 [SD, 2.4]). The validation cohort included 43 086 patients (mean age, 67 [SD, 17] years; 21 993 [51%] male; mean maximum 24-hour SOFA score, 3.6 [SD, 2.0]). Of the 4 derived phenotypes, the α phenotype was the most common (n = 6625; 33%) and included patients with the lowest administration of a vasopressor; in the β phenotype (n = 5512; 27%), patients were older and had more chronic illness and renal dysfunction; in the γ phenotype (n = 5385; 27%), patients had more inflammation and pulmonary dysfunction; and in the δ phenotype (n = 2667; 13%), patients had more liver dysfunction and septic shock. Phenotype distributions were similar in the validation cohort. There were consistent differences in biomarker patterns by phenotype. In the derivation cohort, cumulative 28-day mortality was 287 deaths of 5691 unique patients (5%) for the α phenotype; 561 of 4420 (13%) for the β phenotype; 1031 of 4318 (24%) for the γ phenotype; and 897 of 2223 (40%) for the δ phenotype. Across all cohorts and trials, 28-day and 365-day mortality were highest among the δ phenotype vs the other 3 phenotypes (P < .001). In simulation models, the proportion of RCTs reporting benefit, harm, or no effect changed considerably (eg, varying the phenotype frequencies within an RCT of early goal-directed therapy changed the results from >33% chance of benefit to >60% chance of harm).
Conclusions and Relevance
In this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes were identified that correlated with host-response patterns and clinical outcomes, and simulations suggested these phenotypes may help in understanding heterogeneity of treatment effects. Further research is needed to determine the utility of these phenotypes in clinical care and for informing trial design and interpretation.
Quiz Ref IDSepsis, defined as a dysregulated immune response to infection that leads to acute organ dysfunction, affects millions of individuals per year, and carries a high risk of death even when care is provided promptly.1,2 Although the understanding of the host immune response has advanced considerably, it has not translated into new therapies. A major barrier to progress is the overly broad definition of the syndrome, which encompasses a vast, multidimensional array of clinical and biological features. Different combinations of these features may naturally cluster into previously undescribed subsets or phenotypes that may have different risks for a poor outcome and may respond differently to treatments. However, efforts to determine such phenotypes have remained limited and have focused primarily on patients in the intensive care unit.3-5 In addition, these phenotypes must be identifiable at or soon after hospital presentation to guide treatment.
Quiz Ref IDThe objectives of this investigation, the National Institutes of Health–funded Sepsis Endotyping in Emergency Care (SENECA) project, were to develop and evaluate sepsis phenotypes. The first goal was to determine whether routine clinical information available at hospital presentation could be mathematically reduced to discrete, reproducible sepsis phenotypes. The second goal was to understand whether the different clinical phenotypes were associated both with patterns among biomarkers of the host immune response and with clinical outcomes. The third goal was to explore the heterogeneity of the treatment effects and the sensitivity of clinical trial results to the frequency distributions of these phenotypes. These mathematically derived phenotypes also were compared with traditional subgrouping strategies.
The project was approved by the University of Pittsburgh institutional review board and conducted under several data use agreements (PRO15110441, PRO19030218, PRO20061050, PRO010744, PRO12110516, PRO12020657, and PRO17120315). The data for the SENECA project were obtained under a waiver of informed consent and with authorization under the Health Insurance Portability and Accountability Act. Written informed consent was obtained for clinical trial data per published trial procedures.6-8
The study approach involved several data sets and statistical approaches. For the first goal (determining phenotypes), we derived the clinical phenotypes using unsupervised clustering methods that were applied to the data available at hospital presentation in a large database of hospital encounters. We then assessed phenotype reproducibility both by comparing phenotype derivation using alternative clustering methods in the initial data set and by exploring phenotype frequency distributions in several other cohort and clinical trial data sets (eFigure 1 in the Supplement). For the second goal (understanding the correlation of clinical phenotypes and biological markers of the host response with clinical outcome), we first examined correlations in several data sets between the clinical phenotypes and the concurrent patterns of biomarkers, reflecting different elements of the sepsis host response. We then assessed the association of phenotypes with mortality and other clinical outcomes. For the third goal (assessing the influence of phenotypes on clinical trial results), we explored traditional analyses of heterogeneity for treatment effects on observed clinical trial data and performed simulations on 3 trial data sets to understand the potential consequences of different phenotype frequency distributions on estimation of the treatment effects.
We used data from 3 observational cohorts and 3 randomized clinical trials (RCTs)6-9 (Table 1). The first 2 cohorts (the SENECA derivation and validation cohorts) were drawn from electronic health record data on encounters at 12 community and academic hospitals within the UPMC health care system. We identified all adults (aged ≥18 years) who met sepsis criteria within the first 6 hours of presentation to the emergency department at the 12 hospitals during 2010 to 2012 for the derivation cohort and during 2013 to 2014 for the validation cohort.
The third cohort was the Genetic and Inflammatory Markers of Sepsis (GenIMS) study. The GenIMS study was a multicenter, prospective cohort of patients with severe community-acquired pneumonia recruited from 4 regions in the United States (western Pennsylvania, Connecticut, Tennessee, and Michigan) within 1 hour of emergency department presentation, and for whom we had rich clinical information and a variety of biomarkers for the host immune response. The GenIMS study enrolled patients hospitalized at 28 sites from 2001 to 2003.9,10
All 3 RCTs were multicenter studies that involved patients with sepsis or septic shock and had rich clinical and biomarker data. The first trial called ACCESS (A Controlled Comparison of Eritoran in Severe Sepsis) compared eritoran (a highly specific myeloid differentiation protein 2 antagonist that inhibits toll-like receptor 4) vs placebo in patients with severe sepsis at 197 sites on 6 continents from 2006 to 2010 and reported no benefit for 28-day mortality.6 The second trial called PROWESS (Activated Protein C Worldwide Evaluation in Severe Sepsis) compared activated protein C (a commonly activated pleiotropic acute phase protein) vs placebo in patients with severe sepsis at 164 sites in 11 countries from 1998 to 2000 and reported improved survival, but increased bleeding adverse effects for 28-day mortality.8 The third trial called ProCESS (Protocol-Based Care for Early Septic Shock) compared early goal-directed therapy (a multicomponent resuscitation strategy) vs alternative resuscitation approaches in patients with septic shock at 31 sites in the United States from 2008 to 2013 and reported no benefit for 60-day inpatient mortality.7 The RCTs represent a range of RCT types from different clinical settings, testing different types of interventions, and reporting benefit, harm, or no effect (neutral).
To identify patients with sepsis in the SENECA derivation cohort, the electronic health record was used to determine if a patient met the following Sepsis-3 criteria2 within the first 6 hours of hospital presentation: (1) evidence of a suspected infection and (2) presence of organ dysfunction. Evidence of a suspected infection was defined as the combination of administration of antibiotics (oral or parenteral) and a body fluid culture specimen obtained (blood, urine, or cerebrospinal fluid), the first of which was required within the first 6 hours of hospital presentation. The presence of organ dysfunction was defined as 2 or more Sequential Organ Failure Assessment (SOFA) points11 within the first 6 hours of hospital presentation. In the GenIMS cohort, the Sepsis-2 definition10 was used because it was available at the time. All patients in the 3 RCTs met variations of the Sepsis-2 criteria, and were therefore eligible for the current study (eMethods in the Supplement).
We selected 29 candidate variables based on their association with sepsis onset or outcome, their incorporation in conceptual models of sepsis pathophysiology and host tolerance, and their availability in the electronic health record at hospital presentation.12-14 These included demographic variables (eg, age, sex, Elixhauser comorbidities), vital signs (eg, heart rate, respiratory rate, Glasgow Coma Scale score, systolic blood pressure, temperature, and oxygen saturation), markers of inflammation (eg, white blood cell count, premature neutrophil count [also called bands], erythrocyte sedimentation rate, and C-reactive protein), markers of organ dysfunction or injury (eg, alanine aminotransferase, aspartate aminotransferase, total bilirubin, blood urea nitrogen, creatinine, international normalized ratio, partial pressure of oxygen, platelets, and troponin), and serum levels of glucose, sodium, hemoglobin, chloride, bicarbonate, lactate, and albumin (eTable 1 in the Supplement). For each variable, we extracted the most abnormal value recorded within the first 6 hours of hospital presentation. In the SENECA derivation and validation cohorts, patient-reported race was derived from the UPMC registration system data using fixed categories consistent with the Centers for Medicare & Medicaid Services electronic health record meaningful use data set.
We studied 27 serum biomarkers measured at baseline in GenIMS, ACCESS, PROWESS, and ProCESS. All of the biomarkers are considered reflective of the host response for sepsis and are included broadly under the domains of inflammatory, endothelial, coagulation, and vital organ function (eTable 2 in the Supplement). The primary clinical outcome was 28-day mortality in the SENECA project derivation and validation cohorts and in the GenIMS, ACCESS, and PROWESS trials. The primary clinical outcome was hospital mortality truncated at 60 days in the ProCESS trial. One-year mortality was studied in the ACCESS, PROWESS, and ProCESS trials. Other outcomes included for exploratory analyses included intensive care unit admission during hospitalization, total days of administration of a vasopressor, and total days of mechanical ventilation during hospitalization.
To derive the phenotypes, we first assessed the candidate variable distributions, missingness, and correlation (eTable 3 in the Supplement). Multiple imputation with chained equations was used to account for missing data (eTable 4 and eMethods in the Supplement)15 and log transformation was used for nonnormal data. After evaluating correlation, we excluded highly correlated variables using rank-order statistics in the sensitivity analyses (eFigure 2 in the Supplement). Ordering points to identify the clustering structure (OPTICS) plots were used to determine the optimal clustering strategy.16 Based on these plots, we applied consensus k means clustering to 29 variables using a partitioning approach.17 To determine the optimal number of phenotypes with consensus k means clustering, we evaluated a combination of phenotype size, clear separation of the consensus matrix heatmaps, characteristics of the consensus cumulative distribution function plots, and adequate pairwise–consensus values between cluster members (>0.8). Once optimal phenotype number was determined, patterns of clinical variables were visualized in 3 ways: (1) t-distributed stochastic neighbor embedding plots (which show multidimensional data in 2 dimensions), (2) alluvial plots (which show the proportional distribution of phenotype members across specific variables), (3) chord diagrams (which show how phenotypes differ by major variable groups; eMethods in the Supplement), and (4) ranked plots of variables by the mean standardized difference between the phenotype pairs.18
To assess the reproducibility of the phenotypes, we first used a latent class analysis to derive the groups (eMethods in the Supplement).19 In the latent class analysis, the optimal phenotype number was confirmed using a combination of Bayesian information criteria, adequate size, high median probabilities of group membership within each phenotype, maximum entropy (a measure between 0 and 1 indicating better classification), and clinical features of potential groups. We also determined the proportion of patients with a probability of phenotype assignment on the margin, which was defined as between 45% and 55%. We assessed how robust the phenotypes were to sensitivity analyses of the derivation method, including (1) excluding variables with high missingness (eg, erythrocyte sedimentation rate, C-reactive protein, premature neutrophil count [bands]); (2) excluding both highly missing and highly correlated variables (sodium, hemoglobin, blood urea nitrogen, and alanine aminotransferase); and (3) using a 12-hour window of electronic health record data after hospital presentation (eMethods in the Supplement).
To determine the reproducibility in the external data, we used the SENECA validation cohort and rederived groups using consensus k means clustering. Then, in the GenIMS study and in the 3 RCTs, we predicted phenotype based on the clinical characteristics of typical cluster members in the SENECA derivation cohort. Predictions arose from the Euclidean distance from each patient to the centroid of each SENECA phenotype (eMethods in the Supplement). We studied the frequency and clinical characteristics of the predicted phenotype groups in the GenIMS study and in the 3 RCTs.
We determined the correlation of the phenotypes with 27 biomarkers of the host immune response and compared the mean (SD), the median (interquartile range [IQR]), and the ratio of biomarker distributions across phenotypes as appropriate. The χ2 test was used to compare in-hospital, 28-day, and 365-day mortality. The cumulative mortality was illustrated using probability plots and the differences were tested using the log-rank test.
To understand the implications of the phenotypes on the RCT estimates of the treatment effects, we conducted Monte Carlo simulations (10 000 iterations per simulation) in which the only variable modified was the proportion of phenotypes enrolled in the existing trial data set using random sampling with replacement. Six scenarios were created for each of the 3 trials (eMethods in the Supplement), in which the range of phenotypes was varied. The frequency for the range of phenotypes was informed in simulated trials using upper and lower bounds up to twice that observed across the hospitals in the SENECA derivation and validation cohorts. We also tested logistic regression models for 28-day and 365-day mortality using phenotype, treatment assignment, and their interaction as covariates (eMethods in the Supplement).
Several analyses were conducted to ensure the phenotypes were not simply recapitulations of more traditional clinical groups. First, we tested whether the phenotypes were explained by traditional measures of illness severity, such as the SOFA score or the Acute Physiology and Chronic Health Evaluation (APACHE) score. For the SENECA derivation cohort, alluvial plots were used to inspect whether the phenotypes overlapped with the SOFA score.11 We also determined the overlap of the phenotypes with the quartiles of APACHE and SOFA scores in the 3 RCTs.20 We further inspected the biomarker profiles and mortality by APACHE quartile in the ProCESS trial. Simulations in the ProCESS trial were also repeated, varying the proportions of the 4 severity-of-illness quartiles instead of the phenotypes, and comparing the potential causal relationship with the estimates of treatment benefit or harm (eMethods in the Supplement). Second, we explored whether the phenotypes were explained by the site of the infection. In the ACCESS trial, which includes independent adjudication of the source of infection, we generated alluvial plots and the proportions for infection sites across phenotypes. We measured the frequency of the phenotypes in a subset of patients with sepsis from a single source (bacteremia) among patients in the SENECA derivation cohort.
Data are presented as mean (SD) or median (IQR). For comparisons, we used analysis of variance and the Kruskal-Wallis test for continuous data and the χ2 test for categorical data. The threshold for statistical significance was less than .05 for 2-sided tests. There was no adjustment for the type I error rate due to multiple comparisons; therefore, the findings from these analyses should be considered exploratory. Analyses were performed with Stata version 14.2 (StataCorp) and R versions 3.4.1 and 3.5.0 (R Foundation for Statistical Computing).
Among 1 309 025 patient encounters in the SENECA derivation cohort (eFigure 3 in the Supplement), 87 844 patients (6.7%) had suspected infection within 6 hours of hospital presentation and 20 189 met Sepsis-3 criteria (eTable 5 in the Supplement). The mean SOFA score was 3.9 (SD, 2.4) and the mean serum lactate level was 3.2 mmol/L (SD, 3.2 mmol/L). Among 1 119 388 encounters in the SENECA validation cohort, more patients had suspected infection (n = 103 259; 9.2%) and met Sepsis-3 criteria within 6 hours (n = 43 086); however, the demographic characteristics and SOFA scores were similar (eTables 5 and 6 in the Supplement). The SENECA validation cohort included 43 086 patients (mean age, 67 [SD, 17] years; 21 993 [51%] male; mean maximum 24-hour SOFA score, 3.6 [SD, 2.0]). Patients in the GenIMS cohort (total of 2320 enrolled, including 583 patients with sepsis) had more comorbidities and respiratory symptoms (eg, elevated respiratory rate, lower oxygen saturation). Across the SENECA derivation and validation cohorts and the GenIMS cohort, the in-hospital mortality ranged from 6% to 14% and from 16% to 23% among patients who required intensive care. In the 3 RCTs (eTable 7 in the Supplement), a total of 4737 patients (1706 in the trial on eritoran,6 1690 in the trial on activated protein C,8 and 1341 in the trial on early goal-directed therapy7) participated at 392 sites and short-term mortality (ie, at 28 days in 2 trials and at 60 days in 1 trial) ranged from 19% to 28%.
In the SENECA derivation cohort, the consensus k means clustering models found that a 4-class model was the optimal fit with the 4 phenotypes of α, β, γ, and δ (eFigures 4 and 5 and eTable 8 in the Supplement). Consensus matrix plots and the relative change under the cumulative distribution function curve implied little statistical gain by increasing to a 5- or 6-class model. The size and characteristics of the phenotypes in the 4-class model appear in Table 2 and Figure 1. Phenotypes ranged in size (from 13% to 33% of the cohort) and differed broadly in clinical characteristics and organ dysfunction patterns. When ranking continuous variables by the standardized mean difference between phenotypes (Figure 2), patients with the α phenotype had fewer abnormal laboratory values and less organ dysfunction; those with the β phenotype were older, had greater chronic illness, and were more likely to present with renal dysfunction; those with the γ phenotype were more likely to have elevated measures of inflammation (eg, white blood cell count, premature neutrophil count [bands], erythrocyte sedimentation rate, or C-reactive protein), lower albumin level, and higher temperature; and those with the δ phenotype had elevated serum lactate levels, elevated levels of transaminases, and hypotension (eFigure 6-8 in the Supplement).
Variables such as sex, sodium level, glucose level, and white blood cell count contributed least to phenotype differences (eFigure 9 in the Supplement). Phenotypes also varied across the 12 SENECA hospitals as follows: α phenotype ranged from 24% to 42%; β phenotype ranged from 19% to 30%; γ phenotype ranged from 23% to 50%; and δ phenotype ranged from 5% to 23% (eFigure 10 in the Supplement). There was no difference across phenotypes in the rate of peripheral blood culture as the first body fluid culture after hospital presentation, whereas the rate of intravenous antibiotics (vs other routes of administration) ranged from 76% to 93% (eTable 9 and eFigure 11 in the Supplement).
Latent class analysis confirmed the statistical fit of the 4-class model (Figure 2 and eFigure 12 and eTable 10 in the Supplement). Bayesian information criteria decreased as class number increased from 2 to 4 while entropy was preserved (>0.8). The clinical characteristics of the phenotypes were similar when derived using this method as well as by visualization with t-distributed stochastic neighbor embedding plots (eFigure 13 and eTable 11 in the Supplement). There was strong separation in the likelihood of membership for patients assigned to a given phenotype compared with those assigned to other phenotypes (eFigure 14 in the Supplement).
Phenotypes also were derived in the SENECA validation cohort and showed similar optimal phenotype numbers, frequency of phenotypes, and clinical characteristics as observed in the primary analysis (Figure 2; eFigures 15 and 16 and eTable 12 in the Supplement). No substantial changes were evident after excluding variables with high missingness (eTable 13 in the Supplement), after excluding variables with both high missingness and correlation (eTable 14 in the Supplement), and when the window for capturing data was expanded to 12 hours after hospital presentation (eTable 15 in the Supplement).
In the GenIMS cohort in which patients had sepsis due only to pneumonia (eMethods and eFigure 17 in the Supplement), all 4 phenotypes were present, albeit with slightly different frequencies compared with the SENECA derivation cohort. The clinical characteristics of the phenotypes were largely the same (eTable 16 in the Supplement). When the phenotypes were predicted in the 3 RCTs (eFigure 18-21 in the Supplement), the frequency distributions and clinical characteristics were also similar to the SENECA derivation cohort (eTable 17 for ACCESS, eTable 18 for PROWESS, and eTable 19 for ProCESS in the Supplement).
Broad differences were observed in the distributions of the host-response biomarkers across phenotypes (Figure 3). Of the 27 biomarkers measured in 4 studies, 23 were significantly different across phenotypes in at least 1 study (P < .05). In general, there was an increase in the markers of inflammation and in abnormal coagulation in both the γ and δ phenotypes compared with the α or β phenotypes (Figure 4). For example, in the GenIMS study, the ratio of the δ phenotype to the α phenotype for the median level of IL-6 was 5.0 (IQR, 1.6-13.2); in the ACCESS trial, it was 7.7 (IQR, 1.4-16.6); in the PROWESS trial, it was 3.0 (IQR, 0.7-24.6); and in the ProCESS trial, it was 8.3 (IQR, 1.4-67.7). Similar findings comparing the δ phenotype vs the α phenotype were present for IL-10 level (ranges of ratios across the studies for median level of IL-10, 1.3-6.2), but were less prominent for tumor necrosis factor (range of ratios across the studies for tumor necrosis factor, 1.0-4.6; Figure 3).
Coagulation markers such as thrombin-antithrombin complex, plasminogen activator inhibitor 1, and D-dimer were significantly greater in the δ phenotype compared with the other phenotypes (P < .001; Figure 4 and eTables 20-23 in the Supplement). The levels of some markers of endothelial dysfunction (eg, intercellular adhesion molecule 1, E-selectin) were highest in the γ phenotype (P < .01), other markers were highest in the δ phenotype (eg, vascular cell adhesion molecule 1), and other markers were not different across groups (eg, P-selectin, P = .37). Markers of renal injury (eg, insulin-like growth factor–binding protein 7, collagen type 4, tissue inhibitor of metalloproteinase 2) were highest in both the β and δ phenotypes (P < .01).
Phenotypes were associated with short- and long-term outcomes (eTables 24 and 25 in the Supplement). In the SENECA derivation cohort, the fewest in-hospital deaths occurred in the α phenotype (n = 126; 2%) compared with the β phenotype (n = 286; 5%), the γ phenotype (n = 818; 15%), and the δ phenotype (n = 852; 32%) (P < .001). Across all cohorts and trials, the 28-day mortality (Figure 5) and the 365-day mortality (eFigure 22 in the Supplement) were highest in the δ phenotype compared with the other phenotypes (P < .001). In the SENECA derivation cohort (n = 16 552 unique patients), cumulative 28-day mortality was 287 of 5691 (5%) for the α phenotype, 561 of 4420 (13%) for the β phenotype, 1031 of 4318 (24%) for the γ phenotype, and 897 of 2223 (40%) for the δ phenotype. In the SENECA validation cohort (n = 31 160 unique patients), cumulative 28-day mortality was 837 (9%) for the α phenotype, 923 (11%) for the β phenotype, 854 (9%) for the γ phenotype, and 1278 (29%) for the δ phenotype. Intensive care unit admission rates were higher in the δ phenotype compared with the other phenotypes (P < .01), whereas days of mechanical ventilation and administration of a vasopressor were variable across studies.
The estimated treatment effects by phenotype were variable in the observed data in the ACCESS, PROWESS, and ProCESS trials (eFigures 23-28 in the Supplement). Standard treatment × phenotype interactions were only significant in the ProCESS trial, but not for the other 2 trials based on the P < .05 criteria. The primary findings of the trial simulations appear in Figure 6 (more detailed examples appear in eFigures 29-31 in the Supplement). In general, the trials had similar baseline characteristics between simulation scenarios and original trial populations. For example, a doubling of the δ phenotype did not change the demographics and increased the mean baseline SOFA score from 7.2 (SD, 3.6) points to only 8.6 (SD, 3.6) points in the ProCESS trial (eTables 26-28 in the Supplement). The mortality rates for the control group were also stable across the simulations and were within the typically reported ranges (eTable 29 and eFigure 32 in the Supplement). For example, a doubling of the highly morbid δ phenotype was only associated with an increase in the mortality rate for usual care from 26% to 31% in the ACCESS trial, from 31% to 39% in the PROWESS trial, and from 19% to 26% in the ProCESS trial.
The trial conclusions about the treatment effects were relatively robust to large changes in the proportion of patients with the β and γ phenotypes. Despite modest changes to the baseline characteristics in the trial populations, the changes to the distributions for the α and δ phenotypes had substantial effects (Figure 6). For example, in the ProCESS trial, which under the baseline phenotype distribution had a 0% chance of finding benefit with early goal-directed therapy for 60-day inpatient mortality (and an 85% and 15% chance of finding no difference or harm, respectively), the chance of finding benefit increased to 35% when the α phenotype represented the majority of the population (eFigure 29 in the Supplement).
In contrast, when the δ phenotype was increased to 50% of the ProCESS trial population, there was a greater than 60% chance of finding that early goal-directed therapy was harmful. In the ACCESS trial (eFigure 30 in the Supplement), which under the baseline phenotype distribution had a 0% chance of finding benefit, a 91% chance of finding no difference, and a 9% chance of finding harm for 28-day mortality, an increase in the δ phenotype from 14% to 44% of the trial population resulted in 29% of simulated trials concluding eritoran caused harm. In the PROWESS trial, which had an 82% chance of finding a positive effect with the baseline phenotype distribution, 50% of the simulated trials showed no difference when the frequency of the α phenotype was increased to represent the majority of the trial population (eFigure 31 in the Supplement).
The 4 phenotypes could not be described by severity of illness or site of infection alone. In the SENECA derivation cohort, all 4 phenotypes included both patients with and without organ dysfunction in all SOFA categories (Figure 1). The mean SOFA scores at hospital presentation were lower in patients with the α phenotype (3.0 [SD, 1.4]) and higher in patients with the δ phenotype (6.6 [SD, 3.7]), but overlapped in patients with the β phenotype (3.5 [SD, 1.7]) and in those with the γ phenotype (4.0 [SD, 2.3]) (Table 2 and eFigures 33 and 34 in the Supplement). In the ACCESS trial, although the δ phenotype had a greater proportion of patients with intraabdominal infections, there was a broad distribution for site of infection in each phenotype (eFigure 35 and eTable 30 in the Supplement). There was a similarly broad distribution for phenotypes among patients with sepsis due to bacteremia alone (n = 1714; eFigure 36 in the Supplement).
In the analyses to further explore whether the derived phenotypes were proxies for severity of illness, the pattern of baseline clinical variables and host-response biomarkers differed across the APACHE quartiles from the pattern for the 4 phenotypes (eTables 31 and 32 and eFigures 37 and 38 in the Supplement). The range of short-term mortality rates across the APACHE III quartiles was similar to the range across the 4 phenotypes (eFigure 38 in the Supplement). However, enrichment of the ProCESS trial using APACHE III quartiles was associated with smaller changes in the trial conclusions compared with phenotype enrichment (eFigure 39 in the Supplement).
Quiz Ref IDIn this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes of sepsis were derived using routinely available clinical data at the time of hospital presentation. The phenotypes were multidimensional, differed in their demographics, laboratory abnormalities, patterns of organ dysfunction, and were not homologous with traditional patient groupings such as by site of infection, organ dysfunction patterns, or severity of illness. The frequency and characteristics of the phenotypes were reproducible in additional cohorts and using different machine learning methods. The 4 sepsis phenotypes were strongly correlated with patterns of the host immune response, mortality, and other clinical outcomes. In simulations of 3 large, multicenter trials, conclusions about the estimated treatment benefit or harm were sensitive to phenotype distributions, especially the α and δ phenotypes.
These sepsis phenotypes can be identified at the time of patient presentation to the emergency department, and thus could be useful with regard to early treatment and enrollment in clinical trials. Only routinely available data were used in the clustering models, and the phenotypes were derived from a large observational cohort to ensure generalizability. Phenotype frequency distributions and characteristics were similar in studies with different definitions for sepsis. For example, the SENECA derivation and validation cohorts used electronic health record criteria for Sepsis-3,2 the GenIMS study used Sepsis-2,10 the ProCESS trial enrolled patients with early septic shock and used broad sepsis criteria, and both the ACCESS and PROWESS trials enrolled patients later in their clinical course and patients with more organ failure.
Of the 4 phenotypes identified, the δ phenotype was most strongly correlated with abnormal values of host-response biomarkers as well as clinical features of cardiovascular and liver dysfunction. These characteristics are similar to previously reported subclasses, including the hyperinflammatory subphenotype reported in acute respiratory distress syndrome, a condition most commonly caused by sepsis.18 The δ phenotype also resembles sepsis endotypes derived using transcriptomic analyses of circulating immune cells (such as the inflammopathic cluster, sepsis response signature 1, or the Molecular Diagnosis and Risk Stratification of Sepsis [MARS] 2 cluster) described in patients with sepsis in the intensive care unit.3-5 In contrast, the α phenotype had fewer laboratory abnormalities and less septic shock, which resembles the MARS 3 and sepsis response signature 2 endotypes reported in the same series, and which were found to have predominant expression of adaptive immune and B-cell development pathways.3-5 This concordance between clinical phenotypes and more computationally intensive transcriptomic endotypes could help identify subsets of patients most likely to benefit from particular immunomodulation strategies.
In RCT simulations, variations in the phenotypes had small changes in the distribution of average baseline characteristics, yet resulted in unstable trial conclusions. For example, the ACCESS trial found no benefit from eritoran on 28-day mortality. Yet, when the δ phenotype (the phenotype with the greatest proportion of intraabdominal infections) was increased to nearly half of the trial population, more than one-third of the simulated trials suggested harm from eritoran. This finding is consistent with animal models that suggest toll-like receptor 4 signaling aids bacterial clearance from the peritoneum in patients with intraabdominal sepsis.21 The high proportion with activated protein C and no benefit when the proportion of patients with the α phenotype was increased in the simulated PROWESS trial raises the possibility that such patients were also more common in the subsequent negative trials of activated protein C.22,23
Quiz Ref IDThe largest changes were seen in the ProCESS trial, which found no benefit from early goal-directed therapy compared with usual care. In simulations, when the δ phenotype was increased, early goal-directed therapy was harmful in more than half of the trials. This finding supports data from 2 RCTs conducted in low- to middle-income countries that found harm from early goal-directed therapy in select populations.24,25 Increases in the α phenotype suggested benefit from early goal-directed therapy, similar to the initial report by Rivers et al.26 These data highlight the importance of characterizing the heterogeneity of sepsis when comparing across trials with different conclusions.
Quiz Ref IDThese findings have additional implications. First, completed trials may have unrecognized heterogeneity in the treatment effects by clinical phenotype that were not apparent when analyzing (1) the entire cohort, (2) subgroups based on individual variables, or (3) stratification based on risk of death.27 However, a secondary analysis of treatment × phenotype interactions may be limited by small sample sizes. Second, these proof-of-concept clinical phenotypes could be incorporated prospectively in future study designs that test new biologically active therapeutics. Novel designs could enrich for a priori phenotypes as well as confirm the boundaries around predictive phenotypes during the trial.28
This study has several limitations. First, only routinely available clinical data in the electronic health record were used to identify phenotypes, and the inclusion of other data such as clinicians’ impression, protein biomarkers, immune cell gene expression, or pathogen variables during derivation could change phenotype assignments. However, there appears to be some similarity between the clinical phenotypes derived in this study and those described in other series using such data.
Second, the statistical approach involved a variety of supervised decisions such as (1) the time window for capturing data at hospital presentation was 6 hours, (2) the selection of candidate variables, and (3) the handling of variable distributions. Changes to the initial assumptions, the time window for data capture, or the choice of optimal cluster number could alter the results. The findings were consistent when the electronic health record window of 12 hours was used.
Third, because missing data were common for some variables included in the clustering models, multiple imputation was used in the primary analysis. However, variables with high missingness were excluded from the sensitivity analyses and similar results were still found.
Fourth, differences in short- and long-term prognosis were present across phenotypes, perhaps due to different features of the validation cohorts, such as the definition of sepsis, demographics, or burden of organ dysfunction.
Fifth, characteristics of clinical phenotypes were derived initially from a single integrated health system in the United States at a single moment in clinical care. Although phenotypes were found to be generalizable in the other data sets examined, further exploration is necessary, especially using data from low- and middle-income countries, more recent clinical trials, and longitudinal cohorts.
In this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes were identified that correlated with host-response patterns and clinical outcomes, and simulations suggested these phenotypes may help in understanding heterogeneity of treatment effects. Further research is needed to determine the utility of these phenotypes in clinical care and for informing trial design and interpretation.
Corresponding Author: Christopher W. Seymour, MD, MSc, University of Pittsburgh School of Medicine, Keystone Bldg, 3520 Fifth Ave, Ste 100, Pittsburgh, PA 15261 (firstname.lastname@example.org).
Accepted for Publication: April 24, 2019.
Published Online: May 19, 2019. doi:10.1001/jama.2019.5791
Author Contributions: Dr Seymour had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Seymour, Kennedy, Chang, Clermont, Cooper, Gomez, Opal, van der Poll, Vodovotz, Yealy, Angus.
Acquisition, analysis, or interpretation of data: Seymour, Kennedy, Wang, Chang, Elliott, Xu, Berry, Huang, Kellum, Mi, Talisa, Visweswaran, Vodovotz, Weiss, Yende.
Drafting of the manuscript: Seymour, Kennedy, Wang, Berry, van der Poll, Vodovotz, Yealy, Angus.
Critical revision of the manuscript for important intellectual content: Seymour, Kennedy, Chang, Elliott, Xu, Berry, Clermont, Cooper, Gomez, Huang, Kellum, Mi, Opal, Talisa, Visweswaran, Vodovotz, Weiss, Yende, Angus.
Statistical analysis: Seymour, Kennedy, Wang, Chang, Elliott, Xu, Berry, Mi, Talisa, Weiss, Angus.
Obtained funding: Seymour.
Administrative, technical, or material support: Seymour, Kennedy, van der Poll, Weiss, Yealy, Angus.
Supervision: Seymour, Opal, Vodovotz, Yealy, Angus.
Conflict of Interest Disclosures: Dr Seymour reported receiving personal fees from Edwards Inc and Beckman Coulter Inc. Dr Gomez reported receiving grants from TES Pharma. Dr Huang reported receiving nonfinancial support (procalcitonin assays) from Biomerieux and grants from Thermofisher for microbiome research. Dr Vodovotz reported being the cofounder and a stakeholder in Immunetrics Inc and having a provisional patent application pending. Dr Yende reported receiving personal fees from Atox Bio and grants from Bristol-Myers Squibb. Dr Angus reported receiving personal fees from and serving as a consultant to Ferring Pharmaceuticals, Bristol-Myers Squibb, Bayer AG, and Beckman Coulter Inc; owning stock in Alung Technologies; and having patent applications pending for selepressin (compounds, compositions, and methods for treating sepsis) and proteomic biomarkers of sepsis in elderly patients. No other disclosures were reported.
Funding/Support: Drs Seymour, Gomez, Huang, Kellum, Visweswaran, Vodovotz, and Angus were supported in part by grants R35GM119519, P50GM076659, R34GM102696, R01GM101197, GM107231, R01LM012095, K08GM117310-01A1, and GM61992 from the National Institutes of Health. The GenIMS Study was funded by grant R01 GM61992 from the National Institute of General Medical Sciences with additional support from GlaxoSmithKline for enrollment and clinical data collection.
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: Dr Angus is Associate Editor of JAMA, but he was not involved in any of the decisions regarding review of the manuscript or its acceptance.
Meeting Presentation: Presented in part at the international conference of the American Thoracic Society; May 19, 2019; Dallas, Texas.
Additional Contributions: We acknowledge the significant contribution of the patients, families, researchers, clinical staff, and sponsors for the cohort and randomized trial data included in this study. We acknowledge the Biostatistics and Data Management Core at the Clinical Research, Investigation, and Systems Modeling of Acute Illness Center in the Department of Critical Care Medicine at the University of Pittsburgh for preparing the SENECA, GenIMS, ACCESS, ProCESS, and PROWESS trial datasets. We acknowledge Eisai Medical Research Inc for providing the ACCESS trial dataset, and Eli Lilly Inc for providing the PROWESS trial dataset. We acknowledge Gordon Bernard, MD (Vanderbilt University, Nashville, Tennessee) and Anthony C. Gordon, MD (Imperial College, London, England) for their detailed review of the manuscript.