The vertical lines indicate a risk threshold at the cohort event rate. The horizontal lines indicate the net benefit with each model at a respective risk threshold. For example, the machine learning (ML) model detected an additional 6 and 9 mortality events per 1000 patients in Black and non-Black patient cohorts, respectively, compared with the Get With The Guidelines–Heart Failure (GWTG) risk score. LR indicates logistic regression.
BNP indicates B-type natriuretic peptide; BUN, blood urea nitrogen; COPD, chronic obstructive pulmonary disease; ECG, electrocardiogram; FPG, fasting plasma glucose; HS, high school; NT-proBNP, N-terminal pro-B-type natriuretic peptide; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol.
SI conversion factors: To convert BNP to nanograms per liter, multiply by 1; cholesterol to millimoles per liter, multiply by 0.0259; creatinine to micromoles per liter, multiply by 88.4; glucose to millimoles per liter, multiply by 0.0555; hemoglobin to grams per liter, multiply by 10; potassium to millimoles per liter, multiply by 1; sodium to millimoles per liter, multiply by 1.
aIndicates social determinants of health parameters.
bBody mass index was calculated as weight in kilograms divided by height in meters squared.
eTable 1. Candidate covariates and their respective domain
eTable 2. ZIP-code level social determinants of health parameters that were considered for predicting in-hospital mortality following HF hospitalization
eTable 3. Baseline characteristics of participants in the internal and external validation cohorts by race
eTable 4. Discrimination and calibration performance of the race-specific models for predicting in-hospital mortality among patients with heart failure in the internal GWTG testing cohort with complete data available and with up to 50% missingness in the covariate data
eTable 5. Discrimination and calibration performance of the models for predicting in-hospital mortality among patients with heart failure in the internal GWTG validation cohort across age, sex, and socioeconomic status-based subgroups
eTable 6. Discrimination and calibration performance of the non-Black race-specific and race-agnostic models for predicting in-hospital mortality among patients with heart failure with different self-identified race/ethnicities
eTable 7. Reclassification metrics in the ARIC external validation between the ML model and the original GWTG risk score
eTable 8. Discrimination and calibration performance of the models for predicting in-hospital mortality among patients with heart failure in the internal GWTG validation cohort across disproportional share hospital-based subgroups
eTable 9. Comparison of models to predict risk of in-hospital mortality among patients with hospitalization for heart failure
eFigure 1. CONSORT diagram
eFigure 2. Variable importance of Black and non-Black patients determined by the VIMP metric of a race-specific random forest model with 20 bootstrap replicates
eFigure 3. Area under the receiver operating characteristics and precision-recall curve for increasing number of variables in a random forest model to predict in-hospital mortality in the overall cohort
eFigure 4. Observed vs. predicted probability of in-hospital mortality for the race-specific ML models
eFigure 5. Observed vs. predicted probability of in-hospital mortality for the race-agnostic ML models
eFigure 6. Observed vs. predicted probability of in-hospital mortality for the GWTG-HF risk score
eFigure 7. Among Black participants in the ARIC external validation cohort, percentage of participants with a predicted risk above the specific risk thresholds between the original GWTG risk score and the race-specific ML model
eFigure 8. Observed vs. predicted probability of in-hospital mortality for the race-specific ML + social determinants of health models
Customize your JAMA Network experience by selecting one or more topics from the list below.
Segar MW, Hall JL, Jhund PS, et al. Machine Learning–Based Models Incorporating Social Determinants of Health vs Traditional Models for Predicting In-Hospital Mortality in Patients With Heart Failure. JAMA Cardiol. 2022;7(8):844–854. doi:10.1001/jamacardio.2022.1900
Do machine learning (ML)–based models that incorporate social determinants of health (SDOH) improve the prediction of in-hospital mortality among patients with heart failure (HF)?
In this cohort study, ML models developed in the Get With The Guidelines–Heart Failure (GWTG-HF) registry using race-specific and race-agnostic approaches were associated with an improvement in the prediction of in-hospital mortality after hospitalization for HF compared with the existing and rederived logistic regression models. The addition of SDOH was associated with an improvement in the performance and prognostic utility of the ML models in Black patients but not in non-Black patients.
The findings indicate that ML models incorporating SDOH may improve risk prediction of in-hospital mortality after hospitalization for HF, particularly in Black adults.
Traditional models for predicting in-hospital mortality for patients with heart failure (HF) have used logistic regression and do not account for social determinants of health (SDOH).
To develop and validate novel machine learning (ML) models for HF mortality that incorporate SDOH.
Design, Setting, and Participants
This retrospective study used the data from the Get With The Guidelines–Heart Failure (GWTG-HF) registry to identify HF hospitalizations between January 1, 2010, and December 31, 2020. The study included patients with acute decompensated HF who were hospitalized at the GWTG-HF participating centers during the study period. Data analysis was performed January 6, 2021, to April 26, 2022. External validation was performed in the hospitalization cohort from the Atherosclerosis Risk in Communities (ARIC) study between 2005 and 2014.
Main Outcomes and Measures
Random forest-based ML approaches were used to develop race-specific and race-agnostic models for predicting in-hospital mortality. Performance was assessed using C index (discrimination), regression slopes for observed vs predicted mortality rates (calibration), and decision curves for prognostic utility.
The training data set included 123 634 hospitalized patients with HF who were enrolled in the GWTG-HF registry (mean [SD] age, 71  years; 58 356 [47.2%] female individuals; 65 278 [52.8%] male individuals. Patients were analyzed in 2 categories: Black (23 453 [19.0%]) and non-Black (2121 [2.1%] Asian; 91 154 [91.0%] White, and 6906 [6.9%] other race and ethnicity). The ML models demonstrated excellent performance in the internal testing subset (n = 82 420) (C statistic, 0.81 for Black patients and 0.82 for non-Black patients) and in the real-world–like cohort with less than 50% missingness on covariates (n = 553 506; C statistic, 0.74 for Black patients and 0.75 for non-Black patients). In the external validation cohort (ARIC registry; n = 1205 Black patients and 2264 non-Black patients), ML models demonstrated high discrimination and adequate calibration (C statistic, 0.79 and 0.80, respectively). Furthermore, the performance of the ML models was superior to the traditional GWTG-HF risk score model (C index, 0.69 for both race groups) and other rederived logistic regression models using race as a covariate. The performance of the ML models was identical using the race-specific and race-agnostic approaches in the GWTG-HF and external validation cohorts. In the GWTG-HF cohort, the addition of zip code–level SDOH parameters to the ML model with clinical covariates only was associated with better discrimination, prognostic utility (assessed using decision curves), and model reclassification metrics in Black patients (net reclassification improvement, 0.22 [95% CI, 0.14-0.30]; P < .001) but not in non-Black patients.
Conclusions and Relevance
ML models for HF mortality demonstrated superior performance to the traditional and rederived logistic regressions models using race as a covariate. The addition of SDOH parameters improved the prognostic utility of prediction models in Black patients but not non-Black patients in the GWTG-HF registry.
Heart failure (HF) hospitalization confers a high mortality risk, with in-hospital mortality rates approaching 5%.1 In-hospital mortality rates vary substantially by race and ethnicity, and there is a growing need to develop risk-prediction tools to better identify high-risk individuals across races and ethnicities.2,3
Multiple clinical risk prediction tools are available to estimate in-hospital mortality risk among individuals hospitalized with HF, including Get With The Guidelines–Heart Failure (GWTG-HF), Acute Decompensated Heart Failure National Registry (ADHERE), and Organized Program to Initiate Lifesaving Treatment in Hospitalized Patients with Heart Failure (OPTIMIZE-HF) risk scores.4-6 However, most commonly implemented tools for predicting mortality risk incorporate race as a covariate, assigning a lower risk to Black individuals compared with individuals of other races. Concerns have been raised about this race-based approach that assigns lower risk to Black patients and thus potentially raises the threshold required for risk-based allocation of clinical therapies and adds to the existing disparities in HF care.7 Moreover, including race solely as a covariate in such risk models may not completely capture the societal factors contributing to racial disparities in outcomes among patients with HF. Thus, novel approaches to risk prediction are needed that do not use race as a biological risk factor and better account for social determinants of health (SDOH) in the risk assessment.
Race-specific risk prediction is one such approach used previously to predict the risk of atherosclerotic cardiovascular disease and incident HF in community-dwelling individuals.8-10 Race-specific risk-prediction models acknowledge that outcomes are different between races and look at risk gradients within each race strata, thus allowing for better capture of unique race-specific risk predictors. The race-agnostic approach to predicting risk is another strategy that does not consider race as a covariate while developing the risk model in the overall cohort. Race-agnostic approaches have been recently evaluated to assess biological parameters, such as kidney function.11,12 In this study, we developed and evaluated race-specific and race-agnostic models incorporating clinical and SDOH parameters to predict in-hospital mortality risk among patients hospitalized with HF. We hypothesized that improved and more equitable risk prediction can be achieved when risk is assessed without race as a biological covariate and accounts for SDOH. Consistent with our prior approaches, we used the machine learning (ML)–based random forest technique to develop the race-specific risk-prediction models.10,13
The present study used data from the American Heart Association (AHA) GWTG-HF registry. Details of the GWTG-HF program have been reported previously14,15 and are summarized in the eMethods and eFigure 1 in the Supplement. For the present analysis, 677 140 patients from 634 hospitals between January 1, 2010, and December 31, 2020, were considered for model development and validation. A total of 206 054 participants had less than 15% missing data on relevant covariates, 123 634 (60%) of whom were included in the model training and 82 420 (40%) for internal testing subsets. An additional cohort of 471 086 participants with less than 50% missingness was added to the internal validation cohort to test the performance of the derived model in a real-world–like data set where patients often have multiple missing model covariates (n = 553 506). Participating GWTG-HF centers obtained institutional review board approval and are granted a waiver for informed consent under the common rule. IQVIA (Parsippany, New Jersey) served as the data collection and coordination center. The American Heart Association Precision Medicine Platform was used for data analysis.
Recorded data in the GWTG-HF registry encompass a range of domains, including patient demographics, vital signs, socioeconomic status, medical history, laboratory values, cardiac biomarkers, and electrocardiography and ejection fraction. Details about the candidate variables used for the risk-prediction model developed are provided in the eMethods and eTable 1 in the Supplement.
Among 123 634 participants in the derivation cohort, 64 573 (52.2%) had an admission year of 2015 or later and recorded residential zip codes available to link with publicly available zip code–level measures of SDOH detailed in eTable 2 and the eMethods in the Supplement. All zip code–level data on SDOH were for the patients’ residence. Additionally, hospital-level measures of geography, sole community hospital, essential hospital membership, and disproportionate share hospital metrics were included as described previously16,17 and in the eMethods in the Supplement. Race was self-reported in questionnaires with standardized answer choices: Asian, Black, White, and other.
Our primary outcome of interest was in-hospital mortality. Mortality events were captured as documented on the case report form for participants in GWTG-HF.
The performance of derived risk models was assessed in an external validation cohort of participants from the Atherosclerosis Risk in Communities (ARIC) study obtained from the National Heart, Lung, and Blood Institute BioLINCC data repository. Details of the ARIC study have been previously reported18,19 and are described in the eMethods in the Supplement. Among 3612 candidate hospitalizations, 3469 patients with HF (1205 among Black patients and 2264 among non-Black patients) were included in the final cohort after excluding participants who were discharged to hospice (n = 69), left against medical advice (n = 4), or were discharged on comfort care (n = 70).
Race-specific and race-agnostic models were developed using random forest machine learning (ML) techniques described previously and detailed in the eMethods in the Supplement. The race-specific models were developed separately for Black participants and non-Black participants (subsequently referred to as the race-specific ML model). Variable selection was performed independently for Black participants and non-Black participants in the training data set of the GWTG-HF registry as described in the eMethods in the Supplement. The race-agnostic model was developed in the entire training cohort, excluding race as a candidate covariate from the variable selection. Multiple metrics were used to assess model performance in the testing data sets of the GWTG-HF registry and the external validation cohort. Discrimination was evaluated using the C index with 95% CIs determined using bootstrapping with 2000 replicates.20 Differences in C indices were compared across different models using the DeLong test.21 Consistent with the recent literature on risk prediction,22,23 calibration was assessed using the Brier score, representing the mean squared error between the observed and predicted risk and calibration slopes.24,25 Additionally, a regression slope of the observed mortality rates was calculated across deciles of predicted mortality rates. A lower Brier score indicates calibration intercept closer to 0, and calibration slope closer to 1 indicates better performance. Observed and predicted risks across deciles of predicted risk were also reported in the validation cohorts as additional measures of model calibration. Reclassification was reported using categorical net reclassification improvement (at race-specific event rate risk threshold) and integrated discrimination index.26,27 Decision curve analysis, a measure of the true-positive cases identified without an increase in the false-positive rate, was used to assess the clinical net benefit with the model across thresholds of risk.28
The generalizability of the ML models was assessed in cohorts of participants with less than 15% missing data and less than 50% missing data in model covariates. Subgroup analyses were performed to evaluate the performance of the ML models in age-based (≤70 years or >70 years), sex-based (male and female), race-based (Asian, White, and other), ethnicity-based (Hispanic and non-Hispanic), ejection fraction–based (HF with reduced and preserved ejection fraction, using the 50% ejection fraction cutoff), and socioeconomic status–based (median income of ≥$54 471 vs <$54 471) subgroups.
We compared the performance of the race-specific and race-agnostic ML models with the models that used race as a covariate. This included the original GWTG-HF risk score and a logistic regression model with race as a covariate that was rederived in the GWTG-HF registry data used in the present study. We also compared the performance of the ML models vs a race-specific logistic regression model. Details of the logistic regression model are described in the eMethods in the Supplement. Sensitivity analyses were also performed to evaluate the performance of an additional ML model with race as a covariate. The importance of race in risk prediction using the ML model with race as a covariate was assessed using the minimum depth metric.
To evaluate whether incorporating SDOH might improve risk prediction of race-specific or race-agnostic ML models, the random forest ML model was rederived using an expanded pool of covariates that included patient-level clinical data, patient-level insurance status, and zip code–based SDOH parameters (65 covariates: 38 clinical and 27 SDOH) for patients admitted in 2015 and later and with available zip code–level data. Because participant zip codes were not available in the ARIC external validation cohort, the clinical and socioeconomic models were validated only in the GWTG-HF internal validation cohort with less than 50% missingness. Subgroup analyses were performed by disproportionate share hospital status. Finally, to determine the proportion of in-hospital mortality associated with specific clinical and socioeconomic risk factors across races, we used the Greenland-Drescher method for calculating population-attributable risk percentage as detailed in the eMethods in the Supplement.29 Analyses were performed using R version 4.0.2 (R Foundation) with a 2-tailed P value <.05 indicating significance.
The training cohort consisted of 123 634 participants (mean [SD] age, 71  years; 58 356 [47.2%] female individuals and 65 278 [52.8%] male individuals), of whom 2121 (2.1%) were Asian; 23 453 (19.0%), Black; 91 154 (91.0%), White; and 6906 (6.9%), other race and ethnicity.
Table 1 shows the baseline characteristics of Black participants and non-Black participants in the GWTG-HF training data set. More Black patients were female; Black patients were also generally younger; had a higher prevalence of hypertension, obesity, and kidney dysfunction; and had higher levels of cardiac biomarkers, including natriuretic peptide levels and troponin (Table 1). Non-Black patients were more likely to have a history of coronary artery disease and diabetes at presentation. Black patients were more likely to lack health insurance coverage and had lower median zip code household income.
The ranked variables for the race-specific and race-agnostic models according to variable importance are displayed in eFigure 2 in the Supplement. No improvement in C index was observed with more than 20 covariates in an ML model (eFigure 3 in the Supplement). The testing subset included 15 634 Black patients and 66 786 non-Black patients (eTable 3 in the Supplement) with in-hospital event rates of 1.7% (n = 269) and 3.1% (n = 2082), respectively. Across participants in the Black and non-Black groups, the race-specific ML model demonstrated excellent discrimination performance (C index among Black patients, 0.81 [95% CI, 0.79-0.83] and non-Black patients, 0.82 [95% CI, 0.81-0.83]) and adequate calibration (eTable 4 and eFigure 4 in the Supplement). The performance of race-agnostic ML models among Black patients and non-Black patients was comparable with that observed for race-specific models (eTable 4 and eFigure 4 in the Supplement).
In the validation data set with up to 50% missingness in covariates (107 508 Black patients [19.4%] and 445 998 non-Black patients [80.6%]), the Black and non-Black race-specific ML models demonstrated high discrimination (C index, 0.74 [95% CI, 0.71-0.77] and 0.75 [95% CI, 0.73-0.78], respectively) and adequate calibration (Brier score, 16 and 31 ×10−3, respectively), comparable with that noted in the testing subset (eTable 4 and eFigure 5 in the Supplement). Similar results were also observed with the race-agnostic ML model (eTable 4 in the Supplement). In subgroup analysis, the race-specific and race-agnostic ML models also demonstrated good and comparable discrimination and calibration performance across age-based (≤70 years or >70 years), sex-based, ethnicity-based, ejection fraction, and socioeconomic status–based subgroups (eTables 5 and 6 in the Supplement).
We externally validated the ML models in a cohort of participants with hospitalization for HF from the ARIC study (n = 3469; 1205 Black patients [34.7%] and 2264 White patients [65.3%]) (eTable 3 in the Supplement). All non-Black patients self-identified as White. In-hospital mortality rates were 2.0% (n = 24) and 3.1% (n = 70) for Black patients and White patients, respectively. Compared with the GWTG-HF cohort, more participants in the ARIC cohort were female; they were also generally older and had higher rates of cardiovascular disease risk factors and higher levels of abnormal cardiac biomarkers. Among Black patients, the race-specific ML models demonstrated superior discrimination (C index = 0.79 [95% CI, 0.77-0.81]) and calibration (Brier, 19 ×10−3) compared with the GWTG-HF risk (C index, 0.69 [95% CI, 0.67-0.71]; difference, 0.10 [95% CI, 0.07-0.13]), rederived logistic regression with race as a covariate (C index = 0.71 [95% CI, 0.69-0.72]; difference, 0.09 [95% CI, 0.06-0.11]), and race-specific logistic regression models (C index = 0.74 [95% CI, 0.72-0.76]; difference, 0.05 [95% CI, 0.02-0.08]) (Table 2; Figure 1; eFigure 6 in the Supplement). A similar pattern of results was observed among non-Black patients, with consistently high and superior performance of the race-specific ML models (C index = 0.80 [95% CI, 0.79-0.81]) compared with other models using race as a covariate and the race-specific logistic regression model (Table 2; Figure 1).
In reclassification analysis, the race-specific ML model demonstrated improved net reclassification improvement and integrated discrimination index to the original GWTG-HF score in Black individuals and non-Black individuals (eTable 7 in the Supplement). In decision curve analyses, the race-specific ML model detected an additional 3 to 6 mortality events per 1000 Black patients (Figure 2A) and 2 to 9 events per 1000 non-Black patients compared with other models using race as a covariate (Figure 2B). Performance of the race-agnostic ML model in the external validation cohort was comparable with that of the race-specific models across both race groups (Table 2; Figure 2).
Notably, race-specific logistic regression had superior discrimination and reclassification compared with logistic regression with race as a covariate model in Black patients and non-Black patients (C index difference, 0.04 [95% CI, 0.01-0.06] and 0.04 [95% CI, 0.01-0.08]; net reclassification improvement, 0.18 [95% CI, 0.02-0.29] and 0.22 [95% CI, −0.04 to 0.42], respectively) (Table 2; eTable 7 in the Supplement). Sensitivity analysis comparing the performance of ML models using a race-specific approach vs race as a covariate demonstrated comparable C indices among Black patients and non-Black patients. In the ML models with race as a covariate, race featured in the top 5 predictor variables based on the minimum depth.
Among Black patients in the ARIC validation cohort, the GWTG risk score predicted 5.7% of patients with an estimated risk above the 5% threshold (eFigure 7 in the Supplement). Conversely, the race-specific and race-agnostic ML models identified a significantly higher proportion of patients above the different risk thresholds (5% risk threshold: GWTG score = 5.7%; race-specific ML model = 12.4%; race-agnostic ML model = 15.3%; χ2 P value <.001).
In the GWTG-HF validation cohort, among 13 088 Black patients, the race-specific ML model that included SDOH demonstrated improved discrimination and calibration (C indices, 0.77 [95% CI, 0.75-0.79]; intercept, −0.07; slope, 0.93) than the models with clinical covariates only (C indices, 0.73 [95% CI, 0.71-0.75]; intercept, −0.18; slope, 0.85; difference, 0.04 [95% CI, 0.01-0.07]) (eFigure 8 in the Supplement). Among reclassification metrics, the addition of SDOH parameters to the race-specific ML model with clinical covariates was associated with a significant improvement in upwards reclassification (net reclassification improvement, 0.22 [95% CI, 0.14-0.30]; integrated discrimination index, 0.007 [95% CI, 0.005-0.01]). In the decision curve analysis, the race-specific ML model with clinical and SDOH covariates (vs clinical covariates only) detected an additional 3 events per 1000 patients (Figure 2C). Similar results were observed with the race-agnostic ML model (C index of 0.76 [95% CI, 0.75-0.78]; difference, 0.01 [95% CI, −0.02 to 0.03]). In subgroup analysis, the race-specific and race-agnostic ML models demonstrated good and comparable discrimination and calibration performance across disproportionate share hospital–based subgroups (eTable 8 in the Supplement).
Conversely, among 51 485 non-Black patients, inclusion of SDOH to the race-specific ML models was not associated with a significant improvement in risk prediction performance with comparable discrimination (C indices, 0.75 [95% CI, 0.73-0.77]; difference, 0.01 [95% CI, −0.03 to 0.04]), calibration (intercept, −0.39; slope, 0.93), prognostic utility (no additional mortality events detected by decision curve analysis), and reclassification (net reclassification improvement, −0.01 [95% CI, −0.05 to 0.03]; integrated discrimination index, 0.003 [95% CI, −0.002 to 0.006]) (Figure 2D; eFigure 8 in the Supplement). Similar results were observed with the race-agnostic ML model (C index, 0.75 [95% CI, 0.73-0.76]; difference, 0.005 [95% CI, −0.02 to 0.03]).
Using the race-specific ML model among Black patients, multiple SDOH parameters were identified as strong predictors of in-hospital mortality, with 5 such parameters among the top 20 predictors (Figure 3). Overall, the population-attributable risk percentage for in-hospital mortality associated with all SDOH parameters was 11.6% among Black patients. In contrast, among non-Black patients, only 1 SDOH parameter featured in the top 20 risk predictors with a total population-attributable risk percentage of 0.5% for in-hospital mortality using the race-specific ML model. Among clinical risk factors, measures of kidney function, blood pressure, natriuretic peptide, troponin, and age were among the top predictors of in-hospital mortality across both race groups (Figure 3).
In this cohort study, we developed and validated ML-based race-specific and race-agnostic risk models to predict in-hospital mortality among individuals with hospitalization for HF. We observed that the race-specific and race-agnostic ML-based models demonstrated excellent performance in the testing data sets, including those with substantial missingness in model covariates. Furthermore, the ML-based models had superior discrimination, calibration, and clinical utility in the external validation cohort than the original GWTG-HF risk scores and other rederived logistic regression models using race as a covariate. The addition of zip code–level SDOH to the ML model was associated with an improvement in risk reclassification and prognostic utility of the model in Black patients. We also observed significant race-specific differences in the population-attributable risk of in-hospital mortality associated with the SDOH with a significantly greater contribution of these parameters to the overall in-hospital mortality risk in Black patients vs non-Black patients. Overall, the present study demonstrates the potential utility of ML models for better and more equitable prediction of in-hospital mortality risk among Black patients and non-Black patients hospitalized for HF.
The most significant advancement with our risk models is the use of ML-based approach to risk prediction. Several models exist for predicting the risk of adverse outcomes among patients with HF hospitalization. Established models, such as the GWTG-HF, OPTIMIZE-HF, ADHERE, and AHFI (Acute Heart Failure Index) risk scores, use traditional statistical modeling techniques, provide acceptable risk stratification, and have been well validated in external cohorts.4,5,30 A summary of prior risk-prediction models is provided in eTable 9 in the Supplement. Besides traditional risk-prediction models, some previous studies have also developed ML-based models to predict in-hospital mortality. However, these studies have been mainly developed in non-US–based, ethnically homogenous cohorts.31 The ML models developed in the present study offer several advantages. First, they incorporate well-established prognostic biomarkers for HF in risk prediction not included in previous risk models and are thus better predicting individual-level risk. Second, the ML-based approach allows for greater generalizability, better tolerance to missing data, and more accurate risk prediction in external cohorts. This is evidenced by the ML model demonstrating adequate performance in a cohort with up to 50% missingness. With the addition of an application programming interface to improve implementation,10 the ML models offer an opportunity for real-world, electronic health record–based risk prediction.
In addition to using the ML-based approach to risk prediction, other aspects of our risk models are noteworthy. In the present study, we developed race-specific models for in-hospital mortality. We observed superior performance of the race-specific logistic regression model compared with models using race as a covariate, highlighting the potential utility of race-specific risk-prediction models. However, when using an ML-based approach, the discrimination and calibration metrics for the race-specific ML model were comparable with the ML model using race as a covariate. This is related to the modeling approach used by random forest learning methods, which assigned race a lower minimal depth (higher importance). Thus, the random forest model takes a race-specific approach, even when race is included as a covariate, creating a decision tree after the first few nodes, comparable with a race-specific model. Furthermore, even with the use of a race-agnostic approach (without race as a covariate), the performance of the ML model was comparable with the race-specific model. Taken together, the ML-based approach represents the most novel aspect of our risk model that is associated with improved and more equitable prediction of individual-level risk across race groups. Thus, even though the risk of mortality among Black patients was lower than non-Black patients in our study cohorts, the proportion of Black patients identified to be above specific risk thresholds was higher with the ML models than with the traditional GWTG-HF model. Future studies are needed to determine if similar ML models may facilitate more equitable risk-based allocation of care. This is particularly relevant considering the concerns raised about the potential unintended effects of assigning lower risk to Black patients using the existing risk models that use race as a covariate, which may add to existing disparities in HF care.7
SDOH are stronger predictors of in-hospital mortality in Black patients vs non-Black patients with HF.32-34 Previous studies have observed improved model performance incorporating SDOH data in the risk prediction equations.35-37 In the present study, we observed that zip code–level SDOH contributed more than 11% of the total in-hospital mortality risk in Black patients compared with 0.5% in non-Black patients. Consistent with the greater relative importance of SDOH in predicting the risk of in-hospital mortality in Black patients, we observed a significant improvement in risk reclassification and calibration with the addition of these parameters in Black patients but not non-Black patients. Furthermore, the model performance for the clinical and SDOH model was comparable among patients hospitalized at disproportionate share vs non–disproportionate share hospitals, highlighting the generalizability of the risk models among patients hospitalized in low- and high-resource hospitals.
While the improvement in risk prediction with the incorporation of zip code–level SDOH parameters is encouraging, future studies are needed to understand better how SDOH factors can be better incorporated for risk prediction in HF patients. First, it would be more informative to include individual-level data on SDOH rather than zip code–level data alone. Second, the ML-based approach used in the present study allowed us to evaluate the clinical and SDOH model in a data set with up to 50% missingness in clinical parameters. It is plausible that with better capture of clinical parameters in the model, the relative improvement in model performance with the incorporation of zip code–level SDOH may be attenuated.
Our study has some notable limitations. First, we only included variables that were regularly captured in the GWTG registry. Data on certain laboratory measures, such as hemoglobin A1c and lipid profiles, which are associated with increased mortality risk,38 had significant missingness and were excluded from the candidate covariates. Second, the race-specific models were developed using self-reported race. Individuals who may not identify with a specific race on the GWTG-HF data form may not be accurately represented. However, sensitivity analysis across available races and ethnicities showed similar performance. Third, only zip code–level SDOH data were available for the present analysis. Incorporating participant-level SDOH parameters in risk models may further improve their predictive performance. Fourth, race was self-reported data on genetic ancestry were not available. Fifth, we could not externally validate the clinical and SDOH models given the lack of zip code data in the ARIC cohort.
The race-specific and race-agnostic ML models to predict in-hospital mortality among patients with HF demonstrated superior discrimination and calibration in Black patients and non-Black patients and outperformed traditional logistic regression models with race as a covariate. Furthermore, incorporating zip code–level SDOH parameters into the risk prediction ML models improved their performance among Black patients but not non-Black patients. Future studies are needed to determine whether race-specific and race-agnostic ML models may improve risk prediction, resource allocation, and care outcomes among Black patients with HF.
Accepted for Publication: April 26, 2022.
Published Online: July 6, 2022. doi:10.1001/jamacardio.2022.1900
Corresponding Author: Ambarish Pandey, MD, MSCS, Department of Internal Medicine, Division of Cardiology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390 (firstname.lastname@example.org).
Author Contributions: Drs Segar and Pandey had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Segar, Jhund, Kao, Fonarow, Hernandez, Ibrahim, Pandey.
Acquisition, analysis, or interpretation of data: Segar, Hall, Jhund, Powell-Wiley, Morris, Fonarow, Rutan, Navar, Stevens.
Drafting of the manuscript: Segar, Powell-Wiley, Hernandez, Rutan, Pandey.
Critical revision of the manuscript for important intellectual content: Hall, Jhund, Powell-Wiley, Morris, Kao, Fonarow, Ibrahim, Navar, Stevens, Pandey.
Statistical analysis: Segar, Jhund, Kao, Pandey.
Obtained funding: Pandey.
Administrative, technical, or material support: Powell-Wiley, Ibrahim, Rutan, Stevens, Pandey.
Supervision: Hall, Powell-Wiley, Morris, Fonarow, Pandey.
Conflict of Interest Disclosures: Dr Segar has received nonfinancial support from Pfizer and Merck. Dr Hall is an employee of the American Heart Association and a member of the Innovation Advisory Board for Change Healthcare. Dr Jhund’s employer, the University of Glasgow, has been remunerated by Astra-Zeneca, Novartis, and Novo Nordisk for his work on clinical trials and reports speaker and advisory board fees from Novartis, AstraZeneca, and Boehringer Ingelheim and grants from Boehringer Ingelheim. Dr Powell-Wiley is funded by the Division of Intramural Research at the National Heart, Lung, and Blood Institute and the Intramural Research Program of the National Institute on Minority Health and Health Disparities at the National Institutes of Health. Dr Morris reported grants from National Heart, Lung, and Blood Institute, Woodruff Foundation, and the Association of Black Cardiologists outside the submitted work. Dr Kao is medical advisor for Codex Health, Inc. Dr Fonarow reports consulting for Abbott, Amgen, AstraZeneca, Bayer, Cytokinetics, Edwards, Janssen, Medtronic, Merck, and Novartis and serves as and Associate Section Editor for JAMA Cardiology. Ms Rutan is an employee of the American Heart Association. Dr Navar has received funding for research to her institution from Bristol Myers Squibb, Esperion, Amgen, and Janssen and honoraria and consulting fees from Amarin, Amgen, AstraZeneca, Boehringer Ingelheim, Esperion, Janssen, Lilly, Sanofi, Regeneron, NovoNordisk, Novartis, New Amsterdam Pharma, and Pfizer and serves as Associate Editor at JAMA Cardiology. Dr Stevens is an employee of the American Heart Association. Dr Pandey received grant funding from Applied Therapeutics and Gilead Sciences outside the submitted work; has received honoraria as an advisor or consultant for Tricog Health, Inc, Eli Lilly, Rivus, and Roche Diagnostics; has received nonfinancial support from Pfizer and Merck; and has received research support from the Texas Health Resources Clinical Scholarship, the Gilead Sciences Research Scholar Program, the National Institute on Aging GEMSSTAR Grant, and Applied Therapeutics. No other disclosures were reported.
Funding/Support: The Get with The Guidelines–Heart Failure (GWTG-HF) program is provided by the American Heart Association. GWTG-HF is sponsored in part by Novartis, the Boehringer Ingelheim and Eli Lilly Diabetes Alliance, Novo Nordisk, Sanofi, AstraZeneca, and Bayer. Dr Pandey has received research support from the Texas Health Resources Clinical Scholarship, the Gilead Sciences Research Scholar Program, and the National Institute on Aging GEMSSTAR Grant (1R03AG067960-01).
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institute on Minority Health and Health Disparities; the National Institutes of Health; or the US Department of Health and Human Services. Dr Fonarow is the Associate Editor for Health Care Quality and Guidelines of JAMA Cardiology and Dr Navar is Deputy Editor, Diversity, Equity and Inclusion of JAMA Cardiology, but they were not involved in any of the decisions regarding review of the manuscript or its acceptance.
Additional Contributions: We thank all the members of the American Heart Association volunteer Get with The Guidelines–Heart Failure committee and participating hospitals and clinicians. IQVIA serves as the data collection and coordination center. Data analysis was conducted on the American Heart Association Precision Medicine Platform, powered by Amazon Web Services, and is supported by Hitachi Vantara.