EHR indicates model derived from electronic health records; LR, de novo logistic regression model; LASSO, least absolute shrinkage and selection operator method; and TAN, tree-augmented Bayesian network.
GBM indicates gradient-boosted model and RF, random forest model.
eAppendix. Model Development.
eTable 1. A List of All Variables in the Get With The Guidelines Registry Data Entry Form
eTable 2. List of Candidate Variables Derived From the Get With The Guidelines—Heart Failure registry and Baseline Characteristics Of Patients From GWTG-HF Registry Linked With Medicare data Included in Study Sample
eTable 3. Demographic and Clinical Characteristics of the Study Sample, Listed By Overall and Training Vs Validation Cohorts.
eTable 4. Demographics of the Study Sample, Listed By Overall, and readmitted Within 30 Days (Yes Vs No).
eTable 5. Final Variables in Logistic Regression Model
eTable 6. Net Reclassification Improvement Comparing TAN Model With LR
eFigure. Flowchart Illustrating Acquisition of the Study Sample
Customize your JAMA Network experience by selecting one or more topics from the list below.
Frizzell JD, Liang L, Schulte PJ, et al. Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches. JAMA Cardiol. 2017;2(2):204–209. doi:https://doi.org/10.1001/jamacardio.2016.3956
Can a machine-leaning approach improve the accuracy of predicting the risk for readmission at 30 days in hospitalized patients with heart failure?
In this registry-based modelling study, the accuracy and discrimination of 3 machine-learning approaches (least absolute shrinkage and selection operator, random forest, and gradient boosted) to predicting risk of 30-day readmission in patients discharged following hospitalization for heart failure were compared with a traditional logistic regression model. All models performed comparably albeit modestly, with C-statistics ranging from 0.59 to 0.62.
The findings are consistent with the literature, based on traditional statistical methods of limited predictive ability for heart failure readmission.
Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling.
To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using “conventional” statistically based methods.
Design, Setting, and Participants
Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016.
Main Outcomes and Measures
C statistics were used for comparison of discriminatory capacity across models in the validation sample.
The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set.
Conclusions and Relevance
Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.
Efforts to reduce heart failure (HF) readmissions have yielded mixed results,1,2 leaving guidelines unable to offer specific recommendations.3 Numerous studies have created models to predict HF readmissions,4-10 none of which have demonstrated sufficient discriminative properties, with C statistic or areas under the curve ranging from 0.54 to 0.72. Machine learning (ML) was identified by the Institute of Medicine with potential for analysis of predictive capabilities of large clinical data sets.11
The specific aims of this study were to develop models using ML to predict all-cause readmission 30 days after discharge from an index HF hospitalization and to compare ML model performance with models developed using “conventional” statistics-based methods.
Data for this analysis were obtained from the American Heart Association Get With the Guidelines Heart Failure (GWTG-HF) registry linked with Medicare inpatient data, a method previously described12 and validated. The Medicare data included Part A (inpatient) claims and the associated denominator file. Medicare inpatient claims data from January 1, 2005, through December 31, 2011, were linked with GWTG-HF registry data for this same time. All participating hospitals were required to submit GWTG-HF protocol to their institutional review board for approval. Because data collected were used for hospital quality improvement, sites were granted a waiver of informed consent under the common rule.
The analysis sample included patients in the GWTG-HF registry who were (1) admitted to hospitals in GWTG-HF registry having at least 75% complete data on medical history; (2) age older than 65 years with a GWTG-HF registry hospitalization linked to Medicare files; (3) discharged alive between January 1, 2005, and December 1, 2011 (allowing all patients to have at least 30 days follow-up); and (4) enrolled in Medicare Fee-For-Service A and B at discharge. Patients were excluded if they left against medical advice, were discharged/transferred to another short-term hospital or hospice, or if discharge destination was missing. Patients were also excluded if they were without Medicare Fee-For-Service A and B eligibility within 30 days, except those who died within 30 days (n = 289). For patients with multiple hospitalizations, the first hospitalization meeting this criteria was kept as the index hospitalization for analysis.
The primary outcome was readmission within 30 days following discharge of an index hospitalization for HF. Readmission was defined as any new inpatient claim excluding the index hospitalization claim, transfers from another hospital, admissions for rehabilitation (facility Medicare ID was used to identify admissions in independent inpatient rehabilitaion facilities and rehabilitation units of hospitals), or elective or unknown type of admissions. Heart failure readmissions were those with a primary diagnosis of heart failure (International Classification of Diseases, Ninth Revision codes 428.x, 402.x1, 404.x1, and 404.x3).
From all available variables recorded in the GWTG-HF registry data collection form, the candidate variables for our model included demographics (age, sex, and race/ethnicity), socioeconomic status (defined according to zip code by median household income, percentage with ≥4 years college, and percentage of high school graduates), medical history, characterization of HF (including admission symptoms), admission and discharge medications, vital signs, weights, selected laboratories treatment, and discharge interventions.
The study sample was randomly divided into training (70% of sample) and validation (30% of sample) cohorts. We compared baseline characteristics of the training vs validation cohorts and by readmission status (yes vs no) using proportions for categorical variables and median with 25th and 75th percentiles for continuous variables. For clinically important candidate variables with missing data, multiple imputation was used, with 25 imputed data sets generated using fully conditional specification methods to generate final estimates.
We built predictor models using a tree-augmented naive Bayesian network (TAN), logistic regression (LR) with backward stepwise selection, an LR with least absolute shrinkage and selection operator (LASSO model), a gradient-boosted model, and a random forest model. Details of model development are available in the eAppendix in the Supplement.
Each model’s performance was evaluated and compared using the validation data set. The C statistic was used to evaluate a model’s ability to discriminate between a readmission or not. Model calibration was evaluated using plots of predicted vs observed 30-day HF readmission rate. Because of the novelty in building a TAN prediction model, it was further compared with the standard LR method via net reclassification improvement. The 2-category net reclassification improvement used the observation event rate as the decision boundary, and 95% confidence intervals for the net reclassification improvement were created using a percentile bootstrap with 50 bootstrap samples. Last, using a previously validated model derived from electronic health records,10 we assessed its discriminatory capacity in our study sample.
From the GWTG-HF data set, 238 581 patients from 650 hospitals were linked to Medicare data. Of these, 87 503 patients from 292 hospitals had at least 75% complete data recorded in their medical history. Once further exclusion criteria were applied, 56 477 patients remained in the data analysis, covering January 4, 2005, to December 1, 2011 (eFigure in the Supplement).
From all variables recorded in the GWTG-HF data collection form (eTable 1 in the Supplement), candidate variables and selected characteristics of our analysis sample are shown in eTable 2 in the Supplement. The study population overall 30-day readmission rate was 21.2% (11 959 of 56 477 patients). Comparisons between those readmitted and those not, as well as between training and validation groups, are found in eTables 3 and 4 in the Supplement.
Final variables for the LR model are shown in eTable 5 in the Supplement. Calibration plots for training and validation sets, along with respective C statistics for the TAN, LR, and LASSO models, are shown in Figure 1. The TAN model used all class variables in construction. C statistics for derivation and validation sets were modest (0.622 and 0.618, respectively). C statistics for the training and validation sets for the LR model were 0.629 and 0.624, respectively, and 0.624 and 0.618, respectively, for the LASSO model. The electronic health records model using our study population had poor discriminatory capacity, with training and validation set C statistics of 0.592 and 0.589, respectively. The calibration curves show poorer alignment (accuracy) for the TAN model than the LR and LASSO models. Calibration plots for training and validation sets along with C-statistics for the random forest and gradient-boosted models are shown in Figure 2. Neither model provided improved discrimination vs TAN, LR, or LASSO.
Direct comparison of the LR model with TAN showed no significant reclassification improvement (eTable 6 in the Supplement).
Use of ML algorithms did not lead to improved prediction of 30-day HF readmissions compared with traditional statistical models. All models developed in this study were concordant and showed modest discrimination, with C statistics consistently around 0.62. Our conclusions are consistent with other studies that attempted to predict HF readmissions.
The Centers for Medicare and Medicaid Services model developed by Keenan et al “for the purpose of public reporting of hospital-level readmission rates by [Centers for Medicare and Medicaid Services ]”13 underpins the basis for penalizing excessive readmissions.14 As the Centers for Medicare and Medicaid Services began constructing its predictive model, subsequent data from hospital readmissions yielded insights that substantially complicate efforts at readmission prediction and prevention. For example, after a discharge for HF, a patient may be readmitted for pneumonia. Indeed, nearly two-thirds of readmissions after HF hospitalization are for reasons other than HF.15 Furthermore, several factors associated with readmissions reflect social determinants of health. Comprehensively addressing these factors is typically beyond the scope of a single hospitalization, may stretch the capability of a single hospital or hospital system, and ultimately may require policy change at the societal level.
Our study found no model among those tested with sufficient discriminatory capacity to warrant clinical use, at least as judged by the C statistic. Although pervasive enough in the literature to arbitrate among predictive models, the C statistic may not be the best measure for doing so because it is used to describe how well the model ranks cases over noncases. Supplementing the C statistic with model calibration plots showed that no model was as or more accurate in predicting readmissions than logistic regression.
Nearly all published models for prediction of HF readmissions were generated by multivariable LR, which has several limitations, primarily resting on assumptions of linearity (on a log-odds scale). Although regression models have distinct advantages in being simple to implement and easily interpretable, they are of limited use in model prediction. Predictive capability, an essential aspect of ML methodology, is fundamentally based on classification, or grouping, of characteristics (attributes) in the form of a neural network, decision tree, or rule-driven algorithm. Importantly, for any ML technique to be useful, a balance between the minimization of training errors (the first step in any ML approach) seen with more complex models and transportability (generalizability) must be struck.
Although we included approximately 250 variables in the GWTG-HF registry, significant unrecognized covariates contributing to HF readmissions may exist and for which we are currently unable either to account or measure. We attempted to include all clinical measures available to us as candidate variables including several measures of social determinants of health. Our failure to elicit significant contributing variables among the latter is similar to the findings of Krumholz et al16: these variables, although important, may simply not add significantly to the model’s predictive properties.
The GWTG-HF registry is dependent on voluntary participation from centers. Thus, there may be bias with site and patient enrollment. Nearly two-thirds of patients were excluded because of insufficient data, and after further exclusions, less than 25% of the original study population remained. Heart failure diagnosis can be challenging, and there may be variation in reporting this clinical syndrome. Postdischarge data were not directly tracked or recorded. Longitudinal outcomes were assessed based on deterministic matching to CMS and, despite validation in other settings, may introduce error on matching. We attempted to adjust for potential confounders but cannot exclude residual confounding. Our observations may not apply to populations younger than 65 years of age.
In conclusion, ML methods had only modest success in predicting 30-day HF readmissions and were similar to previous attempts at prediction in the Medicare population. The failure to create a predictive model with adequate discriminatory capacity places our effort well within the literature, with multiple attempts at prediction all within a general range of insufficient clinical usefulness (Table). This failure across multiple attempts and methods raises concern that the issue is perhaps less with methodology than with using a subjectively driven outcome or possibly the presence of additional unrecognized covariates of importance.
Correponding Author: Warren K. Laskey, MD, MPH, Division of Cardiology, Department of Internal Medicine, University of New Mexico, MSC10-5550, One University of New Mexico, Albuquerque, NM 87131 (email@example.com).
Accepted for Publication: August 30, 2016.
Published Online: October 26, 2016. doi:10.1001/jamacardio.2016.3956
Author Contributions: Drs Laskey and Frizzell had full access to the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Frizzell, Yancy, Hernandez, Fonarow, Laskey.
Acquisition, analysis, or interpretation of data: Frizzell, Liang, Schulte, Heidenreich, Hernandez, Bhatt, Fonarow.
Drafting of the manuscript: Frizzell, Laskey.
Critical revision of the manuscript for important intellectual content: Frizzell, Liang, Schulte, Yancy, Heidenreich, Hernandez, Bhatt, Fonarow.
Statistical analysis: Liang, Schulte.
Administrative, technical, or material support: Hernandez, Laskey.
Study supervision: Frizzell, Fonarow, Laskey.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Bhatt is on the advisory board for Cardax, Elsevier Practice Update Cardiology, Medscape Cardiology, and Regado Biosciences; the board of directors for Boston VA Research Institute and the Society of Cardiovascular Patient Care; chair of the American Heart Association Quality Oversight Committee; the data monitoring committees for Duke Clinical Research Institute, Harvard Clinical Research Institute, Mayo Clinic, and Population Health Research Institute; receives honoraria from the American College of Cardiology (Senior Associate Editor, Clinical Trials and News, ACC.org), Belvoir Publications (Editor in Chief, Harvard Heart Letter), Duke Clinical Research Institute (clinical trial steering committees), Harvard Clinical Research Institute (clinical trial steering committee), HMP Communications (Editor in Chief, Journal of Invasive Cardiology), Journal of the American College of Cardiology (Guest Editor; Associate Editor), Population Health Research Institute (clinical trial steering committee), Slack Publications (Chief Medical Editor, Cardiology Today’s Intervention), Society of Cardiovascular Patient Care (Secretary/Treasurer), and WebMD (continuing medical education steering committees); is the deputy editor of Clinical Cardiology; is the Vice-Chair of National Cardiovascular Data Registry Steering Committee; is the Chair of the Veterans Affairs Cardiovascular Assessment Reporting and Tracking program Research and Publications Committee; receives royalties from Elsevier (Editor, Cardiovascular Intervention: A Companion to Braunwald’s Heart Disease); is site coinvestigator for Biotronik, Boston Scientific, St. Jude Medical; is a trustee of the American College of Cardiology; and conducts unfunded research for FlowCo, PLx Pharma, and Takeda. Dr Fonarow conducts research for National Institutes of Health and Patient-Centered Outcomes Research Institute and is a consultant for Amgen, Janseen, Novartis, and Medtronic. No other disclosures are reported.
Funding/Support: Research was funded by the American Heart Association Get With The Guidelines Young Investigator Database Award, 2013. It is sponsored in part by Amgen Cardiovascular and has been funded in the past through support from Medtronic, GlaxoSmithKline, Ortho-McNeil, and the American Heart Association Pharmaceutical Roundtable. Dr Bhatt receives research funding from Amarin, AstraZeneca, Bristol-Myers Squibb, Eisai, Ethicon, Forest Laboratories, Ischemix, Medtronic, Pfizer, Roche, Sanofi Aventis, and the Medicines Company.
Role of the Funder/Sponsor: The American Heart Association and industry sponsors had no role in the design and conduct of the study; collection, management, analysis and interpretation of the data; preparation of the manuscript and decision to submit the manuscript for publication. Representatives from the American Heart Association reviewed and approved a draft of the manuscript prior to submission.
Disclaimer: Dr Yancy is a Deputy Editor, JAMA Cardiology, Dr Hernandez is an Associate Editor, JAMA Cardiology, and Dr Fonarow is the Associate Editor for Health Care Quality and Guidelines, JAMA Cardiology, but they were not involved in the review process or decision to accept the manuscript for publication.
Create a personal account or sign in to: