[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Mozaffarian  D, Benjamin  EJ, Go  AS,  et al; Writing Group Members; American Heart Association Statistics Committee; Stroke Statistics Subcommittee.  Executive summary: heart disease and stroke statistics—2016 update: a report from the American Heart Association.  Circulation. 2016;133(4):447-454. doi:10.1161/CIR.0000000000000366PubMedGoogle ScholarCrossref
Benjamin  EJ, Virani  SS, Callaway  CW,  et al; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee.  Heart disease and stroke statistics—2018 update: a report from the American Heart Association.  Circulation. 2018;137(12):e67-e492. doi:10.1161/CIR.0000000000000558PubMedGoogle ScholarCrossref
Blecker  S, Paul  M, Taksler  G, Ogedegbe  G, Katz  S.  Heart failure–associated hospitalizations in the United States.  J Am Coll Cardiol. 2013;61(12):1259-1267. doi:10.1016/j.jacc.2012.12.038PubMedGoogle ScholarCrossref
Rahimi  K, Bennett  D, Conrad  N,  et al.  Risk prediction in patients with heart failure: a systematic review and analysis.  JACC Heart Fail. 2014;2(5):440-446. doi:10.1016/j.jchf.2014.04.008PubMedGoogle ScholarCrossref
Frizzell  JD, Liang  L, Schulte  PJ,  et al.  Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches.  JAMA Cardiol. 2017;2(2):204-209. doi:10.1001/jamacardio.2016.3956PubMedGoogle ScholarCrossref
Greiner  MA, Hammill  BG, Fonarow  GC,  et al.  Predicting costs among Medicare beneficiaries with heart failure.  Am J Cardiol. 2012;109(5):705-711. doi:10.1016/j.amjcard.2011.10.031PubMedGoogle ScholarCrossref
Lee  H, Shi  SM, Kim  DH.  Home time as a patient-centered outcome in administrative claims data.  J Am Geriatr Soc. 2019;67(2):347-351. doi:10.1111/jgs.15705PubMedGoogle ScholarCrossref
Greene  SJ, O’Brien  EC, Mentz  RJ,  et al.  Home-time after discharge among patients hospitalized with heart failure.  J Am Coll Cardiol. 2018;71(23):2643-2652. doi:10.1016/j.jacc.2018.03.517PubMedGoogle ScholarCrossref
Hennessy  S.  Use of health care databases in pharmacoepidemiology.  Basic Clin Pharmacol Toxicol. 2006;98(3):311-313. doi:10.1111/j.1742-7843.2006.pto_368.xPubMedGoogle ScholarCrossref
McCormick  N, Lacaille  D, Bhole  V, Avina-Zubieta  JA.  Validity of heart failure diagnoses in administrative databases: a systematic review and meta-analysis.  PLoS One. 2014;9(8):e104519. doi:10.1371/journal.pone.0104519PubMedGoogle Scholar
Ouwerkerk  W, Voors  AA, Zwinderman  AH.  Factors influencing the predictive power of models for predicting mortality and/or heart failure hospitalization in patients with heart failure.  JACC Heart Fail. 2014;2(5):429-436. doi:10.1016/j.jchf.2014.04.006PubMedGoogle ScholarCrossref
Kim  DH, Schneeweiss  S, Glynn  RJ, Lipsitz  LA, Rockwood  K, Avorn  J.  Measuring frailty in Medicare data: development and validation of a claims-based frailty index.  J Gerontol A Biol Sci Med Sci. 2018;73(7):980-987. doi:10.1093/gerona/glx229PubMedGoogle ScholarCrossref
Bonito  A, Bann  C, Eicheldinger  C, Carpenter  L. Creation of New Race-Ethnicity Codes and Socioeconomic Status (SES) Indicators for Medicare Beneficiaries: Final Report, Sub-Task 2. Rockville, MD: Agency for Healthcare Research and Quality; January 2008. AHRQ publication 08-0029-EF.
Gopalakrishnan  C, Gagne  JJ, Sarpatwari  A,  et al.  Evaluation of socioeconomic status indicators for confounding adjustment in observational studies of medication use.  Clin Pharmacol Ther. 2019;105(6):1513-1521. doi:10.1002/cpt.1348PubMedGoogle ScholarCrossref
Steyerberg  EW, van Veen  M.  Imputation is beneficial for handling missing data in predictive models.  J Clin Epidemiol. 2007;60(9):979. doi:10.1016/j.jclinepi.2007.03.003PubMedGoogle ScholarCrossref
Sterne  JA, White  IR, Carlin  JB,  et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.  BMJ. 2009;338:b2393. doi:10.1136/bmj.b2393PubMedGoogle ScholarCrossref
Austin  PC.  Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research.  Commun Stat Simul Comput. 2009;38(6):1228-1234. doi:10.1080/03610910902859574Google ScholarCrossref
Chand  S. On tuning parameter selection of LASSO-type methods—a Monte Carlo study. Paper presented at: Applied Sciences and Technology (IBCAST) 2012 9th International Bhurban Conference; January 9-12, 2012; Islamabad, Pakistan. https://ieeexplore.ieee.org/document/6177542. Accessed January 31, 2018.
Zhang  Y, Li  R, Tsai  C-L.  Regularization parameter selections via generalized information criterion.  J Am Stat Assoc. 2010;105(489):312-323. doi:10.1198/jasa.2009.tm08013PubMedGoogle ScholarCrossref
Oyeyemi  GM, Ogunjobi  EO, Folorunsho  AI.  On performance of shrinkage methods—a Monte Carlo study.  Int J Stat Appl. 2015;5(2):72-76. doi:10.5923/j.statistics.20150502.04Google Scholar
Hothorn  T, Hornik  K, Zeileis  A.  Unbiased recursive partitioning: a conditional inference framework.  J Comput Graph Stat. 2006;15(3):651-674. doi:10.1198/106186006X133933Google ScholarCrossref
Breiman  L.  Random forests.  Mach Learn. 2001;45(1):5-32. doi:10.1023/A:1010933404324Google ScholarCrossref
Hastie  T, Tibshirani  R, Friedman  J. Boosting and additive trees. In:  The Elements of Statistical Learning. New York, NY: Springer; 2009:337-387. doi:10.1007/978-0-387-84858-7_10
Steyerberg  EW, Vickers  AJ, Cook  NR,  et al.  Assessing the performance of prediction models: a framework for traditional and novel measures.  Epidemiology. 2010;21(1):128-138. doi:10.1097/EDE.0b013e3181c30fb2PubMedGoogle ScholarCrossref
DeLong  ER, DeLong  DM, Clarke-Pearson  DL.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.  Biometrics. 1988;44(3):837-845. doi:10.2307/2531595PubMedGoogle ScholarCrossref
Saito  T, Rehmsmeier  M.  The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.  PLoS One. 2015;10(3):e0118432. doi:10.1371/journal.pone.0118432PubMedGoogle Scholar
Vickers  AJ, Elkin  EB.  Decision curve analysis: a novel method for evaluating prediction models.  Med Decis Making. 2006;26(6):565-574. doi:10.1177/0272989X06295361PubMedGoogle ScholarCrossref
Christodoulou  E, Ma  J, Collins  GS, Steyerberg  EW, Verbakel  JY, Van Calster  B.  A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.  J Clin Epidemiol. 2019;110:12-22. doi:10.1016/j.jclinepi.2019.02.004PubMedGoogle ScholarCrossref
Rajkomar  A, Oren  E, Chen  K,  et al.  Scalable and accurate deep learning with electronic health records.  NPJ Digit Med. 2018;1(1):18. doi:10.1038/s41746-018-0029-1PubMedGoogle ScholarCrossref
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    Original Investigation
    January 10, 2020

    Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes

    Author Affiliations
    • 1Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
    • 2Heart and Vascular Center, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts
    • 3Market Access, Bayer AG, Wuppertal, Germany
    JAMA Netw Open. 2020;3(1):e1918962. doi:10.1001/jamanetworkopen.2019.18962
    Key Points español 中文 (chinese)

    Question  Can prediction of patient outcomes in heart failure based on routinely collected claims data be improved with machine learning methods and incorporating linked electronic medical records?

    Findings  In this prognostic study including records on 9502 patients, machine learning methods offered only limited improvement over logistic regression in predicting key outcomes in heart failure based on administrative claims. Inclusion of additional predictors from electronic medical records improved prediction for mortality, heart failure hospitalization, and loss in home days but not for high cost.

    Meaning  Models based on claims-only predictors may achieve modest discrimination and accuracy in prediction of key patient outcomes in heart failure, and machine learning approaches and incorporation of additional predictors from electronic medical records may offer some improvement in risk prediction of select outcomes.


    Importance  Accurate risk stratification of patients with heart failure (HF) is critical to deploy targeted interventions aimed at improving patients’ quality of life and outcomes.

    Objectives  To compare machine learning approaches with traditional logistic regression in predicting key outcomes in patients with HF and evaluate the added value of augmenting claims-based predictive models with electronic medical record (EMR)–derived information.

    Design, Setting, and Participants  A prognostic study with a 1-year follow-up period was conducted including 9502 Medicare-enrolled patients with HF from 2 health care provider networks in Boston, Massachusetts (“providers” includes physicians, clinicians, other health care professionals, and their institutions that comprise the networks). The study was performed from January 1, 2007, to December 31, 2014; data were analyzed from January 1 to December 31, 2018.

    Main Outcomes and Measures  All-cause mortality, HF hospitalization, top cost decile, and home days loss greater than 25% were modeled using logistic regression, least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient-boosted modeling (GBM). All models were trained using data from network 1 and tested in network 2. After selecting the most efficient modeling approach based on discrimination, Brier score, and calibration, area under precision-recall curves (AUPRCs) and net benefit estimates from decision curves were calculated to focus on the differences when using claims-only vs claims + EMR predictors.

    Results  A total of 9502 patients with HF with a mean (SD) age of 78 (8) years were included: 6113 from network 1 (training set) and 3389 from network 2 (testing set). Gradient-boosted modeling consistently provided the highest discrimination, lowest Brier scores, and good calibration across all 4 outcomes; however, logistic regression had generally similar performance (C statistics for logistic regression based on claims-only predictors: mortality, 0.724; 95% CI, 0.705-0.744; HF hospitalization, 0.707; 95% CI, 0.676-0.737; high cost, 0.734; 95% CI, 0.703-0.764; and home days loss claims only, 0.781; 95% CI, 0.764-0.798; C statistics for GBM: mortality, 0.727; 95% CI, 0.708-0.747; HF hospitalization, 0.745; 95% CI, 0.718-0.772; high cost, 0.733; 95% CI, 0.703-0.763; and home days loss, 0.790; 95% CI, 0.773-0.807). Higher AUPRCs were obtained for claims + EMR vs claims-only GBMs predicting mortality (0.484 vs 0.423), HF hospitalization (0.413 vs 0.403), and home time loss (0.575 vs 0.521) but not cost (0.249 vs 0.252). The net benefit for claims + EMR vs claims-only GBMs was higher at various threshold probabilities for mortality and home time loss outcomes but similar for the other 2 outcomes.

    Conclusions and Relevance  Machine learning methods offered only limited improvement over traditional logistic regression in predicting key HF outcomes. Inclusion of additional predictors from EMRs to claims-based models appeared to improve prediction for some, but not all, outcomes.