[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    Views 1,561
    Citations 0
    Original Investigation
    Health Informatics
    July 17, 2020

    Development and Validation of a Risk Stratification Model Using Disease Severity Hierarchy for Mortality or Major Cardiovascular Event

    Author Affiliations
    • 1Division of Digital Health Science, Department of Health Science Research, Mayo Clinic, Rochester, Minnesota
    • 2The Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota
    • 3Division of General Internal Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota
    • 4Division of Healthcare Policy and Research, Department of Health Science Research, Mayo Clinic, Rochester, Minnesota
    • 5University of Minnesota School of Nursing, Minneapolis
    • 6Department of Computer Science and Engineering, University of Minnesota, Minneapolis
    • 7Institute for Health Informatics, University of Minnesota, Minneapolis
    • 8Department of Medicine, University of Minnesota, Minneapolis
    JAMA Netw Open. 2020;3(7):e208270. doi:10.1001/jamanetworkopen.2020.8270
    Key Points español 中文 (chinese)

    Question  Does incorporating clinical domain knowledge regarding diseases, disease severity, and treatment pathways into machine learning improve risk stratification?

    Findings  In this retrospective cohort study involving 51 969 patients, a new representation of patient data was developed and used to train machine learning models to predict mortality and major cardiovascular events. Results showed substantial improvement in prediction performance compared with traditional patient data representation methods.

    Meaning  The findings of this study suggest that methods that can extract and represent the clinical knowledge contained in electronic medical records should be incorporated into machine learning models for use in clinical decision support systems.


    Importance  Clinical domain knowledge about diseases and their comorbidities, severity, treatment pathways, and outcomes can facilitate diagnosis, enhance preventive strategies, and help create smart evidence-based practice guidelines.

    Objective  To introduce a new representation of patient data called disease severity hierarchy that leverages domain knowledge in a nested fashion to create subpopulations that share increasing amounts of clinical details suitable for risk prediction.

    Design, Setting, and Participants  This retrospective cohort study included 51 969 patients aged 45 to 85 years, with 10 674 patients who received primary care at the Mayo Clinic between January 2004 and December 2015 in the training cohort and 41 295 patients who received primary care at Fairview Health Services from January 2010 to December 2017 in the validation cohort. Data were analyzed from May 2018 to December 2019.

    Main Outcomes and Measures  Several binary classification measures, including the area under the receiver operating characteristic curve (AUC), Gini score, sensitivity, and positive predictive value, were used to evaluate models predicting all-cause mortality and major cardiovascular events at ages 60, 65, 75, and 80 years.

    Results  The mean (SD) age and proportions of women and white individuals were 59.4 (10.8) years, 6324 (59.3%) and 9804 (91.9%), respectively, in the training cohort and 57.4 (7.9) years, 21 975 (53.1%), and 37 653 (91.2%), respectively, in the validation cohort. During follow-up, 945 patients (8.9%) in the training cohort died, while 787 (7.4%) had major cardiovascular events. Models using the new representation achieved AUCs for predicting death in the training cohort at ages 60, 65, 75, and 80 years of 0.96 (95% CI, 0.94-0.97), 0.96 (95% CI, 0.95-0.98), 0.97 (95% CI, 0.96-0.98), and 0.98 (95% CI, 0.98-0.99), respectively, while standard methods achieved modest AUCs of 0.67 (95% CI, 0.55-0.80), 0.66 (95% CI, 0.56-0.79), 0.64 (95% CI, 0.57-0.71), and 0.63 (95% CI, 0.54-0.70), respectively.

    Conclusions and Relevance  In this study, the proposed patient data representation accurately predicted the age at which a patient was at risk of dying or developing major cardiovascular events substantially better than standard methods. The representation uses known relationships contained in electronic health records to capture disease severity in a natural and clinically meaningful way. Furthermore, it is expressive and interpretable. This novel patient representation can help to support critical decision-making, develop smart guidelines, and enhance health care and disease management by helping to identify patients with high risk.