[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address 34.204.191.0. Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    Original Investigation
    Health Informatics
    January 17, 2020

    Assessment of a Machine Learning Model Applied to Harmonized Electronic Health Record Data for the Prediction of Incident Atrial Fibrillation

    Author Affiliations
    • 1Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora
    • 2Colorado School of Public Health, Department of Biostatics and Informatics, University of Colorado Denver, Aurora
    • 3Children’s Hospital Colorado, Cancer Center Biostatistics Core, Department of Pediatrics, University of Colorado, Aurora
    • 4Division of Cardiology and Cardiac Electrophysiology, University of Colorado School of Medicine, Aurora
    JAMA Netw Open. 2020;3(1):e1919396. doi:10.1001/jamanetworkopen.2019.19396
    Key Points español 中文 (chinese)

    Question  Can machine learning approaches applied to harmonized electronic health record data identify patients at risk of 6-month incident atrial fibrillation with greater accuracy than standard risk factors?

    Findings  This diagnostic study used electronic health record data from more than 2 million individuals to classify patients diagnosed with incident atrial fibrillation within a 6-month period, comparing several approaches to data management. A strategy that included the use of the 200 most common electronic health record features, random oversampling, and a single-layer neural network provided optimal classification of 6-month incident atrial fibrillation; however, this model was only marginally better than a logistic regression model with age, sex, and known risk factors for atrial fibrillation.

    Meaning  Machine learning approaches applied to electronic health record data hold promise for predicting clinical outcomes, such as incident atrial fibrillation, but this model was not substantially more accurate than a logistic regression model with standard risk factors.

    Abstract

    Importance  Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, and its early detection could lead to significant improvements in outcomes through the appropriate prescription of anticoagulation medication. Although a variety of methods exist for screening for AF, a targeted approach, which requires an efficient method for identifying patients at risk, would be preferred.

    Objective  To examine machine learning approaches applied to electronic health record data that have been harmonized to the Observational Medical Outcomes Partnership Common Data Model for identifying risk of AF.

    Design, Setting, and Participants  This diagnostic study used data from 2 252 219 individuals cared for in the UCHealth hospital system, which comprises 3 large hospitals in Colorado, from January 1, 2011, to October 1, 2018. Initial analysis was performed in December 2018; follow-up analysis was performed in July 2019.

    Exposures  All Observational Medical Outcomes Partnership Common Data Model–harmonized electronic health record features, including diagnoses, procedures, medications, age, and sex.

    Main Outcomes and Measures  Classification of incident AF in designated 6-month intervals, adjudicated retrospectively, based on area under the receiver operating characteristic curve and F1 statistic.

    Results  Of 2 252 219 individuals (1 225 533 [54.4%] women; mean [SD] age, 42.9 [22.3] years), 28 036 (1.2%) developed incident AF during a designated 6-month interval. The machine learning model that used the 200 most common electronic health record features, including age and sex, and random oversampling with a single-layer, fully connected neural network provided the optimal prediction of 6-month incident AF, with an area under the receiver operating characteristic curve of 0.800 and an F1 score of 0.110. This model performed only slightly better than a more basic logistic regression model composed of known clinical risk factors for AF, which had an area under the receiver operating characteristic curve of 0.794 and an F1 score of 0.079.

    Conclusions and Relevance  Machine learning approaches to electronic health record data offer a promising method for improving risk prediction for incident AF, but more work is needed to show improvement beyond standard risk factors.

    ×