Comparison of Machine Learning Methods With National Cardiovascular Data Registry Models for Prediction of Risk of Bleeding After Percutaneous Coronary Intervention | Acute Coronary Syndromes | JAMA Network Open | JAMA Network
[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
[Skip to Content Landing]
Figure.  Plots for the Existing Full Model and the Blended Model
Plots for the Existing Full Model and the Blended Model

The blended model demonstrated a closer calibration than the existing full model. A, Decile-based calibration plots are calculated from the 5-fold cross-validation showing stable model calibration. B, Continuous calibration plots with 95% CIs (shaded areas) are shown for the 2 models.

Table 1.  Patient Characteristicsa
Patient Characteristicsa
Table 2.  Names of Variables, Descriptions, and Use in Prior Models
Names of Variables, Descriptions, and Use in Prior Models
Table 3.  C Statistics of 5-Fold Cross-validation Results for the Existing Simplified Risk Score and the Blended Model
C Statistics of 5-Fold Cross-validation Results for the Existing Simplified Risk Score and the Blended Model
Table 4.  Prospective Predictions and Changes in 5-Fold Cross-validation for the Existing Simplified Risk Score and the Blended Modela
Prospective Predictions and Changes in 5-Fold Cross-validation for the Existing Simplified Risk Score and the Blended Modela
1.
Mehta  SK, Frutkin  AD, Lindsey  JB,  et al; National Cardiovascular Data Registry.  Bleeding in patients undergoing percutaneous coronary intervention: the development of a clinical risk algorithm from the National Cardiovascular Data Registry.  Circ Cardiovasc Interv. 2009;2(3):222-229. doi:10.1161/CIRCINTERVENTIONS.108.846741PubMedGoogle ScholarCrossref
2.
Rao  SV, Kaul  PR, Liao  L,  et al.  Association between bleeding, blood transfusion, and costs among patients with non–ST-segment elevation acute coronary syndromes.  Am Heart J. 2008;155(2):369-374. doi:10.1016/j.ahj.2007.10.014PubMedGoogle ScholarCrossref
3.
Rao  SV, McCoy  LA, Spertus  JA,  et al.  An updated bleeding model to predict the risk of post-procedure bleeding among patients undergoing percutaneous coronary intervention: a report using an expanded bleeding definition from the National Cardiovascular Data Registry CathPCI Registry.  JACC Cardiovasc Interv. 2013;6(9):897-904. doi:10.1016/j.jcin.2013.04.016PubMedGoogle ScholarCrossref
4.
Baklanov  DV, Kaltenbach  LA, Marso  SP,  et al.  The prevalence and outcomes of transradial percutaneous coronary intervention for ST-segment elevation myocardial infarction: analysis from the National Cardiovascular Data Registry (2007 to 2011).  J Am Coll Cardiol. 2013;61(4):420-426. doi:10.1016/j.jacc.2012.10.032PubMedGoogle ScholarCrossref
5.
Montalescot  G, Salette  G, Steg  G,  et al.  Development and validation of a bleeding risk model for patients undergoing elective percutaneous coronary intervention.  Int J Cardiol. 2011;150(1):79-83. doi:10.1016/j.ijcard.2010.02.077PubMedGoogle ScholarCrossref
6.
Mortazavi  BJ, Downing  NS, Bucholz  EM,  et al.  Analysis of machine learning techniques for heart failure readmissions.  Circ Cardiovasc Qual Outcomes. 2016;9(6):629-640. doi:10.1161/CIRCOUTCOMES.116.003039PubMedGoogle ScholarCrossref
7.
Brindis  RG, Fitzgerald  S, Anderson  HV, Shaw  RE, Weintraub  WS, Williams  JF.  The American College of Cardiology–National Cardiovascular Data Registry (ACC-NCDR): building a national clinical data repository.  J Am Coll Cardiol. 2001;37(8):2240-2245. doi:10.1016/S0735-1097(01)01372-9PubMedGoogle ScholarCrossref
8.
Messenger  JC, Ho  KK, Young  CH,  et al; NCDR Science and Quality Oversight Committee Data Quality Workgroup.  The National Cardiovascular Data Registry (NCDR) Data Quality Brief: the NCDR Data Quality Program in 2012.  J Am Coll Cardiol. 2012;60(16):1484-1488. doi:10.1016/j.jacc.2012.07.020PubMedGoogle ScholarCrossref
9.
Friedman  J, Hastie  T, Tibshirani  R. The Elements of Statistical Learning. Vol 1. Berlin, Germany: Springer; 2001. Springer Series in Statistics.
10.
Chen  T, Guestrin  C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13-17, 2016; San Francisco, CA.
11.
Sokolova  M, Lapalme  G.  A systematic analysis of performance measures for classification tasks.  Inf Process Manage. 2009;45(4):427-437. doi:10.1016/j.ipm.2009.03.002Google ScholarCrossref
12.
Wood  SN.  Generalized Additive Models: An Introduction With R. Boca Raton, FL: Chapman & Hall/CRC; 2006. doi:10.1201/9781420010404
13.
Friedman  J, Hastie  T, Tibshirani  R.  Regularization paths for generalized linear models via coordinate descent.  J Stat Softw. 2010;33(1):1-22. doi:10.18637/jss.v033.i01PubMedGoogle ScholarCrossref
14.
Robin  X, Turck  N, Hainard  A,  et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves.  BMC Bioinformatics. 2011;12(1):77. doi:10.1186/1471-2105-12-77PubMedGoogle ScholarCrossref
15.
Wood  SN.  Generalized Additive Models: An Introduction With R. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC; 2017. doi:10.1201/9781315370279
16.
Zeileis  A.  Econometric computing with HC and HAC covariance matrix estimators.  J Stat Softw. 2004;11(10). doi:10.18637/jss.v011.i10Google Scholar
17.
Siegert  S, Bhend  J, Kroener  I, De Felice  M. Package “SpecsVerification.” https://cran.r-project.org/web/packages/SpecsVerification/SpecsVerification.pdf. Published March 7, 2017. Accessed June 7, 2017.
18.
GitHub. Source code for work on modeling major bleeding risk in post-PCI patients from the NCDR CathPCI Registry. https://github.com/bobakm/NCDR_CathPCI_MajorBleed_Public. Accessed May 28, 2019.
19.
Zhang  T.  On the consistency of feature selection using greedy least squares regression.  J Mach Learn Res. 2009;10:555-568. http://www.jmlr.org/papers/volume10/zhang09a/zhang09a.pdf. Accessed November 11, 2017.Google Scholar
20.
Elenberg  ER, Khanna  R, Dimakis  AG, Negahban  S. Restricted strong convexity implies weak submodularity. https://arxiv.org/abs/1612.00804. Last revised October 12, 2017. Accessed November 11, 2017.
21.
Spertus  JA, Bach  R, Bethea  C,  et al.  Improving the process of informed consent for percutaneous coronary intervention: patient outcomes from the Patient Risk Information Services Manager (ePRISM) study.  Am Heart J. 2015;169(2):234-241.e1.PubMedGoogle ScholarCrossref
Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    1 Comment for this article
    Better models, application needed
    Steven Bradley, MD, MPH | JAMA Network Open Associate Editor
    Wonderful use of the robust clinical data of NCDR to develop more accurate methods of risk prediction for a common complication of PCI. Now to apply this in clinical care to improve patient outcomes.
    CONFLICT OF INTEREST: None Reported
    Original Investigation
    Cardiology
    July 10, 2019

    Comparison of Machine Learning Methods With National Cardiovascular Data Registry Models for Prediction of Risk of Bleeding After Percutaneous Coronary Intervention

    Author Affiliations
    • 1Department of Computer Science and Engineering, Texas A&M University, College Station
    • 2Center for Remote Health Technologies and Systems, Texas A&M University, College Station
    • 3Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
    • 4Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, Connecticut
    • 5Division of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
    • 6Now with the Department of Pediatrics, Boston Children’s Hospital, Boston, Massachusetts
    • 7Division of Cardiology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora
    • 8Division of Cardiology, Department of Medicine, California Pacific Medical Center, Sutter Health, San Francisco
    • 9Department of Statistics, Yale University, New Haven, Connecticut
    • 10Department of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut
    JAMA Netw Open. 2019;2(7):e196835. doi:10.1001/jamanetworkopen.2019.6835
    Key Points español 中文 (chinese)

    Question  Can machine learning techniques, bolstered by better selection of variables, improve prediction of major bleeding after percutaneous coronary intervention (PCI)?

    Findings  In this comparative effectiveness study that modeled more than 3 million PCI procedures, machine learning techniques improved the prediction of post-PCI major bleeding to a C statistic of 0.82 compared with a C statistic of 0.78 from the existing model. Machine learning techniques improved the identification of an additional 3.7% of bleeding cases and 1.0% of nonbleeding cases.

    Meaning  By leveraging more complex, raw variables, machine learning techniques are better able to identify patients at risk for major bleeding and who can benefit from bleeding avoidance therapies.

    Abstract

    Importance  Better prediction of major bleeding after percutaneous coronary intervention (PCI) may improve clinical decisions aimed to reduce bleeding risk. Machine learning techniques, bolstered by better selection of variables, hold promise for enhancing prediction.

    Objective  To determine whether machine learning techniques better predict post-PCI major bleeding compared with the existing National Cardiovascular Data Registry (NCDR) models.

    Design, Setting, and Participants  This comparative effectiveness study used the NCDR CathPCI Registry data version 4.4 (July 1, 2009, to April 1, 2015), machine learning techniques were used (logistic regression with lasso regularization and gradient descent boosting [XGBoost, version 0.71.2]), and output was then compared with the existing simplified risk score and full NCDR models. The existing models were recreated, and then performance was evaluated through additional techniques and variables in a 5-fold cross-validation in analysis conducted from October 1, 2015, to October 27, 2017. The setting was retrospective modeling of a nationwide clinical registry of PCI. Participants were all patients undergoing PCI. Percutaneous coronary intervention procedures were excluded if they were not the index PCI of admission, if the hospital site had missing outcomes measures, or if the patient underwent subsequent coronary artery bypass grafting.

    Exposures  Clinical variables available at admission and diagnostic coronary angiography data were used to determine the severity and complexity of presentation.

    Main Outcomes and Measures  The main outcome was in-hospital major bleeding within 72 hours after PCI. Results were evaluated by comparing C statistics, calibration, and decision threshold–based metrics, including the F score (harmonic mean of positive predictive value and sensitivity) and the false discovery rate.

    Results  The post-PCI major bleeding rate among 3 316 465 procedures (patients’ median age, 65 years; interquartile range, 56-73 years; 68.1% male) was 4.5%. The existing full model achieved a mean C statistic of 0.78 (95% CI, 0.78-0.78). The use of XGBoost and full range of selected variables achieved a C statistic of 0.82 (95% CI, 0.82-0.82), with an F score of 0.31 (95% CI, 0.30-0.31). XGBoost correctly identified an additional 3.7% of cases identified as high risk who experienced a bleeding event and an overall improvement of 1.0% of cases identified as low risk who did not experience a bleeding event. The data-driven decision threshold helped improve the false discovery rate of the existing techniques. The existing simplified risk score model improved the false discovery rate from more than 90% to 78.7%. Modifying the model and the data decision threshold improved this rate from 78.7% to 73.4%.

    Conclusions and Relevance  Machine learning techniques improved the prediction of major bleeding after PCI. These techniques may help to better identify patients who would benefit most from strategies to reduce bleeding risk.

    Introduction

    Major bleeding, a common complication after percutaneous coronary intervention (PCI), is associated with increased mortality risk, other periprocedural complications, and greater cost.1-5 Several risk prediction models and bleeding prevention strategies have been developed to identify patients who are at highest risk for post-PCI bleeding and may benefit from bleeding avoidance therapies. In particular, 2 models from the National Cardiovascular Data Registry (NCDR) have been widely used for risk stratification and quality improvement initiatives.1,3 Rao et al3 developed and then updated the following 2 NCDR bleeding risk models: (1) a 31-variable existing full PCI model that uses 23 patient characteristic variables (available at presentation) and 8 procedural characteristics focused on the coronary anatomy and culprit lesion to aid in clinical decisions about bleeding avoidance strategies, and (2) an existing simplified risk score model using 10 pre-PCI variables selected from the 31 variables chosen for the existing full model. Many of the 31 variables were presented as dichotomous variables extracted from continuous variables. The existing full model was derived using logistic regression with backward elimination of the available NCDR CathPCI Registry data.3

    Although these NCDR bleeding risk models have performed well in validation cohorts and in clinical practice, the current model discrimination of 0.77 leaves room for an improvement in the definition of high-risk patients. While decision thresholds can be varied to determine treatment decision paradigms, the existing models use low thresholds for high risk, resulting in classifying an abundant number of nonbleeding cases as high risk. Therefore, we examined the improvement in model discrimination.

    Machine learning techniques may enhance risk prediction because they allow nonlinear associations and are better suited to extracting additional information from continuous variables (eg, preprocedural hemoglobin continuous value rather than 2 variables of preprocedural hemoglobin [≤13 and >13 g/dL]) (to convert hemoglobin level to grams per liter, multiply by 10.0). Prior work shows that machine learning techniques can boost clinical prediction if the appropriate data are available.6 If machine learning techniques substantially improve risk prediction, they may be able to improve clinical decision making before and during PCI. The American College of Cardiology uses the existing model,3 indicating support for improvements of such a model that can be used for addressing clinical decision making and be used for quality improvement by improving risk adjustment.

    This work investigates a direct comparison of modeling techniques with the existing full PCI model and the existing simplified risk score. We conducted 4 data experiments to determine if machine learning algorithms can meaningfully improve risk prediction and whether the preparing of the data as dichotomous variables altered the ability to improve the models. These experiments included an analysis of (1) the association machine learning techniques have with the existing simplified risk score, (2) the association the machine learning techniques have with the existing full model with the same selected variables, (3) the association the machine learning techniques have with the existing full model when we allow the model to examine the full dynamic range of the variables (blended model), and (4) our understanding of the importance of variables in the best-performing (blended) model. We purposely constrained this study to variables in the current model and the underlying data that were used to produce those variables.

    Methods
    Study Population

    The analysis used data from version 4.4 of the NCDR CathPCI Registry,3,7 which includes PCIs performed from July 1, 2009, to April 1, 2015. This registry, cosponsored by the American College of Cardiology and the Society for Cardiovascular Angiography and Interventions, includes data on patient characteristics, clinical features, angiographic and procedural details, and in-hospital outcomes on interventional cases at 1561 participating institutions. Data quality is monitored through extensive data abstraction training, site feedback reports, independent auditing, and data validation, reducing the likelihood of large outliers or high rates of missing data.4,8 We conducted this analysis from October 1, 2015, to October 27, 2017, on a deidentified extract in an institutional review board–approved study (Yale Human Research Protection Program); requirement for informed consent was waived owing to the use of deidentified data. The setting was retrospective modeling of a nationwide clinical registry of PCI. Participants were all patients undergoing PCI. Percutaneous coronary intervention procedures were excluded if they were not the index PCI of admission, if the hospital site had missing outcomes measures, or if the patient underwent subsequent coronary artery bypass grafting. This work follows the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines.

    Our initial sample used inclusion and exclusion criteria for the existing NCDR bleeding risk model (eFigure 1 in the Supplement).3 This study population excluded patient admissions that represented readmissions, patients who died in the hospital the same day as the procedure (even if they had a bleeding event before death, as was constrained in the prior work as well), and patients who had missing bleeding information.3 We only included the first PCI procedure within the same episode because we have unique coded identifiers per admission and procedure identifiers linked to this. If a patient had a second PCI in a different admission, we treated this as an independent procedure because we did not have patient identifiers. We added an exclusion for patients who underwent coronary artery bypass grafting (CABG) because the high risk of bleeding after CABG may obscure the bleeding risk attributable to PCI alone; these cases were not excluded in the primary prior model.5 Rao et al3 also evaluated a cohort that excluded patients who underwent CABG, increasing discrimination to 0.78 (from 0.77) for the existing full model and to 0.76 (from 0.75) for the existing simplified risk score.

    Variable Set Creation

    We recreated the existing full NCDR bleeding risk model variable set (31 variables updated with the additional participant data samples) and 10-variable existing simplified risk score variable set. In addition, we created 2 extensions to evaluate changes in performance with each variable. First, the blended model variable set includes the 31 variables from the full NCDR bleeding risk model, plus 28 additional variables that were used to derive the dichotomous variables used in the existing full model. While this would create colinear variables, the modeling techniques used are able to select variables while building models, countering this occurrence. Second is the top features variable set, in which the top-ranked variables in the best-performing model are incrementally added to measure the discrimination of the variables.

    Derivation and Validation

    We created derivation and validation cohorts using stratified 5-fold cross-validation. Each variable set was divided randomly into 5 equal subsets, preserving the same event rate in each subset, by first randomly dividing bleeding cases and then nonbleeding cases. Each bleeding subset was then paired with 1 nonbleeding subset. The derivation cohort combined 4 (80%) of the subsets; the remaining subset (20%) was reserved as a validation set. This process was repeated 5 times, such that each of the subsets served as the validation set. While results are provided for each patient, models are fairly compared as being trained on 80% of the data and tested on the unseen 20% 5 separate times, and each case has a single risk estimate produced.

    Variable Selection and Imputation

    All 59 variables considered in the blended model data set were collected from the NCDR CathPCI Registry. For binary variables, missing values were redefined as “no” values (eg, history of hypertension was considered “yes” only if it was explicitly recorded). Categorical variables, such as the New York Heart Association class I through IV, which lacked a “no” category, were coded as 1, 2, 3, and 4; an additional category of 0 was added, indicating patients for whom a value was not recorded. Therefore, category 0 indicates no heart failure, and categories 1 through 4 correspond to the New York Heart Association classification. For continuous variables, missing values were imputed via a single median imputation, as was done in the existing full model.3 All variables had less than 50% missing, with preprocedural left ventricular ejection fraction having the highest missing rate (29.9%), followed by preprocedural hemoglobin (6.6%), glomerular filtration rate (5.6%), preprocedural creatinine (5.6%), and all other variables having little or no missing data (<1%).3

    Outcome Definition

    Consistent with the existing NCDR CathPCI Registry bleeding risk model, post-PCI major bleeding was defined as any post-PCI, predischarge major bleeding within 72 hours (eAppendix in the Supplement).3 The NCDR data and outcomes, audited and derived from medical records, increase reliability by allowing only hospitals that pass quality checks.8

    Model Development

    Two methods were used to train models in this analysis. First, logistic regression with lasso regularization is a statistical technique selected to show the value of changing automated variable and model selection techniques.9 Second, gradient descent boosting was used to demonstrate the power of machine learning techniques that account for higher-order, nonlinear interactions, particularly on a variety of data types, including binary and continuous variables, which selects variables while training. Gradient descent boosting creates a series of “boosted” decision trees of weaker individual predictors to create stronger final predictions, permitting analysis of higher-order interactions with varying variables and types.9 The particular method of gradient descent boosting was extreme gradient boosting (XGBoost, version 0.71.2; xgboost developers).10 The final model used 1000 trees, a learning rate of 0.1, and a maximum depth of each tree of 6, and it was trained with an objective functioned aimed at minimizing errors similar to logistic regression for binary classification (bleed vs nonbleed). These models, hyperparameters, and how they were selected are further described in the eAppendix in the Supplement.

    Model Evaluation

    Receiver operating characteristic curves were used to estimate model discrimination by the C statistic, and the 5-fold cross-validation provides 5 C statistics that allow for a mean C statistic and 95% CI to be calculated. However, because C statistics do not indicate the ideal decision threshold or give patient-specific prediction, we evaluated each model’s patient-specific predictive ability by comparing the correctly and incorrectly identified bleeding cases and nonbleeding cases at a data-driven decision threshold. From these predictions, we calculated the number of true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) results. The positive predictive rate (TP / [TP + FP]) and sensitivity (TP / [TP + FN]) were combined to yield the F score (harmonic mean of positive predictive value and sensitivity). The optimal threshold for each model was defined as that which maximized the F score along the receiver operating characteristic curve.11

    We first assessed the associations of machine learning techniques on the existing variable sets. We then extended the experiments to the blended model variable set (and top features set) to evaluate the gains from the variables, as well as methods that can leverage these variables. Using the best-performing model, we then evaluated the model calibration, analyzed the importance of variables, and assessed prediction performance at the data-driven decision thresholds. We used calibration plots to plot the observed bleeding rate for each decile of predicted risk, compared by the Brier score, and with continuous general additive models.12

    In supplementary analyses (eAppendix in the Supplement), we sought to better understand the updated samples available with the longer date range. Specifically, because our date range does not match that of the prior work, we split analyses by year, for cases considered in the existing models vs newly collected cases, to confirm that changes in bleeding rates do not alter model discrimination (they did not). In addition, we performed supplementary analyses to ensure that our top features selection technique was a fair selector of variables, including avoiding overfitting and data leakage, and includes a discussion of why forward stepwise selection was a fair choice for comparison.

    Software Implementation

    All analyses were conducted in R (version 3.3.2; R Project for Statistical Computing), with GLMNET used for lasso regularization,13 XGBoost for gradient descent boosting10 and pROC for C statistics.14 We used mgcv and sandwich for the continuous calibration curves, and SpecsVerification was used for the Brier score.15-17 Source code is available online.18

    Assessment of Predictor Variables

    We evaluated the importance of each variable in the best-trained model (XGBoost) using a variety of metrics.10 Each training iteration might present a different ordering of variables; therefore, the importance of variables was calculated from a model trained on all the data. We took these selected variables, listed in order of importance, and added each variable in the 5-fold cross-validation data in a forward stepwise selection to identify the mean incremental C statistic. In high-dimensional problems, backward selection techniques may lend themselves to solutions altered by greater noise.19 Instead, we focused on forward selection techniques because they have strong theoretical guarantees19,20 and excellent empirical behavior.9 The eAppendix in the Supplement includes details on ensuring a fair evaluation. Similar C statistics assure that we did not overfit in this additional analysis.

    Results
    Patient Characteristics

    This study included 3 316 465 PCI procedures (patients’ median age, 65 years; interquartile range, 56-73 years; 68.1% male) performed at 1538 sites (eFigure 1 in the Supplement). Major bleeding occurred in 4.5% of patients after PCI. Baseline patient characteristics by bleeding status are listed in Table 1. Candidate variables, their definitions and data types, and sources are listed in Table 2.

    Experiment 1: Machine Learning Techniques and the Existing Simplified Risk Score

    Results for the existing simplified risk score and the machine learning techniques are summarized in Table 3. The existing simplified risk score achieved a mean C statistic of 0.77 (95% CI, 0.77-0.77), similar to the 0.76 reported by Rao et al3 after excluding CABG cases. Adding lasso regularization did not alter the model’s discrimination. Using the same 10 variables, XGBoost improved discrimination of the mean C statistic to 0.81 (95% CI, 0.80-0.81).

    Table 4 lists the prediction results of the existing simplified risk score variable set and the existing simplified risk score variable set with XGBoost. Using the existing simplified risk score threshold for high risk of 65.0 points (6.5% risk),3 we correctly identified 105 316 cases as high risk for bleeding who experienced an event and 2 208 569 cases as low risk for bleeding who did not experience an event. However, we classified 44 408 cases as low risk for bleeding who experienced an event and identified 958 172 cases as high risk for bleeding who did not experience an event, yielding a false discovery rate of 90.1% and a positive predictive value of 9.9%. Using a data-driven threshold of 96.4 points (between 14.9% and 17.0% risk) correctly identified 47 445 bleeding cases and 2 990 509 nonbleeding cases (21.8% of cases, or 21 833 corrected cases per 100 000 PCI cases), yielding a false discovery rate of 78.8% and a positive predictive value of 21.2%. The 10-variable existing simplified risk score variable set modeled with XGBoost, selecting a data-driven risk threshold of 15.1%, correctly identified 52 768 cases as high risk for bleed who experienced an event and 3 019 006 cases as low risk for bleeding who did not experience an event (22.9% of cases, or 22 852 corrected cases per 100 000 PCI cases) and a false discovery rate of 73.7%.

    Experiment 2: Machine Learning Techniques and the Existing Full Model

    Results for the existing full model and the machine learning techniques are summarized in Table 3. The existing full model achieved a mean C statistic of 0.78 (95% CI, 0.78-0.78), similar to the 0.78 reported by Rao et al3 after excluding CABG cases. Neither lasso regularization nor XGBoost improved model discrimination.

    Experiment 3: Machine Learning Techniques and the Blended Model

    Results for the blended model variable set are summarized in Table 3 and eTable 1 in the Supplement. Logistic regression with lasso regularization achieved a mean C statistic of 0.78 (95% CI, 0.78-0.78). XGBoost improved model discrimination to a mean C statistic of 0.82 (95% CI, 0.82-0.82). The blended model had an F score of 0.31 (95% CI, 0.30-0.31) vs 0.26 (95% CI, 0.26-0.26) for the existing full model.

    Table 4 summarizes the progression of model improvement, from the existing full model variable set using logistic regression and the blended model NCDR bleeding risk variable set using logistic regression, to the blended model variable set modeled with XGBoost. The existing full model, with a data-driven risk threshold of 11.9%, correctly identified 49 967 cases as high risk for bleeding who experienced an event and 2 982 389 cases as low risk for bleeding who did not experience an event, yielding a false discovery rate of 78.7%. The blended model trained with logistic regression, at a data-driven risk threshold of 11.6%, correctly identified 51 840 cases as high risk for bleeding who experienced an event and 2 977 168 cases as low risk for bleeding who did not experience an event, an increase in 1873 cases identified as high risk for bleeding who experienced an event but a decrease in 5221 cases identified as low risk for bleeding who did not experience an event, yielding a false discovery rate of 78.5%. The blended model trained with XGBoost, at a data-driven risk threshold of 15.6%, correctly identified 55 527 cases as high risk for bleeding who experienced an event and 3 013 868 cases as low risk for bleeding who did not experience an event, an increase in 5560 cases identified as high risk for bleeding who experienced an event (3.7% of bleeding cases, or an additional 168 bleeding cases per 100 000 PCI cases) and 31 479 nonbleeding cases (1.0% of nonbleeding cases, or an additional 949 nonbleeding cases per 100 000 PCI cases), yielding a false discovery rate of 73.4%.

    Experiment 4: Understanding the Blended Model

    Calibration plots are shown for the existing full model and the blended model divided into deciles, with associated standard errors (Figure, A). Also shown are continuous calibration plot functions of risk and 95% CIs (Figure, B). The blended model demonstrated a closer calibration than the existing full model, with a Brier score of 0.039 vs 0.041. eFigures 2, 3, 4, and 5 in the Supplement show the improvement in predictions in the highest deciles of risk (eAppendix in the Supplement).

    eTables 2, 3, and 4 in the Supplement show the variables for the blended model, along with the forward stepwise selection C statistic when added in rank order to the model. Using only the top 10 predictive variables achieved a mean C statistic of 0.81 (eAppendix in the Supplement).

    Discussion

    Our study shows that machine learning algorithms better characterize the risk of major bleeding after PCI. However, the ability of machine learning techniques to produce better results depends on whether extracted variables are constrained. Our results comparing logistic regression with lasso regularization against XGBoost with the predefined, dichotomous variables show that implementing machine learning algorithms does not necessarily improve predictive results. By not constraining available data to dichotomous variables before the modeling and then leveraging machine learning techniques, we observed improvements in models. These models were better able to identify high-risk individuals and reclassified the risk of a meaningful percentage of patients. Therefore, models improved with machine learning techniques, but processing of data has a large role in effectiveness. Such models have the possibility of being integrated into electronic medical records and made available at the point of care.

    Using machine learning techniques, we improved the discrimination of the existing full model of bleeding risk. By using the data-driven thresholds to optimize decision making, we correctly classified an additional 5560 cases as high risk for bleeding who experienced an event and 31 479 cases as low risk for bleeding who did not experience an event (an additional 168 bleeding cases per 100 000 PCI cases and an additional 949 nonbleeding cases per 100 000 PCI cases, respectively). Similarly, the machine learning techniques model improved the discrimination of the existing simplified risk score model to 0.82 from 0.78 with no additional variables because gradient descent boosting is better equipped to fully analyze continuous and categorical variable ranges. This work demonstrates an improvement in model discrimination and calibration and includes an evaluation of the ability of models to identify specific bleeding cases and nonbleeding cases; the improved identification of both bleeding cases and nonbleeding cases is ultimately necessary to address how clinicians could use risk scores in real time.

    Nevertheless, prospective application of machine learning techniques requires more research. Other work (eg, that by Spertus et al21 and from an associated product called Patient Risk Information Services Manager [ePRISM]) attempts to link outcomes research, clinical decision making, and informed consent. We believe such work demonstrates a path forward for using a model for quality improvement, consent (as in the work by Spertus et al21), and additional risk mitigation on a prospective case-by-case basis. We chose a data-driven selection of the risk threshold to equalize misclassification errors. If the treatment for FP results and for FN results is found to differ in costs, time associated with treating them, and other important resources, the decision threshold may require adjustment.

    The predictive enhancements demonstrated herein have 2 implications. The first involves variable selection, whereby thresholds that emerge from the data allow for the use and interpretation of continuous values rather than forcing preselection of dichotomous thresholds. For example, XGBoost analysis of pre-PCI hemoglobin as a continuous value rather than the dichotomous threshold (≤13 vs >13 g/dL) at a minimum reduces preprocessing efforts and potentially enables further insight into what the critical values are in predicting risk for patients. A second implication is monitoring of higher-order interactions. By finding potentially nonlinear combinations of values, a machine learning model can better characterize risk.

    Limitations and Future Work

    Our study has some limitations. First, we used a simple data imputation strategy due to the low rate of missing values in the data set. While this may introduce some faulty performance, our improved results will become stronger with advanced, patient-specific imputation techniques that should be further investigated. Second, despite the large number of variables we incorporated, we could not adjust for several other factors that may augment risk prediction. For example, we were unable to adjust for the access-site decision; to keep a strict comparison of modeling techniques possible, we did not include this information. Such a decision would result in downstream differences in treatments. We are now able to evaluate the influence the additional variables collected in the registry will have on risk, particularly in the extreme low-risk and high-risk scenarios with access site and anticoagulation decisions.

    Evaluating the predictive differences in the existing simplified risk score model vs the blended model (which contains periprocedural variables) shows variations in prediction that identify changes in patient risk throughout the course of treatment. The development of models that include additional variables and decisions made during treatment would require multiple, staged models to identify patient risk. This would aid clinical decision making in elucidating the dynamic factors that change risk, understanding what variables are readily available in electronic health records for tool development, and further discussing the clinical implementations of such models. In addition, comparing results with the different potential types of bleeding may clarify the difference in FP and TP results. For example, the definition of major bleeding includes a hemoglobin decrease that may not actually correspond to major bleeding that was identified and treated. Similarly, other values may indicate major bleeding that does not fit the definitions. A further exploration of the misclassifications and their causes may provide additional clinical value.

    Conclusions

    Using machine learning methods strategically allows for improvements in predictive model performance. Knowing the data ranges measured and how data fit into machine learning techniques enables us to realize the potential of these techniques. When applied with an appropriate variable set, machine learning techniques improved risk prediction models for major bleeding after PCI. We demonstrated that machine learning techniques will not necessarily do the work of improving predictive value and that a key to successful implementation is the use of variables in a way that does not reduce information. We showed that the application of these methods improved model discrimination (C statistic, 0.82) and calibration and offered direct metrics of how the model would perform with a prospective cohort of patients (F score, 0.31). These findings lay the groundwork for future work in more advanced models with additional variables for further improved performance.

    Back to top
    Article Information

    Accepted for Publication: May 17, 2019.

    Published: July 10, 2019. doi:10.1001/jamanetworkopen.2019.6835

    Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Mortazavi BJ et al. JAMA Network Open.

    Corresponding Author: Harlan M. Krumholz, MD, SM, Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, One Church Street, Ste 200, New Haven, CT 06510 (harlan.krumholz@yale.edu).

    Author Contributions: Drs Mortazavi and Krumholz had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Negahban and Krumholz contributed equally as senior authors.

    Concept and design: All authors.

    Acquisition, analysis, or interpretation of data: Mortazavi, Bucholz, Huang, Masoudi, Negahban, Krumholz.

    Drafting of the manuscript: Mortazavi, Desai.

    Critical revision of the manuscript for important intellectual content: All authors.

    Statistical analysis: Mortazavi, Bucholz, Huang.

    Obtained funding: Negahban.

    Administrative, technical, or material support: Desai, Masoudi, Shaw, Krumholz.

    Supervision: Negahban.

    Conflict of Interest Disclosures: Dr Curtis reported receiving support from the American College of Cardiology, the Centers for Medicare & Medicaid Services, and Medtronic. Dr Masoudi reported having a contract with the American College of Cardiology for his role as chief science officer of the National Cardiovascular Data Registry; and receiving support from the American College of Cardiology. Dr Negahban reported receiving grants from the National Institutes of Health and the National Science Foundation. Dr Krumholz reported having research agreements with Medtronic and Johnson & Johnson (Janssen), through Yale University, to develop methods of clinical trial data sharing; receiving a grant from Medtronic and the US Food and Drug Administration, through Yale University, to develop methods for postmarket surveillance of medical devices; working under contract with the Centers for Medicare & Medicaid Services to develop and maintain performance measures that are publicly reported; chairing a cardiac scientific advisory board for UnitedHealth; being a participant/participant representative of the IBM Watson Health life sciences board; serving as a member of the advisory board for Element Science and the physician advisory board for Aetna; and being the founder of Hugo, a personal health information platform. Dr Krumholz also reported receiving grants from the Shenzhen Center for Health Information; receiving personal fees from the National Center for Cardiovascular Diseases (Beijing), Arnold & Porter Law Firm, and Ben C. Martin Law Firm; and serving as a member of the advisory board to Facebook. No other disclosures were reported.

    Funding/Support: These research data were provided by the American College of Cardiology’s National Cardiovascular Data Registry, Washington, DC. This study was funded, in part, by the American College of Cardiology Foundation. This work was supported, in part, by the Texas A&M Engineering Experiment Station’s Center for Remote Health Technologies and Systems. Dr Negahban acknowledges support from National Science Foundation award DMS1723128.

    Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

    Disclaimer: The views expressed herein represent those of the authors and do not necessarily represent the official views of the National Cardiovascular Data Registry or its associated professional societies identified online (https://cvquality.acc.org/NCDR-Home).

    References
    1.
    Mehta  SK, Frutkin  AD, Lindsey  JB,  et al; National Cardiovascular Data Registry.  Bleeding in patients undergoing percutaneous coronary intervention: the development of a clinical risk algorithm from the National Cardiovascular Data Registry.  Circ Cardiovasc Interv. 2009;2(3):222-229. doi:10.1161/CIRCINTERVENTIONS.108.846741PubMedGoogle ScholarCrossref
    2.
    Rao  SV, Kaul  PR, Liao  L,  et al.  Association between bleeding, blood transfusion, and costs among patients with non–ST-segment elevation acute coronary syndromes.  Am Heart J. 2008;155(2):369-374. doi:10.1016/j.ahj.2007.10.014PubMedGoogle ScholarCrossref
    3.
    Rao  SV, McCoy  LA, Spertus  JA,  et al.  An updated bleeding model to predict the risk of post-procedure bleeding among patients undergoing percutaneous coronary intervention: a report using an expanded bleeding definition from the National Cardiovascular Data Registry CathPCI Registry.  JACC Cardiovasc Interv. 2013;6(9):897-904. doi:10.1016/j.jcin.2013.04.016PubMedGoogle ScholarCrossref
    4.
    Baklanov  DV, Kaltenbach  LA, Marso  SP,  et al.  The prevalence and outcomes of transradial percutaneous coronary intervention for ST-segment elevation myocardial infarction: analysis from the National Cardiovascular Data Registry (2007 to 2011).  J Am Coll Cardiol. 2013;61(4):420-426. doi:10.1016/j.jacc.2012.10.032PubMedGoogle ScholarCrossref
    5.
    Montalescot  G, Salette  G, Steg  G,  et al.  Development and validation of a bleeding risk model for patients undergoing elective percutaneous coronary intervention.  Int J Cardiol. 2011;150(1):79-83. doi:10.1016/j.ijcard.2010.02.077PubMedGoogle ScholarCrossref
    6.
    Mortazavi  BJ, Downing  NS, Bucholz  EM,  et al.  Analysis of machine learning techniques for heart failure readmissions.  Circ Cardiovasc Qual Outcomes. 2016;9(6):629-640. doi:10.1161/CIRCOUTCOMES.116.003039PubMedGoogle ScholarCrossref
    7.
    Brindis  RG, Fitzgerald  S, Anderson  HV, Shaw  RE, Weintraub  WS, Williams  JF.  The American College of Cardiology–National Cardiovascular Data Registry (ACC-NCDR): building a national clinical data repository.  J Am Coll Cardiol. 2001;37(8):2240-2245. doi:10.1016/S0735-1097(01)01372-9PubMedGoogle ScholarCrossref
    8.
    Messenger  JC, Ho  KK, Young  CH,  et al; NCDR Science and Quality Oversight Committee Data Quality Workgroup.  The National Cardiovascular Data Registry (NCDR) Data Quality Brief: the NCDR Data Quality Program in 2012.  J Am Coll Cardiol. 2012;60(16):1484-1488. doi:10.1016/j.jacc.2012.07.020PubMedGoogle ScholarCrossref
    9.
    Friedman  J, Hastie  T, Tibshirani  R. The Elements of Statistical Learning. Vol 1. Berlin, Germany: Springer; 2001. Springer Series in Statistics.
    10.
    Chen  T, Guestrin  C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13-17, 2016; San Francisco, CA.
    11.
    Sokolova  M, Lapalme  G.  A systematic analysis of performance measures for classification tasks.  Inf Process Manage. 2009;45(4):427-437. doi:10.1016/j.ipm.2009.03.002Google ScholarCrossref
    12.
    Wood  SN.  Generalized Additive Models: An Introduction With R. Boca Raton, FL: Chapman & Hall/CRC; 2006. doi:10.1201/9781420010404
    13.
    Friedman  J, Hastie  T, Tibshirani  R.  Regularization paths for generalized linear models via coordinate descent.  J Stat Softw. 2010;33(1):1-22. doi:10.18637/jss.v033.i01PubMedGoogle ScholarCrossref
    14.
    Robin  X, Turck  N, Hainard  A,  et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves.  BMC Bioinformatics. 2011;12(1):77. doi:10.1186/1471-2105-12-77PubMedGoogle ScholarCrossref
    15.
    Wood  SN.  Generalized Additive Models: An Introduction With R. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC; 2017. doi:10.1201/9781315370279
    16.
    Zeileis  A.  Econometric computing with HC and HAC covariance matrix estimators.  J Stat Softw. 2004;11(10). doi:10.18637/jss.v011.i10Google Scholar
    17.
    Siegert  S, Bhend  J, Kroener  I, De Felice  M. Package “SpecsVerification.” https://cran.r-project.org/web/packages/SpecsVerification/SpecsVerification.pdf. Published March 7, 2017. Accessed June 7, 2017.
    18.
    GitHub. Source code for work on modeling major bleeding risk in post-PCI patients from the NCDR CathPCI Registry. https://github.com/bobakm/NCDR_CathPCI_MajorBleed_Public. Accessed May 28, 2019.
    19.
    Zhang  T.  On the consistency of feature selection using greedy least squares regression.  J Mach Learn Res. 2009;10:555-568. http://www.jmlr.org/papers/volume10/zhang09a/zhang09a.pdf. Accessed November 11, 2017.Google Scholar
    20.
    Elenberg  ER, Khanna  R, Dimakis  AG, Negahban  S. Restricted strong convexity implies weak submodularity. https://arxiv.org/abs/1612.00804. Last revised October 12, 2017. Accessed November 11, 2017.
    21.
    Spertus  JA, Bach  R, Bethea  C,  et al.  Improving the process of informed consent for percutaneous coronary intervention: patient outcomes from the Patient Risk Information Services Manager (ePRISM) study.  Am Heart J. 2015;169(2):234-241.e1.PubMedGoogle ScholarCrossref
    ×