[Skip to Navigation]
Comment & Response
September 29, 2021

Assessing Performance of Machine Learning—Reply

Author Affiliations
  • 1Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
  • 2Center for Outcomes Research and Evaluation, Yale New Haven Hospital, New Haven, Connecticut
  • 3Computer Science & Engineering, Texas A&M University, College Station
  • 4Department of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut
JAMA Cardiol. 2021;6(12):1466. doi:10.1001/jamacardio.2021.3715

In Reply In our study,1 we pursued an exhaustive cross-validated grid search to identify the optimal hyperparameters for the extreme gradient descent boosting model (XGBoost), a standard approach to the selection of hyperparameters that included searching over the learning rate, number of trees trained, maximum tree depth, and minimum loss reduction required for partition on a leaf node on a tree.2 To permit comparison of area under the receiver operator characteristic (AUROC) curves, we focused on defining their variance in iterative cross validation and reported as a 95% CI. Moreover, this approach allowed comparison of other metrics, such as the precision and recall, using a consistent approach for reporting confidence intervals. As reported in the study, XGBoost did not have better discrimination for in-hospital mortality in acute myocardial infarction (AMI) than a logistic regression model (XGBoost: AUROC, 0.89; 95% CI, 0.88-0.89; logistic regression: AUROC, 0.88; 95% CI, 0.88-0.88) despite the large sample size and selection of optimal hyperparameters.1

Add or change institution