To the Editor Machine learning (ML) has emerged as a hot topic in medical research. However, despite promising results, many studies were found to be overoptimistic. The inability to generalize the approach owing to limited training or single-site studies appears to be an Achilles heel of ML. Khera et al1 address this issue by evaluating several ML approaches to predict mortality after myocardial infarction compared with the baseline logistic regression. Apart from the huge registry-derived data set (n = 755 402), a key advantage of their study lies in the detailed evaluation of the performance of multiple predictive models with several performance metrics, such as the (often underused) Brier score and shift tables.