The last decade’s growth in artificial intelligence, machine learning, and statistical methods for high-dimensional data has driven a zeitgeist of prediction (or forecasting) in medicine1 and psychiatry.2 Algorithms for prediction require a model that is governed by parameters whose values are estimated from exemplar training cases. Estimation (or training) of parameters ingrains uncertainty into the resulting algorithm arising from model assumptions in addition to bias and error in the data. The trained algorithm’s proficiency is tested on separate validation cases (not seen during training) and summarized as representative of the expected performance when used for making predictions about actual patients. The trained model yields a continuous score that is proportional to the probability of some outcome, commonly a diagnosis or the occurrence of an event. Most often, this continuous score is compared with an operating threshold (or cutoff) that implicitly defines a dichotomizing decision rule3 because this is compatible with summary measures of performance (SMP)4 such as the area under the receiver operating characteristic curve (AUROC), sensitivity/specificity, and balanced accuracy. Sometimes, the continuous scores are instead summarized as the Brier score, ranging from 0 (perfect) to 1 (worst). In this Viewpoint, we discuss an important but neglected issue: summary measures of performance obscure uncertainty in the algorithm’s predictions that may be relevant when deployed for clinical decision-making.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Joyce DW, Geddes J. When Deploying Predictive Algorithms, Are Summary Performance Measures Sufficient? JAMA Psychiatry. Published online January 22, 2020. doi:10.1001/jamapsychiatry.2019.4484
Customize your JAMA Network experience by selecting one or more topics from the list below.
Create a personal account or sign in to: