[Skip to Content]
Access to paid content on this site is currently suspended due to excessive activity being detected from your IP address 34.226.244.70. Please contact the publisher to request reinstatement.
[Skip to Content Landing]
Views 1,569
Citations 0
Viewpoint
January 22, 2020

When Deploying Predictive Algorithms, Are Summary Performance Measures Sufficient?

Author Affiliations
  • 1National Institute of Health Research Oxford Health Biomedical Research Center, Warneford Hospital, Department of Psychiatry, University of Oxford, Oxford, England
JAMA Psychiatry. Published online January 22, 2020. doi:10.1001/jamapsychiatry.2019.4484

The last decade’s growth in artificial intelligence, machine learning, and statistical methods for high-dimensional data has driven a zeitgeist of prediction (or forecasting) in medicine1 and psychiatry.2 Algorithms for prediction require a model that is governed by parameters whose values are estimated from exemplar training cases. Estimation (or training) of parameters ingrains uncertainty into the resulting algorithm arising from model assumptions in addition to bias and error in the data. The trained algorithm’s proficiency is tested on separate validation cases (not seen during training) and summarized as representative of the expected performance when used for making predictions about actual patients. The trained model yields a continuous score that is proportional to the probability of some outcome, commonly a diagnosis or the occurrence of an event. Most often, this continuous score is compared with an operating threshold (or cutoff) that implicitly defines a dichotomizing decision rule3 because this is compatible with summary measures of performance (SMP)4 such as the area under the receiver operating characteristic curve (AUROC), sensitivity/specificity, and balanced accuracy. Sometimes, the continuous scores are instead summarized as the Brier score, ranging from 0 (perfect) to 1 (worst). In this Viewpoint, we discuss an important but neglected issue: summary measures of performance obscure uncertainty in the algorithm’s predictions that may be relevant when deployed for clinical decision-making.

Limit 200 characters
Limit 25 characters
Conflicts of Interest Disclosure

Identify all potential conflicts of interest that might be relevant to your comment.

Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.

Err on the side of full disclosure.

If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.

Not all submitted comments are published. Please see our commenting policy for details.

Limit 140 characters
Limit 3600 characters or approximately 600 words
    ×