[Skip to Navigation]
September 22, 2021

Risk of Bias and Error From Data Sets Used for Dermatologic Artificial Intelligence

Author Affiliations
  • 1Department of Dermatology, Medical University of Vienna, Vienna, Austria
JAMA Dermatol. 2021;157(11):1271-1273. doi:10.1001/jamadermatol.2021.3128

In this issue of JAMA Dermatology, Daneshjou et al1 report on bias of medical data sets used for artificial intelligence (AI) and underreporting of relevant metainformation. Their findings are in line with those of other reports,2-4 showing that most current data sets for machine learning are biased in various ways. Biased data sets, however, may render a machine learning model unfit for practical use. This is because data sets are not simply a small part of the machine learning pipeline, but the essence of it. Machine learning models are not “intelligent” in the broad human sense; rather, they learn ways of processing known training cases to build a representation map that can be used to map unknown test cases (Figure, A). An unknown test case is then classified according to the distribution of known diagnoses of similar training cases in the adjacent area of the representation map. If this distribution favors the correct diagnosis, we regard this as a successful prediction (Figure, B). This is an oversimplification of supervised machine learning, but it should help in understanding 2 important problems surrounding biased data sets.

Add or change institution