[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
[Skip to Content Landing]
Figure 1.
Diagrams of the General Procedures for Participants in Studies 1 and 2
Diagrams of the General Procedures for Participants in Studies 1 and 2

After the first-step diagnostic algorithm, the 3387 SLs evaluated as melanocytic underwent the next 4 diagnostic algorithms. ABCD indicates asymmetry, border, color, and diameter; SLs, skin lesions.

aMelanocytic vs nonmelanocytic SLs.

bMelanoma vs benign melanocytic SLs.

cMalignant vs benign SLs.

Figure 2.
Effects of Group Size on True- and False-Positive Rates Using the Majority and the Quorum Rules
Effects of Group Size on True- and False-Positive Rates Using the Majority and the Quorum Rules

The first-step algorithm (study 1) aimed to differentiate between melanocytic and nonmelanocytic lesions. The pattern analysis algorithm (study 1) aimed to differentiate between melanoma and benign melanocytic lesions. The 3-point checklist algorithm (study 2), aimed to differentiate between malignant (melanoma and basal cell carcinoma) and benign skin lesions. With increasing group size, the true-positive rate increased and the false-positive rate decreased for each combination of collective intelligence approach and each diagnostic algorithm. Data are expressed as mean values. The dashed lines represent the mean individual true- and false-positive rates (ie, group size of 1).

Figure 3.
Receiver Operating Characteristics Curves Using the Quorum Rule When the Quorum Threshold Is Varied
Receiver Operating Characteristics Curves Using the Quorum Rule When the Quorum Threshold Is Varied

Each point is obtained by setting a different quorum threshold, starting at 0, with increments of 0.05 to 1. Data are shown for the first-step, pattern analysis, and 3-point checklist diagnostic algorithms. Data are based on a group size of 11.

Table 1.  
Characteristics of Study Participants
Characteristics of Study Participants
Table 2.  
Results of Applying the Majority and the Quorum Rules to the 6 Different Diagnostic Algorithms
Results of Applying the Majority and the Quorum Rules to the 6 Different Diagnostic Algorithms
1.
Garbe  C, Leiter  U.  Melanoma epidemiology and trends.  Clin Dermatol. 2009;27(1):3-9.PubMedGoogle ScholarCrossref
2.
Mayer  JE, Swetter  SM, Fu  T, Geller  AC.  Screening, early detection, education, and trends for melanoma: current status (2007-2013) and future directions, part I: epidemiology, high-risk groups, clinical strategies, and diagnostic technology.  J Am Acad Dermatol. 2014;71(4):599.e1-599.e12. doi:10.1016/j.jaad.2014.05.046.PubMedGoogle ScholarCrossref
3.
Mayer  JE, Swetter  SM, Fu  T, Geller  AC.  Screening, early detection, education, and trends for melanoma: current status (2007-2013) and future directions, part II: screening, education, and future directions.  J Am Acad Dermatol. 2014;71(4):611.e1-611.e10. doi:10.1016/j.jaad.2014.05.045.PubMedGoogle ScholarCrossref
4.
Kittler  H, Pehamberger  H, Wolff  K, Binder  M.  Diagnostic accuracy of dermoscopy.  Lancet Oncol. 2002;3(3):159-165.PubMedGoogle ScholarCrossref
5.
Bafounta  ML, Beauchet  A, Aegerter  P, Saiag  P.  Is dermoscopy (epiluminescence microscopy) useful for the diagnosis of melanoma? results of a meta-analysis using techniques adapted to the evaluation of diagnostic tests.  Arch Dermatol. 2001;137(10):1343-1350.PubMedGoogle ScholarCrossref
6.
Ascierto  PA, Satriano  RA, Palmieri  G, Parasole  R, Bosco  L, Castello  G.  Epiluminescence microscopy as a useful approach in the early diagnosis of cutaneous malignant melanoma.  Melanoma Res. 1998;8(6):529-537.PubMedGoogle ScholarCrossref
7.
Vestergaard  ME, Macaskill  P, Holt  PE, Menzies  SW.  Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a meta-analysis of studies performed in a clinical setting.  Br J Dermatol. 2008;159(3):669-676.PubMedGoogle Scholar
8.
Rajpara  SM, Botello  AP, Townend  J, Ormerod  AD.  Systematic review of dermoscopy and digital dermoscopy/ artificial intelligence for the diagnosis of melanoma.  Br J Dermatol. 2009;161(3):591-604.PubMedGoogle ScholarCrossref
9.
Rubegni  P, Burroni  M, Cevenini  G,  et al.  Digital dermoscopy analysis and artificial neural network for the differentiation of clinically atypical pigmented skin lesions: a retrospective study.  J Invest Dermatol. 2002;119(2):471-474.PubMedGoogle ScholarCrossref
10.
Ganster  H, Pinz  A, Röhrer  R, Wildling  E, Binder  M, Kittler  H.  Automated melanoma recognition.  IEEE Trans Med Imaging. 2001;20(3):233-239.PubMedGoogle ScholarCrossref
11.
Garbe  C, Eigentler  TK.  Diagnosis and treatment of cutaneous melanoma: state of the art 2006.  Melanoma Res. 2007;17(2):117-127.PubMedGoogle ScholarCrossref
12.
Krause  J, Ruxton  GD, Krause  S.  Swarm intelligence in animals and humans.  Trends Ecol Evol. 2010;25(1):28-34.PubMedGoogle ScholarCrossref
13.
Bonabeau  E, Dorigo  M, Theraulaz  G.  Swarm Intelligence: From Natural to Artificial Systems. Oxford, England: Oxford University Press; 1999.
14.
Couzin  ID.  Collective cognition in animal groups.  Trends Cogn Sci. 2009;13(1):36-43.PubMedGoogle ScholarCrossref
15.
Woolley  AW, Chabris  CF, Pentland  A, Hashmi  N, Malone  TW.  Evidence for a collective intelligence factor in the performance of human groups.  Science. 2010;330(6004):686-688.PubMedGoogle ScholarCrossref
16.
Surowiecki  J.  The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. New York, NY: Knopf Doubleday Publishing Group; 2004.
17.
Arrow  KJ, Forsythe  R, Gorham  M,  et al.  Economics: the promise of prediction markets.  Science. 2008;320(5878):877-878.PubMedGoogle ScholarCrossref
18.
Clément  RJ, Krause  S, von Engelhardt  N, Faria  JJ, Krause  J, Kurvers  RH.  Collective cognition in humans: groups outperform their best members in a sentence reconstruction task.  PLoS One. 2013;8(10):e77943. doi:10.1371/journal.pone.0077943.PubMedGoogle ScholarCrossref
19.
King  AJ, Gehl  RW, Grossman  D, Jensen  JD.  Skin self-examinations and visual identification of atypical nevi: comparing individual and crowdsourcing approaches.  Cancer Epidemiol. 2013;37(6):979-984.PubMedGoogle ScholarCrossref
20.
Argenziano  G, Soyer  HP, Chimenti  S,  et al.  Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet.  J Am Acad Dermatol. 2003;48(5):679-693.PubMedGoogle ScholarCrossref
21.
Zalaudek  I, Argenziano  G, Soyer  HP,  et al; Dermoscopy Working Group.  Three-point checklist of dermoscopy: an open Internet study.  Br J Dermatol. 2006;154(3):431-437.PubMedGoogle ScholarCrossref
22.
Sumpter  DJ, Pratt  SC.  Quorum responses and consensus decision making.  Philos Trans R Soc B Biol Sci. 2009;364(1518):743-753.Google ScholarCrossref
23.
Hastie  R, Kameda  T.  The robust beauty of majority rules in group decisions.  Psychol Rev. 2005;112(2):494-508.PubMedGoogle ScholarCrossref
24.
Sorkin  RD, West  R, Robinson  DE.  Group performance depends on the majority rule.  Psychol Sci. 1998;9(6):456-463.Google ScholarCrossref
25.
Wolf  M, Kurvers  RH, Ward  AJ, Krause  S, Krause  J.  Accurate decisions in an uncertain world: collective cognition increases true positives while decreasing false positives.  Proc R Soc B. 2013;280(1756):20122777. doi:10.1098/rspb.2012.2777.Google ScholarCrossref
26.
Kurvers  RJM, Wolf  M, Krause  J.  Humans use social information to adjust their quorum thresholds adaptively in a simulated predator detection experiment.  Behav Ecol Sociobiol. 2014;68(3):449-456.Google ScholarCrossref
27.
Swets  JA.  The science of choosing the right decision threshold in high-stakes diagnostics.  Am Psychol. 1992;47(4):522-532.PubMedGoogle ScholarCrossref
28.
Swets  JA, Dawes  RM, Monahan  J.  Psychological science can improve diagnostic decisions.  Psychol Sci Public Interest. 2000;1(1):1-26.PubMedGoogle ScholarCrossref
29.
Swets  JA.  Measuring the accuracy of diagnostic systems.  Science. 1988;240(4857):1285-1293.PubMedGoogle ScholarCrossref
30.
Argenziano  G, Fabbrocini  G, Carli  P, De Giorgi  V, Sammarco  E, Delfino  M.  Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis.  Arch Dermatol. 1998;134(12):1563-1570.PubMedGoogle ScholarCrossref
31.
Conradt  L, List  C, Roper  TJ.  Swarm intelligence: when uncertainty meets conflict.  Am Nat. 2013;182(5):592-610.PubMedGoogle ScholarCrossref
32.
Kao  AB, Couzin  ID.  Decision accuracy in complex environments is often maximized by small group sizes [published online April 23, 2014].  Proc R Soc B. doi:10.1098/rspb.2013.3305.Google Scholar
33.
Krause  S, James  R, Faria  JJ, Ruxton  GD, Krause  J.  Swarm intelligence in humans: diversity can trump ability.  Anim Behav. 2011;81(5):941-948.Google ScholarCrossref
34.
Luan  S, Katsikopoulos  KV, Reimer  T.  When does diversity trump ability (and vice versa) in group decision making? a simulation study.  PLoS One. 2012;7(2):e31043.PubMedGoogle ScholarCrossref
35.
Hukkinen  K, Kivisaari  L, Vehmas  T.  Impact of the number of readers on mammography interpretation.  Acta Radiol. 2006;47(7):655-659.PubMedGoogle ScholarCrossref
36.
Duijm  LEM, Louwman  MWJ, Groenewoud  JH, van de Poll-Franse  LV, Fracheboud  J, Coebergh  JW.  Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome.  Br J Cancer. 2009;100(6):901-907.PubMedGoogle ScholarCrossref
37.
Farnetani  F, Scope  A, Braun  RP,  et al.  Skin cancer diagnosis with reflectance confocal microscopy: reproducibility of feature recognition and accuracy of diagnosis [published online May 20, 2015].  JAMA Dermatol. doi:10.1001/jamadermatol.2015.0810.PubMedGoogle Scholar
38.
Wolf  M, Krause  J, Carney  PA, Bogart  A, Kurvers  RHJM.  Collective intelligence meets medical decision making: the collective outperforms the best radiologist [published online August 12, 2015].  PLoS One. doi:10.1371/journal.pone.0134269. PubMedGoogle Scholar
39.
Massone  C, Brunasso  AMG, Hofmann-Wellenhof  R, Gulia  A, Soyer  HP.  Teledermoscopy: education, discussion forums, teleconsulting and mobile teledermoscopy.  G Ital Dermatol Venereol. 2010;145(1):127-132.PubMedGoogle Scholar
40.
Massone  C, Hofmann-Wellenhof  R, Ahlgrimm-Siess  V, Gabler  G, Ebner  C, Soyer  HP.  Melanoma screening with cellular phones.  PLoS One. 2007;2(5):e483. doi:10.1371/journal.pone.0000483.PubMedGoogle ScholarCrossref
41.
Kroemer  S, Frühauf  J, Campbell  TM,  et al.  Mobile teledermatology for skin tumour screening: diagnostic accuracy of clinical and dermoscopic image tele-evaluation using cellular phones.  Br J Dermatol. 2011;164(5):973-979.PubMedGoogle ScholarCrossref
42.
Ebner  C, Wurm  EMT, Binder  B,  et al.  Mobile teledermatology: a feasibility study of 58 subjects using mobile phones.  J Telemed Telecare. 2008;14(1):2-7.PubMedGoogle ScholarCrossref
Original Investigation
December 2015

Detection Accuracy of Collective Intelligence Assessments for Skin Cancer Diagnosis

Author Affiliations
  • 1Department of Biology and Ecology of Fishes, Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
  • 2Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
  • 3Faculty of Life Sciences, Humboldt-University of Berlin, Berlin, Germany
  • 4Department of Dermatology, Second University of Naples, Naples, Italy
  • 5Department of Dermatology and Venereology, Medical University of Graz, Graz, Austria
JAMA Dermatol. 2015;151(12):1346-1353. doi:10.1001/jamadermatol.2015.3149
Abstract

Importance  Incidence rates of skin cancer are increasing globally, and the correct classification of skin lesions (SLs) into benign and malignant tissue remains a continuous challenge. A collective intelligence approach to skin cancer detection may improve accuracy.

Objective  To evaluate the performance of 2 well-known collective intelligence rules (majority rule and quorum rule) that combine the independent conclusions of multiple decision makers into a single decision.

Design, Setting, and Participants  Evaluations were obtained from 2 large and independent data sets. The first data set consisted of 40 experienced dermoscopists, each of whom independently evaluated 108 images of SLs during the Consensus Net Meeting of 2000. The second data set consisted of 82 medical professionals with varying degrees of dermatology experience, each of whom evaluated a minimum of 110 SLs. All SLs were evaluated via the Internet. Image selection of SLs was based on high image quality and the presence of histopathologic information. Data were collected from July through October 2000 for study 1 and from February 2003 through January 2004 for study 2 and evaluated from January 5 through August 7, 2015.

Main Outcomes and Measures  For both collective intelligence rules, we determined the true-positive rate (ie, the hit rate or specificity) and the false-positive rate (ie, the false-alarm rate or 1 − sensitivity) and compared these rates with the performance of single decision makers. Furthermore, we evaluated the effect of group size on true- and false-positive rates.

Results  One hundred twenty-two medical professionals performed 16 029 evaluations. Use of either collective intelligence rule consistently outperformed single decision makers. The groups achieved an increased true-positive rate and a decreased false-positive rate. For example, individual decision makers in study 1, using the pattern analysis as diagnostic algorithm, achieved a true-positive rate of 0.83 and a false-positive rate of 0.17. Groups of 3 individuals achieved a true-positive rate of 0.91 and a false-positive rate of 0.14. These improvements increased with increasing group size.

Conclusions and Relevance  Collective intelligence might be a viable approach to increase diagnostic accuracy in skin cancer and reduce skin cancer–related mortality.

Introduction

Incidence rates of skin cancer have been increasing during the past 5 decades in the United States and many parts of Europe.1,2 The key for reducing the mortality rate due to skin cancer includes early detection and correct classification of skin lesions (SLs).3,4 During the past 2 decades, several different approaches have been proved to increase diagnostic accuracy, most notably dermoscopy47 and computer-aided diagnosis.811 We herein focus on an alternative and complementary collective intelligence approach that, to our knowledge, has received little attention in skin cancer research.

Collective intelligence refers to the ability of groups to outperform single individuals when performing cognitive tasks.1216 Well-known examples of collective intelligence include the prediction of election outcomes and memory retrieval and number estimation tasks.12,17,18 Although collective intelligence can thus be used in a diverse range of tasks, at present, little is known about the scope for collective intelligence in medical diagnostics. We herein investigate whether a collective intelligence approach can be used to improve diagnostic accuracy in skin cancer detection.

Although most research on skin cancer detection focuses on single raters, King et al19 investigated a crowdsourcing approach in the context of skin self-examination. In their study, participants with little to no experience in dermatology independently classified 40 images as melanoma or nonmelanoma. Afterward, these decisions were combined into a single collective decision. Compared with decisions made by individuals, collective decisions achieved a higher true-positive rate (ie, hit rate or specificity) but also a higher false-positive rate (ie, false-alarm rate or 1 – sensitivity). However, collective decisions consisted of the mean ratings of 400 individuals, and the study did not include experienced dermatologists, who outperform individuals with little experience in dermatology. Thus, our understanding of how combining independent assessments by dermatologists affects diagnostic accuracy is limited, and the extent to which a collective intelligence approach can improve skin cancer detection is unclear.

Herein we investigated the potential of a collective intelligence approach in skin cancer detection. We used 2 large and independent data sets to investigate whether 2 well-known collective intelligence rules (majority rule and quorum rule) that combine the independent assessments of multiple raters can improve diagnostic accuracy when differentiating between different types of SLs.

Methods

We used 2 data sets based on 2 published studies in which patient data were deidentified.20,21 Institutional review board approval was waived by the Second University of Naples for both studies because they did not affect the routine procedures during clinical practice. Data were collected from July through October 2000 for study 1 and from February 2003 through January 2004 for study 2. Brief descriptions of each study follow.

Study 1

The first study was based on a consensus meeting via the Internet, known as the Consensus Net Meeting on Dermoscopy.20 In this study, 40 experienced clinical dermoscopists (with ≥5 years of experience in dermoscopy practice, teaching, and publishing) independently diagnosed 128 digital images of SLs. Skin lesion images were obtained from the Department of Dermatology, University Frederico II; the Department of Dermatology, University of L’Aquila; the Department of Dermatology, University of Graz; the Sydney Melanoma Unit, Royal Prince Alfred Hospital; and the Skin and Cancer Associates, Plantation, Florida.20 Skin lesions were selected based on the photographic quality of the clinical and dermoscopic images. Histopathologic specimens of all SLs were available and judged by a histopathology panel. Diagnostic categories included melanoma (n = 33), benign melanocytic SLs (n = 70), basal cell carcinoma (n = 10), and other nonmelanocytic SLs (including 10 seborrheic keratoses, 2 vascular lesions, 2 dermatofibromas, and 1 lichen planus–like keratosis [n = 15]). Participants evaluated the dermoscopic images of the SLs via the Internet. Dermoscopists first underwent a training procedure consisting of 20 SLs, during which they received web-based tutorials to familiarize them with the definitions and procedures. Then, dermoscopists evaluated the remaining 108 SLs (Figure 1). First, each participant was asked to evaluate each SL using the first-step diagnostic algorithm, which differentiates melanocytic from nonmelanocytic lesions. Whenever a participant evaluated an SL as melanocytic, the participant was asked to classify the SL as melanoma or a benign melanocytic lesion. For this classification, each participant was instructed to use the following 4 diagnostic algorithms sequentially: (1) pattern analysis; (2) the ABCD (asymmetry, border, color, and diameter) rule; (3) the Menzies method; and (4) a 7-point checklist.20

Study 2

The second Internet-based study included 165 digital images of SLs and 170 participants.21 The 165 SLs were seen and selected at a specialized pigmented lesion clinic established by the Department of Dermatology, Second University of Naples. Skin lesions were selected based on high image quality and the presence of melanin or hemoglobin pigmentation in all or part of the lesion. Whereas study 1 focused on the differentiation between melanocytic and nonmelanocytic lesions (and within the melanocytic lesions, on the differentiation between melanoma and benign melanocytic lesions), study 2 focused on the differentiation between malignant (including melanoma and basal cell carcinoma) and benign SLs. Results of a histopathologic examination classified each lesion as malignant (n = 49) or benign (n = 116). The participants varied in their dermatology experience (Table 1). The participants evaluated the SLs via the Internet. After a training procedure consisting of 15 SLs, the participants were asked to evaluate the remaining 150 SLs using the 3-point checklist as a diagnostic algorithm. The 3-point checklist is based on 3 dermoscopic criteria (asymmetry, atypical network, and blue-white structure), whereby the presence of 2 or more criteria is considered indicative of malignancy. Not all the 170 participants evaluated all the images. We excluded all individuals who evaluated fewer than 110 images, which resulted in 82 participants.

To summarize, the participants in study 1 first used a diagnostic algorithm (first step) to differentiate melanocytic from nonmelanocytic SLs, and if a participant evaluated an SL as melanocytic, then 4 different diagnostic algorithms (pattern analysis, the ABCD rule, the Menzies method, and a 7-point checklist) were used to differentiate melanoma from benign melanocytic lesions. Participants in study 2 used a single diagnostic algorithm (a 3-point checklist) to differentiate malignant from benign SLs. To test the performance and robustness of a collective intelligence approach, we investigated the performance of both collective intelligence rules for each diagnostic algorithm.

Collective Intelligence Rules

We tested the performance of 2 well-known collective intelligence rules: the majority and the quorum rules. These rules aggregate the independent assessments of multiple raters. We applied the majority and the quorum rules to each of the 6 diagnostic algorithms described above. In the following description we will use the first-step diagnostic algorithm as an example. All other diagnostic algorithms were analyzed following the same procedure.

Majority Rule

The majority rule classifies each SL according to the majority opinion of the raters.2224 For the first-step algorithm, majority rule implies that whenever the majority of the group members classifies an SL as melanocytic, the SL is classified as melanocytic; otherwise it is classified as nonmelanocytic. For a given group size (n; range, 1-11, only using odd numbers to avoid a tie-breaker rule), we randomly drew n evaluations for each SL in the data set. For each SL, we then determined the number of evaluations supporting melanocytic and nonmelanocytic classifications. Each SL was classified based on the option that received the most support. After classifying each SL in this way, we used the histopathologic records to determine the true- and false-positive rates of the majority rule. For each group size n, we repeated this procedure 2000 times. We report the mean (SEM) true- and false-positive rates per group size.

Quorum Rule

The quorum rule uses a so-called quorum threshold to classify an SL. Each SL is classified as condition present whenever the fraction of evaluations for condition present is above the quorum threshold; otherwise the SL is classified as condition absent. For example, for the first-step diagnostic algorithm and a quorum threshold of 0.3, an SL is classified as melanocytic whenever at least 30% of the raters in a group classify it as melanocytic; otherwise it is classified as nonmelanocytic. Compared with single raters, groups using the quorum rule are predicted to increase true-positive results and decrease false-positive results whenever the quorum threshold is set halfway between the mean true- and false-positive rates of the raters.25,26 The procedure went as follows. First, we randomly assigned half of the SLs to a training set and the other half to a validation set. The training set was used to determine the quorum threshold. We calculated the mean true- and false-positive rates of the participants in the training set and set the quorum threshold halfway between both values (alternative ways of setting the quorum threshold are described below). We then investigated the performance of this quorum threshold in the validation set. For all SLs in the validation set, we randomly drew n (range, 1-11, only using odd numbers) evaluations. For each SL in the validation set, we then determined the fraction of evaluations supporting the condition-present classification (for the first-step algorithm, melanocytic). Whenever this fraction was higher than (or equal to) the quorum threshold, the SL was classified as condition present; otherwise as condition absent. After classifying each SL in the validation set in this way, we used the histopathologic records to determine the true- and false-positive rates of the quorum rule in the validation set. For each group size, we repeated this procedure 2000 times. We report the mean (SEM) true- and false-positive rates per group size. The SLs in the validation set are different from those in the training set. This cross-validation procedure ensures an independent evaluation of the performance of the quorum threshold, thereby preventing overfitting.

Best Individual Performance

We determined the performance of the best individual in each group using a similar cross-validation procedure. First, we randomly assigned half of the SLs to a training set and the other half to a validation set. The training set was used to identify the best individual. For a given group size n (range, 1-11, only using odd numbers), we randomly drew n individuals and determined the performance of these individuals in the training set. In the training set, we calculated the true-positive and true-negative rates of each individual and selected the best individual, giving equal weight to that individual’s true-positive and true-negative rates. We then calculated the true- and false-positive rates of this best individual using the SLs from the validation set. We repeated this procedure 2000 times per group size.

Statistical Analysis

Data were analyzed from January 5 through August 7, 2015. We analyzed the effect of group size (ie, the number of independent evaluations) on the true- and false-positive rates using generalized linear models with binomial errors and a logit-link function because true- and false-positive rates were bound from 0 to 1. We used the built-in generalized linear modeling function in R (version 3.2.0; R Development Core Team 2010). Significance levels were derived from the z scores and associated P values and set at P < .05.

Results

Table 1 provides an overview of the demographics of the participants in studies 1 and 2. Figure 2 shows the results of applying the majority and quorum rules to the first-step, pattern analysis, and 3-point checklist diagnostic algorithms. We found that an increasing group size increased the diagnostic accuracy independently of the diagnostic algorithms. Compared with single decision makers, groups using the majority or the quorum rule achieved higher true-positive rates (Figure 2A, C, and E) and lower false-positive rates (Figure 2B, D, and F). These effects already occurred at a group size of 3 and further increased with increasing group size. To illustrate, the mean individual true-positive rate under the pattern analysis algorithm is 0.83 (Figure 2C) and the mean individual false-positive rate is 0.17 (Figure 2D). In contrast, combining 3 independent raters using the majority rule results in a true-positive rate of 0.91 (Figure 2C) and a false-positive rate of 0.14 (Figure 2D). When we compared the 2 different types of malignant lesions in study 2 (melanoma and basal cell carcinoma), we found that the true-positive rate increased with increasing group size for both types (eFigure 1 in the Supplement). Generally, improvements stabilized at a group size of approximately 10.

Groups using the quorum rule outperformed the best individual in that group. The quorum rule achieves higher true-positive rates (Figure 2A) and lower false-positive rates (Figure 2B) or higher true-positive rates (Figure 2C and E) and comparable or slightly higher false-positive rates (Figure 2D and F). The majority rule outperforms the best individual in some of the cases (Figure 2A-D) and achieves higher true-positive rates (Figure 2E) but also higher false-positive rates (Figure 2F) in other cases.

When applying the collective intelligence rules to any of the other 3 diagnostic algorithms aimed at differentiating between melanoma and benign melanocytic lesions (the ABCD rule, the Menzies method, and the 7-point checklist in study 1), we obtained qualitatively similar results (Table 2 and eFigures 2-4 in the Supplement).

In the analysis above, we always set the quorum threshold halfway between the mean true- and false-positive rates of the raters. However, a key advantage of the quorum rule compared with the majority rule is that it can be adjusted to put more weight on improving the true- or the false-positive rate.27,28 Generally, higher quorum thresholds tend to decrease true- and false-positive rates because more evaluations for condition present (eg, melanocytic SL) are needed to classify an SL as condition present. Conversely, lower quorum thresholds tend to increase true- and false-positive rates. This trade-off between true- and false-positive rates is well-known for individual decision makers,25,29 and here it is present at the group level.

To illustrate the flexibility of the quorum threshold, we used a range of different quorum thresholds (range, 0-1, with increments of 0.05) and calculated for each threshold the associated true- and false-positive rates. Figure 3 shows the results of these analyses for the first-step, pattern analysis, and 3-point checklist diagnostic algorithms illustrating the trade-off between the true- and false-positive rates at the collective level. The other 3 diagnostic algorithms showed a similar pattern (eFigure 5 in the Supplement).

Discussion

Our results show that 2 well-known collective intelligence rules that combine the independent assessments of multiple raters can improve performance in detection of skin cancer (ie, increase the true-positive rate and decrease the false-positive rate). Specifically, we show that collective intelligence increases the diagnostic accuracy of melanoma in study 1 and of melanoma and basal cell carcinoma in study 2. Given that we tested our collective intelligence approach in different scenarios (2 independent data sets and 6 diagnostic algorithms), this result appears to be particularly robust.

The majority rule requires no prior information before implementation because for any given SL, it follows the decision of the majority of the raters. In contrast, the quorum rule requires information before implementation because it is based on a quorum threshold that has to be set below the mean true-positive rate and above the mean false-positive rate of the raters. Although the quorum rule thus requires some prior information, it is more flexible than the majority rule. First, the majority rule only works well when the mean true-positive rate is well above 50% and the mean false-positive rate is well below 50%.2224 The quorum rule, however, is more flexible and is predicted to work whenever the quorum threshold is set between the true- and false-positive rates of the raters.25 A second benefit of the quorum rule is that the threshold can be shifted upward or downward, depending on which of the 2 types of errors (ie, false-positive or false-negative) is deemed more important to prevent. In dermoscopy, the main goal is to maximize the true-positive rate while maintaining an acceptable degree of false-positive findings.30 A majority rule has a fixed threshold and thus does not allow for such adjustments.

An important future question is to understand the mechanisms underlying the observed collective improvement. A first mechanism could be that some poor performers bring down the mean individual accuracy. When combining decisions, these erroneous decisions get filtered out. This explanation seems unlikely because in our data sets, the vast majority of individual raters is outperformed by the collective approach. A second possibility is that raters differ in their relative ability to evaluate the different cues used for diagnosis. For example, the 3-point checklist requires the evaluation of 3 cues (asymmetry, atypical network, and blue-white structure). If the errors that different raters make when evaluating these different cues are not perfectly correlated, then this could give rise to collective improvement.31,32 Such a scenario would be an example of the importance of diversity for collective intelligence.33,34

At present, collective intelligence is rarely used in medical decision making, and few studies have investigated the potential of such an approach, including King et al,19 Hukkinen et al,35 Duijm et al,36 Farnetani et al,37 and Wolf et al.38 Technological developments could play an important role in facilitating a collective intelligence approach. Online exchange of information (eg, images) avoids the necessity of seeing a medical specialist and would allow for a relatively quick assessment by multiple experts. In skin cancer diagnostics, a number of technological developments are ongoing in this direction. For example, mobile teledermatology investigates the possibility of people taking pictures of SLs with mobile-phone apps, which are then made available to dermoscopists. This approach shows promising rates of accuracy3942 and would be highly compatible with a collective intelligence approach.

An important cost of a collective intelligence approach is the extra viewing time by medical specialists. These additional costs have to be weighed against the potential benefits: a higher true-positive rate could decrease mortality risk, and a lower false-positive rate could reduce financial (fewer erroneous additional workups) and emotional costs. Further investigations will be necessary to quantify the precise costs and benefits of a collective approach.

Although we evaluated 2 independent data sets and used different diagnostic algorithms and collective classifiers, future studies should address the generality of our results within skin cancer diagnostics and medical diagnostics in general. Further, although the observations were made in a setting closely resembling clinical practice (eg, experienced dermoscopists evaluating images of real SLs), the setting was not akin to clinical practice. In clinical practice, the SL of a patient normally undergoes direct evaluation by a dermatologist using a dermatoscope.

Conclusions

We show that a collective intelligence approach can improve diagnostic accuracy substantially in skin cancer detection by increasing true-positive and decreasing false-positive rates. Our results, in combination with rapid developments in technological possibilities, suggest that collective intelligence might be a viable method in the ongoing efforts to reduce skin cancer–related mortality rates.

Back to top
Article Information

Corresponding Author: Ralf H. J. M. Kurvers, PhD, Center for Adaptive Rationality, Max Planck Institute for Human Development, Lentzeallee 94, Berlin 14195, Germany (kurvers@mpib-berlin.mpg.de).

Accepted for Publication: July 25, 2015.

Published Online: October 21, 2015. doi:10.1001/jamadermatol.2015.3149.

Author Contributions: Dr Kurvers had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Kurvers, Krause, Wolf.

Acquisition, analysis, or interpretation of data: Kurvers, Krause, Argenziano, Zalaudek.

Drafting of the manuscript: Kurvers, Krause, Wolf.

Critical revision of the manuscript for important intellectual content: Kurvers, Krause, Argenziano, Zalaudek.

Statistical analysis: Kurvers.

Obtained funding: Krause, Wolf.

Administrative, technical, or material support: Zalaudek, Wolf.

Study supervision: Krause.

Conflict of Interest Disclosures: None reported.

Funding/Support: This study was supported by Leibniz Competition Grant SAW-2013-IGB-2 from the Leibniz Association (Drs Krause and Wolf) and by the Rubicon Grant 825.11.014 from the Netherlands Organisation for Scientific Research (Dr Kurvers).

Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

References
1.
Garbe  C, Leiter  U.  Melanoma epidemiology and trends.  Clin Dermatol. 2009;27(1):3-9.PubMedGoogle ScholarCrossref
2.
Mayer  JE, Swetter  SM, Fu  T, Geller  AC.  Screening, early detection, education, and trends for melanoma: current status (2007-2013) and future directions, part I: epidemiology, high-risk groups, clinical strategies, and diagnostic technology.  J Am Acad Dermatol. 2014;71(4):599.e1-599.e12. doi:10.1016/j.jaad.2014.05.046.PubMedGoogle ScholarCrossref
3.
Mayer  JE, Swetter  SM, Fu  T, Geller  AC.  Screening, early detection, education, and trends for melanoma: current status (2007-2013) and future directions, part II: screening, education, and future directions.  J Am Acad Dermatol. 2014;71(4):611.e1-611.e10. doi:10.1016/j.jaad.2014.05.045.PubMedGoogle ScholarCrossref
4.
Kittler  H, Pehamberger  H, Wolff  K, Binder  M.  Diagnostic accuracy of dermoscopy.  Lancet Oncol. 2002;3(3):159-165.PubMedGoogle ScholarCrossref
5.
Bafounta  ML, Beauchet  A, Aegerter  P, Saiag  P.  Is dermoscopy (epiluminescence microscopy) useful for the diagnosis of melanoma? results of a meta-analysis using techniques adapted to the evaluation of diagnostic tests.  Arch Dermatol. 2001;137(10):1343-1350.PubMedGoogle ScholarCrossref
6.
Ascierto  PA, Satriano  RA, Palmieri  G, Parasole  R, Bosco  L, Castello  G.  Epiluminescence microscopy as a useful approach in the early diagnosis of cutaneous malignant melanoma.  Melanoma Res. 1998;8(6):529-537.PubMedGoogle ScholarCrossref
7.
Vestergaard  ME, Macaskill  P, Holt  PE, Menzies  SW.  Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a meta-analysis of studies performed in a clinical setting.  Br J Dermatol. 2008;159(3):669-676.PubMedGoogle Scholar
8.
Rajpara  SM, Botello  AP, Townend  J, Ormerod  AD.  Systematic review of dermoscopy and digital dermoscopy/ artificial intelligence for the diagnosis of melanoma.  Br J Dermatol. 2009;161(3):591-604.PubMedGoogle ScholarCrossref
9.
Rubegni  P, Burroni  M, Cevenini  G,  et al.  Digital dermoscopy analysis and artificial neural network for the differentiation of clinically atypical pigmented skin lesions: a retrospective study.  J Invest Dermatol. 2002;119(2):471-474.PubMedGoogle ScholarCrossref
10.
Ganster  H, Pinz  A, Röhrer  R, Wildling  E, Binder  M, Kittler  H.  Automated melanoma recognition.  IEEE Trans Med Imaging. 2001;20(3):233-239.PubMedGoogle ScholarCrossref
11.
Garbe  C, Eigentler  TK.  Diagnosis and treatment of cutaneous melanoma: state of the art 2006.  Melanoma Res. 2007;17(2):117-127.PubMedGoogle ScholarCrossref
12.
Krause  J, Ruxton  GD, Krause  S.  Swarm intelligence in animals and humans.  Trends Ecol Evol. 2010;25(1):28-34.PubMedGoogle ScholarCrossref
13.
Bonabeau  E, Dorigo  M, Theraulaz  G.  Swarm Intelligence: From Natural to Artificial Systems. Oxford, England: Oxford University Press; 1999.
14.
Couzin  ID.  Collective cognition in animal groups.  Trends Cogn Sci. 2009;13(1):36-43.PubMedGoogle ScholarCrossref
15.
Woolley  AW, Chabris  CF, Pentland  A, Hashmi  N, Malone  TW.  Evidence for a collective intelligence factor in the performance of human groups.  Science. 2010;330(6004):686-688.PubMedGoogle ScholarCrossref
16.
Surowiecki  J.  The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. New York, NY: Knopf Doubleday Publishing Group; 2004.
17.
Arrow  KJ, Forsythe  R, Gorham  M,  et al.  Economics: the promise of prediction markets.  Science. 2008;320(5878):877-878.PubMedGoogle ScholarCrossref
18.
Clément  RJ, Krause  S, von Engelhardt  N, Faria  JJ, Krause  J, Kurvers  RH.  Collective cognition in humans: groups outperform their best members in a sentence reconstruction task.  PLoS One. 2013;8(10):e77943. doi:10.1371/journal.pone.0077943.PubMedGoogle ScholarCrossref
19.
King  AJ, Gehl  RW, Grossman  D, Jensen  JD.  Skin self-examinations and visual identification of atypical nevi: comparing individual and crowdsourcing approaches.  Cancer Epidemiol. 2013;37(6):979-984.PubMedGoogle ScholarCrossref
20.
Argenziano  G, Soyer  HP, Chimenti  S,  et al.  Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet.  J Am Acad Dermatol. 2003;48(5):679-693.PubMedGoogle ScholarCrossref
21.
Zalaudek  I, Argenziano  G, Soyer  HP,  et al; Dermoscopy Working Group.  Three-point checklist of dermoscopy: an open Internet study.  Br J Dermatol. 2006;154(3):431-437.PubMedGoogle ScholarCrossref
22.
Sumpter  DJ, Pratt  SC.  Quorum responses and consensus decision making.  Philos Trans R Soc B Biol Sci. 2009;364(1518):743-753.Google ScholarCrossref
23.
Hastie  R, Kameda  T.  The robust beauty of majority rules in group decisions.  Psychol Rev. 2005;112(2):494-508.PubMedGoogle ScholarCrossref
24.
Sorkin  RD, West  R, Robinson  DE.  Group performance depends on the majority rule.  Psychol Sci. 1998;9(6):456-463.Google ScholarCrossref
25.
Wolf  M, Kurvers  RH, Ward  AJ, Krause  S, Krause  J.  Accurate decisions in an uncertain world: collective cognition increases true positives while decreasing false positives.  Proc R Soc B. 2013;280(1756):20122777. doi:10.1098/rspb.2012.2777.Google ScholarCrossref
26.
Kurvers  RJM, Wolf  M, Krause  J.  Humans use social information to adjust their quorum thresholds adaptively in a simulated predator detection experiment.  Behav Ecol Sociobiol. 2014;68(3):449-456.Google ScholarCrossref
27.
Swets  JA.  The science of choosing the right decision threshold in high-stakes diagnostics.  Am Psychol. 1992;47(4):522-532.PubMedGoogle ScholarCrossref
28.
Swets  JA, Dawes  RM, Monahan  J.  Psychological science can improve diagnostic decisions.  Psychol Sci Public Interest. 2000;1(1):1-26.PubMedGoogle ScholarCrossref
29.
Swets  JA.  Measuring the accuracy of diagnostic systems.  Science. 1988;240(4857):1285-1293.PubMedGoogle ScholarCrossref
30.
Argenziano  G, Fabbrocini  G, Carli  P, De Giorgi  V, Sammarco  E, Delfino  M.  Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis.  Arch Dermatol. 1998;134(12):1563-1570.PubMedGoogle ScholarCrossref
31.
Conradt  L, List  C, Roper  TJ.  Swarm intelligence: when uncertainty meets conflict.  Am Nat. 2013;182(5):592-610.PubMedGoogle ScholarCrossref
32.
Kao  AB, Couzin  ID.  Decision accuracy in complex environments is often maximized by small group sizes [published online April 23, 2014].  Proc R Soc B. doi:10.1098/rspb.2013.3305.Google Scholar
33.
Krause  S, James  R, Faria  JJ, Ruxton  GD, Krause  J.  Swarm intelligence in humans: diversity can trump ability.  Anim Behav. 2011;81(5):941-948.Google ScholarCrossref
34.
Luan  S, Katsikopoulos  KV, Reimer  T.  When does diversity trump ability (and vice versa) in group decision making? a simulation study.  PLoS One. 2012;7(2):e31043.PubMedGoogle ScholarCrossref
35.
Hukkinen  K, Kivisaari  L, Vehmas  T.  Impact of the number of readers on mammography interpretation.  Acta Radiol. 2006;47(7):655-659.PubMedGoogle ScholarCrossref
36.
Duijm  LEM, Louwman  MWJ, Groenewoud  JH, van de Poll-Franse  LV, Fracheboud  J, Coebergh  JW.  Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome.  Br J Cancer. 2009;100(6):901-907.PubMedGoogle ScholarCrossref
37.
Farnetani  F, Scope  A, Braun  RP,  et al.  Skin cancer diagnosis with reflectance confocal microscopy: reproducibility of feature recognition and accuracy of diagnosis [published online May 20, 2015].  JAMA Dermatol. doi:10.1001/jamadermatol.2015.0810.PubMedGoogle Scholar
38.
Wolf  M, Krause  J, Carney  PA, Bogart  A, Kurvers  RHJM.  Collective intelligence meets medical decision making: the collective outperforms the best radiologist [published online August 12, 2015].  PLoS One. doi:10.1371/journal.pone.0134269. PubMedGoogle Scholar
39.
Massone  C, Brunasso  AMG, Hofmann-Wellenhof  R, Gulia  A, Soyer  HP.  Teledermoscopy: education, discussion forums, teleconsulting and mobile teledermoscopy.  G Ital Dermatol Venereol. 2010;145(1):127-132.PubMedGoogle Scholar
40.
Massone  C, Hofmann-Wellenhof  R, Ahlgrimm-Siess  V, Gabler  G, Ebner  C, Soyer  HP.  Melanoma screening with cellular phones.  PLoS One. 2007;2(5):e483. doi:10.1371/journal.pone.0000483.PubMedGoogle ScholarCrossref
41.
Kroemer  S, Frühauf  J, Campbell  TM,  et al.  Mobile teledermatology for skin tumour screening: diagnostic accuracy of clinical and dermoscopic image tele-evaluation using cellular phones.  Br J Dermatol. 2011;164(5):973-979.PubMedGoogle ScholarCrossref
42.
Ebner  C, Wurm  EMT, Binder  B,  et al.  Mobile teledermatology: a feasibility study of 58 subjects using mobile phones.  J Telemed Telecare. 2008;14(1):2-7.PubMedGoogle ScholarCrossref
×