[Skip to Content]
Sign In
Individual Sign In
Create an Account
Institutional Sign In
OpenAthens Shibboleth
[Skip to Content Landing]
Figure.
Comparison of the Diagnostic Accuracy of the Dermoscopic Algorithms
Comparison of the Diagnostic Accuracy of the Dermoscopic Algorithms

Receiver operating characteristic curves for 6 dermoscopic algorithms were evaluated. CASH indicates color, architecture, symmetry, and homogeneity.

Table 1.  
Comparison of Dermoscopic Criteria of Simplified Diagnostic Algorithms for Melanoma
Comparison of Dermoscopic Criteria of Simplified Diagnostic Algorithms for Melanoma
Table 2.  
Participant Characteristics
Participant Characteristics
Table 3.  
Association Between Dermoscopic Criteria With Melanoma Status
Association Between Dermoscopic Criteria With Melanoma Status
Table 4.  
Measures of Diagnostic Accuracy for 6 Dermascopic Algorithms
Measures of Diagnostic Accuracy for 6 Dermascopic Algorithms
1.
Kittler  H, Pehamberger  H, Wolff  K, Binder  M.  Diagnostic accuracy of dermoscopy.  Lancet Oncol. 2002;3(3):159-165.PubMedGoogle ScholarCrossref
2.
Argenziano  G, Fabbrocini  G, Carli  P, De Giorgi  V, Sammarco  E, Delfino  M.  Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis.  Arch Dermatol. 1998;134(12):1563-1570.PubMedGoogle ScholarCrossref
3.
Stolz  W, Riemann  A, Cognetta  AB,  et al.  ABCD rule of dermoscopy: a new practical method for early recognition of malignant melanoma.  Eur J Dermatol. 1994;4(7):521-527.Google Scholar
4.
Soyer  HP, Argenziano  G, Zalaudek  I,  et al.  Three-point checklist of dermoscopy: a new screening method for early detection of melanoma.  Dermatology. 2004;208(1):27-31.PubMedGoogle ScholarCrossref
5.
Henning  JS, Dusza  SW, Wang  SQ,  et al.  The CASH (color, architecture, symmetry, and homogeneity) algorithm for dermoscopy.  J Am Acad Dermatol. 2007;56(1):45-52.PubMedGoogle ScholarCrossref
6.
Menzies  SW, Ingvar  C, Crotty  KA, McCarthy  WH.  Frequency and morphologic characteristics of invasive melanomas lacking specific surface microscopic features.  Arch Dermatol. 1996;132(10):1178-1182.PubMedGoogle ScholarCrossref
7.
Rosendahl  C, Cameron  A, McColl  I, Wilkinson  D.  Dermatoscopy in routine practice: ‘chaos and clues’.  Aust Fam Physician. 2012;41(7):482-487.PubMedGoogle Scholar
8.
Dolianitis  C, Kelly  J, Wolfe  R, Simpson  P.  Comparative performance of 4 dermoscopic algorithms by nonexperts for the diagnosis of melanocytic lesions.  Arch Dermatol. 2005;141(8):1008-1014.PubMedGoogle ScholarCrossref
9.
Pizzichetta  MA, Talamini  R, Marghoob  AA,  et al.  Negative pigment network: an additional dermoscopic feature for the diagnosis of melanoma.  J Am Acad Dermatol. 2013;68(4):552-559.PubMedGoogle ScholarCrossref
10.
Balagula  Y, Braun  RP, Rabinovitz  HS,  et al.  The significance of crystalline/chrysalis structures in the diagnosis of melanocytic and nonmelanocytic lesions.  J Am Acad Dermatol. 2012;67(2):194.e1-194.e8.PubMedGoogle ScholarCrossref
11.
Argenziano  G, Soyer  HP, Chimenti  S,  et al.  Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet.  J Am Acad Dermatol. 2003;48(5):679-693.PubMedGoogle ScholarCrossref
12.
Carli  P, Quercioli  E, Sestini  S,  et al.  Pattern analysis, not simplified algorithms, is the most reliable method for teaching dermoscopy for melanoma diagnosis to residents in dermatology.  Br J Dermatol. 2003;148(5):981-984.PubMedGoogle ScholarCrossref
13.
Rosendahl  C, Tschandl  P, Cameron  A, Kittler  H.  Diagnostic accuracy of dermatoscopy for melanocytic and nonmelanocytic pigmented lesions.  J Am Acad Dermatol. 2011;64(6):1068-1073.PubMedGoogle ScholarCrossref
14.
Argenziano  G, Puig  S, Zalaudek  I,  et al.  Dermoscopy improves accuracy of primary care physicians to triage lesions suggestive of skin cancer.  J Clin Oncol. 2006;24(12):1877-1882.PubMedGoogle ScholarCrossref
15.
Menzies  SW, Emery  J, Staples  M,  et al.  Impact of dermoscopy and short-term sequential digital dermoscopy imaging for the management of pigmented lesions in primary care: a sequential intervention trial.  Br J Dermatol. 2009;161(6):1270-1277.PubMedGoogle ScholarCrossref
16.
International Society for Digital Imaging of the Skin. http://isdis.net/isic-project/. Accessed October 7, 2015.
17.
International Skin Imaging Collaboration (ISIC). Melanoma Project. ISIC Archive.https://isic-archive.com/. Accessed October 7, 2015.
18.
Kurvers  RH, Krause  J, Argenziano  G, Zalaudek  I, Wolf  M.  Detection accuracy of collective intelligence assessments for skin cancer diagnosis.  JAMA Dermatol. 2015;151(12):1346-1353.PubMedGoogle ScholarCrossref
19.
Katalinic  A, Waldmann  A, Weinstock  MA,  et al.  Does skin cancer screening save lives? an observational study comparing trends in melanoma mortality in regions with and without screening.  Cancer. 2012;118(21):5395-5402.PubMedGoogle ScholarCrossref
Original Investigation
July 2016

Validity and Reliability of Dermoscopic Criteria Used to Differentiate Nevi From Melanoma: A Web-Based International Dermoscopy Society Study

Author Affiliations
  • 1Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
  • 2Melanoma Unit, Department of Dermatology, Hospital Clinic Barcelona, Institut d’Investigacions Biomèdiques August Pi i Sunyer, University of Barcelona, Centro de Investigacion Biomedica en red de enfermedades raras, Barcelona, Spain
  • 3Dermatology Unit, Second University of Naples, Naples, Italy
  • 4Department of Dermatology, University Hospital Zürich, Zürich, Switzerland
  • 5Dermatology Service, Aurora Skin Cancer Center, Universidad Pontificia Bolivariana, Medellín, Colombia
  • 6Department of Dermatology, Medical University of Vienna, Vienna, Austria
  • 7Sydney Melanoma Diagnostic Centre, Sydney Cancer Centre, Royal Prince Alfred Hospital, Camperdown, Australia
  • 8Discipline of Dermatology, The University of Sydney, New South Wales, Australia
  • 9Center for Environmental, Genetic, and Nutritional Epidemiology, Department of Diagnostic, Clinical, and Public Health Medicine, University of Modena and Reggio Emilia, Modena, Italy
  • 10Skin and Cancer Associates, Plantation, Florida
  • 11Department of Dermatology, Sheba Medical Center, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
  • 12Dermatology Research Centre, The University of Queensland, Brisbane, Queensland, Australia
  • 13School of Medicine, Translational Research Institute, Brisbane, Queensland, Australia
  • 14Clinic for Dermatology, Allergology, and Environmental Medicine, Klinik Thalkirchner Straße Städt, Klinikum München GmbH, Munich, Germany
  • 15Department of Dermatology, Medical University of Graz, Graz, Austria
 

Copyright 2016 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.

JAMA Dermatol. 2016;152(7):798-806. doi:10.1001/jamadermatol.2016.0624
Abstract

Importance  The comparative diagnostic performance of dermoscopic algorithms and their individual criteria are not well studied.

Objectives  To analyze the discriminatory power and reliability of dermoscopic criteria used in melanoma detection and compare the diagnostic accuracy of existing algorithms.

Design, Setting, and Participants  This was a retrospective, observational study of 477 lesions (119 melanomas [24.9%] and 358 nevi [75.1%]), which were divided into 12 image sets that consisted of 39 or 40 images per set. A link on the International Dermoscopy Society website from January 1, 2011, through December 31, 2011, directed participants to the study website. Data analysis was performed from June 1, 2013, through May 31, 2015. Participants included physicians, residents, and medical students, and there were no specialty-type or experience-level restrictions. Participants were randomly assigned to evaluate 1 of the 12 image sets.

Main Outcomes and Measures  Associations with melanoma and intraclass correlation coefficients (ICCs) were evaluated for the presence of dermoscopic criteria. Diagnostic accuracy measures were estimated for the following algorithms: the ABCD rule, the Menzies method, the 7-point checklist, the 3-point checklist, chaos and clues, and CASH (color, architecture, symmetry, and homogeneity).

Results  A total of 240 participants registered, and 103 (42.9%) evaluated all images. The 110 participants (45.8%) who evaluated fewer than 20 lesions were excluded, resulting in data from 130 participants (54.2%), 121 (93.1%) of whom were regular dermoscopy users. Criteria associated with melanoma included marked architectural disorder (odds ratio [OR], 6.6; 95% CI, 5.6-7.8), pattern asymmetry (OR, 4.9; 95% CI, 4.1-5.8), nonorganized pattern (OR, 3.3; 95% CI, 2.9-3.7), border score of 6 (OR, 3.3; 95% CI, 2.5-4.3), and contour asymmetry (OR, 3.2; 95% CI, 2.7-3.7) (P < .001 for all). Most dermoscopic criteria had poor to fair interobserver agreement. Criteria that reached moderate levels of agreement included comma vessels (ICC, 0.44; 95% CI, 0.40-0.49), absence of vessels (ICC, 0.46; 95% CI, 0.42-0.51), dark brown color (ICC, 0.40; 95% CI, 0.35-0.44), and architectural disorder (ICC, 0.43; 95% CI, 0.39-0.48). The Menzies method had the highest sensitivity for melanoma diagnosis (95.1%) but the lowest specificity (24.8%) compared with any other method (P < .001). The ABCD rule had the highest specificity (59.4%). All methods had similar areas under the receiver operating characteristic curves.

Conclusions and Relevance  Important dermoscopic criteria for melanoma recognition were revalidated by participants with varied experience. Six algorithms tested had similar but modest levels of diagnostic accuracy, and the interobserver agreement of most individual criteria was poor.

Introduction

Use of dermoscopy by trained users, but not novices, improves diagnostic accuracy for cutaneous melanoma compared with naked eye examination alone.1 Experts of dermoscopy tend to review a dermoscopic image and reach a diagnosis without use of structured analytical criteria, a diagnostic process that can be referred to as pattern analysis. Multiple simplified dermoscopic algorithms, such as the ABCD rule, the Menzies method, the 7-point checklist, the 3-point checklist, chaos and clues, and CASH (color, architecture, symmetry, and homogeneity), were developed to facilitate a novice’s ability to distinguish melanomas from nevi with high diagnostic accuracy.2-7 A comparison of these algorithms reveals 2 diverging approaches to simplified melanoma detection (Table 1). The ABCD rule and CASH principally quantify the overall organization of a lesion by assessing features such as symmetry, architectural disorder, border sharpness, and heterogeneity in colors and structures. However, the 7-point checklist relies on the identification of atypical appearances of dermoscopic structures (eg, atypical network) in distinction from otherwise normal counterparts or on identifying unique structures strongly associated with melanoma (eg, regression). Chaos and clues, the Menzies method, and the 3-point checklist include elements of both approaches.

Although each algorithm has unique criteria, there is significant overlap in their concepts, which may explain why the ABCD rule, the Menzies method, and the 7-point checklist have similar overall accuracy in the diagnosis of melanocytic lesions by novices.8 Beginners and instructors of dermoscopy are consequently unclear as to which, if any, algorithm(s) they should use and teach, respectively. In addition, no algorithm has been significantly revised since its initial publication to include newly identified dermoscopic features with high specificity for melanoma, such as negative network or white shiny structures.9,10 A critical need exists to better understand the comparative diagnostic performance of dermoscopic algorithms, in particular the discriminatory power and interobserver agreement of their individual criteria. The primary objective of this study was to measure the discriminatory power and interobserver agreement of individual dermoscopic criteria, including newly described dermoscopic features. A secondary objective was to compare the diagnostic accuracy of 6 existing simplified algorithms.

Box Section Ref ID

Key Points

  • Question What is the discriminatory power and reliability of dermoscopic criteria used in melanoma detection?

  • Findings In this survey-based study, the diagnostic importance of new and previously identified dermoscopic criteria for melanoma detection was validated; however, the majority of criteria had poor to fair interobserver agreement. Criteria with relatively strong discriminatory power and moderate levels of interobserver agreement included architectural disorder, pattern asymmetry, contour asymmetry, comma vessels, and absence of vessels.

  • Meaning Further efforts are needed to standardize terminology and definitions of dermoscopic criteria.

Methods

The Memorial Sloan Kettering Cancer Center Institutional Review Board approved this study without the requirement for written informed consent in accordance with the Helsinki Declaration. Data were deidentified.

Lesion Selection

Twelve pigmented lesion clinics from Australia, Austria, Germany, Italy, Spain, Switzerland, and the United States contributed study images. Each contributor provided up to 50 lesions with a 1:3 ratio of melanomas to nevi. Melanomas were required to have an unequivocal histopathologic diagnosis, and nevi were required to be histopathologically verified or to have demonstrated stability under sequential dermoscopic imaging over time. Contributors sequentially selected lesions from their patient records and used 1:1 randomization of lesions into polarized vs nonpolarized sets. Other requested data included anatomical location, patient age and sex, imaging modality (polarized vs nonpolarized), and a clinical close-up image.

A total of 580 lesions (140 melanomas and 440 nevi) were contributed to the study. Lesions were reviewed by Memorial Sloan Kettering Cancer Center investigators, and 103 were excluded because of (1) location on acral, mucosal, or facial sites, (2) inadequate image quality, (3) equivocal diagnosis after review of the pathology report or sequential imaging, (4) nonmelanocytic lesions, and (5) lesions from patients younger than 18 years. The final data set was composed of 477 unique lesions, of which 119 (24.9%) were melanomas. Lesions were randomized into 12 image sets that contained 39 (n = 8) or 40 (n = 7) unique lesions and 5 nonunique lesion images (2 melanoma, 3 benign) that were repeated in all sets.

Web-Based Study Interface

Algorithm tutorials were created and posted by dermoscopic experts through the International Dermoscopy Society (IDS) website. Review of tutorials was encouraged but not mandatory for participants, and links to tutorials were available on the main study site interface and the data collection form.

Participant Selection

A link present on the IDS website from January 1, 2011, through December 31, 2011, directed participants to the study website (www.dermoscopy-ids.org/). Data analysis was performed from June 1, 2013, through May 31, 2015. Participation was open to attending physicians, residents, and medical students and was not restricted by specialty type or experience level. Image contributors were excluded from the study. Participants were required to register and specify their specialty, years of clinical experience, preferred dermoscopic analysis method, dermoscopy frequency of use, predominant modality (polarized vs nonpolarized) of use, and experience. There was no incentive for study participation.

Two hundred forty participants registered for the study, and 103 (42.9%) completed all available images in their data sets. The 110 participants (45.8%) who evaluated fewer than 20 lesions were excluded, resulting in data from a total of 130 participants (54.2%) eligible for analysis.

Participant Evaluation

A comprehensive list of all dermoscopic structures from the dermoscopy algorithms was created, and overlapping criteria were merged into 1 criterion (eg, granularity and peppering were combined into 1 criterion). Newly identified dermoscopic structures with high specificity for melanoma (eg, negative network, chrysalis structures [shiny white or crystalline structures], polymorphous vessels, atypical vessels, and pink veil) were included. Criteria included (1) global pattern, (2) pattern organization, (3) symmetry of contour, (4) symmetry of pattern, (5) architectural disorder, (6) abruptness of lesion border, (7) colors, and (8) melanocytic structures, including network and vascular structures. Participants examined the close-up clinical image of each lesion before viewing the dermoscopic image. The modality (polarized vs nonpolarized) of dermoscopic images was specified. There were no time constraints. For each lesion, the participant indicated the presence or absence of all dermoscopic criteria on the same webpage. Users were unable to modify their responses for a lesion after submission of data.

Statistical Analysis

Descriptive statistics and graphic methods were used to describe participant and lesion characteristics and participant dermoscopic evaluations because block randomization was used and no participants evaluated all images. Data were assessed as individual dermoscopic evaluations and as consensus evaluations for participants who reviewed a given study lesion. For individual evaluations, prevalence of each dermoscopic feature was tabulated along with 95% CIs. To quantify the association for the presence or absence of each feature with melanoma status, tabular cross-classifications, χ2 statistics, and the associated odds ratios (ORs) and 95% CIs were calculated. Robust SEs were estimated to adjust for the clustered observations within reviewers. Intraclass correlation coefficients (ICCs) were estimated for each dermoscopic feature using 2-way random-effects models, with the dermoscopic raters treated as a random effect. This approach assumes that raters are randomly sampled from the larger population of raters with dermoscopic experience. The ICC is equal to 0 when the agreement is exactly what is expected by chance and 1 when there is perfect agreement. Intermediate values were interpreted as follows: poor, 0.01 to less than 0.2; fair, 0.2 to less than 0.4; moderate, 0.4 to less than 0.6; substantial, 0.6 to less than 0.8; and almost perfect agreement, greater than 0.8.

For consensus evaluations, the presence or absence of each dermoscopic feature was calculated as the proportion of participants who identified the feature for a given lesion. When 50% or more of the participants identified a dermoscopic feature for a given study lesion, the attribute was considered present. We applied consensus evaluations to dermoscopic algorithms to evaluate performance. Using logistic regression models with the dichotomous outcome of melanoma vs nevus, we compared areas under the receiver operating characteristic (ROC) curve among the diagnostic algorithms. Analyses were performed with STATA statistical software, version 12.1 (StataCorp).

Results
Participants

The 130 participants who evaluated 20 lesions or more had a mean (SD) of 12 (8.7) years of dermatology experience. The mean (SD) percentages of their practice that was composed of skin cancer screening and the population at high risk for skin cancer were 33.5% (25.8%) and 14.4% (16.4%), respectively. A total of 73 participants (56.2%) reported being attending dermatologists, 122 (93.8%) were comfortable using dermoscopy, and 121 (93.1%) were regular users of dermoscopy (Table 2).

Lesion Evaluations

A total of 477 unique lesions were evaluated in the study. Each lesion was evaluated by a median of 12 participants, with the exception of the 5 lesions that were repeated in the 12 image sets and evaluated by all 130 participants, resulting in a total of 5670 unique lesion evaluations.

Interobserver Agreement of Dermoscopic Criteria

Most dermoscopic criteria had poor to fair interobserver agreement, including features such as atypical network (ICC, 0.21; 95% CI, 0.17-0.25), blue-white veil (ICC, 0.34; 95% CI, 0.30-0.39), regression (ICC, 0.11; 95% CI, 0.08-0.13), and atypical vessels (ICC, 0.26; 95% CI, 0.22-0.30) (Table 3).

Criteria with moderate levels of interobserver agreement included comma vessels (ICC, 0.44; 95% CI, 0.40-0.49), absence of vessels (ICC, 0.46; 95% CI, 0.42-0.51), dark brown color (ICC, 0.40; 95% CI, 0.35-0.44), and architectural disorder (ICC, 0.43; 95% CI, 0.39-0.48) (Table 3). Absence of network (ICC, 0.39; 95% CI, 0.34-0.43), pattern symmetry (ICC, 0.37; 95% CI, 0.32-0.41), contour symmetry (ICC, 0.37; 95% CI, 0.32-0.42), and total colors present (ICC, 0.36; 95% CI, 0.31-0.40) had similar levels of interobserver agreement.

Dermoscopic Criteria Associated With Melanoma Status

Criteria strongly associated with melanoma status (OR ≥3) included marked architectural disorder (OR, 6.6; 95% CI, 5.6-7.8), pattern asymmetry (OR, 4.9; 95% CI, 4.1-5.8), nonorganized pattern (OR, 3.3; 95% CI, 2.9-3.7), border score of 6 (OR, 3.3; 95% CI, 2.5-4.3), contour asymmetry (OR, 3.2; 95% CI, 2.7-3.7), polymorphous vessels (OR, 3.1; 95% CI, 2.4-4.0), border score of 5 (OR, 3.1; 95% CI, 2.3-4.2), and atypical vessels (OR, 3.0; 95% CI, 2.5-3.6) (P < .001 for all) (Table 3). Inability to determine features such as border score (OR, 4.1; 95% CI, 3.1-5.4), pattern symmetry (OR, 6.3; 95% CI, 3.6-10.8), and contour symmetry (OR, 6.3; 95% CI, 4.0-9.9) were also strongly associated with melanoma status (all P < .001). Other criteria associated with melanoma status are given in Table 3.

Criteria with a strong inverse association with melanoma status (OR <0.7) included comma vessels (OR, 0.4; 95% CI, 0.3-0.6), peripheral reticular with central hyperpigmentation global pattern (OR, 0.5; 95% CI, 0.4-0.6), globular global pattern (OR, 0.5; 95% CI, 0.4-0.6), 2-component symmetric global pattern (OR, 0.5; 95% CI, 0.3-0.7), regular brown dots (OR, 0.5; 95% CI, 0.4-0.6), regular brown globules (OR, 0.5; 95% CI, 0.4-0.7), absence of vessels (OR, 0.5; 95% CI, 0.4-0.5), regular blotch (OR, 0.4; 95% CI, 0.3-0.6), and light brown color (OR, 0.6; 95% CI, 0.5-0.7) (all P < .001) (Table 3).

The dermoscopic criteria with ICC levels of 0.37 or higher and relatively strong discriminatory power (OR ≥3.0 or <0.7) included comma vessels, absence of vessels, marked architectural disorder, pattern asymmetry, and contour asymmetry.

Newly Identified Dermoscopic Criteria

Negative network (OR, 1.4; 95% CI, 1.1-1.8; P = .005) and white shiny structures (OR, 2.5; 95% CI, 1.8-3.5; P < .001) were significantly associated with melanoma status. However, both had poor interobserver agreement levels (negative network: ICC, 0.15; 95%, CI 0.12-0.18; white shiny structures: ICC, 0.16; 95% CI, 0.13-0.19).

Comparison of Diagnostic Accuracy of the 6 Simplified Algorithms

Measures of diagnostic accuracy for the ABCD rule, the Menzies method, the 7-point checklist, the 3-point checklist, chaos and clues, and CASH are given in Table 4. Note that this analysis was artificially constructed by using the participants’ consensus evaluation of individual criteria (ie, when ≥50% of the participants identified a dermoscopic feature for a given study lesion, the attribute was considered present) and that participants did not directly score algorithms in a head-to-head comparison scenario. For these analyses, the data are presented with defined cut points for melanoma diagnosis. The Menzies method had the highest sensitivity for melanoma detection (95.1%; 95% CI, 89.0%-98.4%), significantly higher than any other method (P < .001), and the 3-point checklist had the lowest (68.9%; 95% CI, 59.8%-77.1%). The ABCD rule had the highest specificity (59.4%; 95% CI, 54.0%-64.6%), which was significantly higher compared with chaos and clues (40.2%; 95% CI, 35.1%-45.5%) and the Menzies method, which had the lowest (24.8%; 95% CI, 20.1%-30.1%) compared with any other (P < .001). Chaos and clues had significantly lower specificity compared with the ABCD rule and the 3- and 7-point checklists. The Figure shows the ROC curves of the 6 algorithms. No significant differences in ROC areas were observed in CASH, the 7-point checklist, the 3-point checklist, chaos and clues, and the ABCD rule (P = .44). However, the Menzies method had a lower ROC area compared with CASH, the 7-point checklist, the 3-point checklist, the ABCD rule, and chaos and clues, with P values for each comparison of .03, .03, .007, .001, and <.001, respectively.

Discussion

In this study, which involved participants of varied backgrounds who reported comfort with and regular use of dermoscopy, we revalidated the diagnostic importance of well-described criteria associated with melanoma, such as atypical network, irregular blotch, regression, streaks, pseudopods, atypical dots or globules, atypical vessels, any blue or white color, and blue-white veil. However, we found that these criteria had poor to fair levels of interobserver agreement. Criteria with the highest levels of discriminatory power and interobserver agreement included features not always highlighted in existing algorithms, such as comma vessels and absence of vessels, as well as subjective features that quantify the overall organization of a lesion, namely, architectural disorder and symmetry of pattern and contour. We further found that 6 simplified dermoscopy algorithms had similar but modest levels of diagnostic accuracy.

Few reproducibility studies of dermoscopic features have been performed, particularly investigating the discriminatory power and interobserver and intraobserver agreement of specific criteria. An Internet consensus meeting of dermoscopy experts in 2003 found that pattern analysis, the ABCD rule, the 7-point checklist, and the Menzies method all have high sensitivity and specificity for the diagnosis of melanoma.11 However, the interobserver agreement of the diagnostic methods was moderate, and many individual diagnostic structures had poor levels of interobserver agreement. The authors suggested that this discrepancy might be attributable to the importance of the overall dermoscopic gestalt of a given lesion to the assignment of a final diagnosis, independent of the recognition of individual criteria.11 Indeed, experts usually do not apply algorithms. In other words, evaluators may assign a diagnosis based on the overall impression of a lesion and then search for criteria to fit their decision. To avoid this potential bias, participants in our study evaluated the presence and absence of dermoscopic features but did not apply an algorithm or make a diagnosis. A comparative study8 of pattern analysis and the different algorithms among nonexperts have also found generally poor interobserver agreement for most individual dermoscopic criteria but much better results for the method as a whole. This interpretation is supported by a study12 of dermatology residents that found that pattern analysis, defined by the authors as the “simultaneous assessment of the diagnostic value of all dermoscopy features shown by the lesion,”12(p 981) had a higher diagnostic accuracy compared with the ABCD rule of dermoscopy and the 7-point checklist.

Of interest, in the present study, several features that indicate overall organization and symmetry had the highest agreement and discriminatory power, such as architectural disorder, contour asymmetry, and dermoscopic pattern asymmetry. These concepts have previously been summarized as disarrangement in appearance or chaos and support the usefulness of chaos and clues7 and the 3-point checklist,13 which were created for use in melanocytic and nonmelanocytic lesions. Reassuringly, well-designed, prospective clinical studies7,8,14,15 have found that use of dermoscopy significantly improves the ability of general practitioners to evaluate pigmented lesions in the primary care setting. Indeed, the 3-point checklist was tested in a clinical setting and allowed primary care physicians to perform 25.1% better triage of skin lesions suggestive of skin cancer compared with naked-eye examination alone.14 However, it remains unknown how general practitioners or novices rely on overall dermoscopic gestalt vs application of a dermoscopic algorithm when using dermoscopy in the daily clinical setting. To more broadly promote the use of dermoscopy in the primary care setting, our results suggest that significant efforts are needed to standardize and improve dermoscopic terminology, which is one of the central goals of the International Skin Imaging Collaboration Melanoma Project.16,17

Our data suggest that features that quantify the overall organization of a lesion (eg, architectural disorder and pattern asymmetry) have higher levels of interobserver agreement and discriminatory power than many well-known dermoscopic structures (eg, atypical network or irregular blotch); thus, criteria for overall organization of a lesion may not be sufficiently emphasized in dermoscopic algorithms for melanoma diagnosis. Specific dermoscopic structures with low prevalence, such as negative network, may still be robust criteria for melanoma diagnosis but had poor agreement and low discriminatory power in this study because participants may have received insufficient training to accurately identify them. Accordingly, criteria that are useful in melanoma diagnosis should not be abandoned but rather readdressed and potentially refined through further study. This point also highlights the evolving nature and current lack of standardization of dermoscopy teaching worldwide and the critical need to determine effective teaching methods of dermoscopy.

Several factors may contribute to the poor interobserver agreement levels observed in this study. First, participants may not have received sufficient training in the definitions of criteria or, despite training, they used different definitions of criteria, potentially influenced by their personal experience with dermoscopy. To help mitigate these potential factors, we created algorithm tutorials with definitions of criteria. However, completion of tutorials was not required for participation. Second, the interobserver agreement levels may reflect the range of expertise levels of participants in that certain criteria require significant training for mastery. Third, a participant’s gestalt diagnosis of a lesion may have affected their criteria selection; if so, a participant may have preferentially assigned some criteria and ignored others. Lastly, criteria may simply be inherently unreliable. For this point, it is important to recognize that tests in medicine are frequently subject to limitations in human judgment and generally do not exceed fair levels of interobserver agreement. In addition, interpretation of the ICC as levels of agreement among reviewers has limitations. When the ICC is high, we can be assured that the agreement level for a given attribute is good. However, a low ICC may be attributable to a suboptimally designed evaluation process. For example, small technical differences in imaging, such as variations in focus or contrast, can have large effects on measure of agreement. In addition, evaluations were performed online, and users viewed images under noncalibrated conditions (eg, variable image display monitors and room lighting).

There are multiple limitations of this study. First, there was a relatively low rate of study completion with likely participation bias for more experienced dermoscopists. As a result, our results may not be generalizable to beginners. Second, we assessed diagnostic accuracy through the artificial scenario of a reader study, which may not be representative of decisions made during live patient examinations. Third, the image data set was not representative of the entire spectrum of melanocytic lesions because it excluded facial, acral, and amelanotic lesions and was biased toward diagnostically challenging lesions with few banal nevi included. In addition, nonmelanocytic lesions were excluded, and the study assumes that participants would apply these criteria after reliably identifying lesions as melanocytic in origin (ie, 2-step algorithm). Thus, comparison of measures of diagnostic accuracy for the included algorithms may not accurately reflect real-life sensitivities and specificities. Finally, diagnostic performance of algorithms was assessed based on consensus evaluations (≥50%) for individual criteria and not directly by individual participants or experts.

Conclusions

Algorithms are generally well accepted to be helpful in training novices in discriminating processes. Therefore, the criteria of an ideal algorithm should be easy to learn, valid, and reliable. Unfortunately, to our knowledge, no dermoscopic algorithm has emerged with these characteristics for melanoma recognition. Our results confirm the need to further improve dermoscopic terminology, criteria, and algorithms. To do so, future studies may benefit from crowd-sourcing and collective intelligence approaches,18 as well as the public image archive being created in the International Skin Imaging Collaboration Melanoma Project, which permits analysis and comparison of the areas within a lesion that users select as having unique dermoscopic structures.16,17 We hope these efforts will lead to a unified dermoscopy algorithm, automated detection of criteria, and clinical decision support systems that facilitate population-based melanoma screening efforts.19

Back to top
Article Information

Accepted for Publication: February 18, 2016.

Corresponding Author: Ashfaq A. Marghoob, MD, Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, 16 E 60th St, New York, NY 10022 (marghooa@mskcc.org).

Published Online: April 13, 2016. doi:10.1001/jamadermatol.2016.0624.

Author Contributions: Drs Carrera and Marghoob had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Carrera and Marchetti are co-first authors.

Study concept and design: Dusza, Braun, Halpern, Jaimes, Malvehy, Puig, Scope, Hofmann-Wellenhof, Marghoob.

Acquisition, analysis, or interpretation of data: Carrera, Marchetti, Dusza, Argenziano, Braun, Jaimes, Kittler, Malvehy, Menzies, Pellacani, Rabinovitz, Scope, Soyer, Stolz, Hofmann-Wellenhof, Zalaudek, Marghoob.

Drafting of the manuscript: Carrera, Marchetti, Dusza.

Critical revision of the manuscript for important intellectual content: Carrera, Marchetti, Dusza, Argenziano, Braun, Halpern, Jaimes, Kittler, Malvehy, Menzies, Pellacani, Puig, Rabinovitz, Scope, Soyer, Stolz, Hofmann-Wellenhof, Zalaudek, Marghoob.

Statistical analysis: Dusza.

Administrative, technical, or material support: Kittler, Stolz, Zalaudek, Marghoob.

Study supervision: Dusza, Halpern, Marghoob.

Conflict of Interest Disclosures: Dr Halpern reported receiving consulting fees from Canfield Scientific Inc, DermTech, and SciBase. Dr Rabinovitz reported serving as a clinical investigator for 3 Gen LLC and Canfield and as a speaker for 3 Gen LLC. Dr Soyer reported receiving support in part from Australian National Health and Medical Research Council Practitioner Fellowship APP1020145 and being a shareholder of e-derm consult GmbH and MoleMap by Dermatologists Ltd Pty. He provides teledermatologic reports regularly for both companies. Dr Hofmann-Wellenhof reported being a shareholder of e-derm consult GmbH. He provides teledermatologic reports regularly for this company. No other disclosures were reported.

Funding/Support: This research was funded in part through National Institute of Health/National Cancer Institute Cancer Center Support Grant P30 CA008748. The research at the Melanoma Unit in Barcelona is partially funded by grants 12/00840 and PI15/00716 from Fondo de Investigaciones Sanitarias, the CIBER de Enfermedades Raras of the Instituto de Salud Carlos III, the AGAUR 2014_SGR_603 of the Catalan Government, a grant from Fundació La Marató de TV3, 201331-30, and grant CE_CIP-ICT-PSR-13-7 from the European Commission under the 7th Framework Programme (Diagnoptics).

Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and the decision to submit the manuscript for publication.

Additional Contributions: We are in debt to the participants and members of the International Dermoscopy Society who completed the study; Al Kopf, MD, who contributed his knowledge and personal collection of images; and Gerald Gabler, MSc, who developed and created the study website.

References
1.
Kittler  H, Pehamberger  H, Wolff  K, Binder  M.  Diagnostic accuracy of dermoscopy.  Lancet Oncol. 2002;3(3):159-165.PubMedGoogle ScholarCrossref
2.
Argenziano  G, Fabbrocini  G, Carli  P, De Giorgi  V, Sammarco  E, Delfino  M.  Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the ABCD rule of dermatoscopy and a new 7-point checklist based on pattern analysis.  Arch Dermatol. 1998;134(12):1563-1570.PubMedGoogle ScholarCrossref
3.
Stolz  W, Riemann  A, Cognetta  AB,  et al.  ABCD rule of dermoscopy: a new practical method for early recognition of malignant melanoma.  Eur J Dermatol. 1994;4(7):521-527.Google Scholar
4.
Soyer  HP, Argenziano  G, Zalaudek  I,  et al.  Three-point checklist of dermoscopy: a new screening method for early detection of melanoma.  Dermatology. 2004;208(1):27-31.PubMedGoogle ScholarCrossref
5.
Henning  JS, Dusza  SW, Wang  SQ,  et al.  The CASH (color, architecture, symmetry, and homogeneity) algorithm for dermoscopy.  J Am Acad Dermatol. 2007;56(1):45-52.PubMedGoogle ScholarCrossref
6.
Menzies  SW, Ingvar  C, Crotty  KA, McCarthy  WH.  Frequency and morphologic characteristics of invasive melanomas lacking specific surface microscopic features.  Arch Dermatol. 1996;132(10):1178-1182.PubMedGoogle ScholarCrossref
7.
Rosendahl  C, Cameron  A, McColl  I, Wilkinson  D.  Dermatoscopy in routine practice: ‘chaos and clues’.  Aust Fam Physician. 2012;41(7):482-487.PubMedGoogle Scholar
8.
Dolianitis  C, Kelly  J, Wolfe  R, Simpson  P.  Comparative performance of 4 dermoscopic algorithms by nonexperts for the diagnosis of melanocytic lesions.  Arch Dermatol. 2005;141(8):1008-1014.PubMedGoogle ScholarCrossref
9.
Pizzichetta  MA, Talamini  R, Marghoob  AA,  et al.  Negative pigment network: an additional dermoscopic feature for the diagnosis of melanoma.  J Am Acad Dermatol. 2013;68(4):552-559.PubMedGoogle ScholarCrossref
10.
Balagula  Y, Braun  RP, Rabinovitz  HS,  et al.  The significance of crystalline/chrysalis structures in the diagnosis of melanocytic and nonmelanocytic lesions.  J Am Acad Dermatol. 2012;67(2):194.e1-194.e8.PubMedGoogle ScholarCrossref
11.
Argenziano  G, Soyer  HP, Chimenti  S,  et al.  Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet.  J Am Acad Dermatol. 2003;48(5):679-693.PubMedGoogle ScholarCrossref
12.
Carli  P, Quercioli  E, Sestini  S,  et al.  Pattern analysis, not simplified algorithms, is the most reliable method for teaching dermoscopy for melanoma diagnosis to residents in dermatology.  Br J Dermatol. 2003;148(5):981-984.PubMedGoogle ScholarCrossref
13.
Rosendahl  C, Tschandl  P, Cameron  A, Kittler  H.  Diagnostic accuracy of dermatoscopy for melanocytic and nonmelanocytic pigmented lesions.  J Am Acad Dermatol. 2011;64(6):1068-1073.PubMedGoogle ScholarCrossref
14.
Argenziano  G, Puig  S, Zalaudek  I,  et al.  Dermoscopy improves accuracy of primary care physicians to triage lesions suggestive of skin cancer.  J Clin Oncol. 2006;24(12):1877-1882.PubMedGoogle ScholarCrossref
15.
Menzies  SW, Emery  J, Staples  M,  et al.  Impact of dermoscopy and short-term sequential digital dermoscopy imaging for the management of pigmented lesions in primary care: a sequential intervention trial.  Br J Dermatol. 2009;161(6):1270-1277.PubMedGoogle ScholarCrossref
16.
International Society for Digital Imaging of the Skin. http://isdis.net/isic-project/. Accessed October 7, 2015.
17.
International Skin Imaging Collaboration (ISIC). Melanoma Project. ISIC Archive.https://isic-archive.com/. Accessed October 7, 2015.
18.
Kurvers  RH, Krause  J, Argenziano  G, Zalaudek  I, Wolf  M.  Detection accuracy of collective intelligence assessments for skin cancer diagnosis.  JAMA Dermatol. 2015;151(12):1346-1353.PubMedGoogle ScholarCrossref
19.
Katalinic  A, Waldmann  A, Weinstock  MA,  et al.  Does skin cancer screening save lives? an observational study comparing trends in melanoma mortality in regions with and without screening.  Cancer. 2012;118(21):5395-5402.PubMedGoogle ScholarCrossref
×