Case 6 (A) and case 7 (B) are shown in the study interface that was presented to the study participants. Navigation buttons are available at the bottom of the screen, allowing the display of number tags on the moles that were evaluated as part of the study (A) and of clinical close-up images (B). Lesion 13 in case 7 was a malignant melanoma that was generally apparent as different but not generally apparent as completely different. It is irregular and larger than other lesions but displays only 2 shades of brown and black, colors that are not dissimilar to those seen in some of the benign lesions (eg, lesion 9).
The tabulation of participant responses for “ugly duckling” evaluation of the moles in cases 6 and 7. The participants (n = 34) are tabulated in rows and the lesions in columns (12 lesions in case 6 and 14 lesions in case 7). Lesions recognized as different are identified by color (black cells, completely different; gray cells, somewhat different). Lesion 13 in case 7 (red) was a melanoma (see image of lesions in Figure 1); all but 4 participants (blank cells) observed this lesion as different, and most participants (n = 20) saw it as completely different. In contrast, in case 6, there was no general agreement on the ugly duckling.
Joint 95% confidence region plots for diagnostic accuracy of melanoma by grouping of participant's expertise level. A point that is closer to the upper left corner denotes better diagnostic accuracy. A, For “ugly ducklings,” with lesions perceived as somewhat and completely different combined. B, For lesions selected for biopsy. Diagnostic performance was best for experts and shows a decreasing trend by level of expertise.
Scope A, Dusza SW, Halpern AC, Rabinovitz H, Braun RP, Zalaudek I, Argenziano G, Marghoob AA. The “Ugly Duckling” SignAgreement Between Observers. Arch Dermatol. 2008;144(1):58-64. doi:10.1001/archdermatol.2007.15
Copyright 2008 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2008
To assess whether multiple observers can identify the same pigmented lesion(s) as being different from a patient's other moles (“ugly duckling” [UD] sign) and to explore whether the UD sign is sensitive for melanoma detection.
Baseline back images of 12 patients were obtained from a database of standardized patient images. All patients had at least 8 atypical moles on the back, and in 5 patients, one of the lesions was a histologically confirmed melanoma. The overview back images were supplemented with close-up clinical images of lesions. Participants were asked to evaluate whether the images showed any lesions on the back that differed from other nevi.
Dermatology clinic specializing in pigmented lesions.
Images were evaluated by 34 participants, including 8 pigmented lesion experts, 13 general dermatologists, 5 dermatology nurses, and 8 nonclinical medical staff.
Main Outcome Measures
A lesion was considered a generally apparent UD if it was perceived as different by at least two-thirds of the participants. Sensitivity was defined as the fraction of melanomas identified as different.
All 5 melanomas (100%) and only 3 of 140 benign lesions (2.1%) were generally apparent as different. The sensitivity of the UD sign for melanoma detection was 0.9 for the whole group, 1.0 for experts, 0.89 for general dermatologists, 0.88 for nurses, and 0.85 for nonclinicians. A limitation of the study is that assessment was done in virtual settings.
In the present study, melanomas were generally apparent as UDs. The potential of the UD sign for melanoma screening should be further assessed.
The incidence of malignant melanoma (MM) continues to rise at high rates.1 Nevi are among the most important known risk factors for the disease in adults, and MM is more common in individuals with many nevi and/or atypical (dysplastic) nevi.2 The cornerstone of MM prevention remains the early detection of disease at a stage when surgical excision of the tumor is curative.3 The challenge for clinicians who diagnose and treat pigmented skin lesions is to distinguish between MM and benign simulants. The relatively low sensitivity and specificity of clinical diagnosis of MM, even among dermatologists,4,5 underscores the challenge: there is an overlap in clinical features between MMs and benign nevi6 that leads to missed MMs and excessive excision of benign lesions. Different methods have been suggested to improve on the diagnostic acumen in differentiating atypical moles from MMs. In 1998, Grob and Bonerandi7 introduced the “ugly duckling” (UD) concept to demonstrate that nevi in the same individual tend to resemble one another and that MM often deviates from the individual's nevus pattern.
Many dermatologists have accepted the UD concept in clinical practice.8 However, physicians may differ in visual perception and clinical experience, and different observers may not agree on which lesions are different and which are similar. Also, the usefulness of the UD sign for MM screening has been scarcely studied.9 Yet, if the UD sign proves to be a valuable screening tool, it could be taught to primary care physicians, to nurse practitioners, and even to patients who are performing skin self-examination.
The principal aim of this study was to investigate whether different observers, evaluating images of melanocytic lesions on patients' backs, will perceive the same lesions as clinical outliers, ie, as UDs. We used images of patients with multiple atypical nevi, with or without MM, since the UD concept is particularly pertinent to the screening of such high-risk patients. A secondary aim was to preliminarily explore whether the UD sign is sensitive for MM detection.
Images of moles were obtained from a database of standardized patient images provided by a New Zealand–based teledermatology company (MoleMap, Newmarket, Auckland, New Zealand). The images were completely anonymous to us and to the participants of the study. They were taken by trained medical photographers according to a standard protocol. Initially, images of the patient's entire skin surface were acquired in sectors (total body photography) with a digital camera (Hewlett-Packard, Palo Alto, California). All pigmented lesions that met preselected criteria (specified below) underwent fixed-distance standardized close-up clinical and ×10 dermoscopic digital imaging with custom lighting and an optics module that was attached to the digital camera lens. The close-up images were tagged to the relevant mole on the sector image. Criteria for close-up imaging included lesions that (1) were larger than 6 mm, (2) had at least 2 clinical criteria (asymmetry, border irregularity, color variability, or diameter >3 mm), (3) had at least 1 dermoscopic criterion (structure asymmetry, abrupt peripheral cutoff of network, >2 colors, presence of peripheral streaks, blue-gray veil, regression, peripheral dots/globules, or multiple irregular dots/globules), or (4) had a history of change.
For the study, baseline images of the backs of patients were used; the relatively flat surface of the back lends itself well to photography, and the nevus phenotype of the back has been demonstrated to be a good marker of MM risk.10 Images of 12 cases were included, all with multiple atypical moles apparent on the overview images of the back, and in 5 of the cases, one of the lesions was a histologically confirmed MM. The criteria for including cases in the study were as follows: (1) at least 8 clinically atypical nevi were apparent on the back; (2) most of the lesions on the back and all of the atypical nevi had close-up clinical digital images; (3) 1-year follow-up images (close-up clinical and dermoscopic images) were available to show that lesions considered to be benign were in fact biologically indolent by revealing no change; and (4) the image quality of both the overview and the close-up images were acceptable, as assessed by 2 of us (A.S. and A.A.M.). All images were acquired after January 2003. Consecutive cases fulfilling these criteria were selected from the database of the teledermatology company.
Images were evaluated by 34 participants who were grouped into 4 subgroups in terms of clinical expertise: group 1, pigmented lesion experts (n = 8); group 2, dermatologists who were considered nonexperts in pigmented lesion evaluation (n = 13); group 3, dermatology nurses (n = 5, including 1 dermatology medical photographer); and group 4, nonclinical medical staff (n = 8). Participants were informed that the study aim was to assess clinical management of pigmented lesions, without explanation as to the specific aims and hypotheses underlying the study. The study was sent electronically to participants as a powerpoint file (Microsoft Corp, Redmond, Washington) that contained the clinical image interface and a word document that contained questionnaire and response forms (see details in next paragraph).
All participants were asked to fill a 1-page word document questionnaire regarding their experience with the evaluation of pigmented lesions. The powerpoint presentation included a short tutorial of the digital interface (eg, how to advance slides and view close-up images) and reference to the forms that had to be filled out in response to each presented case. Study images were consecutively presented on an interactive interface using a powerpoint platform; each slide contained an overview image of the back, with pigmented lesions identified by number tags. For most of the tagged nevi and all of the atypical nevi, close-up clinical images could be observed by the click of a button (Figure 1). Participants were asked to look at the overview of the back and identify whether there is/are a nevus/nevi that differs/differ from the other nevi. For each lesion that was deemed as different, the participants had to mark the lesion number on the form, identify it as either completely different or somewhat different from the other moles, give a short qualitative description of how the lesion differs, and report whether they would like to have a biopsy (Bx) performed on the lesion. The observers were not limited in the number of lesions that they could select as different. Also, observers were asked to classify all the other tagged nevi into groups and to describe common characteristics for each group. No lower or upper limit to the number of groups was dictated. Participants could also indicate moles within these groups that they felt required Bx. Results on the grouping of lesions will be reported in a separate article, as the current study focused on agreement regarding the outlier lesions. Although the time required to complete the study was not specifically assessed, most participants estimated the time at approximately 1 hour.
The participants were not shown dermoscopic images. However, dermoscopic images of lesions (with a 1-year follow-up dermoscopic image) were available to the investigators to verify that lesions considered benign did not show dermoscopic features suggestive of malignancy, and the 1-year follow-up images confirmed that the lesions were in fact biologically indolent by revealing no change.
The primary aim of the study was to assess whether, in high-risk individuals with multiple nevi, the same pigmented lesions are generally apparent to various observers as different from the other nevi. We used the word different based on Grob and Bonerandi’s7 definition of the UD. Generally apparent was defined as a nevus that was observed as different by at least two-thirds (ie, >66%) of participants. A secondary aim was to evaluate whether the UD sign may be a valuable screening method for pigmented lesions. The data were analyzed to explore differences in sensitivity and specificity of the UD method for MM detection for the subgroups of participants in this study. We were aware that the subgroups of participants in this study may not necessarily reflect actual sensitivity values in the real world. Ugly duckling sensitivity for MM detection was defined as the number of MMs identified as different divided by the total number of MMs evaluated. Ugly duckling specificity was defined as the number of nevi not identified as different divided by the total number of nevi evaluated. Also, to compare the perception of lesions as different with the selection of lesions for Bx, sensitivity and specificity were analyzed for the decision about Bx: Bx sensitivity was defined as the number of MMs biopsied divided by the total number of MMs evaluated; Bx specificity was defined as the number of nevi not biopsied divided by the total number of nevi evaluated. The sensitivity and specificity of Bx were compared with the sensitivity and specificity of the UD method.
The primary outcome variable for the study was the assessment of lesion difference graded by each participant on 3 levels: not different, somewhat different, and completely different. Descriptive frequencies and relative frequencies were calculated to describe the data. For the analysis of agreement and sensitivity and specificity, the primary outcome variable was dichotomized, combining somewhat and completely different and comparing this combined group to the lesions that were assessed as not different. κ Statistics were calculated for interobserver agreement of lesion difference assessments. The interpretation of κ values is based on the scale published by Landis and Koch11: κ values of 0.00 to 0.20 represent slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, and 0.61 to 0.80 substantial agreement. A value above 0.80 is considered almost perfect agreement. For analysis of sensitivity and specificity, marginal regression modeling framework was used to take into account the correlated nature of the data. All statistical analyses were performed with commercially available software (Version 9.1; Stata Corp, College Station, Texas).
Of the 145 pigmented lesions that were assessed (5 MMs and 140 benign lesions) from 12 cases, 8 were generally apparent as different (Table 1). All 5 MMs (100%) and only 3 benign lesions (2.1%; 2 nevi and 1 seborrheic keratosis) were generally apparent as different. The MMs were apparent as being different to at least 85% of participants, whereas the agreement rate on the benign lesions perceived as being different was 76% at most. Figure 2 shows an example of high agreement rate in a case with MM and an example of low agreement rate in a case without MM. Four lesions were generally apparent as completely different, all 4 being MMs. The MM that was not generally apparent as completely different (case 7, Figure 1) was an irregular brown macule, 1 cm in diameter, with a central black area.
There was an overall fair agreement on the lesions identified as different, with a κ statistic of 0.30 (P < .001); the κ statistic for experts (group 1) was 0.42, and it was 0.32 for dermatogists (group 2), 0.17 for nurses (group 3), and 0.21 for nonclinicians (group 4). The κ statistics for all groups were significant (P < .001). The overall κ statistic for lesions perceived as completely different was 0.41 (P < .001).
Sensitivity and specificity values for the UD method and for the decision to Bx are shown in Table 2. The sensitivity and the specificity of the UD method were highest for the subgroup of experts (1.0 and 0.89, respectively) and showed a decreasing trend by subgroups with less clinical expertise (P < .001 for trend). The diagnostic accuracy of the UD sign for MM, considering all lesions perceived as different (completely and somewhat different), was 0.96 for group 1, 0.91 for group 2, and 0.87 for groups 3 and 4 (P = .001 for the difference between groups). The joint 95% confidence regions for diagnostic accuracy are plotted in Figure 3. When only lesions perceived as completely different were considered, there was no statistically significant difference in diagnostic accuracy between the 4 groups (P = .25). There was overall substantial agreement for the Bx of lesions considered different (completely and somewhat different), with κ = 0.7 (group 1, κ = 0.6; group 2, κ = 0.76; group 3, κ = 0.77; and group 4, κ = 0.66). When only the lesions perceived as completely different were considered, there was an overall moderate agreement for Bx of lesions: κ = 0.53 (group 1, κ = 0.53; group 2, κ = 0.45; group 3, κ = 0.62; and group 4, κ = 0.54).
Although the UD concept is appealing in practice to many clinicians, it has rarely been assessed in a study. Gachon et al9 described image recognition as being composed of 3 main mental processes: overall pattern recognition (gestalt, eg, instant recognition of an image of an elephant); analytic criteria recognition (identifying an object or making a diagnosis based on combining criteria, eg, the ABCD criteria of melanoma); and differential recognition (recognizing the differences between objects, eg, the UD sign). In their study, Gachon and colleagues surveyed dermatologists for perception parameters that prompted surgical removal of pigmented lesions. Differential recognition (eg, the UD sign), along with overall pattern recognition and history of change, was most discriminatory between melanoma and nevi, retaining significance in the multivariate analysis.9
To convey the UD sign as a useful method for MM screening by primary health care providers and for skin self-examination by patients, MMs should be apparent to distinct observers as different from other moles. Teaching what constitutes an outlier lesion and what degree of difference should be sought after may prove to be more difficult than teaching analytic criteria recognition. Also, appreciation of the different lesions may be dependent on experience; therefore, the UD concept may not be applicable to MM screening by nonexperts. In the current study, all 5 MMs were apparent as different to the vast majority of participants. In contrast, only 3 of 140 (2.1%) benign lesions were generally perceived as UDs. The agreement on all UDs (ie, somewhat or completely different) was moderate among the subgroup of experts, and for lesions perceived as completely different, agreement was moderate among the entire group of participants. The agreement on UDs decreased among the lower-expertise groups. For the entire data set, there was only fair agreement on UDs, suggesting that when benign lesions are being considered, perception of which lesion is an outlier is less obvious to different observers.
Four of the 5 MMs (80%), but none of the benign lesions, were generally apparent to participants as being completely different. On the flip side, one of the MMs was not generally apparent as being completely different (case 7, Figure 2); it was an irregular brown macule, 1 cm in diameter, with a darker central area. It appeared on the background of multiple atypical moles, some of which were irregular and similar in color to the MM, although they were roughly half the diameter. Also, even the other 4 MMs that were generally apparent to most observers as being completely different were not perceived as such by all participants. This observation may be in line with the concept introduced by Mascaro and Mascaro,12 who used the term little red riding hood. These authors emphasized that in a high-risk patient with many atypical-appearing moles, an MM may not stand out as completely different. Therefore, a more careful examination is required to unveil subtle differences from adjacent nevi, analogous to little red riding hood noticing the protruding sharp teeth of the wolf pretending to be the grandmother in the fairy-tale.
Interestingly, one of the benign lesions was clinically confirmed, based on the close-up and dermoscopic images, to be a seborrheic keratosis, the only nonmelanocytic lesion included in the study set of lesions. Pigmented seborrheic keratosis and MM can simulate one another.13- 16 In fact, as seborrheic keratoses are unlikely to appear similar to a patient's prevalent nevus pattern, they may be the Achilles' heel of the UD strategy.
Although the sensitivity and specificity and diagnostic accuracy of the UD sign depended on clinical expertise, the values for these parameters were good in all subgroups of participants. Interestingly, identification of the UD showed good sensitivity (0.85), specificity (0.83), and diagnostic accuracy (0.87) for the detection of MM, even among nonclinicians. These preliminary findings suggest that the UD sign may prove to be a useful screening strategy for primary health care providers and even for skin self-examination. It is plausible that nonclinicians included in this study had some exposure to clinical practice by virtue of being medical staff members; therefore, the findings may not be applicable for screening by the general public. However, patients at higher MM risk who perform skin self-examination may also be better informed and more motivated toward MM surveillance.17- 19 Higher-risk patients, particularly those with multiple atypical moles, are probably the individuals for whom melanoma screening efforts2,20- 23 are probably most applicable. Indeed, Grob and Bonerandi7 perceived the UD as most beneficial in patients with atypical nevus syndrome.
There was substantial agreement regarding the perception of the lesions as different (completely or somewhat different) and the consideration of these lesions for Bx. There was moderate agreement regarding the lesions considered as completely different and the decision to Bx. Taken together, these results show that there were many lesions deemed for Bx that were not perceived as completely different. We could not detect a difference in sensitivity values of MM detection between the UD sign and the final decision to Bx owing to the small number of MMs that were included in the study. However, it may be noteworthy that although the UD sensitivity for experts was 100%, the Bx sensitivity was 87%, indicating that all MMs were recognized by experts as different, but they did not always opt for a Bx. There was a similar trend, albeit to a smaller degree, for group 2 physicians (UD sensitivity, 89%; Bx sensitivity, 85%). Taken together, these results suggest that in addition to differential recognition, other image processing criteria, such as analytic criteria recognition, are also involved in the decision to Bx9; integration of different recognition strategies may, in some cases, distract the clinician from correctly detecting MM. Also, the fact that experts missed MMs in our set of images confirms that the MMs included in the study were not all “obvious-from-the-doorway” cases. We did not specifically ask observers to define why a Bx was suggested for specific lesions. Therefore, we hypothesize that some lesions were perceived by experts as “different” but were not diagnosed as MMs because of the lack of additional information, such as a history and a dermoscopic evaluation. It is quite probable that experts rely on multiple cues and not just the UD sign before recommending a Bx. In contrast, for nonphysicians (groups 3 and 4), UD and Bx sensitivities were identical, suggesting that the UD sign may have been the only criterion prompting Bx in these groups.
Our study has several limitations. First, evaluation of lesions performed on images, albeit of reasonable quality, may be different from real-time skin examination; perception of color, hue, and texture may be dissimilar. We tried to compensate for some of the differences between virtual and real examinations by adding the clinical close-up images, simulating the clinical encounter in which the physician can visualize the overview of the skin, and then hone in for greater detail on certain lesions. Second, dermoscopic images were not shown to participants, possibly affecting diagnostic accuracy for participants who frequently use dermoscopy in clinical practice. Only clinical images were included in this study since the UD was introduced as part of clinical, not dermoscopic, assessment.7 Third, sensitivity and specificity values were primarily used as means of measuring quantitative differences between expertise levels and between the UD sign and the Bx decision in this set of cases. These values are probably not reflective of actual sensitivity and specificity in the real world. The prevalence of MMs in this study may have affected diagnostic accuracy values. In dermoscopic studies, eg, diagnostic accuracy was inversely correlated with the prevalence of MMs in the sample.24 It would be interesting for future studies to analyze how the predictive value of the UD test would perform in different scenarios in terms of MM prevalence, including lower prevalence, as can be seen in nonspecialized clinics. Also, we did not precisely define for observers the term different. A lesion can be different in size, color, shape, or combination of attributes. It is unlikely that there was a standard concept of “different” among observers. However, there is currently no standard definition to suggest which lesion attributes are important for recognition of the UD, and we did not wish to bias observers. We plan, going forward, to try to identify the lesion attributes that lead observers to perceive them as different and, in particular, the attributes that appear to be more predictive of MM.
In conclusion, the MMs included in this study were generally apparent as UDs to participants across different expertise levels. Although some nevi were also identified as outlier lesions by different observers, a minority was generally apparent as such. The sensitivity and specificity values for the UD sign were relatively high, even for nonexperts, suggesting that the usefulness of this method in MM screening by primary health care providers and for patient self-examination should be further assessed. Further evaluation of the attributes of an MM that lead it to be perceived as different (eg, specific color, size, and shape) may allow refinement of the UD sign and result in even better interobserver agreement.
Correspondence: Ashfaq A. Marghoob, MD, Dermatology Service, Memorial Sloan-Kettering Cancer Center, 160 E 53rd St, New York, NY 10022 (firstname.lastname@example.org).
Accepted for Publication: July 1, 2007.
Author Contributions: All authors have contributed to the study in a manner that meets authorship criteria, have seen the final draft of the manuscript, and approve the validity of the data presented. Study concept and design: Scope, Halpern, and Marghoob. Acquisition of data: Scope, Dusza, Rabinovitz, Braun, Zalaudek, Argenziano, and Marghoob. Analysis and interpretation of data: Scope, Dusza, Halpern, and Marghoob. Drafting of the manuscript: Scope, Dusza, and Marghoob. Critical revision of the manuscript for important intellectual content: Halpern, Rabinovitz, Braun, Zalaudek, Argenziano, and Marghoob. Statistical analysis: Dusza. Administrative, technical, and material support: Marghoob. Study supervision: Halpern and Marghoob.
Financial Disclosure: None reported.
Additional Contributions: The physicians, nurses, and staff members generously dedicated their time to participating in this study and provided useful insights for future research. MoleMap (MoleSafe), Aukland, New Zealand, provided an anonymized set of images from its database for this study.