A, Unpaired analysis. B, Paired analysis. A color image of squamous cell carcinoma was not included in unpaired image set; thus, there is no corresponding marginal probability of correct response for comparison to the gray-scale image.
aSignificantly higher or lower scores for color image assessment, in which P < .001.
A and B, Multicolored dermatofibroma; 92.4% of participants correctly diagnosed it in gray-scale compared with 56.9% in color (P < .001). The most common incorrect answer for the color image was melanoma. C and D, the central white lines in this dermatofibroma are more conspicuous in gray-scale in which nearly double the participants were able to render a correct diagnosis (6.5% vs 15.0%, P = .03). E and F, The presence of a blue-white veil in this seborrheic keratosis likely led 44.1% of participants to incorrectly diagnose this as a melanoma. In gray-scale, this lesion was correctly diagnosed by 69.9% of participants (P = .01). G and H, Melanoma in which a nearly equal percentage of participants made the correct diagnosis in both images (85.5% vs. 86.8%, respectively, P = .74).
eTable. Distribution of Lesions
eMethods. Survey Administered
Customize your JAMA Network experience by selecting one or more topics from the list below.
Bajaj S, Marchetti MA, Navarrete-Dechent C, Dusza SW, Kose K, Marghoob AA. The Role of Color and Morphologic Characteristics in Dermoscopic Diagnosis. JAMA Dermatol. 2016;152(6):676–682. doi:10.1001/jamadermatol.2016.0270
Both colors and structures are considered important in the dermoscopic evaluation of skin lesions but their relative significance is unknown.
To determine if diagnostic accuracy for common skin lesions differs between gray-scale and color dermoscopic images.
Design, Setting, and Participants
A convenience sample of 40 skin lesions (8 nevi, 8 seborrheic keratoses, 7 basal cell carcinomas, 7 melanomas, 4 hemangiomas, 4 dermatofibromas, 2 squamous cell carcinomas [SCCs]) was selected and shown to attendees of a dermoscopy course (2014 Memorial Sloan Kettering Cancer Center dermoscopy course). Twenty lesions were shown only once, either in gray-scale (n = 10) or color (n = 10) (nonpaired). Twenty lesions were shown twice, once in gray-scale (n = 20) and once in color (n = 20) (paired). Participants provided their diagnosis and confidence level for each of the 60 images. Of the 261 attendees, 158 participated (60.5%) in the study. Most were attending physicians (n = 76 [48.1%]). Most participants were practicing or training in dermatology (n = 144 [91.1%]). The median (interquartile range) experience evaluating skin lesions and using dermoscopy of participants was 6 (13.5) and 2 (4.0) years, respectively.
Main Outcomes and Measures
Diagnostic accuracy and confidence level of participants evaluating gray-scale and color images. Two separate analyses were performed: (1) an unpaired evaluation comparing gray-scale and color images shown either once or for the first time, and (2) a paired evaluation comparing pairs of gray-scale and color images of the same lesion.
In univariate analysis of unpaired images, color images were less likely to be diagnosed correctly compared with gray-scale images (odds ratio [OR], 0.8; P < .001). Using gray-scale images as the reference, multivariate analyses of both unpaired and paired images found no association between correct lesion diagnosis and use of color images (OR, 1.0; P = .99, and OR, 1.2; P = .82, respectively). Stratified analysis of paired images using a color by diagnosis interaction term showed that participants were more likely to make a correct diagnosis of SCC and hemangioma in color (P < .001 for both comparisons) and dermatofibroma in gray-scale (P < .001).
Conclusions and Relevance
Morphologic characteristics (ie, structures and patterns), not color, provide the primary diagnostic clue in dermoscopy. Use of gray-scale images may improve teaching of dermoscopy to novices by emphasizing the evaluation of morphology.
Dermoscopy is a noninvasive skin imaging technique that improves diagnostic accuracy for melanoma.1-3 The diagnosis of skin cancer, however, is complex, and individual clinicians place varying degrees of emphasis on patient history, clinical evaluation, dermoscopic examination, and “gestalt” feeling.4,5 Although many individual features likely contribute to the recognition of skin cancer, one feature that is given significant emphasis in both clinical (eg, ABCDE mnemonic) and dermoscopic (eg, ABCD rule, Menzies method, CASH algorithm, 7- and 3-point checklists, and chaos and clues) diagnostic algorithms is color.6-12
Color is the brain’s subjective perception of the electromagnetic radiation that arrives on the retina. Interestingly, color matching tests have shown that among individuals, there is significant variation in perception of color, which is thought to be due to a combination of differences in preretinal absorption, photopigment density, as well as in the positioning of cone pigments.13 Precise color recognition may not be as important as identifying the degree of color variegation within a lesion. In our experience as educators of dermoscopy,14,15 we have observed that color and color variegation can distract dermoscopy users from recognizing or placing appropriate emphasis on useful and/or diagnostic morphologic structures and patterns present in skin lesions, leading to erroneous diagnostic and management decisions. In other words, sometimes the presence of colors seems to incorrectly trump the presence of structures and patterns when rendering a diagnosis.
The primary objective of this study was to compare the diagnostic accuracy and associated confidence level of attendees of a dermoscopy course evaluating gray-scale vs color dermoscopic images of common skin lesions.
Question Does diagnostic accuracy for common skin lesions differ between gray-scale and color dermoscopic images?
Findings In this cross-sectional reader study involving 158 participants, there was no association between correct lesion diagnosis and dermoscopic image type for most common skin neoplasms.
Meaning Recognition of dermoscopic morphologic characteristics, and not colors, is the primary determinant leading to correct dermoscopic diagnosis.
The study was conducted via survey during the 10th annual 2-day dermoscopy course held at Memorial Sloan Kettering Cancer Center (MSKCC) (October 17-18, 2014; New York, New York). The didactic contents of this course were similar to previous years’ courses and were in no way modified with respect to the study objectives. Furthermore, 2 of the 3 lecturers had no prior knowledge of the study objectives prior to the course’s start. The study was approved by the MSKCC institutional review board without the requirement for written informed consent in accordance with the Helsinki Declaration. The survey was given to all participants who chose to participate on a voluntary basis and was administered on day 2, halfway through the course. Participants received no compensation. The full didactic schedule of the course can be viewed in the eTable and eMethods in the Supplement.
Dermoscopic color images were retrospectively selected from histopathologically proven skin lesions biopsied from June to October 2014 and included melanoma, Spitz nevus, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). In addition, a convenience sample of consecutively imaged color pictures of benign lesions, including melanocytic nevus, dermatofibroma (DF), seborrheic keratosis (SK)/solar lentigo, and hemangioma, was retrospectively identified from the MSKCC image database from the same time period. Representative examples of all of these diagnoses (n = 40), as determined by consensus agreement by 2 dermoscopists (A.M. and M.M.), were included in the study (eTable 1 in the Supplement).
Of the 40 lesions, 20 were randomly assigned to a “nonpaired set” (10 to be shown in gray-scale only, 10 to be shown in color only). The remaining 20 lesions were assigned to a “paired set” in which images were shown twice: 10 lesions were first shown in gray-scale (paired set, A1) and after being flipped in the vertical axis shown again in color (paired set, B1), and 10 lesions were shown first in color (paired set, A2) and after being flipped in the vertical axis shown again in gray-scale (paired set, B2). Therefore, the total number of images displayed was 60.
At the dermoscopy course, images were projected from a Panasonic DZ8700U projector onto a screen that was 18 feet in width and 13.5 feet in height with a 4:3 aspect ratio. The dermoscopic images projected were 1024 × 768 pixels in resolution and were scaled up to fit the screen. They were displayed in PowerPoint in the following sequence: paired sets A1 and A2 (n = 20), nonpaired set (n = 20), and paired set B1 and B2 (n = 20). The order of appearance was arranged by simple randomization of the slides with a PowerPoint code. For the conversion of images from color to gray-scale, we used the PowerPoint image converter that uses a gray-scale value for a pixel with the following formula:
0.19 × Red + 0.73 × Green + 0.08 × Blue,according to our calculations on the study data set. The engineering standard formula for conversion of color to gray-scale images places proportional emphasis on green > red > blue color channels, which is similar to the proportional emphasis placed on each channel by the PowerPoint color to gray-scale image.16
The survey collected participant information including age, sex, highest attained degree, dermatology training, practice specialty, and both years evaluating skin lesions and years using dermoscopy. Each image was presented for 15 seconds. For each of the 60 images presented, participants were asked to provide a single diagnosis from a fixed number of diagnostic choices ([a] nevus/Spitz, [b] melanoma, [c] BCC, [d] SCC, [e] DF, [f] SK/solar lentigo, [g] hemangioma) along with an estimate of diagnostic confidence, ranked 1 through 5, with 1 indicating the lowest level of confidence and 5 indicating the highest level of confidence. Participants were blinded to the study objectives. The survey instrument can be found in the eTable and eMethods in the Supplement.
Relative frequencies and means were used to describe the study participants and the lesions selected for review. Two separate analyses were performed with the participant responses: (1) an unpaired, independent, parallel group evaluation, comparing 20 gray-scale (paired set, A1, and the 10 gray-scale images from the nonpaired set) and 20 color lesion images (paired set, A2, and the 10 color images from the nonpaired set), and a paired evaluation comparing 20 pairs of gray-scale and color images of the same lesion as detailed herein. For the paired analysis the order of lesion presentation was randomly assigned, and the sequence of presentation ensured that evaluations of gray-scale and color images of the same lesion would be separated by different intervening images to reduce potential visual recall. Participant responses from paired set A were included in the independent, parallel group evaluation because it was the participants’ first evaluation of the lesion. Paired t tests were used to assess differences in diagnostic confidence between gray-scale and color images. The main outcome was a dichotomous variable with a value of 1 if the participant lesion diagnosis assessment matched the diagnosis, and 0 otherwise. Random effects logistic regression models were used to assess the association between a correct response and whether the lesion was presented as a color image (yes/no). In these models, a color by diagnosis interaction term was included to assess the potential differential effect color presentation of the images has by lesion diagnosis. Participant age and sex were included in all models. Marginal probabilities were estimated from the regression models to help visualize the interaction between lesion diagnosis and image type (color vs gray-scale). All analyses were performed using Stata statistical software (version 14.0; Stata Corp).
There were 261 attendees at the dermoscopy course; of these, 159 (60.9%) participated in the study (Table 1). Of the 159 participant surveys, 1 was illegible and was excluded from analysis, resulting in 158 total participants (109 women, 45 men, and 4 who did not report sex). Most participants (n = 76 [48.1%]) were attending physicians followed by residents (n = 34 [21.5%]). One hundred forty-four participants (91.0%) listed dermatology as their primary specialty. Most participants had experience evaluating skin lesions, with a median of 6 years (interquartile range [IQR], 13.5 years) evaluating skin lesions and 2 years (IQR, 4.0 years) using dermoscopy. Only 48 of participants (30.4% ) reported previous training in dermoscopy.
There were 9329 total unique image evaluations. The mean number of image evaluations per participant was 59 (out of a possible 60). Univariate analysis of unpaired images showed that participants were 18% less likely to provide a correct lesion diagnosis for color compared with gray-scale dermoscopic images (odds ratio [OR], 0.8 [95% CI, 0.7-0.9]; P < .001) (Table 2). Despite this, there was no significant difference in the diagnostic confidence of participants between gray-scale and color images (mean difference, 0.33; 95% CI, −0.46 to 1.11; P = .42). Overall, participants correctly identified 84.5% of hemangiomas, 69.5% of nevi, 63.5% of BCC, 62.8% of melanoma, 57.2% of SK, 49.4% of DF, and 20.3% of SCC. Compared with a participant age range of 21 to 30 years, participant ages 31 to 50 years and older than 50 years were both associated with correct lesion diagnosis (OR, 1.4 [95% CI, 1.1-1.7]; P = .004; and OR, 1.4 [95% CI, 1.1-1.7]; P = .01, respectively). Sex was not associated with correct lesion diagnosis (P = .90).
Participants who reported prior dermoscopy training performed 50% better (OR, 1.5 [95% CI, 1.3-1.7]; P < .001) on lesions evaluations, regardless of imaging modality, compared with those without training. An effect modification was also observed when exploring the interaction between experience and imaging modality. Participants without prior dermoscopy training performed similarly on gray-scale and color images. However, experienced participants performed better on color than gray-scale images, with an OR of 1.4 (95% CI, 1.1-1.8; P = .01) for the interaction between prior training and imaging modality.
In multivariate analysis using gray-scale images as the referent, there was no association between color images and correct lesion diagnosis for both unpaired (OR, 1.0 [95% CI, 0.7-1.3]; P = .99) and paired (OR, 1.2 [95% CI, 0.9-1.6]; P = .82) data (Table 3). In the unpaired analysis color by diagnosis interaction, using nevi as the referent, melanoma (OR, 0.4 [95% CI, 0.2-0.5), BCC (OR, 0.4 [95% CI, 0.3-0.6]), and DF (OR, 0.1 [95% CI, 0.1-0.2]) were significantly less likely to be correctly identified in color than in gray-scale images (P < .001 for all comparisons). In the paired analysis color by diagnosis interaction, using nevi as the referent, DF was significantly less likely to be correctly identified in color than in gray-scale images (OR, 0.2 [95% CI, 0.1-0.3]; P < .001) and SCC and hemangioma were more likely to be correctly identified in color than in gray-scale images (OR, 2.5 [95% CI, 1.6-3.9]; P < .001; and OR, 3.3 [95% CI, 1.9-5.6]; P < .001). Figure 1 presents a visual depiction of the interaction terms by plotting the model-based predicted probabilities of a correct response for each lesion diagnosis by imaging type (color vs gray-scale). Figure 2 shows representative individual examples of skin lesions included in the paired image set.
First, our results reiterated the findings of many other studies, which have shown that those participants with prior training in dermoscopy were more likely to render correct dermoscopic diagnoses than those who had never previously been trained.17-19 Our results uniquely indicate that there is no statistically significant difference in the ability of participants to correctly diagnose common cutaneous neoplasms in gray-scale vs color dermoscopic images. This implies that participants were able to render a diagnosis based on morphologic characteristics alone (ie, structure and pattern) without any appreciable loss in their diagnostic confidence level when evaluating gray-scale images.
The dermatology field has traditionally placed considerable importance on subtle differences in the color of skin lesions in aiding diagnosis of skin neoplasms. This may explain why color was also given significant importance in dermoscopy.6,7,20 However, dermoscopy allows for the visualization of numerous subsurface skin structures (eg, lines, clods, circles, dots) that provide diagnostic clues independent of color.21 In other words, while many dermoscopic structures can be qualified by color, most point to a diagnosis based on their unique morphologic characteristics, size, and distribution. This concept can be demonstrated using the representative example of “ovoid nests” seen in basal cell carcinomas, which have traditionally been referred to as “blue-gray” in color. However, a blue-gray ovoid nest under nonpolarized dermoscopy may appear brown under polarized dermoscopy.1,22,23 Color qualifiers of dermoscopic structures may be unnecessary and even distracting if they can be identified by morphologic characteristics alone.
In support of this interpretation, our paired images showed that there was no statistically significant difference in the ability to make an accurate diagnosis of melanoma in color vs gray-scale dermoscopic images, and our unpaired image-set indicated that perhaps in specific images, melanoma was easier to diagnose in gray-scale. This suggests that in the absence of color, the visualization of melanoma-specific structures under dermoscopy leads an observer toward the correct diagnosis.
Dermoscopic diagnosis based on structures and patterns may prove to be more objective and accurate compared with diagnosis based on color. In addition, with different dermatoscopes on the market, each with differing lens characteristics and illumination spectra, color will likely prove difficult to standardize.6,7,9 The perception of color is heavily influenced by variation in retinal cones photoreceptors within the population (as seen in color blindness), as well as by experience, memories, and context. When a person is asked to identify an object whose color is a shade between yellow and orange, he or she is more likely to categorize an object in the shape of a banana as “yellow” and an object of identical color in the shape of a carrot, as “orange.”8 With respect to dermatologic diagnosis, this suggests that while color can be useful in diagnosis, it carries a certain degree of subjective bias that varies among observers.
Although precise color perception may be variable and difficult to standardize, this does not imply that the perception of variegation in color is not important in both clinical and dermoscopic dermatologic diagnosis. For example, in the diagnosis of melanoma, color variegation has long been a major facet aiding both clinical and automated diagnosis.10,24-26 For pink tumors, such as SCC and hemangiomas, and likely for featureless lesions (ie, lesions that dermoscopically reveal few to no structures), color may in fact help in diagnosis.
This study raises important unanswered questions regarding the ideal way to both evaluate and/or teach dermoscopic diagnosis. It is interesting to speculate on the role of gray-scale images as an adjunct modality for evaluation of cutaneous lesions. By enhancing contrast, it is possible that gray-scale images will make dermoscopic structures and patterns more conspicuous or make it easier to teach novices to recognize dermoscopic structures. Although the outcomes of teaching dermoscopy using the analytic vs heuristic approach has previously been explored,27 the idea to use gray-scale dermoscopic images for teaching is novel. Interestingly, most devices for computer-aided diagnosis that evaluate skin lesions for malignancy rely on gray-scale or binary (2-toned) images for both lesion segmentation and for assessment of texture.28 Gray-scale and binary images enhance contrast, which facilitates delineation of lesion borders.28-30
Our pilot study is notably limited by a small sample size that does not entirely represent the full spectrum of cutaneous skin lesions. While we intended to select classic-appearing examples of the most common cutaneous neoplasms encountered in practice, we acknowledge that the images included consisted of a retrospective convenience sample that may have unconsciously been subject to selection bias. In addition, our sample did not include lesions such as “featureless” melanoma or blue nevi. In such lesions devoid of structure and pattern, it is likely that color will more favorably influence the diagnostic accuracy of the observer. Furthermore, because the survey was administered to participants of a dermoscopy course, the possibility that the results may have been subject to confounding bias from preceding lectures must be considered. However, we do not think such bias would have significantly affected our results, as the course content followed the standard format of all previous MSKCC dermoscopy courses. In addition, 2 of the 3 lecturers had no knowledge of the study objectives prior to the course’s start.
Our study sheds preliminary insight into the relative importance of morphologic characteristics and color in dermoscopic diagnosis. Overall, we found no association between correct lesion diagnosis and use of color vs gray-scale dermoscopic images, suggesting that morphologic characteristics primarily drive dermoscopic diagnosis. The role of these findings in systematic evaluation of pigmented and nonpigmented lesions and in teaching requires future study.
Corresponding Author: Ashfaq A. Marghoob, MD, Dermatology Service, Memorial Sloan-Kettering Cancer Center, 800 Veterans Memorial Highway, Second Floor, Hauppauge, NY 11788 (firstname.lastname@example.org).
Accepted for Publication: January 30, 2016.
Published Online: March 23, 2016. doi:10.1001/jamadermatol.2016.0270.
Author Contributions: Ms Bajaj and Dr Marghoob had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Bajaj, Marchetti, Navarrete-Dechent, Marghoob.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Bajaj, Marchetti, Navarrete-Dechent, Dusza.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Bajaj, Dusza.
Administrative, technical, or material support: Marchetti, Navarrete-Dechent, Marghoob.
Study supervision: Marchetti, Dusza, Marghoob.
Conflict of Interest Disclosures: None reported.
Funding/Support: This research was funded in part through the National Institutes of Health/National Cancer Institute (NIH/NCI) Cancer Center Support Grant P30 CA008748.
Role of the Funder/Sponsor: The NIH/NCI had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Create a personal account or sign in to: