eTable. Definition of RCM Features and Their Histopathologic Correlates
Farnetani F, Scope A, Braun RP, Gonzalez S, Guitera P, Malvehy J, Manfredini M, Marghoob AA, Moscarella E, Oliviero M, Puig S, Rabinovitz HS, Stanganelli I, Longo C, Malagoli C, Vinceti M, Pellacani G. Skin Cancer Diagnosis With Reflectance Confocal MicroscopyReproducibility of Feature Recognition and Accuracy of Diagnosis. JAMA Dermatol. 2015;151(10):1075-1080. doi:10.1001/jamadermatol.2015.0810
Reflectance confocal microscopy (RCM) studies have been performed to identify criteria for diagnosis of skin neoplasms. However, RCM-based diagnosis is operator dependent. Hence, reproducibility of RCM criteria needs to be tested.
To test interobserver reproducibility of recognition of previously published RCM descriptors and accuracy of RCM-based skin cancer diagnosis.
Design, Setting, and Participants
Observational retrospective web-based study of a set of RCM images collected at a tertiary academic medical center. Nine dermatologists (6 of whom had ≥3 years of RCM experience) from 6 countries evaluated an RCM study set from 100 biopsy-proven lesions, including 55 melanocytic nevi, 20 melanomas, 15 basal cell carcinomas, 7 solar lentigines or seborrheic keratoses, and 3 actinic keratoses. Between June 15, 2010, and October 21, 2010, participanting dermatologists, blinded to histopathological diagnosis, evaluated 3 RCM mosaic images per lesion for the presence of predefined RCM descriptors.
Main Outcomes and Measures
The main outcome was identification of RCM descriptors with fair to good interrater agreement (κ statistic, ≥0.3) and independent correlation with malignant vs benign diagnosis on discriminant analysis. Additional measures included sensitivity and specificity for diagnosis of malignant vs benign for each evaluator, for majority diagnosis (rendered by ≥5 of 9 evaluators), and for experienced vs recent RCM users.
Eight RCM descriptors showed fair to good reproducibility and were independently associated with a specific diagnosis. Of these, the presence of pagetoid cells, atypical cells at the dermal-epidermal junction, and irregular epidermal architecture were associated with melanoma. Aspecific junctional pattern, basaloid cords, and ulceration were associated with basal cell carcinomas. Ringed junctional pattern and dermal nests were associated with nevi. The mean sensitivity for the group of evaluators was 88.9% (range, 82.9%-100%), and the mean specificity was 79.3% (range, 69.2%-90.8%). Majority diagnosis showed sensitivity of 100% and specificity of 80.0%. Sensitivity was higher for experienced vs recent RCM users (91.0% vs 84.8%), but specificity was similar (80.0% vs 77.9%).
Conclusions and Relevance
The study highlights key RCM diagnostic criteria for melanoma and basal cell carcinoma that are reproducibly recognized among RCM users. Diagnostic accuracy increases with experience. The higher accuracy of majority diagnosis suggests that there is intrinsically more diagnostic information in RCM images than is currently used by individual evaluators.
In vivo reflectance confocal microscopy (RCM) is a novel technique that allows noninvasive examination of the epidermis and papillary dermis at cell-level resolution.1 Reflectance confocal microscopy studies2,3 have been performed to identify criteria for diagnosis of melanocytic and nonmelanocytic skin neoplasms. In addition, algorithms have been developed for RCM-based diagnosis of melanoma and for distinction between melanocytic and nonmelanocytic neoplasms.4,5 Use of RCM in research and clinical settings has shown that RCM improves diagnostic accuracy for melanoma6- 9 and for basal cell carcinoma (BCC).10- 12
Like many other morphology-based methods, both pattern identification and diagnostic decisions made with RCM are operator dependent and often related to experience. Because the knowledge base of RCM is still being formed and because formal training programs have been launched only recently, heterogeneity in criteria recognition and in diagnostic accuracy is expected among different RCM evaluators.
The aim of the present study was to test interobserver reproducibility in recognition of previously published RCM descriptors. We also sought to measure accuracy of RCM-based skin cancer diagnosis among evaluators with various levels of experience.
This multicenter web-based study involved evaluators with various levels of RCM experience from 6 countries. Blinded to histopathological diagnosis, participants were asked to evaluate a series of RCM images that were uploaded to a designated web-based platform (http://invivo.com). For each image, participants were asked to complete an online evaluation form that included pattern description and diagnostic judgment. The investigation was conducted in accord with the Declaration of Helsinki. Ethics committee approval, at the institution Azienda Ospedaliero-Universitaria di Modena Policlinico, University of Modena and Reggio Emilia, Modena, Italy, was waived because the study was based on a deidentified image database.
The series of images was derived from 100 diagnostically equivocal lesions that had been excised for histopathological diagnosis because they were clinically or dermoscopically suspicious for melanoma, but a specific clinical and dermoscopic diagnosis could not be rendered with certainty. The series of 100 cases was consecutively and retrospectively selected by an expert dermoscopist (G.P.) blinded to the final histopathological diagnosis. All included lesions had undergone RCM imaging of acceptable quality. No additional selection criteria were considered in case selection such as the presence or lack of pigmentation, diameter, elevation, or other clinical or dermoscopic attributes.
All included RCM images were collected at the Department of Dermatology of the University of Modena and Reggio Emilia (Modena, Italy), and all were rendered a histopathological diagnosis. The final case series included 55 melanocytic nevi, 20 melanomas, 15 BCCs, 7 solar lentigines or seborrheic keratoses, and 3 actinic keratoses.
Each case for evaluation had a high-resolution dermoscopic image obtained with a dermoscopic lens (Dermlite Photo; 3Gen) that was attached to a digital camera (Canon G15; Canon Inc). Three RCM mosaic images (6 × 6 to 8 × 8 mm) were acquired with a commercially available instrument (Vivascope 1500; MAVIG GmbH). Images were horizontal optical sections acquired at different anatomic levels, including one image at the granular spinous layers of the epidermis, one image at the basal layer of the epidermis and dermal-epidermal junction (DEJ), and one image at the level of the superficial dermis. The RCM image acquisition methods and techniques have been described elsewhere.6 No additional clinical information (eg, age and melanoma or lesion history) was provided to evaluators.
Between June 15, 2010, and August 24, 2010, participants were asked to evaluate 10 cases per week for 10 consecutive weeks. They were asked to complete all the evaluations within 6 months from study initiation. The last reader’s evaluation was received on October 21, 2010. Fifteen individuals were invited, 9 of whom agreed to participate as evaluators. They were prospectively categorized as experienced RCM users if they had used RCM for at least 3 years in a clinical setting and categorized as recent RCM users if they had used RCM for less than 3 years. The definitions for each RCM descriptor, the anatomic level in the skin, and the histopathological correlates are listed in the eTable in the Supplement. In addition, an online glossary of representative examples of RCM descriptors is available (http://www.confocaltraining.com/tutorial/).
For calculation of diagnostic accuracy, sensitivity, and specificity, diagnosis was dichotomized as malignant for melanomas and BCCs and as benign for nevi, solar lentigines or seborrheic keratoses, and actinic keratoses. Diagnostic accuracy (calculated as sensitivity × prevalence + specificity × 1 − prevalence), sensitivity, and specificity were calculated separately for each evaluator, for the entire group of evaluators, and for the subgroups (ie, experienced RCM users vs recent RCM users). Sensitivity and specificity were also calculated based on diagnoses rendered by the majority of evaluators (ie, consensual diagnosis by ≥5 of 9 evaluators). We computed κ statistics to calculate interrater agreement. The κ statistics indicated good reproducibility (>0.5), fair reproducibility (0.3-0.5), or poor reproducibility (<0.3). We also performed Pearson χ2 analysis (malignant vs benign diagnosis) to test the hypothesis that frequencies in a 2-way table were independent. Moreover, discriminant analysis (stepwise method) was carried out to identify the parameters independently correlated with malignant vs benign diagnosis.
Of 9 evaluators, 6 (A.S., S.G., P.G., E.M., M.O., and H.S.R.) were classified as experienced RCM users, and 3 (R.P.B., A.A.M., and I.S.) were classified as recent RCM users. Three evaluators (A.A.M., M.O., and H.S.R.) were from the United States, 4 from Europe (Spain [S.G.], Switzerland [R.P.B.], and Italy [E.M. and I.S.]), 1 from Australia (P.G.), and 1 from Israel (A.S.).
Diagnostic RCM features showing good to fair reproducibility that reached statistical significance (P < .05) are listed in Table 1. The distribution of these reproducible diagnostic RCM features by the final histopathological diagnosis is also summarized. Diagnostic features showing poor or nonsignificant reproducibility are not included.
Discriminant analysis identified 6 RCM features independently associated with malignancy (Table 1). Three of 6 discriminatory RCM features were more frequently observed in melanoma. Of these melanoma criteria, the presence of pagetoid cells and the presence of atypical cells at the DEJ showed good interrater reproducibility, while irregular epidermal architecture showed fair interrater reproducibility. In addition, 3 of the discriminatory RCM features were more frequently observed in BCCs. All 3 BCC criteria (basaloid cord–like structures, presence of ulceration, and aspecific DEJ pattern) showed good interrater reproducibility.
Discriminant analysis identified 2 RCM features independently associated with benign neoplasms. Ringed DEJ pattern was seen more frequently in nevi and showed good interrater reproducibility, while the presence of dermal nests was seen only slightly more frequently in nevi than in melanomas and showed fair interrater reproducibility.
Diagnostic accuracy, sensitivity, and specificity values obtained by each evaluator are listed in Table 2. Evaluators attained a mean diagnostic accuracy of 82.7% (range, 76.0%-89.0%), with a mean sensitivity of 88.9% (range, 82.9%-100%) and a mean specificity of 79.3% (range, 69.2%-90.8%). One evaluator from the subgroup of experienced RCM users had a sensitivity of 100% and a specificity of 72.3%. Considering sensitivity and specificity values by the subgroups based on RCM experience, experienced RCM users showed higher sensitivity but similar specificity compared with recent RCM users (Table 3). Considering RCM diagnosis based on the majority of evaluators (rendered by ≥5 of 9 evaluators), a sensitivity of 100% and a specificity of 80.0% were obtained.
In the evolution of RCM, an operator-dependent, morphology-based diagnostic technique, formal teaching of standardized imaging technique and of image evaluation has been lacking, and personal experience has been gained in parallel by experts from different academic centers and countries. Indeed, comparing and integrating the experience of various RCM users would constitute an important milestone for the implementation of RCM in clinical practice. For teaching novices and for disseminating diagnostic algorithms, we need to better understand which RCM criteria for diagnosis of skin cancers are reproducibly recognized and which criteria are currently inconsistently detected. In addition, we need to determine the current diagnostic performance of RCM users in recognition of skin cancer. Finally, by considering the combined experience of different RCM users, as well as the performance of the top experts, we can better understand the present state of the art of RCM, namely, the best diagnostic accuracy that can be obtained with RCM using current criteria.
We found that RCM criteria for melanoma diagnosis showing the highest usefulness across RCM users (based on discriminant analysis and acceptable interobserver agreement) were the presence of pagetoid cells (observed in 86.1% of melanomas), the presence of atypical cells at the DEJ (observed in 90.6% of melanomas), and the presence of irregular epidermal architecture with disruption of the normal honeycomb or cobblestone pattern of epidermal keratinocytes (observed in 51.7% of melanomas). Considering RCM criteria used in the algorithms published by Pellacani et al4 and by Segura et al,5 the presence of atypical cells and the presence of pagetoid cells are relevant criteria for melanoma diagnosis. The criterion of nonedged papillae at the DEJ (a major criterion in the algorithm by Pellacani et al4) showed fair interobserver agreement but did not reach significance in discriminant analysis. Nonedged papillae denote dermal papillae that are not sharply demarcated from the surrounding epidermis, which harbors noncohesive aggregation of refractile cells.13 The ringed pattern, a protective criterion in the algorithm by Segura et al5 for melanoma diagnosis when present throughout the lesion, showed good interobserver agreement and was seen more frequently in nevi in our study. Ringed pattern (seen at low magnification) and edged papillae (the corresponding high-magnification descriptor) describe the pattern whereby the dermal papillae are clearly rimmed by rete ridges with individually highlighted basal keratinocytes. However, cerebriform nests in the dermis (a minor criterion in the algorithm by Pellacani et al4) did not show significant agreement. We conjecture that RCM criteria found in more superficial anatomic layers may be more readily identified because there is decay in RCM laser light intensity with increasing imaging depth and hence decrease in optical resolution. In addition, RCM criteria considered by users as more relevant for diagnosis may be more frequently detected because their perceived relevance may induce the evaluator to more carefully search for these parameters throughout the image. Moreover, cerebriform nests represent a specific (but rarely detected) RCM criterion for melanoma diagnosis, being characteristic of nodular melanoma or of the nodular component in a superficial spreading melanoma. The low frequency (1.3%) of cerebriform nests in our data set likely accounts for the statistical insignificance of this RCM criterion in interobserver agreement analyses. The other positive criterion for melanoma diagnosis in the algorithms by Pellacani et al4 and by Segura et al5 (ie, the presence of nucleated cells in dermis) was not reproducibly recognized in this study, probably because of the low frequency of detection of this RCM criterion.
We also established that RCM criteria for BCC diagnosis showing the highest usefulness across RCM users (again based on discriminant analysis and interobserver agreement) were basaloid cord–like structures representing neoplastic aggregates of BCCs, presence of ulceration, and aspecific DEJ pattern. The latter denotes the overall, mosaic view (akin to low magnification) appearance of a lesion with a flattened DEJ. The corresponding criterion on individual RCM images (akin to higher magnification) is nonvisible papillae, denoting the complete absence of observable interface between dermal papillae and epidermal structures in the same horizontal RCM section.13 In line with our study, basaloid cord–like structures and nonvisible papillae were also emphasized as key RCM criteria by Guitera et al11 in their BCC diagnostic algorithm. However, other important criteria in the BCC algorithms by Guitera et al11 and by Nori et al10 (namely, polarization of nuclei and epidermal shadow) did not show significant agreement in our study.
Taken together, these findings confirm that there is consistent recognition of several key diagnostic criteria for melanoma and BCC. However, few of the published diagnostic criteria for melanoma and BCC were inconsistently recognized: agreement on their identification should be further tested, or RCM users could be better trained in their recognition. Future research could also aim at simplification of RCM semiology by identifying a shorter list of key RCM diagnostic criteria to facilitate the learning and application of RCM technology by practicing clinicians.
Overall, the different evaluators in our study, including those with shorter experience with RCM, showed good diagnostic performance, with a mean sensitivity of 88.9% and a mean specificity of 79.3%. These data are in agreement with previous studies,4- 6,10- 12,14,15 confirming the ability of RCM users to correctly diagnose most skin cancers and a significant proportion of benign lesions. Despite variability in use and in reproducibility of single criteria, good diagnostic performance has been previously shown in imaging morphology–based studies.16 Experienced RCM users had higher sensitivity than more novice RCM users (91.0% vs 84.8%), while their specificity was similar (80.0% vs 77.9%). This suggests that RCM users improve in skin cancer recognition with experience. Only 1 of 9 evaluators achieved 100% sensitivity for melanoma diagnosis. Evaluators performing a retrospective analysis of cases and lacking responsibility for not missing a melanoma in a real-life patient may be inclined toward achieving higher specificity, sacrificing sensitivity.17 However, sensitivity in the present study may be lower than that in actual practice because included cases were all clinically and dermoscopically equivocal and because clinical information (which may be critically pertinent for diagnosis and management decisions) was missing.8,18,19 With regard to specificity, the presence on RCM of irregular epidermal architecture and the presence of few atypical cells or sparse pagetoid cells were criteria responsible for most biopsy-proven nevi misclassified as melanoma by RCM evaluators. In the presence of limited clinical information (eg, long-standing history of lesion stability), the differentiation between an early melanoma and a nevus with atypical RCM features may be extremely difficult at times.20
When we analyzed majority diagnosis (based on the diagnosis of ≥5 of 9 evaluators), 100% sensitivity was observed, with a considerably high specificity of 80.0%. Given variability in recognition of some RCM criteria, it is likely that majority diagnosis is derived from a more balanced weight attributed to each diagnostic feature detectable within difficult-to-diagnose lesions, consequently minimizing the impact of subjective interpretation. This finding implies that there is sufficient intrinsic morphologic information in the RCM images to render a correct diagnosis in most cases.21 In the context of telemedicine RCM diagnosis with limited access to the full spectrum of clinical information that is available during face-to-face examination, evaluations performed by more than a single RCM reader may minimize the diagnostic error and consequent liability.22 In our study, diagnostic accuracy of RCM diagnosis (82.7% overall) was actually lower than that in other studies based on RCM images alone. This is likely due to the inclusion of lesions that were dermoscopically challenging for diagnosis by dermoscopy experts. In addition, dermoscopy images are available to evaluators in real-life RCM-based diagnosis for bedside or telemedicine diagnosis.
Our study has limitations. The present series is restricted in the spectrum of diagnostic entities and lacks squamous cell carcinomas. While a broader range of diagnoses could better highlight the usefulness of RCM in skin cancer diagnosis, the present study focused on RCM imaging of lesions that were clinically and dermoscopically suspicious for melanoma.
In conclusion, this study highlights the key RCM diagnostic criteria for melanoma and BCC that are reproducibly recognized among RCM users. Equally important, the study also delineates RCM criteria that are considered significant for diagnosis but are inconsistently identified by RCM users. Although the mean diagnostic performance of individual RCM users was high, our findings also suggest that there is intrinsically more diagnostic information in RCM images than is currently used by individual evaluators. This emphasizes the need for the community of RCM users to engage in sharing of RCM cases and to continue to improve interobserver agreement on RCM criteria.
Accepted for Publication: March 10, 2015.
Corresponding Author: Alon Scope, MD, Sheba Medical Center, Department of Dermatology, Sackler School of Medicine, Tel Aviv University, Tel Hashomer 52621, Israel (firstname.lastname@example.org).
Published Online: May 20, 2015. doi:10.1001/jamadermatol.2015.0810.
Author Contributions: Drs Farnetani and Pellacani had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Farnetani and Scope contributed equally to this work.
Study concept and design: Farnetani, Pellacani.
Acquisition, analysis, or interpretation of data: Scope, Braun, Gonzalez, Guitera, Malvehy, Manfredini, Marghoob, Moscarella, Oliviero, Puig, Rabinovitz, Stanganelli, Longo, Malagoli, Vinceti.
Drafting of the manuscript: Farnetani, Scope, Pellacani.
Critical revision of the manuscript for important intellectual content: Braun, Gonzalez, Guitera, Malvehy, Manfredini, Marghoob, Moscarella. Oliviero, Puig, Rabinovitz, Stanganelli, Longo, Malagoli, Vinceti.
Statistical analysis: Malagoli, Vinceti.
Study supervision: Pellacani.
Conflict of Interest Disclosures: Dr Gonzalez reported serving on an advisory board for Caliber I.D. (manufacturer of a commercial confocal microscope). Dr Malvehy reported receiving equipment for research from MAVIG GmbH. Dr Puig reported receiving equipment for research from MAVIG GmbH. Dr Rabinovitz reported being an investigator in a study coordinated by Caliber I.D., reported receiving funding for a fellowship program and equipment from Caliber I.D., and reported serving as a consultant and receiving equipment from 3-Gen (manufacturer of a polarized dermascope). Dr Pellacani reported serving on an advisory board for Caliber I.D. and reported receiving honoraria from MAVIG GmbH for teaching courses. No other disclosures were reported.