Key Points español 中文 (chinese) Question
For cutaneous melanocytic lesions that are difficult to diagnose, do second opinions rendered by pathologists who have board certification and/or fellowship training in dermatopathology improve overall reliability of diagnosis?
Findings
In this diagnostic study of 240 melanocytic lesions from the Melanoma Pathology Study data set, misclassification of melanocytic lesions was lowest when first, second, and third consulting reviewers were subspecialty-trained dermatopathologists and when all lesions were subject to second opinions; misclassification was highest when reviewers were all general pathologists lacking subspecialty training. Variability of in situ and thin invasive melanoma was relatively intractable to all examined strategies.
Meaning
The findings suggest that second opinions rendered by dermatopathologists improve overall reliability of diagnosis of melanocytic lesions but do not eliminate or substantially reduce misclassification.
Importance
Histopathologic criteria have limited diagnostic reliability for a range of cutaneous melanocytic lesions.
Objective
To evaluate the association of second-opinion strategies by general pathologists and dermatopathologists with the overall reliability of diagnosis of difficult melanocytic lesions.
Design, Setting, and Participants
This diagnostic study used samples from the Melanoma Pathology Study, which comprises 240 melanocytic lesion samples selected from a dermatopathology laboratory in Bellevue, Washington, and represents the full spectrum of lesions from common nevi to invasive melanoma. Five sets of 48 samples were evaluated independently by 187 US pathologists from July 15, 2013, through May 23, 2016. Data analysis was performed from April 2016 through November 2017.
Main Outcomes and Measures
Accuracy of diagnosis, defined as concordance with an expert consensus diagnosis of 3 experienced pathologists, was assessed after applying 10 different second-opinion strategies.
Results
Among the 187 US pathologists examining the 24 lesion samples, 113 were general pathologists (65 men [57.5%]; mean age at survey, 53.7 years [range, 33.0-79.0 years]) and 74 were dermatopathologists (49 men [66.2%]; mean age at survey, 46.4 years [range, 33.0-77.0 years]). Among the 8976 initial case interpretations, physicians desired second opinions for 3899 (43.4%), most often for interpretation of severely dysplastic nevi. The overall misclassification rate was highest when interpretations did not include second opinions and initial reviewers were all general pathologists lacking subspecialty training (52.8%; 95% CI, 51.3%-54.3%). When considering different second opinion strategies, the misclassification of melanocytic lesions was lowest when the first, second, and third consulting reviewers were subspecialty-trained dermatopathologists and when all lesions were subject to second opinions (36.7%; 95% CI, 33.1%-40.7%). When the second opinion strategies were compared with single interpretations without second opinions, the reductions in misclassification rates for some of the strategies were statistically significant, but none of the strategies eliminated diagnostic misclassification. Melanocytic lesions in the middle of the diagnostic spectrum had the highest misclassification rates (eg, moderately or severely dysplastic nevus, Spitz nevus, melanoma in situ, and pathologic stage [p]T1a invasive melanoma). Variability of in situ and thin invasive melanoma was relatively intractable to all examined strategies.
Conclusions and Relevance
The results of this study suggest that second opinions rendered by dermatopathologists improve reliability of melanocytic lesion diagnosis. However, discordance among pathologists remained high.
Second opinions in medicine are used to reach consensus about the diagnosis and management of patients with the goal of improving patient care. Second opinions may include (1) a brief verbal or written opinion, (2) a formal written review often in compliance with institutional policies, and (3) an independent detailed review by a recognized expert. Given the high rates of diagnostic disagreement for certain skin biopsy samples, especially those of challenging melanocytic proliferations,1 mandatory second opinions have been advocated.2 Because nearly 5 million adults are treated for skin cancer annually, with mean treatment costs of more than $8 billion each year, ensuring quality diagnoses is imperative.3 With increases in the Medicare population over time, an increased number of skin cancer procedures have been performed each year.4 Quantifying the value of mandatory second-opinion review, however, is essential before instituting mandates.
Although studies suggest that second opinions reduce disagreement rates,5 the optimal selection of cases for review and the best practices for them are unknown. With millions of biopsies performed annually,6 second opinions for all skin pathology cases is not feasible.6 Therefore, it is a priority to identify strategies and policies for second opinions that both are practical and can best improve clinical care.
In a recent survey of 207 pathologists,7 most respondents perceived that second opinions increase accuracy (78%) and protect them from malpractice lawsuits (62%). Although a small number of respondents reported that second opinions are mandated by laboratories for lesions such as melanoma in situ (26%) and invasive melanoma (30%), pathologists desire second opinions most often for melanocytic tumors of uncertain malignant potential (85%) and atypical Spitz tumors (88%).7
The present study follows recent work8 that reported an association between fellowship training and board certification in dermatopathology and improved diagnostic reliability. Given the importance of accurate diagnosis and the clinical consequences of errors, the present study compared different second-opinion strategies using data from individual interpretations made by pathologists, including general pathologists and board-certified and/or fellowship-trained dermatopathologists. For different strategies, we assessed rates of overinterpretation and underinterpretation and overall misclassification relative to a consensus reference diagnosis.
Test Set Cases and Consensus Reference Diagnoses
This diagnostic study used samples from the Melanoma Pathology Study, details for which are reported elsewhere.1,9 The institutional review board at the University of Washington, Seattle, approved all study procedures, and all pathologists provided informed consent electronically. Data analysis was performed from April 2016 to November 2017. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline for diagnostic studies.
In brief, skin specimens, represented by 1 slide per case, were selected from a dermatopathology laboratory in Bellevue, Washington. Three experienced pathologists (M.W.P, D.E.E., and R.L.B.) interpreted each case independently before arriving at a consensus reference diagnosis by modified Delphi technique.10 Reference diagnoses were categorized using the Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis (MPATH-Dx) schema.11 The final 240 Melanoma Pathology Study samples were randomly assigned to 5 test sets stratified by patient age (20-49 years, 50-64 years, and ≥65 years) and by MPATH-Dx category, with example diagnostic terms and suggested treatment as follows: (I) common nevus or mildly dysplastic nevus, with no treatment required (10%); (II) moderately dysplastic nevus or Spitz nevus, with suggested reexcision with less than 5-mm margins (15%); (III) severely dysplastic nevus or melanoma in situ, with reexcision with 5-mm to less than 1-cm margins (25%); (IV) thin invasive melanoma (pathologic stage [p]T1a), with wide excision with at least 1-cm margins (24%); and (V) thicker invasive melanoma (pT1b or greater), with wide excision with at least 1-cm margins and consideration of additional diagnostic workup or adjuvant therapy (25%). Cases of MPATH-Dx categories III through V were oversampled.1
Participating Pathologists
Pathologists from 10 states were invited to participate (California, Connecticut, Hawaii, Iowa, Kentucky, Louisiana, New Jersey, New Mexico, Utah, and Washington), as described elsewhere.1,12 Pathologists were eligible if they had interpreted skin biopsies in the previous year, planned to continue interpreting them for the following 2 years, and were not in resident or fellowship training. A web-based survey queried participants about demographics, clinical practices, use of second opinions in their practice, and interpretive experience.7 General pathologists (no board certification or fellowship training in dermatopathology) and dermatopathologists (board certification and/or fellowship training in dermatopathology) were defined based on survey responses. Of 301 eligible pathologists, 207 (68.8%) were enrolled and 187 (62.1%) completed independent phase 1 interpretations. More details about the recruitment and follow-up of participants can be reviewed elsewhere.1
Participants were randomized to examine 1 of 5 sets of 48 samples, recording their interpretations using an online MPATH-Dx tool from July 15, 2013, through May 23, 2016. For each sample, participants indicated whether a second opinion was personally desired before finalizing the diagnosis, whether it would be required by their laboratory policies for the particular diagnosis, or both.
Strategies for Obtaining Second Opinions
Interpretations that incorporated second opinions were defined by considering each possible pair of participants interpreting a case independently and, when 2 interpretations disagreed, resolution by a third, independent interpretation. Resolution was achieved by assigning the sample to the MPATH-Dx diagnostic category identified by 2 of 3 participants or, if all 3 physicians disagreed, assigning the middle diagnosis (Figure 1). All possible paired combinations were considered because we sought to examine, on average, how second opinions were associated with accuracy. We evaluated 10 strategies for obtaining a second opinion, detailed in eTable 1 in the Supplement. The first 6 strategies were based on the initial diagnostic interpretation, whereas the last 4 strategies were based on the primary and consulting physicians’ professional qualifications.
We calculated rates of overinterpretation, underinterpretation, and overall misclassification relative to reference consensus diagnoses. Overinterpretation was defined as classification at a more severe MPATH-Dx category, underinterpretation as classification at a lower category, and misclassification as either overinterpretation or underinterpretation. For implementation, we combined the independent interpretations of the study participants (Figure 1). When a second opinion agreed with the initial diagnosis, that shared diagnosis became the final diagnosis. When a second opinion disagreed with the initial diagnosis, a third opinion was obtained. The final diagnosis was then either the majority or middle diagnosis (in cases in which all 3 opinions differed). The percentage of cases in which a second and third opinion was required for each of the strategies is shown in eTable 2 and eTable 3 in the Supplement.
Second-opinion strategies were implemented by creating ordered data records of interpretations for every case and for every 3 participants who interpreted the case, and the majority or middle interpretation was selected as the final assessment. This approach was analytically equivalent to that described in the preceding paragraph and correctly weighted interpretations of cases in which the second opinion agreed and disagreed with the initial interpretation. The approach produced 11 603 808 data records of 3 independent interpretations of the same case: 39 pathologists interpreted the 48 cases in test set A, 36 interpreted test set B, 38 interpreted test set C, 36 interpreted test set D, and 38 interpreted test set E, yielding a total of 48 × (39 × 38 × 37 + 36 × 35 × 34 + 38 × 37 × 36 + 36 × 35 × 34 + 38 × 37 × 36) = 11 603 808 ordered triple interpretations.
The 95% CIs for the overinterpretation and underinterpretation and overall misclassification rates used percentiles of the bootstrap distribution of each rate in which resampling of participants was performed 1000 times. Second-opinion interpretations that included the same participant for second or third interpretations were discarded from the bootstrapped estimates. The 2-sided P values for the Wald test of a difference in rates between the single participant and second-opinion strategies were derived from the bootstrap SE of the difference in rates.
A secondary analysis examined agreement rather than accuracy. To quantify agreement between single interpretations, rates were calculated from a simple cross-tabulation of all pairs of single interpretations of the same cases; because the computational burden of this approach for the 11 603 808 assessments involving second opinions was not tenable, we used a different approach. We paired all triple readings of each case with a random permutation of those triple readings for that case resulting in a mean of 48 350 triple reading pairs per case. After excluding triple reading pairs that involved the same reader in both readings, we combined data for all cases and calculated agreement statistics from the cross-tabulation. We replicated this procedure 1000 times and report mean agreement and κ statistics. All analyses were conducted with Stata, version 14 (StataCorp LLC).
Of 187 pathologists participating in this study, 113 were general pathologists (65 men [57.5%]; mean [range] age at survey, 53.7 [33.0-79.0] years) and 74 were dermatopathologists (49 men [66.2%]; mean [range] age at survey, 46.4 [33.0-77.0] years). Board-certified and/or fellowship-trained dermatopathologists were more likely than other participating pathologists to be affiliated with an academic medical center. The dermatopathologists reported fewer years of experience interpreting melanocytic skin lesions, a higher proportion of caseload involving interpretations of these lesions, and being held by their peers as experts in their interpretation (Table 1).
Among the 8976 initial case interpretations, physicians desired second opinions for 43.4% (n = 3899), most often for interpretation of severely dysplastic nevi and least often for interpretations of benign lesions (Figure 2). Second opinions were desired more often than mandated by policy except for interpretations of class V invasive melanoma. Physicians’ report of policy-mandated second opinions increased from MPATH-Dx class I (10.5%) through class II (18.1%), class III other (25.0%), class III melanoma in situ (40.1%), class IV (44.6%), and class V (47.8%).
Strategies 1 Through 6—Based on Initial Interpretation of the Case
The highest misclassification rate within diagnostic categories after a single interpretation was for moderately dysplastic nevus (75.3%; 95% CI, 73.2%-77.5%), followed by severely dysplastic nevus or melanoma in situ (59.6%; 95% CI, 57.4%-61.8%), invasive pT1a melanoma (57.2%; 95% CI, 54.9%-59.7%), at least invasive pT1b melanoma (27.9%; 95% CI, 26.0%-29.9%), and benign nevi (7.8%; 95% CI, 6.2%-9.4%) (Table 2). The overall misclassification rate for a single interpretation (47.9%; 95% CI, 46.7%-49.1%) was the performance reference for second-opinion strategies 1 through 6. The lowest misclassification rate resulted when second opinions were applied to all skin biopsy samples (strategy 1: 44.8%; 95% CI, 42.5%-47.1%; P < .001). With strategy 1, the rates for overinterpretation decreased from 6.3% (95% CI, 5.7%-6.9%) to 3.1% (95% CI, 2.4%-4.0%) and were unchanged for underinterpretation (41.6% [95% CI, 40.3%-43.0%] to 41.7% [95% CI, 39.2%-44.2%]).
For strategies 2 through 4 (second opinions obtained when initial interpretations were melanoma in situ or invasive melanoma), the overall misclassification rates were similar to the single interpretation misclassification rate. For strategies 5 and 6 (physician noted that they desire a second opinion [45.5%; 95% CI, 43.5%-47.5%] or second opinions would be desired or required in their practice for this type of case [45.3%; 95% CI, 43.2%-47.4%], respectively), misclassification rates were decreased compared with the single interpretation rate (47.9%; 95% CI, 46.7%-49.1%) (P < .001).
Strategies 7 Through 10—Based on Primary and Consulting Physicians’ Experience
When the initial physician was neither board certified nor fellowship trained in dermatopathology (ie, a general pathologist), the highest misclassification rate after single interpretation was for moderately dysplastic nevi (79.3%; 95% CI, 77.0%-81.6%), followed by severely dysplastic nevi or melanoma in situ (67.1%; 64.7%-69.7%), invasive pT1a melanomas (66.2%; 95% CI, 63.3%-69.5%), at least invasive pT1b melanomas (29.4%; 95% CI, 26.9%-31.9%), and benign melanocytic lesions (6.5%; 95% CI, 4.7%-8.2%) (Table 3). When the initial physician was board certified and/or fellowship trained (ie, a dermatopathologist), the misclassification rates after single interpretation decreased for moderately dysplastic nevi (69.3%; 95% CI, 65.5%-73.2%), severely dysplastic nevi or melanoma in situ (48.1%; 44.8%-51.0%), invasive pT1a melanomas (43.5%; 95% CI, 40.3%-46.6%), and at least pT1b melanomas (25.7%; 95% CI, 22.8%-28.5%).
The overall misclassification rate for single interpretations by general pathologists was 52.8% (95% CI, 51.3%-54.3%). For second-opinion strategies 7 through 9, misclassification rates were reduced, with the rate for strategy 9 (both second and third opinions obtained from dermatopathologists) being lowest (40.7%; 95% CI, 38.4%-43.1%; P < .001). With strategy 9, the rates for overinterpretation decreased from 6.1% (95% CI, 5.4%-6.9%) to 3.3% (95% CI, 2.4%-4.4%) and decreased for underinterpretation from 46.7% (95% CI, 44.9%-48.3%) to 37.4% (95% CI, 35.0%-40.0%) (Table 3).
For strategy 10, all of the interpretations (initial, second, and, if needed, third opinions) were obtained from a dermatopathologist, giving an overall misclassification rate for single interpretations of 40.5% (95% CI, 38.8%-42.1%). For second-opinion strategy 10 (second and third opinions by dermatopathologists), the rate was reduced to 36.7% (95% CI, 33.1%-40.7%) (P = .01). With this strategy, the rates for overinterpretation decreased from 6.5% (95% CI, 5.5%-7.4%) to 3.8% (95% CI, 2.2%-6.1%) and rates for underinterpretation were 34.0% (32.0%-35.9%) and 33.0% (95% CI, 29.0%-37.5%), respectively.
Agreement Among Interpretations
Our secondary analysis evaluated agreement among interpretations. The mean between physician pairwise rate for single interpretations of a case was 54.8%, whereas the corresponding rate for interpretations that included second opinions was 64.3%. The corresponding κ statistics were 0.42 and 0.55, indicating, at most, moderate levels of agreement.
Studies from the past several decades have uncovered limited diagnostic reliability of histopathologic criteria in a clinically important segment of melanocytic lesion categories.13-16 The largest controlled study to date found that limitations predominantly affect the spectrum from dysplastic nevus, through melanoma in situ, and pT1a melanoma,1,17 corresponding to MPATH-Dx classes II through IV.11 A population-based analysis estimated that the spectrum of melanocytic skin biopsies affected by poor diagnostic reliability represents approximately 15% of all melanocytic lesions subject to biopsy.18
The diagnostic process derives from physician assessment of phenotypic alterations along criteria scales, among which are size (ie, diameter), symmetry, circumscription, pagetoid scatter, nuclear atypia, and maturation effects. These attributes constitute more or less continuous scales of variability, with no objective break points. Thus, the diagnostic process for current classifications devolves to pathologist-dependent threshold setting, in which the pathologist determines that sufficient abnormalities do or do not exist along the scales for definitive classification. When histopathologic imagery is parsed to increasingly fine detail, the limits of subjective histopathologic interpretation are reflected in escalating rates of diagnostic discordance.
Our study found that second-opinion strategies of primary and consulting pathologists in which first, second, and third reviewers were fellowship-trained and/or board-certified dermatopathologists yielded the lowest misclassification rates. Conversely, the highest misclassification rates occurred with general pathologists. However, the absolute magnitude of calculated differences, while statistically significant, was modest; improvement in rates for overdiagnosis and underdiagnosis and for misclassification ranged from less than 5% to 10% for most comparisons with single physician results. Misclassification rates were also improved with the strategy of requiring second opinions for all lesions, but this protocol is not feasible and likely not cost-effective in general practice. Among melanomas in situ and thin invasive (pT1a per the American Joint Committee on Cancer) melanomas, only small improvements in misclassification rates occurred when second and third opinions were factored into the diagnostic algorithm. Of note, it is this portion of the melanocytic lesion spectrum for which general policy mandates are often in place. The same intractability to second-opinion strategies applies to gradations of melanocytic dysplasia. An evaluation of strategies for breast histopathologic examination19 reported that second opinions improved diagnostic agreement for benign and invasive lesions but less so for lesions with intermediate atypia.
The observation that none of the second-opinion strategies eliminated or substantially reduced diagnostic variability underscores the need to advance the basic science of melanocytic proliferations and implement alternative diagnostic technologies. At present, simplifying current classification schemes to eliminate clinically irrelevant distinctions could be considered. This approach was adopted in the recent World Health Organization classification,20 which combined the categories of mild dysplasia and banal nevi, yielding a 2-tier categorization of low- and high-grade dysplasia.
Among the limitations of this study, all interpretations were by completely independent reviewers, with blinding of the primary and consulting assessments and no consensus discussion. A consultant in the real world is usually cognizant of the primary reviewer’s diagnosis and not blinded; however, studying differing second-opinion strategies in clinical practice would not be feasible because of complexity, cost, liability issues, privacy concerns, and/or regulatory constraints. Our approach provided the most feasible current estimates of how differing second-opinion strategies could affect diagnostic fidelity. The composition of Melanoma Pathology Study cases is different from clinical practice because case selection was intentionally weighted to achieve statistical power for lesions in the intermediate sector of melanocytic proliferations (MPATH-Dx classes II-IV). Because of this case composition, we provided results by the individual MPATH-Dx class. Furthermore, we did not address the role of advanced clinical experience, the nonequivalency of consultants, and the corresponding effects of such factors on accuracy in histologic diagnosis.
This study’s results suggest that there is uncertainty in the interpretations of some proportion of difficult melanocytic lesions by both primary observers and the consultants who render second opinions. However, a second opinion contributed by an expert may facilitate consensus about the appropriate management of a difficult melanocytic lesion.
Accepted for Publication: August 15, 2019.
Published: October 11, 2019. doi:10.1001/jamanetworkopen.2019.12597
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Piepkorn MW et al. JAMA Network Open.
Corresponding Author: Joann G. Elmore, MD, MPH, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, 1100 Glendon Ave, Ste 900, Los Angeles, CA 90024 (jelmore@mednet.ucla.edu).
Author Contributions: Drs Elmore and Pepe had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Piepkorn, Reisch, Elder, Pepe, Tosteson, Nelson, Onega, Elmore, Barnhill.
Acquisition, analysis, or interpretation of data: Piepkorn, Longton, Elder, Pepe, Kerr, Nelson, Knezevich, Radick, Shucard, Onega, Carney, Elmore, Barnhill.
Drafting of the manuscript: Piepkorn, Reisch, Radick, Shucard, Elmore, Barnhill.
Critical revision of the manuscript for important intellectual content: Piepkorn, Longton, Reisch, Elder, Pepe, Kerr, Tosteson, Nelson, Knezevich, Radick, Onega, Carney, Elmore, Barnhill.
Statistical analysis: Longton, Pepe, Barnhill.
Obtained funding: Tosteson, Onega, Elmore.
Administrative, technical, or material support: Piepkorn, Reisch, Tosteson, Radick, Shucard, Carney, Barnhill.
Supervision: Knezevich, Carney, Barnhill.
Conflict of Interest Disclosures: Drs Piepkorn, Elder, Knezevich, and Barnhill are practicing dermatopathologists and provide second opinions in their clinical practices. Mr Longton reported receiving grants from the University of Washington during the conduct of the study. Dr Elder reported receiving personal fees from Myriad Genetics outside the submitted work. Dr Elmore reported receiving royalties from Wolters Kluwer outside the submitted work. No other disclosures were reported.
Funding/Support: This work was supported by grants R01 CA151306 and CA201376 from the National Cancer Institute, National Institutes of Health.
Role of the Funder/Sponsor: The National Institutes of Health had no role in the design or conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Disclaimer: The content of this study is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Additional Contributions: We thank the study participants for their commitment to improving clinical care in the field of dermatopathology.
1.Elmore
JG, Barnhill
RL, Elder
DE,
et al. Pathologists’ diagnosis of invasive melanoma and melanocytic proliferations: observer accuracy and reproducibility study.
BMJ. 2017;357:j2813. doi:
10.1136/bmj.j2813PubMedGoogle ScholarCrossref 9.Carney
PA, Frederick
PD, Reisch
LM,
et al. How concerns and experiences with medical malpractice affect dermatopathologists’ perceptions of their diagnostic practices when interpreting cutaneous melanocytic lesions.
J Am Acad Dermatol. 2016;74(2):317-324. doi:
10.1016/j.jaad.2015.09.037PubMedGoogle ScholarCrossref 10.Carney
PA, Reisch
LM, Piepkorn
MW,
et al. Achieving consensus for the histopathologic diagnosis of melanocytic lesions: use of the modified Delphi method.
J Cutan Pathol. 2016;43(10):830-837. doi:
10.1111/cup.12751PubMedGoogle ScholarCrossref 16.Xiong
MY, Rabkin
MS, Piepkorn
MW,
et al. Diameter of dysplastic nevi is a more robust biomarker of increased melanoma risk than degree of histologic dysplasia: a case-control study.
J Am Acad Dermatol. 2014;71(6):1257-1258.e4. doi:
10.1016/j.jaad.2014.07.030PubMedGoogle ScholarCrossref 19.Elmore
JG, Tosteson
AN, Pepe
MS,
et al. Evaluation of 12 strategies for obtaining second opinions to improve interpretation of breast histopathology: simulation study.
BMJ. 2016;353:i3069. doi:
10.1136/bmj.i3069PubMedGoogle ScholarCrossref 20.Elder
D, Massi
D, Scolyer
RA, Willemze
R, eds. WHO Classification of Skin Tumours. Vol 11. 4th ed. Geneva, Switzerland: World Health Organization; 2018.