Receiver operating characteristic curves: biopsy or referral accuracy of dermatologists vs primary care physicians (PCPs). Squares represent the dermatologist data and triangles represent the PCP data. The crossing of the 2 curves indicates that there is either no or insufficient evidence to distinguish between the biopsy or referral accuracy of dermatologists and PCPs. TPR indicates true-positive rate; FPR, false-positive rate.
Robustness analysis: primary care physicians (PCPs) with 10% worse sensitivity and specificity. Squares represent the dermatologist data and triangles represent the PCP data. The location of the dermatologist curve (above and to the left) relative to the PCP curve suggests that dermatologists are more accurate. TPR indicates true-positive rate; FPR, false-positive rate.
Robustness analysis: primary care physicians (PCPs) with 10% higher sensitivity but 10% lower specificity than dermatologists. Squares represent the dermatologist data and triangles represent the PCP data. The curves appear as though they might cross, suggesting that there is insufficient evidence to distinguish between dermatologists and PCPs in biopsy or referral accuracy. TPR indicates true-positive rate; FPR, false-positive rate.
Chen SC, Bravata DM, Weil E, Olkin I. A Comparison of Dermatologists' and Primary Care Physicians' Accuracy in Diagnosing MelanomaA Systematic Review. Arch Dermatol. 2001;137(12):1627-1634. doi:10.1001/archderm.137.12.1627
DamianoAbeniMD, MPHMichaelBigbyMDPaoloPasquiniMD, MPHMoysesSzkloMD, MPH, DrPHHywelWilliamsMD
To compare the accuracy of dermatologists and primary care physicians (PCPs) in identifying pigmented lesions suggestive of melanoma and making the appropriate management decision to perform a biopsy or to refer the patient to a specialist.
Studies published between January 1966 and October 1999 in the MEDLINE, EMBASE, and CancerLit databases; reference lists of identified studies; abstracts from recent conference proceedings; and direct contact with investigators. Medical subject headings included melanoma, diagnosis, screening, primary care, family practitioner, general practitioner, internal medicine, dermatologist, and skin specialist. Articles were restricted to those involving human subjects.
Studies that presented sufficient data to determine the sensitivity and specificity of dermatologists' or PCPs' ability to correctly diagnose lesions suggestive of melanoma and to perform biopsies on or refer patients with such lesions.
Two reviewers independently abstracted data regarding the sensitivity and specificity of the dermatologists and PCPs for diagnostic and biopsy or referral accuracy. Disagreements were resolved by discussion. The quality of the studies was also evaluated.
Thirty-two studies met inclusion criteria; 10 were prospective studies. For diagnostic accuracy, sensitivity was 0.81 to 1.00 for dermatologists and 0.42 to 1.00 for PCPs. None of the studies reported specificity for dermatologists; one reported specificity for PCPs (0.98). For biopsy or referral accuracy, sensitivity ranged from 0.82 to 1.00 for dermatologists and 0.70 to 0.88 for PCPs; specificity, 0.70 to 0.89 for dermatologists and 0.70 to 0.87 for PCPs. Receiver operating characteristic curves for biopsy or referral ability were inconclusive.
The published data are inadequate to demonstrate differences in dermatologists' and PCPs' diagnostic and biopsy or referral accuracy of lesions suggestive of melanoma. We offer study design suggestions for future studies.
MANY managed care companies require primary care physicians (PCPs) to screen patients with pigmented lesions and refer only patients with worrisome lesions to dermatologists. However, many dermatologists argue that they are able to more efficiently diagnose pigmented lesions and provide ongoing care for these patients. Additionally, because of the subtleties of presentation and the consequences of missing a melanoma, PCPs have reported feeling uncomfortable screening skin for melanomas and would prefer to refer lesions suggestive of melanoma to dermatologists.1 The purpose of this study is to perform a systematic review of the literature that informs the policy debate regarding direct access to dermatologists vs a "gatekeeper system" for the management of pigmented lesions.
Evidence supporting or refuting dermatologists' claim to be better than PCPs at melanoma diagnosis and management should come from studies that directly compare the 2 physician types on the basis of 2 characteristics: diagnostic accuracy (DA) and biopsy or referral (B/R) accuracy. Diagnostic accuracy refers to the ability of a physician to correctly identify a lesion as melanoma or nonmelanoma. Biopsy or referral accuracy refers to the ability of a physician to correctly determine that a lesion may be malignant and to make the appropriate management decision either to order a biopsy or to refer the patient to a melanoma specialist. These 2 skills are related, but in the clinical setting, the latter is more important because, even if the physician does not correctly identify the lesion at first glance, the lesion will be diagnosed correctly if the appropriate steps are taken.
Biopsy or referral accuracy can be defined in many ways, depending on the gold standard used. For example, some argue that the pathologic condition should be the gold standard. In this case, physicians who perform biopsies or refer patients with lesions with benign histologic features to specialists on the basis of the clinical features of the melanoma would be considered to have poor B/R accuracy. In contrast, if the gold standard is a list of clinical criteria compiled by a panel of melanoma experts, regardless of the pathologic condition, then the physicians who perform biopsies or refer patients with lesions with benign histologic features to specialists on the basis of clinical features would be considered to have good B/R accuracy. Our review includes articles that use either of these gold standards.
To compare dermatologists and PCPs with respect to their DA and B/R accuracy, both types of physicians should be rigorously evaluated using the assessment techniques usually applied to diagnostic and screening tests.2 Sensitivity and specificity of both types of physicians should be directly and concurrently compared. In terms of melanoma DA, sensitivity is the ability to detect (or suspect) a melanoma in a patient with a melanoma. Specificity is the ability to reassure a patient who does not have a melanoma that he or she does not have any lesions suggestive of melanoma. Dermatologists may have both higher sensitivity and specificity than PCPs, but as is often the case with diagnostic tests, one characteristic may be enhanced at the expense of the other.3,4 A receiver operating characteristic (ROC) curve represents the trade-off between sensitivity and specificity for a diagnostic test.5,6 To date, several small studies have evaluated PCPs and/or dermatologists in terms of their sensitivity and/or specificity of melanoma DA and/or B/R accuracy. However, the results are conflicting and have not included ROC curves.
We systematically reviewed the literature comparing the DA and B/R accuracy of dermatologists and PCPs for melanoma. The purpose of our study was to synthesize these data and construct summary ROC curves to determine whether dermatologists have better DA and B/R accuracy for potential melanomas than PCPs.
We considered studies eligible for this analysis if they reported either sensitivity or specificity for either DA or B/R accuracy of either PCPs or dermatologists, or if they reported sufficient data to calculate these values. Studies were excluded if they did not specify the type of physician making the diagnosis or if they did not report data specific to melanoma. Studies whose main text was not presented in English were included if an existing English-translated abstract was available. We contacted the authors of all studies not reporting data in a usable format.
One author (E.W.), in collaboration with a research librarian, performed systematic searches of MEDLINE, EMBASE, and CancerLit databases from January 1966 to October 1999, including the following medical subject headings: melanoma, diagnosis, screening, primary care, family practitioner, general practitioner, internal medicine, dermatologist, and skin specialist. We also searched bibliographies of retrieved articles, the proceedings of the American Academy of Dermatology (AAD) national conferences from 1997 to 1999, and the Science Citation Index.
One author (E.W.) reviewed all titles and abstracts identified in the search for potentially relevant articles. Two authors (S.C. and E.W.) independently abstracted data from the included studies. We resolved discrepancies between abstractors by repeated review and discussion. Abstracted data included information regarding study design, type of physician, lesion characteristics (eg, early melanoma or late melanoma), DA (reported either as sensitivities or number of correctly identified lesions), and B/R accuracy, as defined by the study's gold standard.
One study categorized their subjects as "skin care specialists."7 These included a dermatologist, a plastic surgeon, and an oncologist who had been highly trained to identify melanoma. In another study, a surgeon was considered part of the melanoma management team.8 We categorized these subjects in the dermatology group. The dermatologists included those in the academic setting and private practice. The PCP category included those providers who were identified as family practitioners, internists, and generalists. We included residents in both the dermatologist and PCP categories, but excluded medical students.
We assessed the designs of each study based on an established assessment framework for rating the appropriateness of diagnostic test evaluations.9 We evaluated each study to determine if it satisfied each of 7 study design criteria: (1) reported both sensitivity and specificity; (2) directly compared dermatologists and PCPs; (3) compared fully trained physicians with other similarly trained physicians (ie, did not include resident physicians); (4) provided at least a cursory description of the lesions shown to the subjects (ie, it described lesions as "early" or "late" and as "superficial spreading," "lentigo maligna," or "nodular melanoma" rather than just describing a lesion as "melanoma" without further description); (5) showed subjects more than one lesion; (6) included both early- and late-stage melanomas; and (7) included an adequate sample size of dermatologists and PCPs. To assess the number of physicians needed to be included in an analysis, we performed a power calculation based on a comparison of 2 sensitivities.10 Using the summary sensitivities for PCPs and dermatologists, our power calculations determined that 69 dermatologists and 69 PCPs were required for the study to be adequately powered to detect a 0.20 difference in sensitivity.
Note that even if a study had been appropriately designed, readers cannot appreciate this if the authors did not report their methods and results adequately. We make this distinction since we based our assessment of study design on what was reported.
We used a computer spreadsheet program (Microsoft Excel 5.0; Microsoft Corp, Redmond, Wash) for database management and statistical analyses.
For those studies that did not provide sensitivity or specificity but reported sufficient data to calculate it, 2 authors (S.C.C. and E.W.) independently calculated these values. We defined sensitivity as the number of melanomas that were identified correctly (true positives) divided by the total number of melanomas (true positives plus false negatives). We defined specificity as the number of nonmelanomas that were identified correctly (true negatives) divided by the total number of nonmelanomas (true negatives plus false positives).
From those studies that reported sensitivity and specificity for both dermatologists and PCPs, we calculated summary ROC curves. The ROC curves are a graphical method for comparing 2 or more diagnostic tests. They depict the trade-off between the true-positive rate (TPR) or sensitivity (plotted on the y-axis) and the false-positive rate (FPR) or 1 − specificity (x-axis).5,6 The best diagnostic test is the one with the highest TPR and the lowest FPR (ie, the one occupying the upper left-hand corner of the ROC curve plot).3,5 If the ROC curves cross, a single best diagnostic test cannot be identified, rather we conclude that either there is no difference between the tests or that there are insufficient data to distinguish between the tests.3,5
To quantitatively synthesize the results of multiple studies, we used a meta-analytic method to construct summary ROC curves.11,12 The ROC curves illustrate the trade-off between sensitivity and specificity as the threshold for defining a positive test result varies from most stringent to least stringent. The method we used to calculate a summary ROC curve is based on the assumption that each individual study represents a unique point on a common ROC curve. Our method for constructing summary ROC curves is described in greater detail elsewhere.9,13 Briefly, we logistically transformed the TPR and FPR and fit a summary ROC curve with linear regression.11,12 Although the method for developing a summary ROC curve differs from the method for developing an ROC curve from a single study, the summary ROC curve also estimates the trade-off between sensitivity and specificity for a diagnostic test.3
Robustness analyses test the stability of conclusions from an analysis over a range of plausible estimates of the critical variables used in the analysis. We used robustness analyses to test the hypothesis that the data are insufficient to draw conclusions between dermatologists and PCPs. If the conclusion changes by varying the values of sensitivity and specificity, then the robustness analysis suggests that more data are necessary. The more commonly accepted term for these calculations is sensitivity analysis, but to minimize the confusion between sensitivity and specificity and sensitivity analyses, we will use the phrase robustness analyses.
We performed a series of robustness analyses of the summary ROC curves by varying the PCP sensitivity and specificity relative to that of the dermatologists to model a range of plausible clinical scenarios. For example, one clinical scenario could be that PCPs who might be somewhat unsure of their diagnostic skills refer every patient with a pigmented lesion to dermatologists (ie, they never miss a patient with melanoma but they refer many patients with benign lesions). In this case, PCPs would have higher sensitivity but lower specificity than dermatologists. In our robustness analysis of this scenario, we calculated new summary ROC curves, assuming PCPs to have 10% higher sensitivity but 10% lower specificity than dermatologists. The other clinically plausible scenario is one in which PCPs are both less sensitive and less specific than dermatologists. We modeled this scenario by setting PCPs' sensitivity and specificity to 10% less than that of dermatologists'. We then compared the new summary ROC curves to the baseline summary ROC curves. If the conclusion drawn from the ROC curves changed as to which physician type had better B/R accuracy, then we reported that the baseline ROC curve results were not robust but were highly dependent on the values of physician sensitivity and specificity.
Our literature search identified 582 titles of potentially relevant articles. On further inspection of the bibliography of 120 titles, we obtained an additional 39 references for a total of 159 titles. A total of 462 were excluded because they were studies of basic science pathophysiology, new diagnostic tests, etiology, pediatric populations, therapeutics, or noncutaneous manifestations of melanoma. Two studies14,15 were excluded because they represented duplications of included studies.16,17 Another study did not report the number of melanomas that were shown to the subjects.18 We also excluded a study19 that evaluated referrals to a pigmented lesion clinic because referral to this clinic would imply that all the lesions were deemed suggestive of melanoma by the PCPs, yet the study reported that only a portion of the referrals were suggestive of melanoma to the referring physician. We excluded another study20 because the lesions shown to the subjects were cosmetically applied melanomas to volunteer "patients," whereas all the other studies were evaluations of actual lesions.
We did not analyze the DA of dysplastic nevi separately from that of melanoma, because only one study21 looked at the DA of dysplastic nevi. We excluded the dysplastic nevi DA study. We used dysplastic nevi reported in other studies as lesions for the B/R accuracy analysis. Another study22 reported the results of dysplastic nevi and melanoma together in such a way that we could use their data in our B/R accuracy analysis but not for melanoma DA. In total, 32 studies were eligible for our analysis (Table 1).7,8,16,17,22- 49 No additional articles were found by reviewing the proceedings of the AAD national conferences or by using the Science Citation Index.
The study designs of the included studies differed considerably, falling into 2 main categories: prospective assessments and retrospective histopathologic reviews. The prospective assessments evaluated a physician's ability to correctly identify a lesion that had been previously confirmed by histologic analysis. Subjects were shown slides, photographs, or patients' lesions that were suggestive of melanoma and asked to identify the lesions as either melanoma or nonmelanoma. Some studies permitted the physicians to give a list of differential diagnoses. Other researchers accepted only one diagnosis or only considered the first diagnosis if a list was given. Several of the studies mailed photographs and questionnaires to the physicians. None of the prospective studies were foreign language articles.
In the retrospective reviews, registries from pathology laboratories were reviewed for melanomas. The requisition forms that accompanied the pathology specimens and the medical records were reviewed for the clinical diagnosis provided by the physicians when they sent the biopsy specimen to the pathology laboratory. The number of correct clinical diagnoses was reported in these studies. Only dermatologists were evaluated in the retrospective disease reviews. A few studies were retrospective reviews from skin cancer screening programs. Three of the retrospective disease reviews and one of the retrospective screening studies were from the English abstracts of foreign language articles.
Among the 32 included studies, there were 10 prospective studies of which 9 provided data for DA, and 1 only reported data for B/R accuracy.22 These 9 prospective studies provided data from 583 dermatologists and 2314 PCPs and represent the best available evidence regarding the comparison of dermatologists and PCPs for melanoma DA (Table 1). All 9 prospective studies are from different institutions; therefore, we assume that no physician was included more than once. The retrospective studies did not report data on the number of physicians, so we could not ascertain the number of dermatologists included in these studies.
Five of the 9 prospective studies plus the study by Dolan et al22 provided data for the B/R accuracy analysis. These studies included data on 106 dermatologists and 886 PCPs (Table 1). Two of the studies used histopathologic analysis as the gold standard to define whether the B/R decision was correct, whereas the other 4 used an expert panel of physicians (Table 2). There was one retrospective B/R accuracy study8; it used histopathologic analysis as the gold standard but did not report data on the number of physicians. All 7 studies were from different institutions, so we assumed that no physician was included more than once in our analyses.
We report the DA in terms of sensitivity and specificity for each of the included studies in Table 2 and Table 3, but discuss only the values for the prospective studies.
The range of sensitivity for dermatologists was 0.81 to 1.00 for DA (calculated from 6 studies) and 0.42 to 1.00 for B/R accuracy (from 9 studies) for PCPs (Table 3). None of the studies reported specificity for dermatologists. Only one study reported specificity for PCPs (0.98).
The sensitivity ranged from 0.82 to 1.00 (from 5 studies) for dermatologists and 0.70 to 0.88 (from 6 studies) for PCPs. The range of specificity was 0.70 to 0.89 (from 3 studies) for dermatologists and 0.70 to 0.87 (from 4 studies) for PCPs.
Only one study reported specificity of DA; therefore, summary ROC curves for DA could not be generated. We were able to calculate summary ROC curves from the 3 studies that reported sufficient data for B/R accuracy.7,17,25 The summary ROC from these curves crossed (Figure 1), indicating that the available evidence does not demonstrate a difference in the B/R accuracy of dermatologists and PCPs.
We tested the robustness of the summary B/R accuracy ROC curve in 2 analyses. First, we reduced PCP B/R sensitivity and specificity to values that were 10% lower than that of the dermatologists. Predictably, the PCP curve was below and to the right of the dermatologist curve (Figure 2), indicating that under these assumptions the B/R accuracy of dermatologists could be demonstrated to be better than PCPs. Second, we set PCP sensitivity to 10% better than that of dermatologists and PCP specificity to 10% worse than that of dermatologists. The curves are skewed and appear likely to cross if more data were available (Figure 3), indicating that under these assumptions there would be no difference in the B/R accuracy of the 2 types of physicians. Note that although the ROC curve in Figure 1 is derived from the analysis of the prospective studies, the ROC curves in Figure 2 and Figure 3 are based on hypothetical but plausible scenarios.
We evaluated the 7 study design characteristics of the prospective studies (9 DA and 6 B/R accuracy) (Table 4). The most important design characteristic to determine the relative accuracy between the provider types is the reporting of sensitivity and specificity for dermatologists and PCPs. Although most studies did include both types of physicians (78% of DA and 83% of B/R accuracy studies), few (11%) of the DA studies included measures of both sensitivity and specificity. Sixty-seven percent of the B/R accuracy studies reported both sensitivity and specificity. For both types of studies, most (67% DA and 50% B/R accuracy) were confounded by including subjects at varying levels of medical training. Fewer than half of both types of studies (33% DA and 50% B/R accuracy) adequately described the type of lesions shown to their subjects. For both types of studies, most (56% DA and 83% B/R accuracy) showed more than one potential melanoma to their subjects. For the DA studies, we found that 33% showed at least one early-stage melanoma and 33% showed at least one late-stage melanoma to subjects. Eleven percent of the DA and none of the B/R accuracy studies had an appropriate sample size.
We systematically reviewed and quantitatively synthesized the literature pertaining to the DA and B/R accuracy of dermatologists and PCPs for melanoma. Our results demonstrate that the currently available data are insufficient to support any policy regarding the use of either a gatekeeper system or direct access to dermatologists for melanoma care. We conclude that more well-designed studies are needed to determine either the superiority of dermatologists or the adequacy of PCPs.
In our discussion, we have distinguished between DA and B/R accuracy. Although B/R accuracy, or the ability to correctly identify a malignant lesion suggestive of melanoma, is more clinically relevant, most of the published studies evaluated only DA. Three B/R accuracy studies provided sensitivity and specificity for both PCPs and dermatologists and could be combined into summary ROC curves. The summary ROC curves (Figure 1) of dermatologists and PCPs intersect, suggesting that the available data are insufficient to demonstrate any difference between PCPs and dermatologists regarding B/R accuracy. This is not surprising, since these curves were calculated from only 3 studies. The robustness analyses showed that with as little as 10% difference in sensitivity and/or specificity, the conclusion of whether PCPs or dermatologists are more accurate becomes equivocal, further emphasizing the need for additional data.
Design limitations of the included studies must be considered in the interpretation of these results. The study designs of the B/R accuracy studies differed because they had different purposes. Two of the studies included in the ROC curve evaluated subjects on whether or not they correctly identified a lesion as being suggestive of melanoma, whereas the third study queried subjects as to the appropriate course of action for a given lesion. We analyzed these heterogeneous studies in a dichotomous manner (the subjects either suspected a potentially malignant lesion or they did not) and thus were able to combine them with the ROC curve. Additionally, our results may have been biased in favor of dermatologists, since we considered as part of the dermatologist groups those nondermatologists who had received additional specialized training in the identification and management of pigmented lesions.
The results of our assessment of study design characteristics also underscore the need for better data. We found that most of the studies did not report both sensitivity and specificity, did not have an adequate sample size, and did not adequately describe the lesions shown to the subjects. Most of the prospective studies incorporated resident physicians. Although it may be important to assess the DA of resident physicians for other purposes, given that the ability of resident and attending physicians to accurately diagnose and manage melanoma lesions is likely to differ significantly, for the purpose of comparing across specialties, residents and attending physicians should be evaluated separately. Most of the studies that evaluated DA did not study lesions of various stages of malignancy (eg, early melanoma and late melanoma). Another common limitation was the small number of subjects and the small number of lesions shown to the subjects. In several studies only one melanoma was shown to varying numbers of physicians, whereas in other studies multiple pigmented lesions were shown to a smaller number of subjects.
This heterogeneity in study design is predominantly a result of the fact that the included studies were designed to evaluate other clinical questions. Of the 10 prospective studies that fulfilled our inclusion criteria (9 used for the melanoma DA and 1 for B/R accuracy), only one27 was designed specifically to evaluate dermatologists' and PCPs' DA for melanoma. Six studies7,17,22- 25 were designed to test subjects on their DA for a variety of dermatologic diseases, whereas 2 studies28,29 were designed to test the DA for all types of skin cancer.
A study designed for the purpose of comparing dermatologists' and PCPs' DA and B/R accuracy should, at a minimum, satisfy the following criteria: (1) evaluate and report sensitivity and specificity for dermatologists and PCPs; (2) include an adequate sample size of physicians and lesions; (3) evaluate and report results of fully trained physicians separately from those in training; and (4) present an adequate number of lesions, reflecting the full clinical spectrum of melanoma.
We used a power calculation based on summary ROC curves to determine the sample size needed to detect a 20% difference between the sensitivities of PCPs and dermatologists. For an ideal prospective study, investigators will have to either consider both sensitivity and specificity in their power calculation or justify why only one was used when determining the appropriate numbers of physicians and lesions to include in their evaluation.
Only data obtained from studies that meet strict design criteria, such as those described herein, should be used to inform guidelines regarding the use of gatekeepers or direct access of patients with lesions suggestive of melanoma to dermatologists. Additionally, future studies should compare dermatologists to PCPs with and without additional specialized training in the diagnosis and management of pigmented lesions. It should be noted that although DA is an important factor in the direct access argument, it should be combined with outcomes data. If dermatologists prove to be better in their DA and their suspicion accuracy but the outcomes of patients do not differ between the gatekeeper approach and the direct access approach, it will be difficult to convince policymakers of the advantages of direct access. Similarly, if dermatologists prove to be better but much more costly, policymakers may not feel that the added cost of direct access is worth the added benefit.
A cooperative effort of the Clinical Epidemiology Unit of the Istituto Dermopatico dell'Immacolata–Istituto di Ricovero e Cura a Carattere Scientifico (IDI-IRCCS) and the Archives of Dermatology
Accepted for publication August 8, 2001.
This study was supported by a Dermatology Foundation (Evanston, Ill) Health Care Policy Clinical Career Development Award (Dr Chen), a Veterans Affairs Ambulatory Care Fellowship (Dr Bravata), Veterans Affairs Health Care System, Palo Alto, Calif, and National Science Foundation (Arlington, Va) grant DMS 96 26 265 (Dr Olkin).
Presented at the International Dermatoepidemiology Association held in association with the 2000 Annual Meeting of the Society for Investigative Dermatology, Chicago, Ill, May 13, 2000.
We thank Janet Morrison for her assistance with literature searching.
Corresponding author and reprints: Suephy C. Chen, MD, MS, 5001 Woodruff Memorial Bldg, 1639 Pierce Dr, Atlanta, GA 30322 (e-mail: firstname.lastname@example.org).