A, Receiver operator characteristic curve for melanomas vs all benign pigmented nonmelanomas; B, receiver operator characteristic curve for melanomas vs benign melanocytic lesions.
The normalized frequency of melanoma and benign melanocytic lesions as a function of algorithm index.
The SolarScan (Polartechnics Ltd, Sydney, Australia) “probability of lesion being melanoma” (Pmel) output as a function of algorithm index. The algorithm probability of melanoma is plotted as a function of algorithm index (solid line) (see the “Methods” section). The cutoff between melanoma and nonmelanoma is shown by the dashed line (index, 0.246; probability, 7.25%). The box plots of the median and interquartile ranges of absolute differences of probability between repeated images (intrainstrument error) are shown within the index ranges of 0 to 0.2, 0.2 to 0.4, 0.4 to 0.6, 0.6 to 0.8, and 0.8 to 1.0.
Menzies SW, Bischof L, Talbot H, Gutenev A, Avramidis M, Wong L, Lo SK, Mackellar G, Skladnev V, McCarthy W, Kelly J, Cranney B, Lye P, Rabinovitz H, Oliviero M, Blum A, Virol A, De’Ambrosis B, McCleod R, Koga H, Grin C, Braun R, Johr R. The Performance of SolarScanAn Automated Dermoscopy Image Analysis Instrument for the Diagnosis of Primary Melanoma. Arch Dermatol. 2005;141(11):1388-1396. doi:10.1001/archderm.141.11.1388
To describe the diagnostic performance of SolarScan (Polartechnics Ltd, Sydney, Australia), an automated instrument for the diagnosis of primary melanoma.
Images from a data set of 2430 lesions (382 were melanomas; median Breslow thickness, 0.36 mm) were divided into a training set and an independent test set at a ratio of approximately 2:1. A diagnostic algorithm (absolute diagnosis of melanoma vs benign lesion and estimated probability of melanoma) was developed and its performance described on the test set. High-quality clinical and dermoscopy images with a detailed patient history for 78 lesions (13 of which were melanomas) from the test set were given to various clinicians to compare their diagnostic accuracy with that of SolarScan.
Seven specialist referral centers and 2 general practice skin cancer clinics from 3 continents. Comparison between clinician diagnosis and SolarScan diagnosis was by 3 dermoscopy experts, 4 dermatologists, 3 trainee dermatologists, and 3 general practitioners.
Images of the melanocytic lesions were obtained from patients who required either excision or digital monitoring to exclude malignancy.
Main Outcome Measures
Sensitivity, specificity, the area under the receiver operator characteristic curve, median probability for the diagnosis of melanoma, a direct comparison of SolarScan with diagnoses performed by humans, and interinstrument and intrainstrument reproducibility.
The melanocytic-only diagnostic model was highly reproducible in the test set and gave a sensitivity of 91% (95% confidence interval [CI], 86%-96%) and specificity of 68% (95% CI, 64%-72%) for melanoma. SolarScan had comparable or superior sensitivity and specificity (85% vs 65%) compared with those of experts (90% vs 59%), dermatologists (81% vs 60%), trainees (85% vs 36%; P =.06), and general practitioners (62% vs 63%). The intraclass correlation coefficient of intrainstrument repeatability was 0.86 (95% CI, 0.83-0.88), indicating an excellent repeatability. There was no significant interinstrument variation (P = .80).
SolarScan is a robust diagnostic instrument for pigmented or partially pigmented melanocytic lesions of the skin. Preliminary data suggest that its performance is comparable or superior to that of a range of clinician groups. However, these findings should be confirmed in a formal clinical trial.
Although early detection of melanoma is critical for controlling mortality from the disease, it is clear that diagnostic accuracy in the field is suboptimal.1,2 Therefore, a considerable effort has gone into producing automated diagnostic instruments (so-called machine diagnosis) for primary melanoma of the skin. Studies conducted before March 20023 and after March 20024- 11 were reviewed; from these reviews, basic quality requirements for describing such instruments were outlined3: (1) selection of lesions should be random or consecutive; (2) inclusion and exclusion criteria should be clearly stated; (3) all lesions clinically diagnosed as melanocytic should be analyzed; (4) the study setting should be clearly defined; (5) to avoid verification bias, clearly benign lesions that were not excised should be included, with the diagnostic gold standard being short-term follow-up with digital monitoring; (6) instrument calibration should be reported; (7) repeatability analysis should be carried out (interinstrument and intrainstrument); (8) classification should be carried out on an independent test set; and (9) computer diagnosis should be compared with human diagnosis.
We have previously published12 pilot data on an automated diagnostic instrument (Mk1 Skin PolarProbe; Polartechnics Ltd, Sydney, Australia), which uses image analysis of dermoscopy (surface microscopy) features of pigmented skin lesions. Following that report, the digital surface microscopy (dermoscopy) video instrument SolarScan (Polartechnics Ltd) was developed, and data were collected from 9 clinical sites around the world. Herein we assess the performance of this instrument in terms of these quality requirements.
Between the period of June 15, 1998, and September 30, 2003, images were taken of pigmented skin lesions using SolarScan from 9 clinical centers. Of these, 7 were specialist referral centers: the Sydney Melanoma Unit (Sydney Melanoma Diagnostic Centre), Sydney, Australia; Skin and Cancer Associates, Miami, Fla; Department of Dermatology, University of Tübingen, Tübingen, Germany; the Skin and Cancer Foundation, Sydney; KellyDerm, the private clinic of one of the authors (J.K.), Melbourne, Australia; and South East Dermatology and the Princess Alexandra Hospital, Brisbane, Australia. Two centers were at private skin cancer clinics in Australia: Central Coast Skin Cancer Clinic, Gosford, and the Chatswood Skin Cancer Clinic, Sydney, all staffed by general practitioners. Images were taken after formal written consent by patients, and the research protocol was reviewed by the local ethics committee of each clinic site.
The instrument specifications of the SolarScan have been described previously.13 In addition to imaging, a patient history was recorded that indicated whether the lesion had, within the previous 2 years, bled without being scratched, changed in color or pattern, or increased in size (answer choices: yes, no, uncertain). In all but 1 clinic site, the sole indication for imaging was that the pigmented lesion was to be excised, usually because of a clinical suspicion. However, clinics were inconsistent in imaging excised lesions from their own practices, with some clinics obtaining images of lesions with a predominately high probability of melanoma. Reports of histopathologic findings provided by each clinic were then used as the gold standard for diagnosis. These lesions made up 71% of the data set. In 1 clinic site (Sydney Melanoma Unit), some images were taken of nonmelanocytic pigmented lesions that were diagnosed clinically but not excised. These lesions represented only 3% of the total image set. Also at the Sydney Melanoma Unit, melanocytic lesions that underwent short-term digital monitoring over a 3-month period and remained unchanged were classified as benign according to the previously described protocol.13 These lesions were either moderately atypical melanocytic lesions without a patient history of change or mildly atypical lesions with a history of change. These images represented 26% of the data set. In all centers, some repeated images were taken to permit a reproducibility analysis.
Lesions were excluded from analysis if they were outside the field of view (24 × 18 mm), could not be calibrated reliably because of contamination of calibration surfaces, or had excess artifacts (hair, air bubbles, or movement artifacts). Clipping excess hair before imaging was suggested. Lesions that were nonpigmented, ulcerated, or at an acral site, or that were diagnosed as pigmented basal cell carcioma, pigmented Bowen disease, or squamous cell carcinoma were also excluded. Although pure amelanotic lesions were excluded (using dermoscopy imaging of absent brown, blue, gray, or black pigmentation), partially pigmented or lightly pigmented lesions were included. Finally, lesions from anatomical areas that could not be imaged adequately using the SolarScan headpiece (eg, eyelids, some parts of the pinna, some genital sites, and perianal and mucosal surfaces) were unable to be assessed. The diagnostic frequency of the 2430 analyzed lesions are shown in Table 1.
Each image was calibrated using a procedure of black and white balance, shading correction, setup of camera dynamic range, and capture of an image of a reference surface of known reflectivity, followed by tracing of the colors of the captured lesion to a color space common for all SolarScan instruments, as previously described13 (System and Method for Examining, Recording and Analyzing Dermatological Conditions; US Patent filing No. 09/473270). The lesion border was then determined by a semiautomated procedure and confirmed as accurate by 2 clinicians (S.W.M. and H.K.). For those lesions in which the border was not correctly segmented by this procedure (24%), the lesion border was manually created. An automated procedure was then performed to mask out hair and air bubble artifacts. A total of 103 automated image analysis variables consisting of various properties of color, pattern, and geometry were extracted from the segmented lesion images (Diagnostic Feature Extraction in Dermatological Examination; US Patent filing No. 10/478078).
The entire set of 2430 lesions was divided into a training set and an independent test set at a ratio of approximately 2:1, respectively. These sets were created by a random allocation of lesions stratified by diagnostic category and Breslow thickness. Before algorithm development, each lesion diagnostic category was assigned a “weight” based on a linear representation (range, 0.25-20) of correctly classifying the lesion as benign or melanoma. These weights were arbitrarily determined based on danger of misdiagnosis, ease of clinical diagnosis, and frequency of diagnosis in the field. Melanomas were weighted as a function of Breslow thickness (weight, 5 × Breslow thickness in millimeters), from 1.0 (in situ) to 20 (≥4.0-mm Breslow thickness). Examples of other diagnostic weights are dysplastic or Spitz nevi, 0.25; other benign melanocytic lesions, 0.5; seborrheic keratoses, blue nevi, and hemangiomas requiring clinical diagnosis without excision, 0.75.
The patient history features described in the “Data Collection” subsection and the 103 image analysis variables, in combination with the diagnostic weights, were used in the training set to model 2 diagnostic algorithms (see the “Algorithm Model” subsection). First, we created a model differentiating melanomas from all pigmented benign nonmelanomas. Second, we formed a model differentiating melanomas from pigmented benign melanocytic lesions. We determined the diagnostic accuracy by running these optimized models on the independent test set.
The algorithm model used by SolarScan is an optimized set of fixed discriminant variables with associated weighting factors and relationships features (Australian Patent application No. 20022308395 and Australian Patent No. 2003905998).
We used the distributions of algorithm indices within our data set for melanoma and benign nonmelanocytic cases to calculate a point estimate of the probability of melanoma as a function of an index value. In this way, a new lesion could be analyzed and an algorithm index value and estimate of the probability of melanoma (based solely on our data set) derived. The method used to derive this probability function is as follows. The frequency distribution for melanoma cases as a function of algorithm index was fitted using Gaussian models with 2, 3, or 4 mixture components using an expectation maximization algorithm. The best fit was obtained with a 3-component model. A separate model for benign nonmelanocytic lesions was developed using a similar method, and in this case the best fit was obtained using a 2-component model. Both distributions were then normalized and scaled to the number of cases of each type to yield the relative likelihood, expressed as a function of the index value. The posterior probability of melanoma was then derived as the ratio of the value of the melanoma likelihood to the total likelihood. This method was applied only to the evaluation set and to the combined data set. No significant difference between the point estimates was observed except for areas with low representation in the evaluation set. Because the total data set is less prone to statistical noise for extreme values of index, the probability derived from the entire data set is used within the instrument.
Two sets of repeated images were used to test the intrainstrument reproducibility of the diagnostic algorithm. First, repeated images with an orientation of 90° rotation were taken of 387 lesions. Second, 304 images of lesions that were undergoing 3-month digital monitoring and that remained unchanged were collected and compared with their baseline image taken 3 months before. These were taken at the same orientation. In both of these sets, the images were processed as described herein and the algorithm probability calculated. The intraclass correlation coefficient (ICC) (3,1)14 was used to assess the intramachine reliability. Here, a coefficient greater than 0.75 indicates excellent reliability.15 We also denoted the reproducibility by describing the median of the algorithm probability differences between the repeated images and median experimental error. Here, the experimental error equals the difference between repeated lesion probabilities times 100 divided by the lesion probability. Finally, the repeatability of the algorithm diagnosis using the arbitrary index cutoff (ie, the percentage of lesions that have the same diagnosis in their repeats) for both true melanomas or nonmelanomas were described.
A total of 48 lesion images were taken on 3 SolarScan instruments (3 repeated images per instrument). The images were processed, the algorithm probabilities calculated, and the mean value of the repeats given. The ICC (2,1) was used to assess the intermachine reliability.14 Again, a coefficient greater than 0.75 indicates excellent reliability. In this experimental design, the calculated interinstrument experimental percentage error is the addition of the intrainstrument and the true interinstrument percentage errors. Hence, the true interinstrument error can be calculated. For this study, the experimental percentage error was the standard error of the mean (repeats) times 100 divided by the mean lesion probability.
To assess performance of the SolarScan diagnostic melanocytic algorithm vs diagnoses performed by humans, all melanocytic lesions from the independent test set taken at the Sydney Melanoma Unit that had clinical and dermoscopy photographic images (taken with a Heine Dermaphot camera, Heine Ltd, Herrsching, Germany); patient details of age, sex, and lesion site; and a recorded history of whether the lesion had, within the past 2 years, bled without being scratched, changed in color or pattern, or increased in size (answer choices: yes, no, uncertain) were collected. All lesions had diagnoses based on histological findings. This resulted in a set of 78 melanocytic lesions (Table 2). These images and patient histories were given to 13 independent clinicians who were not involved in the data collection for the study. Three were international dermoscopy experts who headed pigmented lesion clinics (C.G., R.B., and R.J.), 4 were practicing dermatologists from the Sydney metropolitan area, 3 were dermatology registrars (trainee dermatologists), and 3 were primary care physicians from the Sydney metropolitan area. For each of these lesions, the following questions were answered: diagnosis of (1) melanoma (in situ or invasive) or (2) benign nevus (including dysplastic); probability of melanoma (0%-100%), where 0% is certain for being benign and 100% represents certain melanoma; management by (1) excision or referred for a second opinion, (2) close observation (eg, monitoring for 3 months), or (3) routine observation.
From the training set of 1644 lesions, of which 260 were melanomas (97 in situ and 163 invasive; overall median Breslow thickness, 0.37 mm), a diagnostic algorithm was developed to distinguish melanomas from all benign pigmented lesions. This model was run on an independent test set of 786 lesions, 122 of which were melanomas (47 in situ and 75 invasive; overall median Breslow thickness, 0.36 mm) (see the “Methods” section and Table 1). The receiver operator characteristic curve of both diagnostic models is shown in Figure 1A. Here, the performance of the algorithm is shown to be reproducible, with little difference of the area under the receiver operator characteristic between the test and training set (0.871 vs 0.877, respectively; P = .78 for 2-sided Z test). Using an arbitrary cutoff developed in the training set, the sensitivity for melanoma was 90% (95% confidence interval [CI], 86%-94%) and specificity 61% (95% CI, 58%-64%). In the test set, this was shown to be reproducible with a sensitivity of 91% (95% CI, 86%-96%) and a specificity of 65% (95% CI, 61%-69%). On examination of the algorithm performance as a function of diagnostic categories, no difference existed in the proportion of correctly classified lesions in the training or test set (Table 3). However, although the algorithm performed well on melanocytic lesions, it performed poorly on benign nonmelanocytic lesions. In particular, seborrheic keratoses that were diagnosed on routine dermoscopy examination were correctly classified by the algorithm in only 6 (13%) of 47 cases (combined test and training sets). In addition, hemangiomas and dermatofibromas were correctly classified in less than 50% of cases.
Because the developed algorithm failed to adequately distinguish melanomas from pigmented nonmelanocytic lesions, a new algorithm was developed to distinguish melanomas from benign melanocytic lesions. Here, the training set consisted of 260 melanomas and 1239 benign nonmelanocytic lesions, and the test set, 122 melanomas and 596 benign nonmelanocytic lesions, as detailed in Table 3. The median Breslow thickness was 0.37 mm. The optimum model remained that described all pigmented lesions. Figure 1B shows the receiver operator characteristic curves of both diagnostic models. The area under the curve is larger than the algorithm for modeling all pigmented lesions (Figure 1A), and again, there is good reproducibility between the performance of the algorithm in the test and training set (0.881 vs 0.887 receiver operator characteristic curve areas, respectively; P = .77 for 2-sided Z test). Using an arbitrary cutoff developed in the training set, the sensitivity for melanoma was 90% (95% CI, 86%-94%) and specificity was 64% (95% CI, 61%-67%). In the test set, this result was shown to be reproducible with a sensitivity of 91% (95% CI, 86%-96%) and a specificity of 68% (95% CI, 64%-72%). The model performance as a function of diagnostic category is described in Table 3. As stated, there was excellent reproducibility in the test set, with no significant difference in the proportion of correctly classified lesions as a function of their diagnostic category in the training and test sets.
Rather than expressing the algorithm classifier as diagnosing melanomas vs benign melanocytic lesions using an arbitrary cutoff, more information is given to the clinician by signifying the probability of a lesion being melanoma. In this regard, the probability of melanoma as a function of algorithm index was created. As seen in Figure 2, a good separation of the benign melanocytic lesions and melanoma exists as a function of algorithm index, with a lower index indicating benign lesions (full index range, 0-1). From these data, the probability of a lesion being melanoma as a function of the algorithm index was derived (Figure 3). Essentially, this probability represents the percentage of lesions with a particular algorithm index in our combined data set that were melanomas (see the “Methods” section). Because the curves of the test and training sets overlap (Figure 2), combining the data sets allowed a more confident estimate of the probability. In relation to the arbitrary cutoff used to signify a precise diagnosis, when the probability exceeds 7.25% (index, 0.246), a diagnosis of melanoma is made. The median probability of the melanomas in the training set was 78%; the nonmelanoma set, 2.2%. The median probability of the melanomas in the test set was 29%; the nonmelanoma set, 1.5%.
The algorithm was weighted to preferentially detect thicker melanomas over thinner lesions (see the “Methods” section). In this regard, a significant difference existed in the Breslow thicknesses between the true positive (correctly classified) melanomas (median, 0.4 mm) vs those misclassified (median in situ) in the training set. However, this difference failed to reach significance in the independent test set (Table 4). Similarly, a significant difference existed in the mean algorithm probability of melanoma between in situ lesions, invasive melanomas thinner than 1 mm, and lesions at least 1-mm thick in the training set (P<.001). Again, this difference failed to reach significance in the test set (P = .13; Kruskal-Wallis test) (Table 5).
The intrainstrument reproducibility was analyzed in 2 ways. First, repeated images were taken of 387 melanocytic lesions, with different orientations on the same instrument, and the algorithm (probability) assessed (see the “Methods” section). The ICC (3,1) was 0.86 (95% CI, 0.83-0.88), which indicates an excellent correlation. The median absolute difference of the probabilities between the repeated images was 1.2%, with a median melanoma probability of the lesion set of 12%. The median experimental error was 7.6%. These errors have been plotted as a function of algorithm index in Figure 3. Finally, the algorithm diagnosis reproducibility was 95% for true melanomas and 83% for true benign melanocytic lesions.
Second, repeated images of 304 lesions were taken 3 months after the baseline images using the same instrument. All of these were morphologically unchanged and hence benign. The ICC (3,1) was 0.73 (95% CI, 0.67-0.78). The median absolute difference of the probabilities between the repeated images was 0.14%, with a median melanoma probability of the lesion set of 2.9%. The median experimental error was 4.4%. Finally, the algorithm diagnosis reproducibility was 84% (all true benign melanocytic lesions).
To assess whether there was any effect of having a lesion border generated by the manual or automated method (see the “Methods” section), we analyzed 22 paired lesion images taken on the same instrument with an automated border generated on one and a manual border on the other. The ICC (3,1) was 0.89 (95% CI, 0.74-0.95), which indicated an excellent correlation.
We examined the interinstrument reproducibility of the algorithm (algorithm probability) by analyzing 48 pigmented lesions on 3 SolarScan instruments. The ICC (2,1) was 0.88 (95% CI, 0.82-0.93), well above the 0.75 limit of excellent reliability. There was no significant difference between the experimental percentage errors of the interinstrument (11.4%) and intrainstrument (11.8%) reproducibility (P = .13, Wilcoxon signed rank test). This indicates that no significant true interinstrument variation exists among the 3 instruments.
To test the performance of the diagnostic algorithm (for melanocytic lesions only), all lesions that had good-quality clinical and dermoscopy images and complete patient and lesion history details from the independent test set were collected from 1 site (Sydney Melanoma Unit) and compared with a range of clinician groups (see the “Methods” section and Table 2). When we compared the diagnoses performed by humans with those of the SolarScan algorithm (based on the index cutoff as described herein), no statistically significant difference existed in the sensitivity (based on either the absolute diagnosis or the decision to excise as diagnosing melanoma) between any clinician group and the algorithm (Table 6). However, a significant power problem secondary to a low sample size of melanomas may confound these results. In this regard, SolarScan had a sensitivity comparable with that of dermoscopy experts, dermatologists, and trainee dermatologists, and had a substantial superior sensitivity compared with general practitioners. For analysis of specificity, SolarScan’s performance was superior to that of trainee dermatologists (P = .01) and had a higher specificity than all 4 clinical groups (based either on absolute diagnosis or the decision to not excise a benign lesion).
On the assumption that the prevalence of melanoma was the same in the clinical test as in the population of excised lesions in the field, the positive predictive value (the probability that the lesion is melanoma when diagnosed as melanoma) and negative predictive value (the probability that the lesion is benign when diagnosed as benign) were compared with the SolarScan and clinician groups. The SolarScan positive predictive and negative predictive values were equal or superior to all clinical groups whether based on diagnosis or the decision to excise. This reached statistical significance only for the positive predictive value of trainee dermatologists and negative predictive value for general practitioners (Table 6).
For analysis of the probability that a lesion is melanoma in the melanoma set, a significantly increased average confidence (probability) of melanoma existed in all clinical groups compared with SolarScan (P<.001) (Table 7). Conversely, a significantly increased confidence (decreased probability of melanoma) by SolarScan existed compared with all clinical groups on analysis of the benign melanocytic set (P<.001).
Numerous systems that automatically diagnose pigmented lesions have been described.3- 11 These have a wide range of sensitivities and specificities, with some investigators reporting sensitivities and specificities approaching 100%. However, the diagnostic performance of a system depends on the difficulty of lesions included for analysis (measured by the median Breslow thickness of the melanoma set and the proportion of atypical nevi in the benign set) and its performance on an independent test set. Clearly, the only way to accurately compare the diagnostic accuracy of systems is by directly comparing their results for the same set of lesions.
The data for algorithm development and testing were collected from 9 centers in 3 continents. Such a design increases the generalizability of the instruments’ performance. In all but 1 clinic site, the lesions collected were excised or monitored because of clinical suspicion. To reduce verification bias, lesions thought to be benign by the clinician but that required short-term digital monitoring for confirmation of their benign nature were included.3 Furthermore, a small sample of lesions were included that were clearly benign and diagnosed by classic dermoscopy features. This again reduces verification bias.
All image analysis features isolated by SolarScan were automated (ie, without input from the clinician). The clinical history features taken were modeled but not used in the final diagnostic algorithm. The only exception to the complete automated nature of the algorithm was the creation of the lesion border. Here, a 3-tiered system is used. First, an automated best-guess lesion boundary is created. If this boundary is rejected by the clinician, then a second series of automated boundaries are created. If neither of these are considered accurate, then a manual border is created by the clinician. We believe that it is an essential responsibility of the clinician to define the true lesion boundary for analysis. It is also important that the lesion border does not oversegment the lesion, that is, normal skin should not be included within the lesion boundary. If this occurs, significant differences occur with the algorithm output. For this reason, 24% of the lesions require a manual procedure to create the border. However, our results showed no significant difference in algorithm performance when comparing manual and automated boundaries.
The first diagnostic model attempted to correctly classify all pigmented benign lesions that were not melanomas. However, nonmelanocytic pigmented lesions such as seborrheic keratoses and hemangiomas were poorly discriminated. Because these lesions were weighted relatively highly in the benign set to be correctly classified during the algorithm development, it is likely that they are morphologically too similar to melanomas when using the image analysis features selected. A less important but possible contributing reason for the poor discrimination of the nonmelanocytic lesions was their relatively small sample size.
For these reasons, a model designed to discriminate only pigmented melanocytic lesions from melanoma was developed. The model was highly reproducible in the test set and gave a sensitivity of 91% and specificity of 68% for melanoma. The median probability of the melanomas in the test set was 29% and only 1.5% in the nonmelanoma set, which indicates a good separation of the 2 classes. However, because the nonmelanoma set included predominately suspicious lesions that required either excision or short-term digital monitoring for management, the true specificity in the field will be much greater. Furthermore, the median Breslow thickness was only 0.36 mm, which indicates a relatively difficult set of thin melanomas.
There is clearly a clinical limitation for an instrument that does not diagnose pigmented nonmelanocytic lesions. However, because there are strict dermoscopy criteria for distinguishing melanocytic from nonmelanocytic lesions, this clinical limitation should have less impact in a specialist setting. Nevertheless, it remains to be seen whether this is a significant limitation in general practice.
The final requirement of an automated diagnostic system is to compare its performance with diagnoses performed by humans. Although palpation of the lesion is not included for assessment by the participating clinicians, this experimental approach allows direct comparison of performance within the various clinician groups examined. It is important that none of these clinicians were involved in data collection for SolarScan algorithm development. SolarScan’s sensitivity was comparable with that of dermoscopy experts, dermatologists, and trainee dermatologists, and had a substantially superior sensitivity (which did not, however, reach statistical significance) compared with that of general practitioners. In analysis of specificity, SolarScan’s performance was superior to that of trainee dermatologists and had a higher specificity than all 4 clinical groups (based either on absolute diagnosis or the decision to not excise a benign lesion).
The analysis of the human performance compared with that of SolarScan is somewhat limited by the relatively small sample size examined. The next stage in assessment of diagnosis by humans compared with that of SolarScan should be a formal clinical trial that incorporates both suspicious lesions and randomly selected banal lesions. Nevertheless, it seems clear from the data reported herein that SolarScan can be expected to perform well against all clinician groups in such a setting and hence would be a valuable asset for both dermatologists and primary care physicians.
The aim of this project is to produce an instrument that gives an automated diagnosis of melanoma. Because such instrumentation will never achieve 100% diagnostic accuracy, and because the gold standard of histopathologic diagnosis suffers from significant interobserver disconcordance, the absolute computer diagnosis will likely never be used as an absolute clinical diagnosis. Rather, it is more likely to be used as an expert second opinion, an auxiliary for clinical decision making.
Correspondence: Scott W. Menzies, MB, BS, PhD, Sydney Melanoma Diagnostic Centre, Sydney Cancer Centre, Royal Prince Alfred Hospital, Camperdown 2050, New South Wales, Australia (firstname.lastname@example.org).
Accepted for Publication: May 18, 2005.
Author Contributions:Study concept and design: Menzies, Bischof, Talbot, Gutenev, Mackellar, and Skladnev. Acquisition of data: Menzies, Gutenev, Avramidis, McCarthy, Kelly, Cranney, Lye, Rabinovitz, Oliviero, Blum, Virol, De’Ambrosis, McCleod, Koga, Grin, Braun, and Johr. Analysis and interpretation of data: Mackellar, Lo, Gutenev, Wong, and Menzies. Drafting of the manuscript: Menzies, Gutenev, and Mackellar. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Wong, Lo, Mackellar, and Menzies. Obtained funding: Skladnev. Administrative, technical, and material support: Skladnev and Menzies. Study supervision: Menzies.
Financial Disclosure: Dr Menzies is a paid consultant for Polartechnics Ltd, the company with full ownership of the intellectual property for SolarScan. Polartechnics Ltd has filed for patents for the System and Method for Examining, Recording, and Analyzing Dermatological Conditions (US Patent filing No. 09/473270), the Boundary Finding in Dermatological Examination (US Patent filing No. 10/478077), and the Diagnostic Feature Extraction in Dermatological Examination (US Patent filing No. 10/478078). Polartechnics Ltd has filed for patients on the Diagnostic Feature Extraction in Dermatological Examination (Australian Patent application No. 20022308395 and Australian Patent No. 2003905998).
Funding/Support: This research was funded in part by an Australian Federal Government Research and Development Syndication Grant (13812/18/01) in 1994 and Research and Development Start Grant (STG 00186) in 1997.
Previous Presentation: An interim analysis of SolarScan performance (not the final data as shown herein) was presented at the American Academy of Dermatology 62nd Annual Meeting; February 2004; Washington, DC.