This forest plot illustrates the effect of education on participants’ abilities. Some reports included separate analysis of (1) different types of trainees (residents or medical students), (2) different educational interventions (instruction with text, with images, or with text and images; computer-based learning [CBL]; or a dermatology elective [DE]), (3) different tasks (categorize or identify lesions), or (4) different types of dermatoses (lesions or eruptions). The results of these analyses appear separately in the forest plot, and their difference is labeled. SMD indicates standardized mean difference.
This forest plot represents the random effects meta-analysis of the practices used to improve participants’ abilities to diagnose skin lesions. Some reports included separate analysis of different types of trainees, educational practices, tasks, or dermatoses. These results appear separately in the forest plot, and their difference is labeled. SMD indicates standardized mean difference.
This forest plot represents the effect of the educational practices on the 4 groups of trainees. Some reports included separate analysis of (1) different types of trainees (residents or medical students), (2) different educational interventions (instruction with text, with images, or with text and images; computer-based learning [CBL]; or a dermatology elective [DE]), (3) different tasks (categorize or identify lesions), or (4) different types of dermatoses (lesions or eruptions). The results of these analyses appear separately in the forest plot, and their difference is labeled. SMD indicates standardized mean difference.
eFigure 1. Full MEDLINE Query
eFigure 2. PRISMA Flow
eTable 1. Summary of the Methodological Quality of the Studies
Liam Rourke, Sarah Oberholtzer, Trish Chatterley, Alain Brassard. Learning to Detect, Categorize, and Identify Skin LesionsA Meta-analysis. JAMA Dermatol. 2015;151(3):293–301. doi:10.1001/jamadermatol.2014.3300
Educators use a variety of practices to train laypersons, medical students, residents, and primary care providers to diagnose skin lesions. Researchers have described these methods for decades, but there have been few attempts to catalog their scope or effectiveness.
To determine the scope and effectiveness of educational practices to improve the detection, categorization, and identification of skin lesions.
Literature indexed in MEDLINE, EMBASE, CINAHL, ERIC, PsycINFO, and BIOSIS Previews from inception until April 1, 2014, using terms cognate with skin disease, diagnosis, and education.
Studies in which the educational objective was operationalized as the ability to detect, categorize, or identify skin lesions, and the intervention was evaluated through comparisons of participants’ abilities before and after the intervention.
Data Extraction and Synthesis
Information about trainees, educational practices, educational outcomes, and study quality was extracted; it was synthesized through meta-analysis using a random effects model. Effect sizes were calculated by dividing the differences between preintervention and postintervention means by the pooled standard deviation (ie, standardized mean difference [SMD]). Heterogeneity was assessed using an I2 statistic.
Main Outcomes and Measures
Pooled effect size across all studies and separate effect sizes for each of the educational practices.
Thirty-seven studies reporting 47 outcomes from 7 educational practices met our inclusion criteria. The pooled effect of the practices on participants’ abilities was large, with an SMD of 1.06 (95% CI, 0.81-1.31) indicating that posttest scores were approximately 1 SD above pretest scores. Effect sizes varied categorically between educational practices: the dermatology elective (SMD = 1.64; 95% CI, 1.17-2.11) and multicomponent interventions (SMD = 2.07; 95% CI, 0.71-3.44) had large effects; computer-based learning (SMD = 0.64; 95% CI, 0.36-0.92), lecture (SMD = 0.59; 95% CI, 0.28-0.90), pamphlet (SMD = 0.47; 95% CI, –0.11 to 1.05), and audit and feedback (SMD = 0.58; 95% CI, 0.10-1.07) had moderate effects; and moulage had a small effect (SMD = 0.15; 95% CI, –0.26 to 0.57).
Conclusions and Relevance
A number of approaches are used to improve participants’ abilities to diagnose skin lesions; some are more effective than others. The most effective approaches engage participants in a number of coordinated activities for an extended period, providing learners with the breadth of knowledge and practice required to change the mechanisms underlying performance.
Skin diseases result in billions of dollars in direct medical costs annually and a substantial toll on patients.1 Early detection, correct categorization, and accurate identification are pivotal for the successful treatment of skin diseases. Unfortunately, these skills are underdeveloped in the groups most centrally affected. Individuals at risk for skin cancer have difficulty detecting new or changing lesions, which increases the rates of morbidity and mortality.2 Their primary care providers have difficulty differentiating benign skin lesions from melanomas,3 resulting in unnecessary excisions and referrals, and medical students have difficulty learning to identify even a small number of common lesions.4
Concerned groups petition for more training, but it is not clear that training sharpens these abilities, or if it does, it is not clear what types of training are most effective. Two recent systematic reviews explore aspects of these questions. One found that the presentation of images with text improved laypersons’ knowledge and self-efficacy regarding skin self-examination; however, the effect on participants’ ability to detect lesions was inconclusive.5 The second review focused on primary care physicians and reported improvements in knowledge and confidence; however, of the 7 studies that measured participants’ abilities to categorize or identify lesions, 4 reported no improvement.6
These foci do not provide a full examination of the issue. The purpose of the current review is to determine the full scope and effectiveness of educational practices that are used to improve participants’ abilities to detect, categorize, and identify skin lesions. As a review of literature, with no involvement of human or animal subjects, the approval of the University of Alberta’s Research Ethics Board was not required.
We conducted a meta-analysis of the literature cataloged in MEDLINE, EMBASE, CINAHL, ERIC, PsycINFO, and BIOSIS Previews from inception to April 1, 2014. We used terms cognate with the Medical Subject Headings skin disease, diagnosis, and education (see eFigure 1 in the Supplement for the full MEDLINE search strategy). References of retrieved articles were scanned for additional studies. We also examined the articles included in the 2 previous systematic reviews.5,6
We included studies that met 5 criteria:
The educational objective was to improve participants’ ability to detect a skin lesion, categorize a lesion into a finite set of categories (eg, benign or malignant), or identify a lesion by its name, either through multiple-choice or constructed-response formats.
The educational objective was operationalized as the participants’ ability to detect, categorize, or identify lesions presented with photographs or patients.
The means through which the educational objectives were addressed was described in sufficient detail to allow classification.
The effectiveness of the educational practice was evaluated through comparisons of participants’ abilities before and after the intervention or to the abilities of nonparticipants.
Sufficient information to calculate a standardized mean difference (SMD) was available in the article or from the authors through a direct request.
We excluded studies in which:
The intervention used to improve detection, categorization, or identification was not education but rather a technological method (eg, dermoscopy, computer-aided diagnosis).
The objective of the educational intervention was to improve the treatment of a skin lesion rather than its detection, categorization, or identification.
The objective was to improve the frequency with which participants sought clinical skin examinations, thus improving the rate of detection, categorization, or identification, but not the ability of participants to do so.
Two investigators (L.R. and S.O.) worked independently and in duplicate to screen all titles and abstracts returned by the search query, and subsequently, the full texts of items marked provisionally for inclusion. Disagreements were resolved through discussion.
From the articles selected for inclusion, 2 reviewers (L.R. and S.O.) independently extracted information about:
The population being trained. Across studies, participants were of 4 types: laypersons, medical students, residents, or primary care providers.
The type of educational practice used to improve participants’ abilities.
The educational objective, which was 1 of 3 types:detect, which was to discover the presence of a skin lesion on oneself, a patient, or a standardized patient;categorize, which was to assign a skin lesion to a finite set of categories (eg, benign or malignant); andidentify, which was to provide the correct name of a lesion or select the correct name from a list of alternatives.
The quality of studies, which was evaluated using the Medical Education Research Study Quality Instrument.7 The instrument highlights 6 dimensions of a study quality: research design, sampling strategy, type of data collected, soundness of the measurement procedures, appropriateness and complexity of the data analysis procedures, and types of outcomes that are measured. Since its introduction in 2007, the psychometric properties of the Medical Education Research Study Quality Instrument have been investigated in multiple studies, and evidence is accruing of its validity and reliability.
Meta-analysis was performed using a random effects model. Effect sizes were calculated as SMD and computations were carried out in Review Manager 5.2 (The Cochrane Collaboration; https://tech.cochrane.org/revman/about-revman-5). When the data required to calculate an SMD were not included in an article, we requested it from the study’s authors, and when information was not forthcoming, we imputed missing information using formulas recommended by The Cochrane Collaboration.8 Heterogeneity was assessed using an I2 statistic. An aggregate effect size was calculated, as were effect sizes for each of the educational practices. Additional subgroup analyses were planned a priori based on a review of previous meta-analyses of medical education topics. These included subgroups for duration, study design, population, and assessment task.
The initial database queries returned 2758 unique items.5,6 Ultimately, 37 studies met all inclusion criteria (Table 1).4,9- 43 A flowchart tracing the selection process is available as a supplement (eFigure 2 in the Supplement).
The methodologic quality of the studies was measured with the Medical Education Research Study Quality Instrument. Conventionally, this instrument offers a possible score of 18; however, one of its dimensions—response rate—was not relevant to the designs included in this review. This reduced the total possible score to 16.5, which was standardized to present a score out of 18 that could be compared with other reports. Among the studies included in our review, scores ranged from 7.09 to 18.00, with a mean (SD) of 11.09 (1.97) (eTable 1 in the Supplement).
Four types of learners participated in training. The frequency (f) with which they are represented in our review is as follows: medical students (f = 12), primary care providers (f = 10), laypersons (f = 9), and residents (internal medicine residents, f = 3; primary care residents, f = 2; and family medicine residents, f = 2).
Seven educational practices were used to enhance participants’ skills. The practices, their descriptions, and the frequency with which they appear in the literature are presented in Table 2. Lecture was most frequent, while moulage was the least frequent.
The effect of the interventions, pooled across populations and educational practices, was large: SMD = 1.06 (95% CI, 0.81-1.31) (Figure 1). Examined by educational practice and presented in order of magnitude, the effect size for each practice was: multicomponent interventions, SMD = 2.07 (95% CI, 0.71-3.44); dermatology elective, SMD = 1.64 (95% CI, 1.17-2.11); computer-based learning, SMD = 0.64 (95% CI, 0.36-0.92); formal lecture, SMD = 0.59 (95% CI, 0.28-0.90); audit and feedback, SMD = 0.58 (95% CI, 0.10-1.07); pamphlet, SMD = 0.47 (95% CI, –0.11 to 1.05); and moulage, SMD = 0.15 (95% CI, –0.26 to 0.57) (Figure 2).
Examined by population, the effect sizes for trainee groups were: laypersons, SMD = 1.40 (95% CI, 0.36-2.45); medical students, SMD = 1.31 (95% CI, 0.95-1.67); residents (family medicine, primary care, and internal medicine), SMD = 0.64 (95% CI, 0.72-1.37); and primary care providers, SMD = 0.45 (95% CI, 0.30-0.60) (Figure 3).
Heterogeneity was large (>50%) in the aggregate analysis and the analysis by educational practice and population. It was not attenuated by subgroup analyses, which included analyses by study design (single-group, pre-post, randomized controlled trial, or controlled trial), study quality (low or high), task (detect, categorize, and identify), or response format on pretests and posttests (multiple-choice or constructed response).
The purpose of this review was to investigate the scope and effectiveness of the educational practices that are commonly used to improve participants’ abilities to diagnose skin lesions. Five practices were recurrent in the literature: dermatology electives, lectures, computer-based learning, pamphlets, and multicomponent interventions. Dermatology electives and multicomponent interventions had large effects, improving participants’ abilities by 1½ to 2 SDs. Computer-based learning, lectures, and pamphlets had moderate effects, improving participants’ abilities by half a standard deviation.
Two issues are pertinent in the interpretation of these effect sizes. First, in 34 of the 37 studies, the educational intervention was compared with no intervention. Previous reviewers of medical education studies have shown that effect sizes are predictably large under these conditions.44 Second, despite substantial improvement during training, abilities often remain unsatisfactory. For example, in one evaluation of a dermatology elective, medical students identified on average 3 of 25 lesions at pretest and 8 of 25 at posttest.4 This gain yielded a large mean difference, yet after 4 weeks of intensive training, participants remained unable to identify 17 of 25 common lesions.
Setting aside these concerns, it is clear that across and within educational practices, larger effects were associated with approaches that engaged participants in a wider variety of activities for longer durations. Previous meta-analyses have uncovered similar associations when examining continuing medical education,45 Internet-based learning,46 and simulation training.47 Models of expertise in diagnostic image interpretation may account for the association between the variety of educational activities and learning. One model suggests that diagnosis draws on 3 types of knowledge: basic science (eg, physiology, anatomy, and microbiology), clinical (eg, manifestations of disease and epidemiology), and experiential (eg, exemplars and cases).48 In this frame, the dermatology elective—comprising supervised direct patient care, lectures, readings, demonstrations, and various types of rounds—is most likely to equip trainees with each type of knowledge. Although the dermatology elective is an option only for medical students and residents, multicomponent interventions incorporate a variety of educational activities, and they also generate large effects with laypersons and primary care providers.
In addition to the number of components, learning was associated with the duration of training—the median durations for multicomponent interventions and dermatology electives were longer than they were for the other practices. Duration, however, may be an index to the volume of lesions that trainees encounter, and volume may be the variable that is associated with learning. Several theories of expert visual diagnosis stress the importance of diagnosing large numbers of lesions in developing a mental library of lesions, mental prototypes of lesion categories, implicit rules for categorizing lesions, or changes in one’s visual information processing structures and mechanisms.49,50 This account is speculative. Few studies provided the precise number of lesions that participants encountered.
Despite its educational advantages, dermatology educators are not focused only on comprehensive approaches to training. Many studies were conducted explicitly to investigate the effectiveness of brief, inexpensive interventions. Computer-based learning (with a median duration of 45 minutes), lectures (with a median duration of 45 minutes), and pamphlets (with a median exposure time of 5 minutes) yielded moderate effects across contexts and types of learners.
There are limitations to these conclusions. To synthesize the literature through meta-analysis, we excluded a large number of studies of educational practices in dermatology. Qualitative studies were excluded altogether, as were quantitative studies designed in a manner that did not produce statistics required for meta-analytic procedures. Furthermore, within this subset of studies, we excluded research on a broad range of outcomes that are studied regularly, including trainee knowledge, confidence, and lesion treatment.
In addition, heterogeneity was large in the main analysis and the analysis by educational practices. This draws attention to variance at 3 levels: effect sizes varied between educational practices, within each of the educational practices, and between participants engaged in a specific intervention. This is consistent with previous meta-analyses of medical education issues whose authors have identified several predictable sources of variation, including learners, content, instructional design, research methods, and outcome measures.45- 47 Among the studies in the current review, each of these sources of variance were apparent. The main analysis encompassed 4 types of learners, 7 educational practices, several categories of lesions, 3 research designs, and 37 researcher-designed outcome measures. Subgroup analyses did not eliminate the heterogeneity because several sources of variance operated concurrently. A sufficient number of studies were not available for nested subgroup analyses. However, despite the prevalence of clinical, methodologic, and statistical heterogeneity, our estimates of the mean effect sizes for the main analysis and the analysis by educational practices were robust and interpretable.
A pervasive source of variance that should be addressed in subsequent studies is the difficulty of the diagnostic task. Between studies, participants were required variously to detect a lesion, categorize a lesion into 1 of 2 categories (eg, benign or malignant), categorize a lesion into one of several ordinal categories (eg, do nothing, keep an eye on it, show someone else, show physician at next visit, or show physician immediately), identify a lesion using multiple-choice format, or identify a lesion using constructed-response format. Although our subgroup analysis did not establish a systematic relationship, presumably these tasks are increasingly difficult for participants. Another source of variance in task difficulty was the type of lesions included in the tests. Among the studies that presented data on the participants’ abilities to diagnose specific types of lesions, it was apparent that some were more difficult to identify than others, and that for some of these lesions, abilities did not improve substantially through training.
The study quality instrument we used underscores this problem on the dimension labeled validity of evaluation instrument. Of the 3 points that are available for this dimension, the mode across the included studies was zero. Scoring well on this dimension would have meant providing evidence of the soundness of tests that were used to estimate participants’ abilities and thereby the effectiveness of the interventions. Subsequent studies should provide evidence that the tests are reliable measures of participants’ abilities and that the types of lesions, their instances, and quantity are adequately representative. Together, the test’s properties should lead to appropriate decisions about test-takers’ abilities. The American Board of Dermatology provides guidelines for test development, and the literature includes studies that exemplify this process.
Subsequent researchers should also consider framing their investigations in theories or models of visual information processing, perceptual learning, or diagnostic image interpretation. Researchers from diverse fields, including medical education, have been examining these questions for decades, and several conceptual frameworks are available to guide the design of studies and the interpretation of results.48- 50 Such frameworks are useful for synthesizing disparate observations and building a body of literature that is unified, generalizable, and progressive.
The early detection and accurate diagnosis of skin lesions has a substantial effect on patient outcomes and health system resources. There are a number of approaches to imparting these skills, and a review of 4 decades of evaluative research suggests that some approaches are more effective than others. The most effective approaches engaged participants in a number of coordinated activities for a substantial period.
Accepted for Publication: August 23, 2014.
Corresponding Author: Liam Rourke, PhD, Department of Medicine, University of Alberta, 5-125 Clinical Sciences Building, 11350-83 Ave, Edmonton, AB T6G 2G3 Canada (firstname.lastname@example.org).
Published Online: January 7, 2015. doi:10.1001/jamadermatol.2014.3300.
Author Contributions: Dr Rourke had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Rourke, Brassard.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Rourke, Oberholtzer.
Critical revision of the manuscript for important intellectual content: Rourke, Chatterley, Brassard.
Statistical analysis: Rourke.
Administrative, technical, or material support: Rourke, Oberholtzer, Chatterley.
Study supervision: Rourke, Brassard.
Conflict of Interest Disclosures: None reported.
Additional Contributions: Ben Vandermeer, MSc, Alberta Research Centre for Health Evidence, provided assistance with the statistical analysis. He was not compensated for his assistance.