Dermatologic Assessment Form Forearm Photographic Assessment Scale.
A, Distribution of maximum fine wrinkling scoring deviation from the reference standard for each dermatologist (A is the reference standard dermatologist; B-F are the community dermatologists). B, Distribution of maximum coarse wrinkling scoring deviation from the reference standard for each dermatologist. C, Distribution of maximum abnormal pigmentation scoring deviation from the reference standard for each dermatologist. D, Distribution of maximum global assessment scoring deviation from the reference standard for each dermatologist.
Global assessment: severity score 0 (A); 1 (B); 2 (C); 3 (D); 4 (E); 5 (F); 6 (G); 7 (H); 8 (I); and 9 (J).
McKenzie NE, Saboda K, Duckett LD, Goldman R, Hu C, Curiel-Lewandrowski CN. Development of a Photographic Scale for Consistency and Guidance in Dermatologic Assessment of Forearm Sun Damage. Arch Dermatol. 2011;147(1):31-36. doi:10.1001/archdermatol.2010.392
To develop a photographic sun damage assessment scale for forearm skin and test its feasibility and utility for consistent classification of sun damage.
For a blinded comparison, 96 standardized 8 × 10 digital photographs of participants' forearms were taken. Photographs were graded by an expert dermatologist using an existing 9-category dermatologic assessment scoring scale until all categories contained photographs representative of each of 4 clinical signs. Triplicate photographs were provided in identical image sets to 5 community dermatologists for blinded rating using the dermatologic assessment scoring scale.
Academic skin cancer prevention clinic with high-level experience in assessment of sun-damaged skin.
Volunteer sample including participants from screenings, chemoprevention, and/or biomarker studies.
Main Outcome Measures
Reproducibility and agreement of grading among dermatologists by Spearman correlation coefficient to assess the correlation of scores given for the same photograph, κ statistics for ordinal data, and variability of scoring among dermatologists, using analysis of variance models with evaluating physician and photographs as main effects and interaction effect variables to account for the difference in scoring among dermatologists.
Correlations (73% to >90%) between dermatologists were all statistically significant (P < .001). Scores showed good to substantial agreement but were significantly different (P < .001) for each of 4 clinical signs and the difference varied significantly (P < .001) among photographs.
With good to substantial agreement, we found the development of a photographic forearm sun damage assessment scale highly feasible. In view of significantly different rating scores, a photographic reference for assessment of sun damage is also necessary.
The quest for consistency in clinical assessment of sun damage has led to the development of objective grading methods for characterization and quantification of sun damage. The methods include descriptive,1,2 visual analog,3,4 and photographic grading scales.2,5 Published scales have been for facial assessment only, but when skin biopsies are required, forearms are preferable rather than cosmetically sensitive facial areas.
Weiss and colleagues2 developed a descriptive scale for the assessment of overall cutaneous photoaging to be used along with facial photographic samples but did not discuss agreement or validity. The R.W. Johnson Pharmaceutical Research Institute descriptive scale1 achieved a chance-corrected agreement (κ coefficient) of 0.11. Dermatologic research protocols rely on consistent clinical identification, description, and quantification of sun damage in forearm skin. To date, no valid and reliable photographic assessment scale of forearm skin sun damage has been developed.
The clinical assessment of human skin for sun damage is a highly subjective but vital part of evaluating the effectiveness of agents and interventions for their ability to reduce or reverse sun damage. Since histopathologic evaluation is a regulatory requirement along with clinical evaluation to assess safety and efficacy of test articles, biopsied tissue must be obtained. Human subjects considerations suggest that forearm skin, rather than facial skin, should be used for this purpose. This consideration alone makes an objective grading scale for forearm skin essential. Furthermore, a standardized teaching set will be valuable for developing a reproducible method and can support the comparison of findings from a variety of studies. The objective of this study was to begin the development of a consistent photographic assessment scale of sun damage in forearm skin, complemented with a descriptive scale, that can become a criterion standard in dermatologic studies. This study is the first step toward this objective.
A criterion standard is a performance standard with which experts or peers agree and with which individual practice can be compared.6 Establishing such a criterion standard requires a strong empirical relationship between the scale and the variable it represents.7
Forearm photodamage assessment in current studies8- 10 is performed using a subjective 10-point scale for each of 4 clinical signs of UV-induced skin damage: fine wrinkling, coarse wrinkling, abnormal pigmentation, and a global assessment. The global assessment is used to give an overall impression of sun damage. Each clinical sign is ranked and subdivided as follows: absent (0), mild (1-3), moderate (4-6), and severe (7-9). This approach is similar to the R.W. Johnson Pharmaceutical Research Institute descriptive scale1,11 which is used for assessment of photodamage in facial skin. Our scale, the Dermatologic Assessment Form Forearm Photographic Assessment Scale, is presented in Figure 1.
In the spring and summer of 2007, a total of 48 adults (26 women [54.2%] with a mean age of 52 years and 22 men [45.8%] with a mean age of 63 years) were recruited for this study. Participants identified themselves as white (n = 47) or African American (n = 1). Participants further identified themselves as Hispanic (n = 6) or non-Hispanic (n = 36); 6 did not provide any ethnic identification. The sample included community volunteers and participants taking part in screenings and clinical studies. Individuals whose dorsal forearms were unsuitable for use in a photographic scale, including those with significant inflammation or irritation, tattoos, or other markings, were not eligible. Individuals on the extremes—almost no sun damage and very severe sun damage—had to be sought by referral and invited to participate.
One academic physician (C.C.-L.) and 5 community dermatologists agreed to assist with the study as raters. Of these, the academic physician was designated as the project's expert dermatologist and reference standard. This dermatologist is the primary study physician leading our clinical trials and therefore has the most experience assessing skin photodamage involving the forearm. This physician's initial grading was designated as the reference standard for subsequent gradings using the photographic scale.
This study was approved by the institutional review board of the University of Arizona, which has a Federalwide Assurance with the US Office of Human Research Protections and functions under a Statement of Compliance. All participants provided signed informed consent.
Digital photographs were taken of the dorsal forearms from knuckle (metacarpal-phalangeal) to elbow to avoid personal identification. Both forearms of each of the 48 participants were photographed, for a total of 96 unique photographs. A Nikon COOLPIX 4300 digital camera (Nikon, Tokyo, Japan) was used with standardized methods to ensure consistency. Standardized lighting consisted of available overhead lighting in a windowless studio with no separate skin illumination. The Anytime Flash setting was used with maximum aperture (preset between 2.8 and 7.6), and all photographs were taken on a uniform blue background. Additional settings included image size, 2272 × 1704; image quality, fine; focus, macro close-up automatic single mode; and sensitivity, 100 ISO. The focal length of the COOLPIX lens system is 8 to 24 mm.
The expert dermatologist scored the photographs by clinical sign using our existing clinical sun damage assessment scale until all score categories were saturated for each clinical sign.
Each photograph was printed unedited in triplicate, coded, and paired with a blank dermatologic assessment scale form (Figure 1). The expert dermatologist performed the initial grading of the photographs, thus establishing our reference standard for comparison (Table 1). The triplicate image sets, consisting of 288 photographic pages, were randomly ordered in binders and delivered to the 5 evaluating dermatologists. They each blindly evaluated the 96 unique photographs 3 times. Finally, the dermatologist designated as the reference standard repeated evaluation of the randomized set of photographs.
Two dermatologists were able to review only 287 photographs due to a missing image at the time of the evaluation. The remaining dermatologists evaluated the complete set of 288 photographs. The 2 missed evaluations were treated as missing data and imputed using an average of available data for the same reviewer and photograph.
The nonparametric Spearman ρ was used to study the correlation of all scores given for each photograph. Analysis of variance models with random effects were used to study the difference in scores by different dermatologists. All analyses, random ordering, and graphs were carried out in Stata version 10 (StataCorp LP, College Station, Texas).
We first analyzed the relationship among the 4 scores given to each photograph by the expert dermatologist (when setting the reference standard and as assessments 3 months later). Table 2 summarizes the Spearman ρ correlation coefficients. The expert dermatologist's assessments of the same photographs over time were highly and significantly correlated near or above 90% for all 4 clinical signs.
The correlation between the expert dermatologist's assessment and the scores given by the 5 community dermatologists ranged from 73% to above 90% (Table 3) and were all statistically significant (P < .001). These results show that assessments by all dermatologists had a strong linear relationship with the reference standard scores. However, strongly correlated scores can be quite different in magnitude and ultimately fail to show agreement.12 Therefore, to quantify agreement among the community dermatologists and the reference standard, we calculated the κ statistic for ordinal data. Calculation of κ statistic is based on the ratio of the observed to the expected (ie, by chance) agreement. All κ statistics (Table 4) fell between 0.28 and 0.76. Guidelines for interpretation of κ vary. Landis and Koch13 would categorize 0.28 as “fair” and 0.76 as “substantial.” Percentage of agreement among raters, calculated as part of the κ statistic (Table 4), showed that raters agreed with the reference standard 71% to 92% of the time. The highest percent agreement was between the original and final, blinded rating session of the expert dermatologist.
Figure 2 shows the distribution of maximum deviation from the reference standard for each dermatologist and each clinical sign. Deviation is defined as the difference between a given score and the reference standard, and the maximum deviation is the one with the greatest magnitude (positive or negative) among the 3 scores for each photograph. Here, a deviation of ±3 was not rare and could exceed 5.
We used 2-way analysis of variance to examine the dermatologist effect and the photograph effect. All of the expert dermatologist's assessments were excluded from the data to avoid potential bias. Analysis of variance indicated that the scores given by the 5 remaining dermatologists were significantly different (P < .001) for each of the 4 clinical signs, and the differences tended to vary among photographs (P < .001).
Current clinical protocols rely on consistent clinical assessment of sun damage in forearm skin to evaluate baseline and efficacy. To date, no valid and reliable photographic assessment scale of forearm skin sun damage has been developed. The purpose of this study was to develop and test a forearm photographic assessment scale that can be used to ensure such consistency when adopted by study dermatologists who are required to clinically assess photodamage. We plan to subject the scale to expanded testing in order to propose this scale as a criterion standard for general use in dermatologic studies. Weiss and colleagues,2 in studying the effect of topical tretinoin, used a paper scale that included clinical signs for the assessment of overall improvement in cutaneous photoaging of the face to be used along with photographic samples, but they did not discuss agreement or validity. Griffiths and colleagues1 developed a photonumeric scale that included the most common features of interest in the evaluation of photodamage of facial skin. The R.W. Johnson Pharmaceutical Research Institute descriptive scale1 included a detailed description of the manifestations of sun damage with a chance-corrected agreement (κ coefficient) of 0.11 without, and 0.31 with, accompanying facial photographs. Chance-corrected agreement ranges from −1 to +1, with scores of 0.40 to 0.75 considered fair and greater than 0.75 considered excellent or substantial.13,14 This scale is similar to our clinical assessment scale, but for facial skin. On our scale, hyperpigmentation and mottling have been combined into a single clinical sign and renamed abnormal pigmentation because, in the opinion of all of our principal investigators, pigmentation is difficult to separate into 2 different features.
Visual analog scales rely on health care practioners to estimate features visually on a metrically defined horizontal line. Developers of such scales for assessment of sun damage3,4 have described them as more sensitive than descriptive scales and highly reproducible, but they have not reported chance-corrected agreement or repeatability. Our 10-step clinical assessment scale consists of 3 levels of severity: mild, moderate, and severe. Each of these is subdivided into 3 numerical grades, allowing for a more nuanced scale not unlike a visual analog scale.
Photographic scales have the advantage of providing a consistent visual frame of reference, thus minimizing variability in perception and subjectivity. The photographic scale of Larnier et al5 consists of a set of 3 standardized photographs to represent each of 6 grades of sun damage, ranging from mild to very severe. The photographs were taken in a standard manner, from the same angle and of the same side (left) and region of the face. On assessment of interobserver agreement, chance-corrected κ scores ranged from 0.44 to 0.76 on the first and second occasions. In addition, dermatologists with and without experience with sun-damaged skin scored similarly, supporting the notion that a photographic scale increases objectivity and standardization. Testing of our scale achieved similar or better interobserver agreement using blinded image sets. Figure 3 shows the global assessment photographs with the best agreement.
An upper-extremity photonumeric scale was developed to assess skin aging in smokers and nonsmokers on the protected upper inner arm.15 The scale was effective in showing greater skin aging in smokers than nonsmokers. Efficacy and safety of a topical agent were evaluated using a photographic method consisting of baseline and repeated side-by-side projection of before-and-after images during 36 weeks of treatment,16 but the standard was relative and relevant only to that study. The quality of digital photography has improved greatly since the original description of the photographic method,17 justifying the establishment of an absolute standard for photodamage in forearm skin.
Photographic evaluation of photodamage improvement has also been used in laser resurfacing and remodeling18; however, the photographs were facial and therefore not applicable to our scale. Forearm skin is also used to establish combination laser procedures before clinical use and the availability of a forearm skin scale may be useful in nonpharmaceutical approaches to photodamage.
Shoshani and colleagues19 made a case for a clinically validated scale for the assessment of facial wrinkling. We propose that a forearm scale is equally necessary. Our findings support the ability of blinded, independent dermatologists to achieve good to excellent agreement and strong linear correlation among their scores as well as internal consistency of ratings, all at a level of high statistical significance. Nevertheless, there were differences in how the dermatologists rated the photographs. All dermatologists in this study have similar years of experience and we cannot immediately explain the differences in how the community dermatologists rated the photographs, although one of them sees primarily a retiree population and did rate the photographs less severely. The size of maximum differences may be related to the type of patients typically seen in the practices of the community dermatologists. However, even without training, our dermatologists achieved high agreement and significant correlation in how they rated the photographs. The high percentage of agreement testifies to the potential for improvement in consistency with training among dermatologists for whom agreement is vital.
The inability of our photographic scale to account directly for hyperkeratotic features, for both extension of skin surface involvement and thickness, must be acknowledged as a limitation of our study. The next phase of scale development will include an objective form of categorical validation, such as optical coherence tomography20 or microscopy. We also acknowledge that the composition of our sample with regard to race and ethnicity, being mainly white, may limit generalization across all populations. Our sample heterogeneity is representative of our US and local populations; however, it will be expanded in the next phase of scale development.
Based on these results, the expanded Dermatologic Assessment Form Forearm Photographic Assessment Scale has great potential to yield highly consistent scoring of forearm sun damage in study participants. Further steps are needed to create a training image set that can be considered the criterion standard for forearm sun damage.
Correspondence: Naja E. McKenzie, PhD, RN, Arizona Cancer Center, University of Arizona, PO Box 245024, 1515 N Campbell Ave, Tucson, AZ 85724-5024 (email@example.com).
Accepted for Publication: July 2, 2010.
Author Contributions: All authors had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: McKenzie, Saboda, Duckett, Goldman, and Curiel-Lewandrowski. Acquisition of data: Saboda, Duckett, and Curiel-Lewandrowski. Analysis and interpretation of data: McKenzie, Saboda, Hu, and Curiel-Lewandrowski. Drafting of the manuscript: McKenzie, Goldman, and Curiel-Lewandrowski. Critical revision of the manuscript for important intellectual content: McKenzie, Saboda, Duckett, Hu, and Curiel-Lewandrowski. Statistical analysis: McKenzie, Saboda, and Hu. Obtained funding: Goldman. Administrative, technical, or material support: Duckett and Goldman. Study supervision: Hu and Curiel-Lewandrowski.
Financial Disclosure: None reported.
Funding/Support: This study was supported by grant P01 CA27502 from the Chemoprevention of Skin Cancer Program Project (principal investigator David S. Alberts, MD) and grant R25T CA78447 from the Cancer Prevention and Control Training Program (principal investigator David S. Alberts, MD).
Additional Contributions: We gratefully acknowledge Elka Eisen, MD, Stuart Salasche, MD, Gerald N. Goldberg, MD, Linda Ilizaliturri, MD, and Richard C. Miller, MD, the dermatologists who graded our image sets.