A dendrogram generated using hierarchical agglomerative clustering with Ward’s method. Through a discussion with experts in alopecia areata, 5 clusters with distinctive patterns of hair loss were identified. The heatmap was expressed with 3 or 4 levels according to the extent of hair loss in each subregion. Cluster 1 (A) was referred to as grade 1, which showed less extent of hair loss than the other clusters. Clusters 2 (B) and 3 (C) were referred to as grades 2A and 2B, respectively, each showing a distinctive distribution pattern. Grade 2A tended to primarily involve the fronto-occipital area, and the midline of the scalp, whereas grade 2B tended to predominantly involve the temporal area. Clusters 4 (D) and 5 (E) were referred to as grades 3 and 4, respectively, which showed a markedly extensive scalp involvement compared with other clusters. Grade 4 showed total or near-total scalp hair loss.
A, The algorithm was built using a tree analysis with 2 composite variables (total SALT score and temporal SALT) and its pruning into 5 end nodes. Of the participants, 266 (82.9%) were allocated to correct clusters by this approach. B, Schematic illustration and description of the grading system. Given our prediction model (Table), the patients with grade 1 or 2A could be considered having a lower risk compared with patients with grade 2B, 3, or 4. The probability and hazard ratio of major and complete regrowth within 12 months in each grade are summarized. SALT indicates Severity of Alopecia Tool; TOAST, Topography-based Alopecia areata Severity Tool.
Despite some overlapping observed during follow-up, the overall probability of hair regrowth at 12 months was inversely associated with an increment in grade. Therapeutic response was evaluated using the criteria adopted in our recent meta-analysis.9 A, Major regrowth; and B, complete regrowth. SALT indicates Severity of Alopecia Tool.
eFigure 1. SALT II and its modification for ascertainment of scalp hair loss in patients with alopecia areata
eFigure 2. Statistical approach for determining the number of clusters
eFigure 3. SALT score and its subsets by cluster
eTable 1. Clinical characteristics of patients with alopecia areata
eTable 2. Discriminant function model of cluster differentiation
eTable 3. Quadratic weighted kappa statistics for interobserver agreement
Customize your JAMA Network experience by selecting one or more topics from the list below.
Lee S, Kim BJ, Lee C, Lee W. Topographic Phenotypes of Alopecia Areata and Development of a Prognostic Prediction Model and Grading System: A Cluster Analysis. JAMA Dermatol. 2019;155(5):564–571. doi:10.1001/jamadermatol.2018.5894
What topographic phenotypes of alopecia areata can be identified from unsupervised learning, and are the differentiations associated with hair regrowth prognosis?
In this cohort study of 321 patients, 5 characteristic phenotypes were identified using cluster analysis, stratifying prognosis with high accuracy and performance. In addition to extent of hair loss, temporal area involvement was independently associated with worse prognosis.
The topographic characteristics of hair loss should be considered when assessing patients with alopecia areata for better prognostic prediction. Our grading system might aid clinicians in describing the extent and distribution of hair loss and predicting the prognosis.
Diverse assessment tools and classification have been used for alopecia areata; however, their prognostic values are limited.
To identify the topographic phenotypes of alopecia areata using cluster analysis and to establish a prediction model and grading system for stratifying prognoses.
Design, Setting, and Participants
A retrospective cohort study of 321 patients with alopecia areata who visited a single tertiary referral center between October 2012 and February 2017 and underwent 4-view photographic assessment.
Clinical photographs were reviewed to evaluate hair loss using the Severity of Alopecia Tool 2. Topographic phenotypes of alopecia areata were identified using hierarchical clustering with Ward’s method. Differences in clinical characteristics and prognosis were compared across the clusters. The model was evaluated for its performance, accuracy, and interobserver reliability by comparison to conventional methods.
Main Outcomes and Measures
Topographic phenotypes of alopecia areata and their major (60%-89%) and complete regrowth probabilities (90%-100%) within 12 months.
A total of 321 patients were clustered into 5 subgroups. Grade 1 (n = 200; major regrowth, 93.4%; complete regrowth, 65.2%) indicated limited hair loss, whereas grades 2A (n = 66; major regrowth, 87.8%; complete regrowth, 64.2%) and 2B (n = 20; major regrowth, 73.3%; complete regrowth, 45.5%) exhibited greater hair loss than grade 1. The temporal area was predominantly involved in grade 2B, but not in grade 2A, despite being comparable in total extent of hair loss. Grade 3 (n = 20; major regrowth, 45.5%; complete regrowth, 25.5%) included diffuse or extensive alopecia areata, and grade 4 (n = 15; major regrowth, 28.2%; complete regrowth, 16.7%) corresponded to alopecia (sub)totalis. No significant differences in prognosis (hazard ratio [HR] for major regrowth, 0.79; 95% CI, 0.56-1.12) were found between grades 2A and 1, whereas grades 2B (HR, 0.41; 95% CI, 0.21-0.81), 3 (HR, 0.24; 95% CI, 0.12-0.50), and 4 (HR, 0.16; 95% CI, 0.06-0.39) had significantly poorer response. Among multiple models, the cluster solution had the greatest prognostic performance and accuracy. The tree model of the cluster solution was converted into the Topography-based Alopecia Areata Severity Tool (TOAST), which revealed an excellent interobserver reliability among 4 dermatologists (median quadratic-weighted κ, 0.89).
Conclusions and Relevance
Temporal area involvement should be independently measured for better prognostic stratification. The TOAST is an effective tool for describing the topographical characteristics and prognosis of hair loss and may enable clinicians to establish better treatment plans.
Alopecia areata (AA) is a chronic autoimmune disorder that results in nonscarring hair loss. As a major contributor to the global burden of skin diseases,1 its prevalence rate and lifetime risk are estimated to be 0.1%-0.2% and 2.1%, respectively.2,3 The primary target organs in AA are the hair follicles of the scalp, although body hair loss and nail dystrophy are also common manifestations.4 Moreover, AA is associated with diverse systemic and psychiatric disorders.5,6
The prognoses of patients with AA are diverse and difficult to predict.7 Patients with mild AA can show spontaneous improvement within 6 months without any treatment,8 whereas those with severe AA are less likely to have hair regrowth despite rigorous treatments. Our recent meta-analysis revealed that complete regrowth was achieved in only 24.9% of patients with total hair loss after contact immunotherapy.9
Alopecia areata is usually classified into patchy alopecia (PA), alopecia totalis (AT), and alopecia universalis (AU), and a few special subtypes such as alopecia ophiasis and acute diffuse and total alopecia.10 Despite various factors associated with AA prognosis,11 the extent of hair loss is considered the single most important factor.12 Therefore, diverse instruments, including the Severity of Alopecia Tool (SALT), are used to obtain objective measurements.13 Despite its usefulness, SALT score alone was insufficient to provide quantitative information on the probability of satisfactory regrowth. In daily practice, most forms of AA other than AT or AU are barely distinguished by specific patterns of hair loss and tend to be lumped together and referred to as PA. Furthermore, although some subtypes with a specific hair loss pattern have been known to have a distinctive prognosis,10 quantification of their difference in prognosis has been rarely established. Consequently, tailored prognostic information has not been provided to patients with AA according to their disease severity and characteristics.
Collectively, the current description or classifications of AA are limited by their subjectivity and inability to differentiate prognoses. An earlier identification of patients with poorer prognosis might enable clinicians to implement more intensive treatment regimens and monitor their disease for better outcomes. Therefore, we postulated that unsupervised data mining of topographic data on hair loss in patients with AA may have advantages compared with the conventional classification systems in the identification of phenotypes of AA, and this differentiation may improve the prediction of disease prognosis.
This retrospective study included patients with AA who visited the Department of Dermatology at Yonsei University Wonju Severance Christian Hospital between October 2012 and February 2017. In accordance with the AA investigational assessment guideline,10,14 all patients underwent standardized photographic assessment using the 4-view approach at the initial visit and during follow-up, except those who disagreed to be photographed or had only extrascalp hair loss. This study was approved by the institutional review board of Yonsei University Wonju College of Medicine and a waiver of written informed consent was granted owing to the retrospective nature of the study and the deidentified data used.
We reviewed the electronic medical records and clinical photographs of the patients. Information on clinical variables, including age, sex, age at onset, disease duration, marital status, family history of AA, history of atopic diseases, extrascalp hair loss, primary treatment (either topical corticosteroid or diphenylcyclopropenone), and progress during follow-up, was collected.
By using SALT 2,15 we mapped the topographic distribution of scalp hair loss into a predefined data set (eFigure 1A in the Supplement). SALT 2 is an assessment tool that uses a systematic approach for AA where the scalp is equally divided into 100 parts by 1% scalp surface area to ascertain scalp hair loss. Consequently, topographic data consisting of 100 cells for each patient were collected. However, because of the complexity and collinearity across the data structure, these were merged into 36 subregional variables, taking their anatomical association and proximity into account (eFigure 1B in the Supplement).
Cluster analysis is an unsupervised data mining method that identifies a few subgroups on the basis of their common or distinctive characteristics in the input data distribution without any prior knowledge.16 Among various methods for data aggregation and subset identification, this method was adopted for analyzing topographic data because an unsupervised learning was demanded for describing previously unidentified phenotypes and it is advantageous in handling several parameters.17-19 This method was used in previous studies to identify phenotypes of diverse diseases, including atopic dermatitis,20 asthma,21,22 and vitiligo.23
Hierarchical clustering was performed with the Ward minimum-variance method of the squared Euclidean distance.16 In this agglomerative approach, participants are merged into larger clusters to minimize the within-cluster sum of squares at each generation of clusters. No weighting for any variable was applied. Connectivity-based clustering was preferred to other clustering algorithms for the following reasons18,24,25: (1) it can identify the structures within each cluster, (2) the number of clusters was not prespecified, (3) the clustering was more likely to represent gradual margin with topographic data set, and (4) low-level dimensional differentiation was likely to be inefficient because we expected the clustering to be affected by most of the input variables rather than by a few dominant variables.
The number of clusters was specified via an exploratory approach primarily by visual inspection of the dendrogram and heatmap.26 Although this decision was made after discussion with external AA specialists, we further investigated the statistically optimal number of clusters for finding a better cluster solution.27
For evaluation of our cluster solution, the Fisher linear discriminant function was constructed and recursive partitioning analysis was performed for generating a classification and regression tree.28,29 Discriminant function was built with all 36 parameters, whereas the tree was modeled only with a few composite variables and pruned until the number of end nodes became equal to that of the clusters to make it applicable as a tree-based algorithm for grading AA. These simplifications were required to make the model more feasible for application in clinical practice and to reduce overfitting of the model.
For stratifying disease prognosis in patients with AA, we established a prognostic prediction model with the cluster allocation according to the topographic phenotypes derived from the cluster analysis of 36 subregional data. By using the criteria used in a meta-analysis,9 we ascertained major and complete regrowth within 12 months as surrogates for successful treatment. To validate the proposed model, we evaluated prognostic accuracy and performance and compared these parameters with those of 5 other models for classifying AA (models 1-5): model 1 consisted of PA with a SALT score of less than 100 and AT with 2 levels. Model 2 comprised PA with SALT scores of less than 50 and greater than or equal to 50, and AT with 3 levels because a SALT score of 50 is often regarded as clinically important in several guidelines and meta-analyses.7,9,30,31 Model 3 was stratified by 5 levels (S1-S5) for SALT subgrouping.13,14 Finally, models 4 and 5 were stratified by 5 and 10 levels that equally divided the disease extent from SALT scores 0 to 100 by 20 and 10, respectively.
The tree model was adopted for defining the algorithm for grading AA. Four dermatologists blinded to the cluster allocation independently reviewed clinical photographs and classified them according to the defined system. Its interobserver reliability was examined with quadratic-weighted κ statistics.32
To compare the clinical characteristics of each identified cluster, analysis of variance and χ2 test for trend were used as appropriate. The Bonferroni method was used in pairwise comparisons of SALT score and its subsets. For the prognostic validation of the models, cumulative incidence and Cox proportional hazards analyses were performed. The performance of the cluster solution and its global concordance probability (integrated area under the curve [iAUC]33) were compared with those of the other 5 models. The iAUC is a weighted average of the area under the curve across a follow-up period and a measure of predictive accuracy of the model during follow-up, with higher iAUC indicating better predictive accuracy. Homogeneity (small differences in prognosis among patients in the same grade within each model) and discriminatory ability (greater differences among patients in different grades) were compared using the likelihood ratio χ2 and linear trend χ2, respectively.34 The Akaike information criterion (AIC) was calculated to determine which model is more explanatory in predicting survival,35 with the smaller AIC indicating the preferred model. Bootstrap resampling for 1000 times was used to compare multiple prediction models. This study adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement of reporting.36 All analyses were performed using R statistical software (version 3.4.1, R Foundation for Statistical Computing) at a significance level of 5%.
We evaluated 321 patients with AA (mean [SD] age, 37.1 [16.3] years), of whom 161 (50.2%) were women (eTable 1 in the Supplement). The mean (SD) age at onset was 35.2 (16.4) years, with a mean (SD) AA duration of 8.3 (18.7) months. Of the 238 patients aged 25 years or older, 175 (73.5%) were ever married. A family history of AA was identified in 43 patients (13.4%), and a history of atopic diseases in 49 patients (15.3%). The median SALT score was 8 (interquartile range [IQR], 3-20), and extrascalp hair loss was observed in 43 patients (13.4%). Topical corticosteroid and diphenylcyclopropenone were used as primary treatment in 191 (59.5%) and 130 patients (40.5%), respectively.
A dendrogram was generated using the hierarchical agglomerative approach (Figure 1). Five clusters with distinctive patterns of hair loss were identified through visual inspection. The robustness of our decision was further supported by the statistical approaches indicating the optimal number of clusters as 5 (eFigure 2 in the Supplement).27
Cluster 1 (n = 200, 62.3%) (Figure 1A) describes the most common phenotype, with limited hair loss involvement, and was referred to as “grade 1.” With some exceptions, this cluster was distinguished from others by its sparse and irregular involvement pattern. Clusters 2 (n = 66, 20.6%) (Figure 1B) and 3 (n = 20, 6.2%) (Figure 1C) described subgroups with considerably greater hair loss than those in cluster 1 but less than those in clusters 4 and 5. These clusters had distinctive hair loss distributions rather than differences in the total amount of hair loss. The temporal area tended to be spared in cluster 2 but showed a predominant involvement in cluster 3, referred to as “grade 2A” and “grade 2B,” respectively. Clusters 4 (n = 20, 6.2%) (Figure 1D) and 5 (n = 15, 4.7%) (Figure 1E) described subgroups with prominently extensive hair loss compared with those in other clusters. Cluster 4, named “grade 3,” included disease previously described as diffuse, reticular, or extensive type. By contrast, cluster 5, named “grade 4,” corresponded to AT and alopecia subtotalis.
The clinical characteristics of the 5 clusters (grades 1-4) are shown in eTable 1 in the Supplement. The proportion of male patients was significantly greater in the higher grades. The AA duration and the proportion of never-married patients tended to be greater in higher grades, albeit statistically insignificant. In the higher grades, extrascalp manifestation was significantly prevalent, and diphenylcyclopropenone therapy was frequently used as primary treatment.
Significance differences in SALT score and its subsets (SALT score of the fronto-occipital area along the midline of the scalp [midline SALT], parietal area [parietal SALT], and temporal area [temporal SALT]) were found among the 5 grades (eFigure 3 in the Supplement). However, grades 2A and 2B showed no statistical difference in SALT score and parietalvSALT. Instead, midline SALT and temporalSALT were significantly greater in grades 2A and 2B, respectively. In summary, grades 2A and 2B showed no significant differences in the total amount of hair loss and were characterized by differences in hair loss distribution.
The cluster solution identified an imbalanced distribution consisting of 1 large cluster (grade 1, 62.3%) and 4 small clusters. However, we did not regard this solution as inefficient or skewed because most patients with AA have mild disease and only 5% will progress to extensive disease or AT (grade 4 in our solution, 4.67%).4,37 This asymmetry was also supported by several previous studies.38-42 In addition, further splicing of the largest cluster in our solution did not reveal any advantage in characterizing the pattern of hair loss or prognostic prediction. As a result, we considered the ecological validity of this cluster solution fair and acceptable.
Discriminant function test identified all the parameters of the cluster solution to be significantly determinant (eTable 2 in the Supplement). The accuracy was 94.7%, indicating its great reliability for predicting case allocation to the clusters. In addition, a tree analysis was performed with 2 composite variables of SALT score and temporal SALT (Figure 2A). Although the tree was built by only 2 variables and pruned into 5 end nodes, 266 participants (82.9%) were still assigned to the correct cluster by the algorithm. Therefore, we decided to adopt this cluster solution for further analyses.
The cumulative incidence of major and complete regrowth within 12 months was analyzed in each cluster (Figure 3). Owing to the limited sample size of the patients with severe diseases, the cumulative incidence curves for grades 2B through 4 were rough in shape and overlapping each other during follow-up. Nevertheless, regrowth probabilities at 12 months were stratified in descending order according to their hierarchy. The prognostic accuracy and performance of the cluster solution, and models 1 through 5 were compared (Table). In all the analyzed models, the higher grades were associated with poorer outcome. Grades 1 and 2A in our solution showed no significant difference in prognosis, although SALT score was higher in grade 2A than in grade 1. Despite grades 2A and 2B having similar SALT scores, grade 2B was less likely to exhibit hair regrowth than grades 1 and 2A. Among the 6 models, the proposed model derived from cluster solution had the greatest prognostic accuracy, discriminatory ability, homogeneity, and explanatory power.
We converted the tree model of our cluster solution into an algorithm-based grading system (Topography-based Alopecia areata Severity Tool [TOAST]; Figure 2). With TOAST, the median quadratic weighted κ from the results of the cluster analysis and the classified result from 4 dermatologists was 0.89 (range, 0.76-0.92), indicating excellent interobserver reliability (eTable 3 in the Supplement).43
In this study, we identified topographic phenotypes derived from data of real-world patients and suggested a prediction model that yielded better prognosis estimation. We also presented TOAST, an algorithm-based grading system integrated with total SALT score and temporal SALT, for classifying the topographical phenotype and predicting the prognosis of patients with AA. Moreover, our study demonstrated the importance of the measurement of temporal area involvement in prognostic prediction.
From our cohort, 5 clusters were primarily identified according to the severity of hair loss. However, grades 2A and 2B were differentiated by their area with predominant involvement. Patients with higher grades were more likely to have more poor prognostic factors, including male sex,39,44,45 family history of AA,46,47 and history of atopic diseases.47,48 In addition, early-onset age12,31,49 as another poor prognostic factor may be associated with a greater proportion of never-married patients with higher grades.
In the model, the prognosis expectedly became poorer with an increment in grade. However, grade 2A did not significantly differ from grade 1 in prognosis, whereas grade 2B had a significantly worse prognosis than grades 1 and 2A. This suggest that the prognosis of AA with temporal area involvement would be worse despite the similar extent of hair loss. This seems to partially involve some characteristics of alopecia ophiasis, previously known as a poor prognostic subtype. However, lower occipital area involvement (O3-4) was not significantly associated with either better or worse prognosis with respect to hair regrowth in subsequent analysis with adjustment for SALT score. Although the mechanism of the association of temporal area involvement with poorer outcome could not be elucidated in this study, this might be a step toward a better understanding of the importance of topographical distribution of hair loss in AA.
Among multiple prediction models, the proposed model had the greatest accuracy and performance. This difference may have resulted from the paucity in the consideration of topographic characteristics in other models. In addition, the proposed model could have greater predictive power because it was developed by unsupervised data mining rather than by subjective artificial classification. Moreover, our grading system, TOAST that was built on this solution, showed excellent repeatability and reliability.
Although various instruments are being developed, clinicians could not always obtain all measurements required to assess patients with AA in their daily practices. Nevertheless, it is still strongly required that critical measurements, including standardized clinical photographs and SALT, should be obtained for the objective measurement of disease characteristics. The TOAST might provide a simple, intuitive, and rapid approach for clinicians to describe the extent and topographic characteristics of hair loss and prognosis within 12 months comprehensively. Furthermore, this system would enable us to improve our counseling with patients with AA and establishing a better treatment plan for them. Based on our results, the patients with a lower risk (grades 1 and 2A) are more likely to have a favorable prognosis and less intensive treatments including topical corticosteroid could be provided. In contrast, a regimen with more rigorous or combination treatment may be considered for the patients with a higher risk (grades 2B, 3, and 4).
The limitations of this study include its retrospective nature and the deterioration in statistical power resulting from the limited number of patients with severe AA. Nevertheless, this study proposed a prediction model based on phenotypes derived from real-world data, and its accuracy and performance proved to be superior to those of other conventional methods. Moreover, this study presents a concise tree-based grading system with high reliability that may be easily applied in clinical practice. Larger studies that would further validate this grading system will be required.
In this study, we identified the topographical phenotypes of AA using an unsupervised data mining method and found that the extent of temporal area involvement is a crucial measurement in assessing patients with AA. Given its considerable predictive accuracy and reliability, the TOAST will be a simple but powerful tool for describing the topographical characteristics and prognosis of hair loss in patients with AA. Moving forward, this might allow for an improvement in current clinical practices and help achieve a better outcome in patients with AA.
Corresponding Author: Won-Soo Lee, MD, PhD, Department of Dermatology, Yonsei University Wonju College of Medicine, 20 Ilsan-ro, Wonju, Gangwon 26426, Republic of Korea (firstname.lastname@example.org).
Accepted for Publication: December 22, 2018.
Published Online: March 27, 2019. doi:10.1001/jamadermatol.2018.5894
Author Contributions: Drs Solam Lee and Won-Soo Lee had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: S. Lee, W. Lee.
Acquisition, analysis, or interpretation of data: S. Lee, Kim, C. Lee.
Drafting of the manuscript: S. Lee.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: S. Lee, Kim.
Administrative, technical, or material support: C. Lee, W. Lee.
Conflict of Interest Disclosures: None reported.