Customize your JAMA Network experience by selecting one or more topics from the list below.
Beresniak A, de Linares Y, Krueger GG, et al. Validation of a New International Quality-of-Life Instrument Specific to Cosmetics and Physical Appearance: BeautyQoL Questionnaire. Arch Dermatol. 2012;148(11):1275–1282. doi:10.1001/archdermatol.2012.2696
Objective To develop a new quality-of-life (QoL) instrument with international validity that specifically assesses cosmetic products and physical appearance.
Design In the first phase, semidirected interviews involved 309 subjects. In the second stage, an acceptability study was performed on 874 subjects. Thereafter, we recruited a total of 3231 subjects, each of whom completed the BeautyQoL questionnaire, a clinical checklist for the skin, the generic QoL 36-Item Short Form Health Survey, and a sociodemographic questionnaire. A retest was performed 8 days later on a subgroup of 652 subjects.
Setting Populations in France, the United Kingdom, Germany, Spain, Sweden, Italy, Russia, the United States, Brazil, Japan, India, China, and South Africa, representing 16 languages.
Participants The general adult healthy population, including women and men.
Main Outcome Measures Psychometric properties, construct validity, reproducibility, and internal and external consistency.
Results General acceptability was very good in the 16 languages, with a very low rate of no answers. The validation phase reduced the questionnaire to 42 questions structured in the following 5 dimensions that explained 76.7% of the total variance: social life, self-confidence, mood, energy, and attractiveness. Internal consistency was high (Cronbach α coefficients, 0.93-0.98). Reproducibility at 8 days was satisfactory in all dimensions. Results of external validity testing revealed that BeautyQoL scores correlated significantly with all 36-Item Short Form Health Survey scores except for physical function.
Conclusion These results demonstrate the validity and reliability of the BeautyQoL questionnaire as the very first international instrument specific to cosmetic products and physical appearance.
Health-related quality of life (QoL) is an important clinical objective that has gained prominence during the past few decades.1,2 Research shows that an improvement in facial attractiveness is associated with positive changes in emotional and social dimensions of one's life, such as personality, interpersonal relationships, and self-esteem.3,4 Although most clinical research on cosmetic intervention focuses on psychological benefits of cosmetic camouflage, the additional benefits of facial attractiveness achieved with makeup should not be neglected because they may affect someone's life positively.5,6
People whose physical appearance has been altered because of a transient or chronic clinical condition, such as pigmentary disorders, are at a higher risk of negative emotional distress due to their altered facial characteristics.7 A study in China established that patients with vitiligo experienced significantly impaired QoL and unstable marital relationships.8 Patients with acne have been shown to experience levels of social, psychological, and emotional distress similar to those reported in patients with asthma, epilepsy, and diabetes.9 To mask their dermatological conditions, these patients may resort to simple makeup application, such as concealing oily skin or acne10,11; to mechanical camouflage, such as for vitiligo7,12,13; to invasive interventions, such as injectable facial rejuvenations in patients with immunodeficiency syndrome who have lipoatrophy4; or to cosmetic surgery. All these studies report significant improvement in patients' well-being after their cosmetic intervention.
Constructing a new QoL instrument requires a systematic conceptual approach and a robust scientific method. Most studies that use QoL questionnaires to assess cosmetic effects were conducted on a limited number of subjects. Moreover, most studies used a narrow validation process specific to the culture of the population being studied. This process can result in findings that are not valid owing to a potential bias from researchers who may select questions from various existing questionnaires, to random effects within a small number of patients, or to a potentially poor interest in a measure that may not be relevant in the country where the study is being conducted. Our objectives were to develop a new QoL instrument that specifically assesses QoL relevant to cosmetic products and physical appearance and to ensure its international validation.
The validation of the BeautyQoL questionnaire was conducted among healthy adults in the context of an international multicenter study coordinated by a steering committee composed of 2 senior dermatologists (G.G.K. and S.T.), 1 evaluation expert (K.T.), 1 expert in health-related QoL (A.B.), 1 expert in applied mathematics (G.D.), and 1 expert from the cosmetic industry (Y.D.L.). The subjects were recruited from February 1, 2006, through May 31, 2009, in the following 13 countries representing 16 languages: France, the United Kingdom, Germany, Spain, Sweden, Italy, Russia, United States, Brazil, Japan, India (representing Hindi and English speakers), China, and South Africa (representing Zulu, Sotho, and English speakers).
We included adults (aged 18-78 years) who gave informed consent to participate in the study and spoke the tested language as their native language. The validation of the BeautyQoL questionnaire was conducted on healthy adult subjects in the context of an international multicenter study coordinated by the steering committee described in the previous paragraph. Hence, according to European rules (Directive 2001/20/EC of the European Parliament and of the Council of April 4, 2001; available at http://www.eortc.be/services/doc/clinical-eu-directive-04-april-01.pdf), specific approval from an ethical committee was waived.
The development of the BeautyQoL questionnaire followed a classic 3-phase validation process using a codevelopment approach by which surveys are conducted in parallel in all participating countries. This approach was favored over a sequential approach by which cross-cultural validation is conducted in sequence, one country after another. Because of the level of management and resources needed for coordinating simultaneous validation in multiple countries, codevelopment approaches are rarely used in the field of QoL assessment. However, this approach was preferred to complete the international validation of this new QoL instrument in a reasonable amount of time.
For the main validation survey, subjects underwent evaluation at inclusion, and 25% underwent retesting 8 days later. The self-administered survey materials that were completed by the patients alone included the tested questionnaire (BeautyQoL) and the validated generic QoL 36-Item Short Form Health Survey (SF-36).14 The SF-36 consists of 36 items describing the following 8 dimensions: physical functioning, social functioning, role-physical problems, role-emotional problems, mental health, vitality, bodily pain, and general health. Each dimension score ranges from 0 to 100, with a higher score indicating a better perceived state of health. In addition, sociodemographic data and 1 questionnaire regarding skin condition were collected. Subjects underwent retesting on day 8.
The initial item-generation phase included face-to-face semistructured interviews of 309 subjects. The interviews were conducted by trained clinical psychologists simultaneously in the following 10 countries: France (n = 32), the United Kingdom (n = 18), Germany (n = 46), Spain (n = 27), Sweden (n = 19), Russia (n = 16), the United States (n = 53), Brazil (n = 32), Japan (n = 48), and China (n = 18). These interviews aimed at identifying recurrent themes and were used to generate individual questions for the questionnaire. The interviews addressed the effect of cosmetic products and physical appearance on the individuals' QoL. The interviews were also used to determine the wording to be used in question stems and the types and ranges of possible answers. Interviews were conducted in each of the target countries until no new ideas emerged from the content analysis performed in real time. Although local languages were used throughout the interviews, statements were translated to English by a bilingual clinical psychologist and compiled into a standardized interview report for each country. Final semantic content analysis was then performed and complemented by a computerized text-mining analysis (Alceste software; Image, France).
During the second phase of the questionnaire development, an acceptability study was performed on 874 subjects from the original 10 and 3 new countries representing 16 languages, including France, the United Kingdom, Germany, Spain, Sweden, Italy, Russia, the United States, Brazil, Japan, India (representing Hindi and English speakers), China, and South Africa (representing Zulu, Sotho, and English speakers). Subjects were asked to comment on any aspects of the questionnaire (ie, content, wording, and response choices) that they felt were irrelevant or needed improvement. The items that were ambiguous, misunderstood, or rarely answered were excluded or reworded. This acceptability study ensured content validity and guaranteed that the questionnaire was a true reflection of the subjects' perspective in the 16 languages represented. An item reduction was then performed using specific statistical techniques for tracking potential statistical links between questions. We performed κ tests on each question vs all others. A κ:κ value ratio greater than 50% suggested a statistical relation between the questions. Kendall correlations were performed to confirm the κ test results. Correlation coefficients greater than 0.7 suggested a statistical link between the questions. Finally, principal component analyses (PCAs) were used to compare vector distances between questions.
In the third phase, 3231 subjects were recruited in the 13 target countries to fill out the 4 following questionnaires: the BeautyQoL questionnaire, a clinical checklist for the skin (face and body skin characteristics, eg, type, tone, elasticity, and wrinkles, and potential minor problems, eg, spots, scars, broken veins, and being subject to sun reactions or allergies), the SF-36 questionnaire, and a short sociodemographic questionnaire. Subjects were selected from existing general population panels according to socioeconomic criteria specific to each participating country and managed by local survey agencies. A retest of the BeautyQoL questionnaire only was performed 8 days later on a subgroup of 652 subjects (about 40 subjects per target language). The whole database, including all answers provided in a total of 16 languages, was split randomly in 2 subsamples. For 1 subsample, the multidimensional structure of the questionnaire was identified studying interitem, item-dimension, and interdimensional correlations (Pearson correlation tests) and PCA.15 Varimax rotations applied to the PCA identified the main axes composed by a subgroup of questions, in which the component for 1 particular axis would be greater than 0.515 (each subgroup of questions representing 1 dimension). For each potential dimension scale, internal consistency reliability was assessed by the Cronbach α coefficient. A Cronbach α coefficient of at least 0.70 was expected for each scale.16 Within each dimension, the items for which deletion would lead to an increase in the α value of at least 0.02 were candidates for deletion. The unidimensionality of each dimension was assessed using Rasch analyses. For the second subsample, confirmatory factor analysis17 was used to assist selection by testing the various candidate scale structures according to different potential item selection patterns. The more meaningful and psychometrically sound construct was kept to produce the final version of the BeautyQoL questionnaire.
To explore external validity, relationships were investigated between the dimensions of the BeautyQoL instrument and the generic SF-36 instrument. The underlying assumption was that dimension scores of the BeautyQoL questionnaire would be more strongly correlated with scores on similar dimensions from the other instruments than with dissimilar ones (convergent validity). The discriminant validity of the BeautyQoL questionnaire was determined by dimension mean scores across subject groups that were expected to differ in their sociodemographic features (eg, age and sex) or clinical features (eg, skin status) using analysis of variance, the Mann-Whitney test, and Spearman correlation tests. Reproducibility was analyzed through test-retest reliability using intraclass correlation coefficients between the 2 successive assessments of subjects selected for retesting.18 Sensitivity of the BeautyQoL questionnaire was assessed in the frame of a randomized clinical trial conducted in France, which compared the impact of the following 2 camouflage products: cream (high-coverage foundation) vs powder (high-coverage loose powder) used for 3 weeks by 88 subjects with facial cosmetic imperfections.
Among the 13 countries representing 16 languages, we found no significant difference in the sex status. Significant differences were found among the countries in family status (>72% were living as part of a couple in Russia and Hindi-speaking India vs 34.0% in Japan or 21.7% in Zulu-speaking South Africa), educational level (99.5% had a tertiary educational level in Brazil vs <40% in South Africa), labor status (88.0% were employed in China vs <60% in France, Spain, India, and the United Kingdom), housing (about 90% lived in their own home in India vs 29.5% in Germany), and primary residence (≥98% of the population lived in urban areas in Japan, Brazil, China, English-speaking India, Russia, and South Africa vs <70% in the United Kingdom, Sweden, the United States, France, and Germany) (Table 1).
Five axes were identified by the PCA, representing the following 5 dimensions explaining 76.7% of the total variance: social life, self-confidence, mood, energy, and attractiveness. Internal consistency was high (Cronbach α coefficients, 0.93-0.98). Reproducibility at 8 days was satisfactory in all dimensions. External validity testing revealed weak significance for the correlation of BeautyQoL scores with all SF-36 dimensions except for the physical function dimension, which was expected because of the poor link between physical function and appearance (Pearson correlation coefficient, −0.02). These results suggest that BeautyQoL dimensions would capture specific perceptions not covered by generic instruments such as the SF-36.
The second item-reduction analysis led to the final version of the BeautyQoL questionnaire consisting of 42 questions. The international English version 3.0 is presented in Table 2 with the items correlated with the 5 dimensions. Finally, the clinical trial conducted in France confirmed the ability of the BeautyQoL questionnaire to discriminate subclinical changes in facial cosmetic imperfections. The QoL global score for the group receiving the high-coverage foundation was significantly higher than the global score of the group receiving the high-coverage loose powder at days 7 and 21 (Mann-Whitney test, P < .05).
General acceptability was high according to the very low proportion of missing data (<1%). The average time of completion ranged from 3 minutes in Japan to 16 minutes in Hindi-speaking India, which is fully compatible with clinical practice (completion time in China was not taken into consideration because of local survey briefing issues).
The scoring procedure of the BeautyQoL instrument had 3 levels of extensive analyses. Level 1 consisted of a descriptive analysis on a question-by-question basis. Level 2 consisted of a classic “algorithmic scoring” that led to 1 score per dimension (profile) and 1 overall score (index). The scoring procedure was based on the mean score per dimension linearly transformed to a scale of 0 to 100, with 100 indicating the best possible level of QoL and 0 indicating the worst. Table 3 presents the algorithmic scores of the 5 dimensions in each of the 16 participating languages, underlying various differences.
For example, subjects in India and especially in South Africa have, on average, much more positive views of their social life and mood, as well as energy and attractiveness (about 70%-80%), compared with people in other countries. In addition, a global index score was computed as the mean of the dimension scores. Negatively worded item scores were reversed so that higher scores indicated a higher level of QoL. Algorithmic scores of the 5 BeautyQoL dimensions are presented in Table 4 by sex, age group (scores decreased significantly when age increased [Kruskal-Wallis test, P < .05]), family status (single people presented significantly higher BeautyQoL scores than people living as part of a couple or with relatives [Kruskal-Wallis test, P < .05]), labor force status (retired/pensioner category had significantly lower BeautyQoL scores than other categories, and students had significantly higher scores than other categories [Kruskal-Wallis test, P < .05]), housing status (the renting category had significantly lower BeautyQoL scores than other categories [Kruskal-Wallis test, P < .05]), level of education achieved, and primary residence. Table 5 presents the BeautyQoL score by facial skin type and skin tone. Mann-Whitney tests confirmed significant statistical differences (P < .05) in all dimensions of BeautyQoL scores according to facial skin type and skin tone. Level 3 analysis was based on vector projections using the PCA. This innovative approach led to vectoral scores that complemented the classic algorithmic scores and was developed specifically in the frame of the BeautyQoL initiative to significantly improve the sensitivity of the scoring procedure. For example, the classic algorithmic scoring analysis (as proposed in level 2 analysis) could not discriminate 2 opposite profiles because the mean scores were similar. By extension, several profiles of answers could generate the same score. Level 3 analysis proposed to use the data set of the 3231 subjects as a reference corpus for the PCA and to analyze subjects from any other studies as additional subjects. The projection of answers on the main axis defined vectors, which could be linearly transformed from 0 to 100 to generate vectorial scores. After testing various answering profiles leading to similar algorithmic scores of 20, 40, and 60 against the vectorial scoring approach, we established that this new approach was able to discriminate all profiles by different vectorial scores, leading to similar algorithmic scores and significantly improving the sensitivity of the instrument. The Pearson correlation between algorithmic scores and vectorial scores was 1.00, confirming an affine relationship between the 2 indicators.
One of the unique strengths of the BeautyQoL research initiative is the global approach involving 13 countries representing 16 languages worldwide. Such breadth allowed us to generate a fully validated instrument, available internationally, that captures the cultural variability between highly dissimilar countries and differences between people's attitudes within a single country. A multistep iterative process was used to generate subjects' responses and to test their validity. The initial semidirected interviews conducted in 10 countries established potentially relevant categories, and an acceptability study completed with more than 800 subjects in 13 countries representing 16 languages guided the development of the categories for a full-scale assessment. After more than 3200 men and women contributed to the full-scale validation assessment, their responses were analyzed and a subset of categories was created that captured most of the variability in subjects' BeautyQoL attitudes. A final follow-up assessment was used on a subset of the subjects to confirm their reproducibility. The 1-week interval was selected for the test-retest reliability to minimize the likelihood of changes in the subjects' mood while allowing sufficient time between testing and retesting to avoid the subjects recalling their previous answers. Most of the reviewed studies addressing cosmetic QoL improvement measures are limited by sample size. A small sample size prevents investigators from establishing valid QoL measures owing to a lack of variability in subjects' responses. As a result, few relevant QoL categories can be established, and the resultant instrument has limited interest for potential cross-validation use in a broader international audience. We are confident that the resulting BeautyQoL instrument reflects universal categories of cosmetic QoL attitudes and a range of unique cultural characteristics. Future assessments using the BeautyQoL instrument can be deployed across a large spectrum of potential scientific and market research applications.
Another notable strength of the BeautyQoL instrument is its simultaneous codevelopment in 16 languages. This simultaneous development of the instrument permitted the creation of a standard assessment tool in a relatively short time compared with the sequential strategy. When an instrument is created in one country and later adapted to another within the frame of sequential cross-cultural validations, the process takes a very long time and the subsequent versions may not have the structural equivalence and thus will lose validity in comparisons across multiple studies and cultures. Reliability in subsequent studies also becomes an important problem. The simultaneous construction of a multicountry assessment tool, the BeautyQoL questionnaire, solves these structural problems and allows for reliable studies across cultures.
Particular efforts have been deployed to generate relevant items compared with a number of existing QoL instruments. Most recent QoL instruments are based on expert opinions or adapted from other questionnaires, an approach that leads to a limited set of assessment criteria that experts or previous investigators found most relevant. We elected to create a comprehensive instrument that reflected a thorough understanding of the general public's opinion and QoL attitudes.
The international codevelopment approach has a number of advantages but also creates some limitations. First, international standardization of the selection of relevant questions excludes a few important aspects of physical appearance according to some cultures. Second, the 1-week interval between testing and retesting can be considered too short compared with the time frame necessary to observe final changes in physical appearance after some cosmetic interventions or product use in the long term. Third, sensitivity to change is expected to vary between countries and can be considered a limitation for conducting international studies if not further investigated in the frame of comparative surveys.
Nonetheless, we are confident that the resulting instrument contains the most relevant and cross-culturally valid categories that reflect the general population's attitudes toward physical appearance. For example, subjects in highly industrialized European countries (and the United States and Japan) seem to be much less satisfied with their QoL, as established by their BeautyQoL scores (about 40%). Such discrepancy may appear paradoxical, given that these countries have greater access to consumer goods and health care. This finding may be explained by behavioral economics demonstrating that, when people have many choices in consumer products, they are less like to make a purchase and their satisfaction is lower than when they have fewer options.19 Some sociological studies found significant ethnic differences in positive body image, with African American women scoring higher than white American women.20 Given that weight has been traditionally regarded as a symbol of prosperity and status in rural Africa, we should expect the cultural differences found in our study, with South African women having the most positive body image. This interpretation is consistent with the fact that subjects with very fair skin had lower BeautyQoL scores, and the scores increased with darkness of the skin (Table 5). This finding confirms the role of racial differences when assessing QoL.21 Calvert et al22 assessed the QoL of 5408 subjects from the South Asian and African Caribbean communities in the United Kingdom and have reported significantly higher QoL in these minority ethnic groups compared with UK normative data, confirming a difference of perception of QoL according to ethnic communities. An additional factor in positive self-image is lack of exposure to popular media beauty standards, that is, women's attitudes toward their self-image tend to worsen during the introduction of television and popular culture beauty standards. However, this explanation is not entirely satisfactory because we expected that people from rural populations would score higher in their assessment of social life, mood, energy, and attractiveness compared with urban populations; however, urban populations scored higher than rural. In addition, having a secondary rather than a primary or a tertiary education appears to be correlated with higher QoL scores as measured by the BeautyQoL instrument. This argument can be supported given that the urban population is more highly educated in general according to a report from the United Nations Educational, Scientific, and Cultural Organization23 (Table 4). Another counterintuitive finding is that subjects with oily skin appear to have the highest QoL scores (Table 5). One interpretation is that dry skin conditions are often not considered important by the subjects and even by health care professionals. As a result, problems with untreated dry skin can lead to a variety of issues that affect QoL, such as pruritus or ichthyosis vulgaris, which can be distressing, whereas eczema and psoriasis can lead to more serious consequences, such as fissures and infections.24,25
In conclusion, although the results of this study confirm the interest and international validity of the BeautyQoL instrument, the next step in our research agenda is to stimulate further studies that will use the BeautyQoL instrument to assess the QoL of subjects, such as in clinical trials, case-control studies, and cohort studies. Another step will be to perform additional validation studies in more countries, because further data collection would allow a more nuanced analysis of cultural differences.
Correspondence: Ariel Beresniak, MD, MPH, PhD, Data Mining International, Route de l’Aeroport, 29-31, CP221, CH-1215, Geneva 15, Switzerland (email@example.com).
Accepted for Publication: May 25, 2012.
Author Contributions: Dr Beresniak has full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Beresniak, de Linares, Krueger, Talarico, Duru, and Berger. Acquisition of data: Beresniak, de Linares, and Krueger. Analysis and interpretation of data: Beresniak, Krueger, Talarico, and Tsutani. Drafting of the manuscript: Beresniak, Tsutani, and Berger. Critical revision of the manuscript for important intellectual content: Beresniak, de Linares, Krueger, Talarico, Duru, and Berger. Statistical analysis: Beresniak. Obtained funding: Beresniak and de Linares. Administrative, technical, and material support: de Linares, Talarico, and Tsutani. Study supervision: Beresniak, de Linares, Krueger, Duru, and Berger.
Conflict of Interest Disclosures: Ms de Linares is employed by L’Oréal Research.
Funding/Support: This study was supported by an unrestrictive grant from L’Oréal Research.
Additional Contributions: The international BeautyQoL Study Group contributed to the data collection.
Create a personal account or sign in to: