HRQoL indicates health-related quality of life; PROM, patient-reported outcome measures.
eTable 1. Characteristics of the Included Study Populations
eTable 2. Results of Studies on Measurement Properties
eAppendix 1. Search Strategy
eAppendix 2. Guidance for Evaluating Hypothesis Testing
Customize your JAMA Network experience by selecting one or more topics from the list below.
Hopkins ZH, Thiboutot D, Homsi HA, Perez-Chada LM, Barbieri JS. Patient-Reported Outcome Measures for Health-Related Quality of Life in Patients With Acne Vulgaris: A Systematic Review of Measure Development and Measurement Properties. JAMA Dermatol. 2022;158(8):900–911. doi:10.1001/jamadermatol.2022.2260
What are the measurement properties of existing patient-reported outcome measures that assess health-related quality of life in patients with acne?
In this systematic review of 21 patient-reported outcome measures, 2 measures met standards to be recommended for use in acne clinical studies—the Acne-Q and CompAQ. All measures were lacking data on content validity or some measurement properties.
Patient-reported outcome measures used for research or in clinical settings should have sufficient content validity and other measurement properties; these results suggest that while 2 measures can be currently recommended, further understanding of these and other measures will be critical for patient-reported outcome measure use among patients with acne.
Multiple patient-reported outcome measures (PROMs) for health-related quality of life (HRQoL) exist for patients with acne. However, little is known about the content validity and other measurement properties of these PROMs.
To systematically review PROMs for HRQoL in adults or adolescents with acne.
Eligible studies were extracted from PubMed and Embase (OVID).
Full-text articles published in English or Spanish on development, pilot, or validation studies for acne-specific, dermatology-specific, or generic HRQoL PROMs were included. Development studies included original development studies, even if not studied in acne patients per Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) recommendations. If a study included several diagnoses, the majority (ie, over 50%) of patients must have acne or acne-specific subgroup analyses must be available. Abstract and full-text screening was performed by 2 independent reviewers.
Data Extraction and Synthesis
Two independent reviewers assessed study quality applying the COSMIN checklist and extracted and analyzed the data. For each distinctive PROM, quality of evidence was graded by measurement property.
Main Outcomes and Measures
PROM properties (target population, domains, recall period, development language), PROM development and pilot studies, content validity (relevance, comprehensiveness, comprehensibility), and remaining measurement properties (structural validity, internal consistency, cross-cultural validity, reliability, measurement error, criterion validity, construct validity, and responsiveness). Quality of evidence was assigned for each measurement property of included PROMs. An overall recommendation level was assigned based on content validity and quality of the evidence of measurement properties.
We identified 54 acne PROM development or validation studies for 10 acne-specific PROMs, 6 dermatology-specific PROMs, and 5 generic PROMs. Few PROMs had studies for responsiveness. The only acne-specific PROMs with sufficient evidence for content validity were the CompAQ and Acne-Q. Based on available evidence, the Acne-Q and CompAQ can be recommended for use in acne clinical studies.
Conclusions and Relevance
Two PROMs can currently be recommended for use in acne clinical studies: the Acne-Q and CompAQ. Evidence on content validity and other measurement properties were lacking for all PROMs; further research investigating the quality of remaining acne-specific, dermatology-specific, and generic HRQoL PROMs is required to recommend their use.
Acne is a common inflammatory skin condition with a profound impact on quality of life.1 Patient-reported outcome measures (PROMs) are a valuable tool to assess health-related quality of life (HRQoL), and their use can complement clinical assessments, such as lesion counts, and global assessments.2,3 Guidelines from the European Academy of Dermatology and Venerology recommend including PROMs assessing HRQoL in both clinical trials and routine clinical practice.4,5
As the breadth of available acne-specific, dermatology-specific, and generic HRQoL PROMs continues to grow, heterogeneity in outcome reporting in clinical research may result.1,6 This heterogeneity can stifle efforts to synthesize research across studies by limiting the ability to conduct systematic reviews and meta-analyses. One way to combat this issue is to establish core outcome sets for clinical studies, including trials.7 In 2017, international stakeholders identified key domains for acne to move toward standardizing outcomes in acne research.8 These domains included satisfaction with appearance, extent of dark marks and scars, long-term acne control, signs and symptoms, satisfaction with treatment, and HRQoL.
To determine which HRQoL PROMs are most appropriate for acne research and clinical practice, it is important to evaluate the content validity and other measurement properties of these measures.9,10 The purpose of this review was to rigorously examine HRQoL PROMs for use in adolescents and adults with acne, including: (1) identifying all PROM development and validation studies; (2) evaluating the methodological quality of these studies and the quality of the measurement properties of PROMs using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) criteria; (3) evaluating the quality of evidence for the summarized measurement property ratings according to the adapted Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) framework; and (4) providing recommendations for PROM use in patients with acne based on synthesized evidence.
The review protocol was registered on PROSPERO (CRD42021234975). This study did not require ethics approval.
We included any full-text article published in English or Spanish that investigated development, pilot studies, or evaluation of 1 or more measurement properties for a PROM assessing HRQoL in patients with acne. In studies investigating multiple dermatologic conditions, acne patients had to comprise 50% or more of the patients or subgroup analyses on acne-specific data had to be available. Study population could include children, adolescents, or adults. Studies that only used the PROM as an outcome measurement or where PROMs were included to validate a new or other PROM were excluded. General HRQoL instruments were considered those that were developed for use in patients with any variety of medical conditions and were not specific to any one condition or organ system. Dermatology-specific HRQoL instruments are those that were developed in patients with dermatologic conditions to measure cutaneous disease-mediated HRQoL. Lastly, acne-specific instruments were defined as those developed for use only among patients with acne and measuring acne-mediated HRQoL status.
This review followed the COSMIN reporting guideline for systematic reviews of PROMs.11 Bibliographic databases used for identifying articles included PubMed and Embase (OVID) (eAppendix 1 in the Supplement). Search results were uploaded into Covidence to facilitate screening of abstracts and full-text articles for final data extraction. All identified abstracts were screened by 2 independent reviewers. In cases of disagreement or if an abstract was deemed relevant by only 1 reviewer, the full-text article was retrieved and screened. Full-text screening and data extraction was performed by 2 independent reviewers. In cases of disagreement, the reviewers discussed the case, and if needed, a third reviewer was queried.
Extracted information for each study included study characteristics (author, year, country of origin, language, and study design), characteristics of the PROM (construct[s] being measured, target population, number of items, and response options), and measurement properties of the instruments. In accordance with COSMIN guidelines, we assessed the following properties for each PROM development or validation study: content validity, internal consistency, structural validity, construct validity, cross-cultural validity, reliability, measurement error, and responsiveness.12 The criteria for evaluating results of studies using hypothesis testing to assess construct validity were developed by the study team prior to conducting the review (eAppendix 2 in the Supplement). Spanish translation, if required, was performed by 2 coauthors (J.S.B. or L.M.P.C.).
The COSMIN Risk of Bias checklist was used to evaluate the methodological quality of the included studies.12,13 Each study could be rated as very good, adequate, doubtful, or inadequate. Disagreements were discussed by the group until consensus was reached. For structural validity and internal consistency, the instruments’ measurement model (reflective vs formative) was considered. Reflective scales “reflect” the latent construct, ie, changes in HRQoL caused changes in the item scores measured. Formative (sometimes called “causal”) models measure items that directly cause changes in HRQoL.14 We characterized each instrument’s original description as reflective or formative. However, when a description was not available, this determination was made by the authors using guidance such as the “thought test.”15 Structural validity and internal consistency were not evaluated for formative instruments.15-17 If the instrument contained a mix of reflective and formative items and structural validity and internal consistency were reported, the instrument was assumed to be based on a reflective model and such measurement properties were evaluated.12
The result of each study on a measurement property were extracted and evaluated using the Criteria for Good Measurement Properties proposed by COSMIN guidelines. Accordingly, each result was rated as sufficient, insufficient, or indeterminate.11 Results from individual studies were then qualitatively summarized per measurement property per PROM. The summarized result was also compared against the same criteria and rated as sufficient, insufficient, indeterminate, or inconsistent.
For each of the included PROMs, evidence quality for the resulting summary scores was estimated. The quality of the evidence was rated as high, moderate, low, or very low according to modified GRADE guidelines.11,12 These ratings are based on 4 factors: risk of bias (ie, quality of the studies), consistency of results from studies, directness (different populations, interventions or outcomes than those of interest to the review), and precision (width of confidence intervals).11
Each reviewed PROM was assigned to 1 of the 3 standardized COSMIN recommendation for use categories.11 These include can be recommended for use (signified by an A rating), has the potential to be recommended for use but requires further validation (B rating), and should not be recommended for use (C rating). An A-level recommendation for use is defined as the PROM having evidence for sufficient content validity (any level of evidence quality) and at least low-quality evidence for sufficient internal consistency. A C-level recommendation is defined as the PROM having high-quality evidence demonstrating insufficient measurement criteria, and a B-level recommendation is defined as any PROM not meeting criteria for an A-level or C-level recommendation.
A systematic literature search was performed on February 11, 2021. We identified 1200 abstracts; after removing duplicates, 983 potential abstracts remained. After screening, 47 reports met criteria for inclusion into the review, and an additional 7 reports on PROM development were included based on searching cited references of the included studies (Figure).18-69
Ten different acne-specific PROMs were identified: Acne Disability Index (ADI), Acne Impact on Adult Daily Life (AI-ADL), Acne Severity and Impact Scale (ASIS), Acne-Q, Acne Quality of Life Scale (AQOL), Acne-QoL, Acne Quality of Life Index (Acne-QOLI), Assessment of the Psychological and Social Effects of Acne (APSEA), Cardiff Acne Disability Index (CADI), and CompAQ (Table 1; eTable 1 in the Supplement). All PROMs were multi-item instruments and included adolescents and adults in their target populations. Six instruments were targeted for patients with facial and/or truncal acne, and 2 (Acne-QoL and ASIS) were targeted for facial acne alone. The Acne-QOLI and APSEA did not specify the type of acne they were designed for, only using the general term acne. Acne-related quality of life was largely made up of emotional functioning, physical functioning, and social functioning domains. Two measures were developed using Rasch Measurement Theory, the Acne-Q and the ASIS, and the remainder used classical test theory.
We identified 6 dermatology-specific PROMs that have been assessed in acne: the Dermatology Life Quality Index (DLQI), the Children’s Dermatology Life Quality Index (CDLQI), The Dermatology-Specific Quality of Life (DSQL), Oily Skin Self-assessment Scale (OSSAS), Oily Skin Impact Scale (OSIS), and the Skindex-29. Five generic PROMs that have been studied in acne were identified (UK Sickness Profile, EuroQol 5-Dimension, Short Form-36 [SF-36], Patient Benefit Index [PBI], and Patient-Reported Outcomes Measurement Information System–Anxiety).
For dermatology-specific or generic HRQoL PROMs, development studies were evaluated despite having few or no acne patients, in accordance with COSMIN guidance (Table 2).11 Among acne-specific PROMs overall development was rated adequate for the Acne-Q. The AI-ADL, AQOL, Acne-QOLI, ADI, and CADI were rated as inadequate. The remaining acne-specific PROMs were rated doubtful. Overall content validity was rated as sufficient for the Acne-Q (evidence quality was judged to be moderate), and CompAQ (moderate). The ADI, Acne-QOLS, and CADI were rated insufficient (very low, low, and very low, respectively). Evidence quality for content validity tended to be low, with no study having higher than a moderate level of evidence for any component of content validity.
No studies evaluated measurement error, criterion validity, or cross-cultural validity or measurement invariance, with the exception of the ASIS, which had a single study finding no differential item functioning between White patients and patients with other races and ethnicities (cross-cultural validity; rated as inadequate) (Table 3; eTable 2 in the Supplement). No study explicitly specified whether the PROM under evaluation was based on reflective or formative models. We judged that most instruments were based on reflective models. However, instruments that had symptoms scores or questions about symptomatology (ASIS, Acne-Q, Acne-QoL, and CompAQ) were considered mixed, as some questions behaved as effect indicators of HRQoL (ie, reflective model) and others behaved as causal indicators of HRQoL (ie, formative model).15,17 Since these instruments had mixed scales, structural validity and internal consistency were assessed for this review, although their results may be affected by this mixed structure. Structural validity was rated as sufficient for 7 of the 10 acne-specific PROMs and was not possible to be rated in the remaining 3 (ADI, APSEA, and CADI). GRADE quality of evidence for these measures was high for 2 PROMs (AI-ADL and CompAQ). Internal consistency was rated as sufficient for 9 of 10 acne-specific PROMs with a GRADE score of high for 2 of 10 PROMs. Construct validity was graded as sufficient for all 10 PROMs, but GRADE quality ranged from low (Acne-Q, AQOL, APSEA) to high (Acne-QoL, CADI, CompAQ). Responsiveness was assessed for 3 PROMs (Acne-QoL, APSEA, and CADI), but the quality of these data ranged from very low for the CADI to moderate for Acne-QoL. Findings were similar for dermatology-specific PROMs, but quality of evidence tended to be lower. These outcomes were less often evaluated for generic PROMs and quality of evidence was low or very low in most cases.
Among acne-specific instruments, 2 instruments (Acne-Q and CompAQ) received a recommendation grading of A, or recommended for use. No instruments were graded as C, which requires high quality evidence of insufficient measurement criteria. The remaining instruments were graded as B for not having high-quality evidence for insufficient measurement properties or enough evidence for recommendation. All acne-specific instruments had high enough evidence of internal consistency except the APSEA, where this evidence was only assessed in a conference abstract that did not meet inclusion criteria for this review.70 However, evidence for sufficient content validity was infrequent, with only the Acne-Q and CompAQ meeting the COSMIN criteria for sufficient content validity.
We found that 2 instruments, the Acne-Q and the CompAQ, had sufficient evidence for content validity and internal structure (ie, structural validity and internal consistency) to receive an A recommendation for use. These instruments were also appropriate for both facial and truncal acne in both adults and adolescents. This is important because truncal acne is common and has a significant impact on quality of life.71 Despite the A ratings, there are missing areas of study for these 2 measures. The Acne-Q has not been assessed for responsiveness, and evidence for reliability and construct validity remains low. The CompAQ similarly has not been assessed for responsiveness or reliability. Interpretability studies have not been conducted for either PROM.
The Acne-Q has a total of 63 items, but these are split into separately graded scales for facial acne (15 items), chest and back acne (10 items), facial skin (12 items), appearance-related distress (10 items), symptoms (6 items), and scars (10 items), so scales can be used or omitted depending on patient situation. The Acne-Q was also developed using Rasch Measurement Theory, which may have some advantages over measures developed using classical test theory.72
The CompAQ has 5 domains with a total of 20 items (psychological/emotional [4 items], social-judgment [4 items], social-interactions [4 items], treatment concerns [4 items], and symptoms [4 items]). The CompAQ also has an available short form for use in busier clinical settings; however, content and measurement properties have not been assessed for this shortened version. Additionally, classical test theory rather than more modern item response theory and Rasch Measurement Theory were used for item selection and scale generation, which may affect construct theory validity and scale interpretation.73
In this review, no short-form HRQoL instruments for acne met our criteria for recommendation. This is an important shortcoming. Instruments that can be rapidly deployed in a clinical setting and which are less likely to induce survey fatigue in patients over multiple visits are critical for clinical implementation.74
While concepts like informativity and discrimination (ie, floor and/or ceiling effects) are not evaluated within the COSMIN framework, these may be affected by differences in response options. Items with more available options or long scales may improve discrimination, reliability, validity, and informativity.75,76 Together, while both the Acne-Q and the CompAQ are well-validated instruments, they assess different aspects of acne-related quality of life and have different instrument characteristics, which may be important when deciding on which is best for clinical or research applications. For instance, the available short forms for the CompAQ may improve feasibility of use in a routine clinical practice environment, while the detailed subscales of the Acne-Q may be valuable in a clinical trial setting.
The remaining measures were found to have a B rating, suggesting that while there was not enough evidence for sufficient content validity and internal structure for an A rating, we did not find high-quality evidence for insufficient measurement properties (which would result in a C score). Of note, the Skindex-29 was the only dermatology-specific instrument with sufficient overall content validity (with very low evidence quality). However, acne-specific data on the internal structure was not available for evaluation, hampering its consideration for an A grade.
Content validity is considered a key component of instrument validation.12 Although the Acne-QoL is commonly used and has some of the strongest data to support sufficient measurement properties, it had inconsistent ratings with respect to content validity, which precluded it from being given an A rating. Similar issues were identified for the ASIS. Further studies are needed to examine the content validity of these measures. Although the ADI and CADI are frequently used to assess HRQoL in acne studies, these were found to have inadequate PROM development. Likewise, these instruments received overall scores of insufficient for content validity. The APSEA was also found to have inadequate PROM development. Additionally, the AQOL, while receiving a grade of doubtful for development design, did not have a pilot study testing items and received an overall content validity score of insufficient with low evidence. While these instruments did not meet the rather high standard for a C grading, based on these data we believe the evidence for the ADI, CADI, AQOL, and the APSEA are more limited and should not be used without further investigation.
One 2021 systematic review77 evaluating PROMs for patient treatment satisfaction in acne found only 1 study with category B evidence (ie, not yet to be recommended). Although we identified 2 acne-specific instruments that can be recommended for use, more data are needed to establish core outcome sets for acne clinical research. Importantly, even for instruments categorized as A, evidence on key measurement properties were missing. In particular, few studies evaluated responsiveness, which is a critical measurement property to assess whether patients have improved with treatment in the setting of a clinical trial or routine clinical care. In addition, measurement error, criterion validity, and cross-cultural validity or measurement invariance were almost never evaluated. Measures for use in clinical research should meet the criteria designated by the OMERACT group, including truth, discrimination and feasibility.78 There is a need for future validation studies to address these evidence gaps.
This study had several limitations. Only aspects of studies that were reported could be assessed. Limits on publication space may not have allowed prior studies to report on all items in the COSMIN risk of bias checklists. Likewise, prior to the COSMIN initiative in 2005, certain details of methodology may not have been reported, even if performed. Multiple aspects of the risk of bias checklists are relatively subjective (eg, assumedly performed vs doubtful). For example, multiple reports noted following COSMIN checklists or FDA recommendations for instrument development despite missing or not reporting items in the risk of bias checklist. These cases may fall somewhere between assumable (because of mention of the COSMIN checklists) to doubtful, even inadequate (because items were not reported). Furthermore, the FDA guidance documents focus on HRQoL instrument design from a clinical trials perspective, so important design considerations from a clinical perspective may not be included. We attempted to mitigate reviewer bias by discussing and formalizing assumptions for these situations, having multiple reviewers extract data, and conducting group discussions regarding difficult cases. Additionally, although multiple search terms with various interactions were used, it is possible that important development, pilot, or validation studies were missed. As instrument validation work is actively ongoing, future updates are important to guide decisions on instrument use and quality.
Since no studies reported on measurement error or cross-cultural validity, this was not included in overall assessments. However, these are important for interpretability. Likewise, related instrument characteristics such as smallest detectable change, limits of agreement, or minimal important change were beyond the scope of this review. Subjectively, in our review of the evidence, we did not find evidence of these being described for most measures besides the Acne-QoL.
The COSMIN framework has been criticized for its focus on classical test theory and relative lack of modern measurement theory benchmarks, subjectivity and dependency on reviewer expertise, lack of evidence surrounding risk of bias checklist items and grading procedures, and poor interreviewer reliability.73,79,80 Despite these important limitations, the COSMIN guidelines do offer a formal, Delphi methods–based framework for systematically reviewing PROMs and assessing development and validation quality. Therefore, our recommendations should be considered conservative baselines and further research investigating construct theory, scale design, and instrument will be important for further validation.73,79,81
In this systematic review, the Acne-Q and CompAQ were found to be validated to a sufficient standard to support recommendation for measuring acne-associated quality of life. The Acne-QoL and ASIS could also be considered if additional evaluation of content validity is performed. Additionally, important measurement properties have not been studied, or have been studied insufficiently, for all instruments. Further research is needed to better define content validity, other measurement properties such as responsiveness, and interpretability of PROMs used to assess HRQoL in patients with acne.
Accepted for Publication: April 27, 2022.
Published Online: June 22, 2022. doi:10.1001/jamadermatol.2022.2260
Corresponding Author: John S. Barbieri, MD, MBA, Department of Dermatology, Brigham and Women’s Hospital, 41 Avenue Louis Pasteur, 317A, Boston, MA 02115 (email@example.com).
Author Contributions: Drs Barbieri and Hopkins had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Thiboutot, Pérez-Chada, Barbieri.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Hopkins, Barbieri.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Pérez-Chada, Barbieri.
Obtained funding: Barbieri.
Administrative, technical, or material support: Hopkins, Thiboutot, Homsi, Barbieri.
Supervision: Pérez-Chada, Barbieri.
Conflict of Interest Disclosures: Dr Thiboutot reported participating in the development of the Acne Quality of Life instrument. Dr Barbieri reported holding patent licensed to CompAQ. Drs Barbieri and Thiboutot reported being members of the Acne Core Outcomes Research Network (ACORN), which owns the copyright to the CompAQ. Dr Thiboutot reported being involved in the development of the Acne Quality of Life and CompAQ. No other disclosures were reported.
Funding/Support: Support to use the Covidence platform for abstract screening was provided by International Dermatology Outcome Measures.
Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: Dr Barbieri is an Associate Editor and Evidence-based Practice Editor of JAMA Dermatology, but he was not involved in any of the decisions regarding review of the manuscript or its acceptance.
Additional Contributions: We would like to thank Melanie Cedrone (Librarian, University of Pennsylvania) for her help running the literature searches.