Mean FACE-Q scale score for pretreatment and posttreatment participants.
eTable 1. Number (%) from 96 post-operative blepharoplasty patients to report being bothered for each item
eTable 2. FACE-Q Satisfaction with Eyes Interpretation Table
Customize your JAMA Network experience by selecting one or more topics from the list below.
Klassen AF, Cano SJ, Grotting JC, et al. FACE-Q Eye Module for Measuring Patient-Reported Outcomes Following Cosmetic Eye Treatments. JAMA Facial Plast Surg. 2017;19(1):7–14. doi:10.1001/jamafacial.2016.1018
How were the FACE-Q scales, designed to measure patient-reported outcomes following cosmetic eye treatments, developed and validated?
The psychometric analysis provided evidence of the reliability and validity of the 4 (eyes, upper and lower eyelids, and eyelashes) FACE-Q Eye Module scales. In the Rasch Measurement Theory analysis, each scale’s items had ordered thresholds and good item fit, and Pearson Separation Index and Cronbach α were greater than or equal to 0.83.
The FACE-Q can be used for the collection of evidence-based information about cosmetic eye treatments from the patient perspective.
Aesthetic eye treatments can dramatically change a person’s appearance, but outcomes are rarely measured from the patient perspective. The patient perspective could be measured using an eye-specific patient-reported outcome measure.
To describe the development and psychometric evaluation of FACE-Q scales and an adverse effect checklist designed to measure outcomes following cosmetic eye treatments.
Design, Setting, and Participants
Pretreatment and posttreatment patients 18 years and older who had undergone facial aesthetic procedures were recruited from plastic surgery clinics in United States and Canada and completed FACE-Q scales between June 6, 2010, and July 14, 2014. We used Rasch Measurement Theory, a modern psychometric approach, to refine the scales and to examine psychometric properties.
Main Outcomes and Measures
The FACE-Q Eye Module, which has 4 scales that measure appearance of the eyes, upper and lower eyelids, and eyelashes. Scale scores range from 0 (worst) to 100 (best). The module also includes a checklist measuring postblepharoplasty adverse effects.
Overall, 233 patients (81% response rate) 18 years and older participated. Adverse effects included being bothered by eyelid scars, dry eyes, and eye irritation. In Rasch Measurement Theory analysis, each scale’s items had ordered thresholds and good item fit. Person Separation Index and Cronbach α were greater than or equal to 0.83. Higher scores on the eye scales correlated with fewer adverse effects (range, −0.26 to −0.36). In the pretreatment group, older age correlated with lower scores (range, −0.42 to −0.51) on the scales measure appearance of the eyes and upper and lower eyelids. Compared with the pretreatment group, posttreatment participants reported significantly better scores on the scales measuring appearance of eyes overall, as well as upper and lower eyelids.
Conclusions and Relevance
The FACE-Q Eye Module can be used in clinical practice, research and quality improvement to collect evidence-based outcomes data.
Level of Evidence
Blepharoplasty performed on the upper and/or lower eyelids can improve appearance, and in some cases, eyelid function, by removing excess skin from the upper eyelids and bagginess from the lower eyelids. Modern concepts of periorbital rejuvenation also include volume replacement in the aging orbit using fillers or fat grafting. In the United States, 203 934 blepharoplasty procedures were performed in 2015.1 Eyelid surgery was the third most common cosmetic procedure for women, and the second most common procedure for men.1 In the United Kingdom in 2014, blepharoplasty was the second most common cosmetic operation next to breast augmentation for women, and the most common operation for men, with 7752 procedures performed in total.2 Despite the popularity of blepharoplasty, outcomes of the procedure and other appearance enhancing eye treatments (eg, eyelash treatment), are rarely evaluated from the patient’s perspective.3
An important limitation in the ability to measure outcomes from the patient perspective has been the lack of a psychometrically sound patient-reported outcome measure (PROM). PROMs are questionnaires that measure concepts of interest important to patients, such as appearance, health-related quality-of-life, and symptoms.4 In 2013, a literature review funded by the UK Department of Health to identify PROMs for cosmetic surgery identified 9 cosmetic surgery-specific PROMs developed with patient input that demonstrated at least adequate psychometric properties.3 The Blepharoplasty Outcomes Evaluation5,6 was the only eye-specific PROM identified by the search, but this 6-item instrument was excluded because its development process did not include patient input, which is considered essential.3,4 The reviewers concluded that research dedicated to the evaluation of PROMs in cosmetic surgery is urgently required.
PROMs are currently being used in many countries to inform clinical practice, comparative effectiveness research, discussions with regulatory bodies and an evidence-based approach to treatment.7 The United Kingdom was the first nation to formally mandate the collection of PROM data at the health system level. Starting in 2009, PROM data were collected for all National Health Service patients treated with the following 4 procedures: hernia repair, hip and knee replacement, and varicose veins treatments.8
Most recently in the United Kingdom, PROM data collection has been extended to include cosmetic surgery procedures. Following the publication of Sir Bruce Keogh’s Review of the Regulations of Cosmetic Interventions,9 the UK Royal College of Surgeons set up the Cosmetic Surgery Interspecialty Committee to address the review’s recommendations. This committee recommended a minimum data set to enable collection of clinical quality and outcome indicators.10 Specifically, all cosmetic surgery providers are expected to collect and submit a minimum data set, which includes PROM data, to the Private Healthcare Information Network.11 The Cosmetic Surgery Interspecialty Committee recognized satisfaction with appearance as a key outcome for people seeking aesthetic treatments from plastic surgeons. Therefore, PROMs for 6 of 7 targeted cosmetic surgery procedures—abdominoplasty, augmentation mammoplasty, blepharoplasty, liposuction, rhinoplasty and rhytidectomy—measure appearance from the patient perspective using a subset of scales from the BREAST-Q,12,13 FACE-Q14-16 and BODY-Q.17,18 In addition, 2 adverse effect checklists from the FACE-Q15 have been recommended.
The FACE-Q14-16,19-21 is a PROM developed to address the lack of instruments for facial aesthetic procedures. In the Oxford review of cosmetic surgery PROMs,3 the FACE-Q was singled out as one of only 3 PROMs that met international recommendations for how such tools should be developed and validated. The FACE-Q includes over 40 independently functioning scales and checklists that measure appearance (of the face, specific facial areas, and rhytides), health-related quality of life, adverse effects of treatment, and the patient experience of care. The aim of this article is to describe the development and psychometric evaluation of the set of scales and checklist that can be used to evaluate cosmetic eye treatments. Specifically, the FACE-Q Eye Module that includes 4 appearance scales (ie, eyes overall, upper eyelids, lower eyelids and lashes) and a checklist measuring adverse effects following cosmetic eye treatments.
Prior to starting the study, institutional review board approval was obtained through the New School in New York City (United States) and through the University of British Columbia behavioral research ethics board in Vancouver (Canada). The FACE-Q was developed following internationally recommended guidelines for the development of a PROM.4,22-25 Our mixed methods approach to develop the FACE-Q has previously been described in detail.14-16,19-21,26 Briefly, qualitative interviews with 50 surgical and/or nonsurgical patients who had undergone facial aesthetic procedures and input from 26 experts in the field were used to develop the FACE-Q conceptual framework, scales, and checklists. These scales and checklists were further refined through cognitive interviews with 35 patients who had undergone facial aesthetic procedures. All FACE-Q scales were developed with 4 response options in keeping with best practice for scale development.27 Instructions ask respondents to answer in relation to the past week.
Among both the qualitative and cognitive interview samples, patients having cosmetic eye treatments were well represented (qualitative = 25, cognitive interviews = 19). The 4 eye scales each contain 7 items that ask about appearance. The scales measuring eyes overall and eyelashes evaluate satisfaction (Very Dissatisfied, Somewhat Dissatisfied, Somewhat Satisfied, Very Satisfied). The scales measuring upper and lower eyelids and the adverse effect checklist measure the extent which someone is bothered by their appearance or an adverse effect (Not At All, A Little, Moderately, Extremely). Each item in each scale includes a unique descriptor that is used to measure a different aspect of appearance. Together, the items of a scale map out a clinical hierarchy for each concept of interest. For example, the eyelash scale measures appearance using the following 7 descriptors: nice, feminine, dark, long, attractive, thick, and full.
For validation purposes, all participants were also asked to complete the FACE-Q 10-item Satisfaction with Facial Appearance scale. In addition, some clinics included the FACE-Q 10-item Psychological Function scale and 8-item Social Function scale. All 3 of these FACE-Q scales previously demonstrated reliability, validity, and the ability to detect change.14,20 Participants were also asked questions that would allow us to characterize the sample, including sex and race/ethnicity.
Inclusion criteria were any patient aged 18 years or older who was pretreatment or posttreatment for 1 or more facial aesthetic treatments. Patients were recruited from 11 plastic surgery clinics and 3 dermatology clinics in the United States (n = 11) and Canada (n = 3). In 11 clinics, patients were recruited when they checked in for an appointment. In 3 clinics, patients were invited to participate via a postal survey. The survey included a personalized letter from the relevant physician along with the FACE-Q booklet, with up to 3 reminders mailed as necessary. All potential participants were provided a $5 gift certificate to a coffee shop to thank them for their time. As this was a questionnaire survey study, completion of the FACE-Q booklet implied consent. Recruitment took place between June 6, 2010, and July 14, 2014.
For the checklist, we computed the proportion of postoperative blepharoplasty patients that endorsed each of the 4 available response options for the 6 adverse effects.
For the appearance scales, we used Rasch Measurement Theory,28 a modern psychometric approach, within RUMM2030 software.29 Rasch Measurement Theory analysis examines the difference between observed and predicted responses for each item in a scale to determine if data for a scale fits a mathematical model.30-32 A set of graphical and statistical tests were examined, with the results considered together to make decisions about the overall quality of each scale.32 The following tests, which are described more fully elsewhere,31 were conducted:
Thresholds for item response options: We examined thresholds between response options (eg, between Very Satisfied and Somewhat Satisfied) to determine if response categories scored with successive integers increased for the construct measured.
Item fit statistics: Three indicators of fit were examined to determine if each scale’s items measured an unidimensional construct in the form of a clinical hierarchy: (1) log residuals (item-person interaction); (2) χ2 values (item-trait interaction); and (3) item characteristic curves (ICC). Fit residuals should be between −2.5 and +2.5, and χ2 values should be nonsignificant after Bonferroni adjustment.
Targeting: Person and item locations were examined to determine if the scales’ items were evenly spread over a reasonable range that matched the range of the construct experienced by the sample.
Dependency: We examined residual correlations between items to identify correlations above 0.30 as high correlations can artificially inflate a scale’s reliability. A subtest can be performed to determine how much correlated items affect scale reliability.
Person separation index (PSI): We computed the PSI for each scale, which is a measure of the error associated with the measurement of people in a sample. The PSI is comparable to the Cronbach α. Higher values indicate greater reliability.
In addition to the Rasch Measurement Theory outlined above, we computed Cronbach α,33 missing data, floor and ceiling effects, and the grade reading level for each scale.
The Rasch logit score for each participant’s pattern of responses to a scale were transformed into scores from 0 (worst) to 100 (best). The scoring algorithm is available by contacting the corresponding author. To aid in the interpretation of the meaning of scores, we computed the implied range of scores for each response option based directly on the threshold plots produced through the Rasch analysis.29,34
Using the 0 to 100 scores, we computed Pearson correlations to examine associations between scores and independent samples t tests to test for differences between means. We computed the number of posttreatment adverse effects experienced following a blepharoplasty and predicted that a higher number of adverse effects would correlate with lower scores on the 4 eye scales. We also predicted that fewer adverse effects and higher scores on the 4 eye scales would correlate with higher scores on the FACE-Q Satisfaction with Facial Appearance, Psychological Function, and Social Function scales. For patients in the pretreatment group, we expected that older participants would report lower scores on the 4 eye scales compared with younger patients. Finally, we predicted that pretreatment participants would report lower scores on all FACE-Q scales compared with posttreatment participants. In these analyses, P values less than .05 were considered statistically significant.
Overall, 233 of 287 patients (81% response rate) participated. The response rate for face-to-face recruitment (n = 169 of 172 [98%]) was higher than that of mail-out with reminders (n = 64 of 115 [56%]). Table 1 shows sample characteristics. Female participants (n = 192) composed 82% of the sample.
The adverse effects checklist was completed by 96 patients a mean (range) of 11 (0.5-54.0) months after blepharoplasty surgery. The most common adverse effects reported on the checklist included being bothered by eyelid scars (38%), dry eyes (35%), eye irritation (33%), excessive tearing (25%), eyes looking hollowed out (10%), and difficulty closing eyes (4%). eTable 1 in the Supplement shows the proportion of patients to report each adverse effect for each response option.
The Rasch Measurement Theory analysis supported the reliability and validity of the 4 eye scales. Each of the scales’ 7 items had ordered thresholds. This finding provides evidence that each scale’s 4 response options worked as a continuum for the construct measured. Table 2 shows the items for each scale sorted by the item locations; 25 of the 28 items had fit residuals within −2.5 to +2.5, which is the recommended range. All 28 were not significant in terms of Bonferroni adjusted χ2P values. Item residual correlations were above 0.30 for 6 pairs of items from 3 of the scales. We performed subtests on the pairs of items, which revealed marginal effect on scale reliability for 2 scales (Satisfaction with Eyes and Eyelashes; 0.01 and 0.03 difference in PSI) but a larger effect for the scale measuring appearance of the upper eyelids (ie, 0.08 decrease in PSI to 0.80).
The P values for fit to the Rasch model were not significant for 3 scales (eyes overall, lower eyelids, eyelashes), providing support for the data satisfying the requirements of the Rasch model. For the remaining scale (upper eyelids) the P value was 0.01. Pearson Separation Index and Cronbach α were 0.83 or greater. Other scale level findings are shown in Table 3. The 4 scales were easy for participants to comprehend and had minimal missing data. eTable 2 in the Supplement provides a FACE-Q interpretation table as an example. This table shows the implied range of scores for each of the possible responses for the 7-item overall eyes scale.
Table 4 shows the correlation findings. A higher number of adverse effects following a blepharoplasty correlated with lower scores on the 4 eye scales as well as the Satisfaction with Facial Appearance, Psychological Function, and Social Function scales. A higher score on the 4 eye scales were significantly correlated with higher scores on the Satisfaction with Facial Appearance scale. Higher scores on the 3 eye scales (no data for eyelash scale) correlated with higher scores on the Psychological Function and Social Function scales.
In the pretreatment group, older age correlated with lower scores on the eye overall (R = −0.42; P = .001), upper eyelids (R = −0.51; P < .001), and lower eyelids (R = −0.42; P < .001) scales as well as Satisfaction with Facial Appearance scale (R = −0.35; P = .001). In the posttreatment group, the correlations between older age and appearance was significant for the lower eyelid scale (R = 0.21; P = .01) and on the Satisfaction with Facial Appearance scale (R = 0.21; P = .01). Here, older age was associated with reporting more satisfaction with appearance.
The Figure shows the mean scores for pretreatment and posttreatment participants for the 4 eye scales, as well as the Satisfaction with Facial Appearance, Psychological Function, and Social Function scales. For 6 of the 7 scales (exception, eyelash scale), the P value for independent samples t tests were significant (P ≤ .002).
The FACE-Q is currently the only PROM developed following international recommendations that measures appearance and other concepts of interest important to patients who have undergone facial aesthetic procedures. The psychometric analyses described in this article provide evidence of reliability and validity of the 4 scales that compose the FACE-Q Eye Module. For blepharoplasty patients, appearance of the eyes was found to correlate with the number of postoperative adverse effects experienced. Pretreatment patients reported lower scores for appearance of the eyes and upper and lower eyelids and satisfaction with facial appearance overall and psychological and social function compared with patients who underwent a cosmetic treatment. Prospective studies of clinical change are now needed to determine the magnitude of change for cosmetic eye treatments instead of blepharoplasty.
As cosmetic surgery providers in the United Kingdom proceed to collect PROM data on a national level for the first time ever, normative data on important patient outcomes for a range of cosmetic surgery procedures will be compiled. In the United Kingdom quality initiative, the recommended PROM to measure outcomes following blepharoplasty is the FACE-Q 7-item satisfaction with eyes overall, which should be administered before and after treatment. In addition, the FACE-Q adverse effects checklist for eyes was recommended for use postoperatively. Such data can be used to empower patients, inform decision making, identify patients most likely to respond to treatment, and support quality improvement. For example, findings about the incidence of bothersome adverse effects for different cosmetic surgical procedures could be used by plastic surgeons to assure that patients are properly educated prior to surgery.
The FACE-Q study has certain limitations that have been previously reported.14-16,19-21 Specifically, the study sample described here varied by age, sex, timing of assessment, and type of treatment. These factors limit our ability to report findings beyond instrument development and validation. Cosmetic surgery is sought mainly by women and white patients.1 As such, our sample of patients in this FACE-Q study sample had more women than men and primarily white participants. Additionally, it is possible that the office staff recruiting patients for the field test sample may have been biased in their selective recruitment of patients. Finally, as mentioned above, although we included pretreatment and posttreatment patients in the study, the number of patients who provided data before and after treatment was too small to compute change scores. Elsewhere, in a sample of close to 1000 cosmetic patients, we showed that the mean scores on the FACE-Q Satisfaction with Appearance scale for 5 treatments (botulinum toxin type A, filler, rhinoplasty, facelift, or blepharoplasty) were significantly higher among those who underwent treatment compared with those who did not.35 Responsiveness research is now needed to document the benefits of cosmetic eye treatments.
As the cosmetic surgery industry continues to expand worldwide, collection of evidence-based information regarding patient outcomes is essential. The FACE-Q Eye Module, developed and validated using state-of-the art qualitative and quantitative psychometric research methods, is a valuable new tool that can help researchers, clinicians, and regulatory bodies accomplish this goal using eye-specific PROMs.
Corresponding Author: Anne Klassen, DPhil, McMaster University, 3N27, 1200 Main St W, Hamilton, ON L8N 3Z5, Canada (email@example.com).
Accepted for Publication: June 27, 2016.
Published Online: September 15, 2016. doi:10.1001/jamafacial.2016.1018
Author Contributions: Drs Klassen and Pusic had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Klassen, Cano, Van Laeken, Sykes, Schwitzer, Pusic.
Acquisition, analysis, or interpretation of data: Klassen, Cano, Grotting, Baker, Carruthers, Carruthers, Schwitzer, Pusic.
Drafting of the manuscript: Klassen, Cano, Schwitzer, Pusic.
Critical revision of the manuscript for important intellectual content: All Authors.
Statistical analysis: Klassen, Cano, Schwitzer, Pusic.
Obtaining funding: Pusic.
Administrative, technical, or material support: Baker, Van Laeken, Sykes, Schwitzer, Pusic.
Study supervision: Carruthers, Pusic.
Provide patients for the study, aid in study design: Van Laeken.
Editorial critique of manuscript: Grotting.
Conflict of Interest Disclosures: The FACE-Q is owned by Memorial Sloan Kettering Cancer Center. Drs Cano, Klassen and Pusic are codevelopers of the FACE-Q and, as such, receive a share of any license revenues as royalties based on Memorial Sloan Kettering Cancer Center’s inventor sharing policy. Financial disclosure (all other relationships): Dr Cano is cofounder of Modus Outcomes, an outcomes research and consulting firm that provides services to pharmaceutical, medical device, and biotechnology companies. Drs Alastair Carruthers and Jean Carruthers are consultants and investigators for Allergan, Merz, Kythera and Alphaeon.
Funding/Support: This study was supported by a grant from the Plastic Surgery Foundation. Dr Pusic received support from the NIH/NCI Cancer Center (support grant No. P30 CA008748).
Role of the Funder/Sponsor: The funders/sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Create a personal account or sign in to: