eTable. Descriptive Statistics for SNOT-22 Items Pre- and Post-Surgery
DeConde AS, Bodner TE, Mace JC, Smith TL. Response Shift in Quality of Life After Endoscopic Sinus Surgery for Chronic Rhinosinusitis. JAMA Otolaryngol Head Neck Surg. 2014;140(8):712-719. doi:10.1001/jamaoto.2014.1045
Patient-reported measures are designed to detect a true change in outcome, but they are also subject to change from biases inherent to self-reporting: changing internal standards, changing priorities, and changing interpretations of a given instrument. These biases are collectively known as “response shifts” and can obscure true change after medical interventions.
To determine the presence of response shifts in patients with chronic rhinosinusitis (CRS) after endoscopic sinus surgery.
Design, Setting, and Participants
Multisite, prospective, observational cohort study conducted at academic tertiary care centers between February 2011 and May 2013. Study participants comprised a population-based sample of 514 adults (age ≥18 years) with CRS, who elected surgical intervention for continuing medically refractory symptoms.
Endoscopic sinus surgery.
Main Outcomes and Measures
Preoperative and postoperative data from the 22-item Sinonasal Outcome Test (SNOT-22) survey instrument was characterized using exploratory factor analysis. Subsequent longitudinal structural equation models were estimated to test structure, potential response shifts, and true change in the SNOT-22 scores.
A total of 339 participants (66.0%) provided survey evaluations at baseline and 6-month follow-up. Factor analysis of the SNOT-22 revealed 5 correlated, yet distinguishable, underlying factors. Endoscopic sinus surgery had a differential impact across these factors, with the largest effect size in rhinologic symptoms (mean [SD] SNOT-22 scores before and after surgery, 13.18 [5.11] and 7.37 [5.48], respectively; d = −1.13 [P < .001]) and extranasal rhinologic symptoms (8.31 [3.46] and 4.83 [3.68], respectively; d = −1.00 [P < .05]) (d is an effect size measure defined as the difference in means divided by the presurgery SD). Endoscopic sinus surgery had a smaller, yet significant, effect size on the remaining 3 factors: ear/facial symptoms (7.32 [4.6]) and 3.90 [4.07], respectively; d = −0.74 [P < .001]), psychological dysfunction (11.90 [7.21] and 6.50 [6.69], respectively; d = −0.75 [P < .05]), and sleep dysfunction (10.12 [5.59] and 5.88 [5.37], respectively; d = −0.76 [P < .001]). Participants were found to undergo recalibration, reprioritization, and reconceptualization of symptoms after intervention; however, the magnitude of these response shifts was small and not clinically significant.
Conclusions and Relevance
The SNOT-22 measures 5 distinct factors, not a single construct. Reporting of individual subscale scores may improve sensitivity of this instrument in future studies. Participants undergoing endoscopic sinus surgery experience only clinically insignificant response shifts, validating assessment of change through use of presurgery and postsurgery SNOT-22 responses.
clinicaltrials.gov Identifier: NCT01332136
Medical outcomes research is predicated on interval changes of self-reported quality of life (QOL) after interventions. These patient-reported measures are designed to detect a true change in outcome, but they are also subject to change from the biases inherent to self-reporting: changing internal standards (recalibration), changing priorities (reprioritization), and changing interpretations (reconceptualization) of a given instrument. These 3 unmeasured dynamic internal biases can result in a change in the meaning of the QOL instrument, and this change is termed a response shift.1
Response shifts are particularly important in health-related QOL studies using repeated measures, where efficacy is determined as the change from a pretreatment baseline after an intervention. The response shift has been identified in a wide range of medical conditions and can both positively and negatively affect the detection of treatment effects.2- 4 Types and magnitudes of response shift are unique to each intervention and disease process. To date, no one to our knowledge has investigated to what degree interval measurements of QOL after endoscopic sinus surgery (ESS) for chronic rhinosinusitis (CRS) reflect a true change in QOL or if they merely reflect a change in the instrument used to make that measurement. For example, theoretically, a patient with longstanding nasal obstruction may adapt to this as their “normal state” and report “no problem” for this question preoperatively but postoperatively find an unexpected improvement that would still be reported as “no problem.” This hypothetical example illustrates a recalibration response shift that would mask a true change in QOL. The goal of this analysis was to investigate the direction and magnitude of this response shift in a cohort of patients who underwent ESS for medically refractory CRS through secondary statistical analysis with a previously described and applied technique using confirmatory factor analysis and structural equation modeling (SEM).5,6
Before enrollment, written informed consent was obtained for all participants. The institutional review board at each of the 4 sites monitored and approved all investigational protocols. The institutional review board at Oregon Health & Science University provided comprehensive oversight and review for the entire study as the coordinating center. Adult patients (age ≥18 years) with CRS were enrolled into an ongoing prospective, observational cohort investigation using 4 academic, tertiary, rhinology practices (Oregon Health & Science University, Portland; Medical University of South Carolina, Charleston; Stanford University, Palo Alto, California; and University of Calgary, Calgary, Alberta, Canada). Preliminary findings from this cohort study have been previously reported.7- 9 Inclusion criteria consisted of a current diagnosis of symptomatic refractory CRS as defined by the 2007 Adult Sinusitis Guidelines10; prior treatment with oral, broad-spectrum, or culture-directed antibiotics (≥2 weeks); and either topical nasal corticosteroid sprays (≥3 weeks) or a 5-day trial of systemic steroid therapy. Patients deemed surgical candidates that elected ESS were enrolled and required to complete the 22-item Sinonasal Outcome Test (SNOT-22) at both baseline and a 6-month follow-up visit. The SNOT-22 is a 22-item, validated, treatment outcome measure applicable to chronic sinonasal conditions (score range, 0-110).11 Lower total scores on the SNOT-22 suggest better QOL and symptom severity.
Preliminary analyses tested for differences across the 4 surgery locations. Because no significant differences in scale and item scores across locations were found, all reported analyses ignored location. Prior to testing for response shifts, a reasonably well-fitting factor model was needed. This analysis seeks to identify the unique factors, or “constructs,” (ie, aspects of health-related QOL measured that each individual question measures) that the SNOT-22 measures by examining correlating groups of questions. Although exploratory factor analyses have been conducted on the SNOT-20,12,13 to our knowledge, exploratory factor analysis methods have not been used on the SNOT-22. Thus, analysis began with exploring and testing the SNOT-22 factor structure before surgery, prior to building measurement models as previously described.12,13
On defining the underlying factors of the SNOT-22, a series of longitudinal structural equation models were then estimated to evaluate for any changes of this factor structure to clarify response shifts and true change in the SNOT-22. Given the skewed nature of the item response distributions, robust estimation procedures were used in these models. The use of robust estimation procedures complicates model comparisons using the χ2 (λ2) difference test; thus, the recommended procedure based on scaled likelihoods was used for model comparisons.14 Across models, we followed the 4 steps for detecting response shifts outlined by Oort and colleagues5,6 and described in the following subsections. Deviations from the recommended procedures are also discussed. Statistical analyses were conducted using SPSS version 22.0 software (IBM Corporation), and SEM was conducted using Mplus version 4.2 (Muthén & Muthén).
Following the recommendation by Oort and colleagues,5,6 an initial model for the SNOT-22 measurement structure was tested without any across-time parameter constraints. The model (model L0) specification was based on the results of the exploratory factor analysis model where factor loadings were determined by the primary loading in the presurgery exploratory factor analysis model results. This structure was extended longitudinally to the 6-month postsurgery measurement occasion. As is typical in longitudinal structural equation models, factors were permitted to correlate over time (eg, the rhinologic symptoms factor before and after surgery) and item residual variances were permitted to correlate over time (eg, the residual variances for item 1 before and after surgery).
Similar to Oort and colleagues,5,6 successive models place or release constraints on the factor loadings, means, variances, and correlations over time, as well as place constraints on the item intercepts and item residual variances. In this step, to provide an overall test of response shifts, invariance constraints across the 2 periods were applied to the item intercepts, factor loadings, and the residual variances.
In this step, the invariance constraints were lifted one at a time to test their impact on model fit. Lifted across-time invariance constraints that improve model fit were retained in the final model. Secondarily, modification indices were inspected for the postsurgery part of the model to identify potential changes in the measurement structure; such additional factor loadings were retained if their presence increased model fit and made theoretical sense.
In step 4, attention turned to changes in factor means and covariances across time. Significant changes in factor means over time indicate true change after measurement error, and changes in the measurement structure over time were accounted for in the prior steps.
Between February 2011 and May 2013, 514 participants who met inclusion criteria and gave informed consent were enrolled into this on-going cohort, among whom 339 (66.0%) had provided both a baseline and 6-month follow-up SNOT-22 survey for analysis. Of the 339 participants (overall mean [SD] age, 51.0 [15.0] years), 151 (44.5%) were male, 178 (52.5%) reported a history of sinus surgery, 123 (36.3%) had polyps, 118 (34.8%) had asthma, 124 (36.6%) tested positive for allergies, 58 (17.1%) were depressed, and 29 (8.6%) had aspirin sensitivity.
Descriptive statistics for the SNOT-22 items were calculated before and after surgery (eTable in the Supplement). For all items, descriptively, item means and SDs decreased from before to after surgery and item positive skew increased from before to after surgery (ie, a mix of positive and negative skew before surgery is uniformly positive after surgery).
Eigenanalysis indicated 5 factors with eigenvalues greater than 1.0. Table 1 provides the estimated factor loadings for each of the SNOT-22 items as well as the factor correlations after Promax rotation using the presurgery data. Two features of this solution warrant discussion. First, the rhinologic symptom items based on earlier research was partitioned into 2 dimensions, the latter we have called extranasal rhinologic symptoms. Second, 4 of the items do not load uniquely onto a single dimension (sneezing, thick nasal discharge, waking up tired, and fatigue). These 4 items with “cross-loadings” are kept in mind because model modifications are entertained based on the confirmatory factor analysis results.
Scale scores were created and tested for change from before to after surgery. These scale score included the overall SNOT-22 score (ie, the sum of responses to the 22 items) and 5 subscale scores corresponding to the subdimensions identified in the exploratory factor analysis. Table 2 presents the results of paired t tests comparing mean scale scores from before to after surgery. All scale scores tests indicated significant reductions in symptoms and dysfunctions; inspection of effect sizes indicated that all reductions would be characterized as large in a standardized metric.
A summary of the fit of the various models to the data appears in Table 3. Model L0 fit reasonably well (χ2835 = 2088.28, P < .001, root mean square error of approximation [RMSEA] = 0.067, standardized root mean residual [SRMR] = 0.064). Given the 4 identified salient cross-loadings in the exploratory factor analysis, we next fit a model (model L1) that permitted these 4 cross-loadings (mentioned in the previous subsection and appearing in Table 3) both before and after surgery. Model L1 also fits reasonably well (χ2827 = 1844.61, P < .001, RMSEA = 0.060, SRMR = 0.060), with an RMSEA value approaching 0.05, which is indicative of a “close fit.” Furthermore, model L1 fits significantly better than model L0 (Δχ28 = 87.61, P < .001). These results confirm the SNOT-22’s correlated 5-factor structure; model L1 serves as the base model for tests of response shifts in the subsequent step.
Invariance constraints across the 2 periods were applied to the item intercepts, factor loadings, and residual variances. A significant reduction in model fit between this model and model L1 from step 1 indicates the presence of some type of response shift. However, given the nature of the SNOT-22 data, we deviate slightly from the recommendation by Oort and colleagues.6 Inspection of the estimated residual variances before and after surgery in model L1 indicated that 21 of the 22 residual variances were smaller after surgery than before, with a mean reduction of 30%. Although this pattern could be indicative of a nonuniform recalibration response shift per Oort and colleagues,6 we believe this reduction may be due in part to the positively skewed nature of item responses after surgery caused by floor effects on the response scale (ie, many more scores of 0 indicating “no problem”). From this perspective, any reduction in symptoms after surgery would reduce the mean item response as well as the variance of the item response as observed (eTable in Supplement). Thus, we do not apply invariance constraints on the residual variances and therefore will not test for nonuniform recalibration response shifts.
Table 3 presents the fit of a longitudinal structural equation model with invariance constraints across time on the item intercepts and factor loadings (ie, model L2). The fit of model L2 might be considered adequate (χ2870 = 2023.67, P < .001, RMSEA = 0.063, SRMR = 0.068). However, this model fits significantly worse than model L1 (Δχ243 = 87.74, P < .001). This evidence suggests that some of these invariance constraints are reducing model fit and suggest the presence of response shifts; the change in the RMSEA and SRMR fit indices suggest, however, that magnitude of these response shifts may be small.
Because the first part of step 3 is tedious, releasing 48 individual invariance constraints separately, we present the end result of this process as model L3. Model L3 released the across-time factor loading invariance constraints on items 8, 9, and 10 for the ear/facial symptoms factor and on item 22 for the rhinologic symptoms factor and the across-time item intercept invariance constraints for items 1, 9, 10, and 22. Model L3 fits significantly better than model L2 (Δχ28 = 79.03, P < .001). Despite a large number of across-time invariance constraints, model L3 did not fit significantly worse than model L1 (Δχ235 = 23.22, P = .94). Thus, any response shifts appear to be localized to these particular items. Inspection of the model modification indices for model L3 suggested the addition of only 1 additional factor loading after surgery, ie, item 18 loading onto extranasal rhinologic symptoms. Model L4, which specified this additional factor loading, fit the data reasonably well (χ2861 = 1871.35, P < .001, RMSEA = 0.059, SRMR = 0.062), and significantly better than model L3 (Δχ21 = 7.97, P = .005). From Table 3, the model bayesian information criteria (BICs), which balance model fit and parsimony, support the superiority of model L4. Table 4 and Table 5 provide select model parameters from model L4.
We next turn to an interpretation of the 9 identified response shifts in model L4. Any shifts in the factor loading patterns from before to after surgery indicate a reconceptualization in the underlying factors. The addition of item 18 (ie, frustrated/restless/irritable) onto the extranasal rhinologic symptoms factor after surgery suggests that responses to this item are affected by one’s standing on this factor, unlike before surgery, perhaps indicating differing levels of postsurgery frustration for this factor. A comparison of this factor loading before (λ = 0.00) and after (λ = 0.20) surgery in Table 4, however, suggests that the level of reconceptualization is small in magnitude.
Shifts in the magnitude of factor loadings over time indicate a reprioritization of the importance of that item as it relates to the underlying factor. Items 10 (facial pain/pressure, λ = 0.87 and λ = 1.01 before and after surgery, respectively) and 22 (blockage/congestion of the nose, λ = 0.87 and λ = 1.26 before and after surgery, respectively) demonstrated increases in the size of the factor loadings over time onto the ear/facial symptoms and rhinologic symptoms factors, respectively, suggesting that these items more strongly indicate their underlying factors after surgery. In contrast, items 8 (dizziness, λ = 1.00 and λ = 0.74 before and after surgery, respectively) and 9 (ear pain, λ = 1.12 and λ = 0.89 before and after surgery, respectively) demonstrated decreases in the size of the factor loadings over time onto the ear/facial symptoms factor, suggesting that these items less strongly indicate their underlying factors after surgery.
Shifts in item intercepts across time indicate a (uniform) recalibration of item responses relative to the underlying factors. Items 1 (need to blow nose, τ = 2.72 and τ = 2.80 before and after surgery, respectively) and 9 (ear pain, τ = 1.28 and τ = 1.39 before and after surgery, respectively) demonstrated increases in the item intercepts after surgery, indicating that patients rate these symptoms as more problematic, relative to before surgery and on average, than implied by their standing on the underlying factors. In contrast, items 10 (facial pain/pressure, τ = 2.53 and τt = 2.05 before and after surgery, respectively) and 22 (blockage/congestion of nose; τ = 3.50 and τ = 3.09 before and after surgery, respectively) demonstrated decreases in item intercepts after surgery, indicating that patients rate these symptoms as less problematic, relative to before surgery and on average, than implied by their standing on the underlying factors. Given item response options ranging from 0 to 5, these item intercepts shifts may be considered small.
Model L5 specifies invariance constraints on the factor means from before to after surgery. Model L5 fits significantly worse than model L4 (Δ χ25 = 66.60, P < .001). Follow-up test results indicate that all 5 factor means differ from before to after surgery. Table 5 presents the factor means before and after surgery from model L4. Given the scaling of the factor variances to 1.00, the postsurgery means are equivalent to a standardized mean difference in the factor means. The standardized mean differences are similar to those reported using observed scale scores in Table 1. Inspection of the factor correlations before and after surgery in Table 5 suggests that the correlations among factors are stronger after surgery than before. Model L6 specifies across-time invariance constraints on the within-time factor correlations. Model L6 fits significantly worse than model L4 (Δχ210 = 42.38, P < .001), confirming this descriptive comparison. Thus, the factors underlying the SNOT-22 appear to represent a more unitary set of symptoms and dysfunctions after surgery.
Accurate and sensitive measures of how interventions affect QOL are critically important for our subspecialty. The rationing of national health care resources is inevitable, and QOL measures are already used by the National Health Service in the United Kingdom and the recent Patient Protection and Affordable Care Act in the United States invests in comparative clinical outcomes research. Accurately capturing the impact of an intervention on our patients will be essential in guiding individual and societal decisions on the value of any given intervention. Establishing to what extent a response shift plays a role in a given intervention may preserve the value of an intervention or accurately guide us to another, more effective treatment.
Response shifts have the potential to misinform comparative clinical outcomes research. For example, in edentulous patients, response shift completely masks improvement in QOL2 after denture rehabilitation. Patients undergoing cholecystectomy have greater improvements in gastrointestinal QOL when response shift is considered.3 A psychosocial intervention for cancer survivors appeared to worsen QOL based on change in pretreatment baseline, but evaluation of the response shift in fact demonstrated a positive effect that was not identified by a recalibration of the QOL toward that of healthy controls.4
A range of methods for detecting and quantifying response shifts has been described. In general, these methods take 1 of 3 forms: (1) additional administration of a test questionnaire to retrospectively evaluate baseline (eg, a then-test), (2) additional evaluation of the target outcome (eg, interviews, direct assessments of values or preferences), and (3) post hoc statistical analysis.4 Retrospective analysis of baselines (ie, then-tests) are limited by recall bias and may be confounded by alternative explanations such as implicit theories of change, that is, patients suffered through an intervention and therefore are invested in its success and recall an artificially worsened baseline.15 Additional evaluation of the target outcome is labor intensive and not feasible on data sets already collected. Statistical methods of detecting a response shift can be applied post hoc to data, requires no additional measurements, and only requires a minimum of 2 longitudinal time points (eg, baseline and posttreatment scores).
The research in the present article makes several contributions to the literature on outcomes of rhinologic surgery. To our knowledge, this is the first article testing the factor structure of the SNOT-22 in a confirmatory manner. The results of this analysis indicate 5 correlated yet distinguishable factors underlying the SNOT-22, providing a new level of discrimination to this instrument. The longitudinal structural equation models testing for response shifts did indeed find evidence of response shifts; however, the magnitude of these shifts may be considered small and unimportant for clinical practice. Perhaps most important is the finding of the invariance of most model parameters from before to after surgery. This result provides statistical and measurement evidence validating the comparison of SNOT-22 item responses or scale scores before and after surgery to quantify changes in symptoms and dysfunctions. Had larger degrees of invariance been found, the factors underlying the SNOT-22 before and after surgery would have had differing meanings and interpretations, making any assessment of change questionable.
Detection of 5 distinguishable factors of the SNOT-22 offers a new resolution to this instrument and has potential to better characterize the impacts of interventions and comorbidities on CRS in future studies. Prior factor analysis of the SNOT-20 revealed 4 separate constructs: rhinologic symptoms, ear/facial symptoms, sleep function, and psychological function,12,13 but the present study reveals a fifth construct that uniquely captures “cough” and “postnasal discharge.” Total SNOT-22 scores are often used to investigate the impact of interventions across populations, but aggregate scores lack the resolution of reporting of the individual domains identified in the present study. For example, aggregate scores cannot detect symmetrically divergent changes in separate domains of health. Similarly, aggregate QOL scores are an abstract concept, whereas patients and clinicians are faced with specific symptoms that they are attempting to improve. Knowledge of what domains are captured by the SNOT-22 and how these domains are changed by ESS will aid in patient-oriented clinical decision making. Further investigation using these domains could help explain the clinical end points achieved by patients with comorbid depression, fibromyalgia, and migraine.16- 18 Patients with these comorbidities experience comparable overall gains to the general population but have diminished baselines and postoperative QOL. The present factor analysis provides tools to further investigate the prior observation that comorbid depression, fibromyalgia, and migraine have an impact on SNOT-22 pretreatment and posttreatment QOL measurements.
There are a several important limitations to the present study. By using SEM to detect a response shift, these results can only be applied at the population level. Conceivably individuals may undergo equal and opposite response shifts that would not be detected at this population level. Similarly, unless a significant portion of a study population experiences a response shift, it may not appear in a model because the response shift is averaged across the group.5,19 This limitation could be addressed through future studies using another method to detect response shifts, such as a then-test, allowing for cross-validation of these results or through a subgroup analysis of participants with different types of CRS or other comorbidities. Another concern is that our sample size was not adequate to detect response shifts; however, some conventional guidelines are available to help make this determination. Kline20 summarizes research on SEM practices and notes that a typical sample size is approximately 200 participants.20 Thus, our sample size of 338 would be considered larger than average against this benchmark. Outside of SEM, population surveys typically consist of approximately 1000 participants to represent populations of 100 million people (eg, the population of registered voters who intend to vote in a US presidential election) with great success. Because the population of patients experiencing rhinological symptoms warranting rhinological surgery is far less than this number, we find some solace in the size of our sample. Finally, we report BIC values in Table 3. As noted in the article, BICs permit model comparisons that balance the fit of the model with the complexity of the model (ie, all else being equal, complex models tend to fit better). Unfortunately, more complex models also tend to replicate, generalize, and cross-validate less well. Our favored model (model 4) has the lowest BIC value. Importantly, another common fit statistic the expected cross-validation index preserves the ordering of model favorability based on the BIC values. Thus, our model with the best BIC value also is the best model with respect to expected cross-validation index and therefore expected degree of cross-validation and generalizability. Thus, although a larger sample size is always appreciated, we are cautiously optimistic about the generalizability of our results.
The results of this analysis identified 5 correlated but distinguishable factors underlying the SNOT-22, which carries important implications for future QOL outcomes research in CRS. The longitudinal structural equation models testing for response shifts reveals response shifts; however, the magnitude of these shifts may be considered small and unimportant for clinical practice. This result provides statistical and measurement evidence validating the comparison of SNOT-22 item responses or scale scores before and after surgery to quantify changes in symptoms and dysfunctions.
Submitted for Publication: March 11, 2014; final revision received April 21, 2014; accepted May 7, 2014.
Corresponding Author: Timothy L. Smith, MD, MPH, Division of Rhinology and Sinus/Skull Base Surgery, Oregon Sinus Center, Department of Otolaryngology–Head and Neck Surgery, Oregon Health & Science University, 3181 SW Sam Jackson Park Rd, Mail Code PV-01, Portland, OR 97239 (email@example.com).
Published Online: June 26, 2014. doi:10.1001/jamaoto.2014.1045.
Author Contributions: Drs DeConde and Bodner had full access to all the data in the study and take responsibility for the integrity of the data and accuracy of the data analysis.
Study concept and design: DeConde, Bodner.
Acquisition, analysis, or interpretation of data: Bodner, Mace, Smith.
Drafting of the manuscript: DeConde, Bodner, Mace, Smith.
Critical revision of the manuscript for important intellectual content: DeConde, Bodner, Mace.
Statistical analysis: Bodner, Mace.
Obtained funding: Mace, Smith.
Administrative, technical, or material support: Mace, Smith.
Study supervision: DeConde, Smith.
Conflict of Interest Disclosures: Dr Smith and Mr Mace receive partial support from the National Institute on Deafness and Other Communication Disorders. Dr Smith is also a consultant for IntersectENT Inc, which is not affiliated with this investigation. Dr Bodner is supported by grants from the National Institute of Child Health & Human Development, the National Heart, Lung, and Blood Institute and Kaiser Permanente, the National Institute for Occupational Safety and Health, and the US Department of Defense, none of which are associated with funding or publishing this study. No other disclosures are reported.
Funding/Support: This study was supported by grant R01 DC005805 from the National Institute on Deafness and Other Communication Disorders (Principal Investigator/Project Director: Dr Smith).
Role of Funder/Sponsor: This funding organization did not contribute to the design or conduct of this study; collection, management, analysis, or interpretation of the data; preparation, review, approval or decision to submit this manuscript for publication.