Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Skirko JR, James KT, Garrison LP, Weaver EM. Development of a Sleep Apnea–Specific Health State Utility Algorithm. JAMA Otolaryngol Head Neck Surg. 2020;146(3):270–277. doi:10.1001/jamaoto.2019.4469
Can preference-weighted health state utility be measured with a sleep apnea–specific patient-reported outcome?
In this cohort study of 500 participants with sleep apnea, the log-converted subscale scores created the best model, mapping the preference-weighted health state utility from the SF-6D to the Symptoms of Nocturnal Obstruction and Related Events (SNORE-25). The association was maintained in an independent sample.
The preference-weighted scoring of the SNORE-25 might improve the accessibility and use of economic evaluations in patients with obstructive sleep apnea.
With the increasing emphasis on economic evaluations, there is a need for additional methods of measuring patient utility in the obstructive sleep apnea population.
To develop and validate a utility scoring algorithm for a sleep apnea–specific quality-of-life instrument.
Design, Setting, and Participants
Development and validation were conducted at 2 tertiary referral sleep centers and associated sleep clinics and included patients with newly diagnosed obstructive sleep apnea from a randomized clinical trial and an associated observational cohort study. Baseline participants were randomly divided into a model development group (60%) and a cross-validation group (40%).
Main Outcomes and Measures
Utility scoring of the Symptoms of Nocturnal Obstruction and Related Events (SNORE-25) was mapped from the SF-6D utility index through multiple linear regression in the development sample using the Akaike information criterion to determine the best model.
A total of 500 participants (development, n = 300; validation, n = 200) were enrolled; the analyzed sample of 500 participants included 295 men (59%), and the mean (SD) age was 48.6 (12.8) years, with a range of 18 to 90 years. The mean (SD) SF-6D utility among participants with untreated sleep apnea was 0.61 (0.08; range, 0.40-0.85) with similar utility across sleep apnea severity groups. The best-fit model (the SNORE Utility Index) was the natural log conversion of the instrument subscales (r2 = 0.32 in the development sample). The SNORE Utility Index retained this association within the validation sample (r2 = 0.33).
Conclusions and Relevance
The SNORE Utility Index provides a validated, disease-specific, preference-weighted utility instrument that can be used in future studies of patients with obstructive sleep apnea.
Cost-effectiveness analyses are becoming increasingly important in evaluating health care interventions. Most cost-effectiveness analyses require a preference-based measure of health state utility to determine the cost per quality-adjusted life-year (QALY) gained. Health state utility refers to preference for a health outcome or health state that varies from 0 to 1, with 0 representing the worst health state/death and 1 representing the best. Although there are several methods for measuring utility, mapping a utility score from an existing preference-weighted instrument onto a disease-specific instrument allows for easier measurement during clinical studies. This method has been used to map utility onto generic quality-of-life instruments1,2 as well as disease-specific quality-of-life measures for disorders such as angina, Crohn disease, and obesity.3-5 Symptoms of Nocturnal Obstruction and Related Events (SNORE-25) is an instrument designed to measure disease-specific quality of life in people with obstructive sleep apnea (OSA). Quality of life is important but is of limited use in economic evaluations.
Obstructive sleep apnea syndrome is characterized by symptomatic repeated upper airway obstructions during sleep and is diagnosed in 4% to 5% of the adult US population6; as many as 1 in 4 adults are at high risk of having this disorder, most without a formal diagnosis.7,8 Patients with OSA experience lower quality of life and have an increased risk of cardiovascular disease and early mortality.9-12 Treatments for OSA improve many of these outcomes,13-16 and some have shown to be cost-effective.17-19 Cost-effectiveness analyses of OSA treatments thus far have used studies of utility with small sample sizes and primarily standard gamble and generic health status instruments.20-22 An alternative way of obtaining a preference-based measure is the development of a mapping algorithm. Mapping preference-weighted utility onto an OSA-specific instrument would allow future studies of sleep apnea treatments to measure the effect on QALYs gained, thus facilitating future cost-effectiveness analyses comparing alternative interventions. This study seeks to conduct a limited subscale validation of the SNORE-25 and map the preference-based SF-6D utility index to the SNORE-25 to a utility scoring algorithm for this OSA-specific quality-of-life instrument.
This study used a convenience sample of participants with newly diagnosed and untreated OSA enrolled in a randomized clinical trial and an associated observational cohort study. Study participants were enrolled from 2 related studies: (1) a randomized clinical trial evaluating treatment with radiofrequency inferior turbinate reduction in patients with OSA (the TURBO trial [NCT00503802]) and (2) a prospective observational cohort study23 of participants evaluated for suspected OSA (the Seattle Sleep Cohort). Although both studies drew from a population with untreated, newly diagnosed OSA, the randomized clinical trial had more restrictive inclusion and exclusion criteria. Both studies enrolled adults with concern for OSA who underwent diagnostic polysomnography at the University of Washington Sleep Medicine Center at Harborview Medical Center in Seattle. This included a subset of participants who were evaluated for suspected OSA but who did not have OSA diagnosed on polysomnography (n = 13). The randomized clinical trial additionally enrolled participants with newly diagnosed sleep apnea identified at the Virginia Mason Sleep Disorders Center and 4 of its satellite sleep clinics in metropolitan King County, Washington. Inclusion criteria for both studies included age of 18 years and older and fluency in spoken and written English. An additional inclusion criterion for the randomized clinical trial was persistent inferior turbinate hypertrophy identified by anterior nasal examination. Participants were excluded from both samples if they had a prior diagnosed sleep disorder, did not have a telephone, or planned to move during the 12 months after enrollment. Additional exclusion criteria for the randomized clinical trial related to ensuring suitable surgical candidates and included history of previous turbinate surgery, history of coagulopathy, severe psychiatric disorder, and history of life-threatening disease (American Society of Anesthesiologists class IV or V).
This study was reviewed and approved by the University of Washington Institutional Review Board. Written informed consent was obtained from all participants included in the study (randomized clinical trial and cohort).
Study participants from the randomized clinical trial completed baseline computer-based questionnaires after polysomnography, prior to exposure to continuous positive airway pressure treatment and before randomization to active or placebo turbinate reduction. Participants from the observational cohort study completed baseline computer-based questionnaires immediately prior to polysomnography. Computer-based questionnaires were self-administered and were formatted to match the paper versions of the instruments. All participants had the option of completing paper questionnaires if they preferred.
The SNORE-25 is a self-reported, 25-item, OSA-specific quality-of-life instrument. It was originally developed and validated as a 32-item instrument24,25 with content derived from focus groups and structured interviews with patients with OSA.26 Analysis of the psychometric properties resulted in eliminating 7 items to obtain the current SNORE-25. It contains 5 subscales: sleep problems, awake problems, medical problems, emotional and personal problems, and occupational problems. Each item is presented with a 6-point Likert-type scale response format that ranges from 0 (no problem) to 5 (problem as bad as it can be). The total score is the mean of all items in the instrument, and the subscale scores are the mean for the items in the respective subscale with a possible range of 0 to 5; higher values represent worse quality of life.26 The SNORE-25 instrument has been shown to be responsive to treatment.27
The SF-6D is a 6-dimension, preference-weighted, health state classification derived from a sample of responses to the 36-Item Short-Form Health Survey, a validated, widely used general health status instrument used to assess a person’s health across 8 domains.28 The SF-6D uses items from 6 domains, which are physical functioning, role participation (a combination of role-physical and role-emotional), social functioning, bodily pain, mental health, and vitality.29 The resulting SF-6D index ranges from 0.0 to 1.0, with 0.0 representing the worst health state and 1.0 representing the best health state. The SF-6D index can be used in calculating QALY changes needed in cost-utility analysis, which is the recommended and most widely used form of cost-effectiveness analysis applied to health interventions.30 This preference-weighted utility index has been used in several studies to create utility scoring algorithms for other general and disease-specific quality-of-life instruments.4,5,31,32 The SF-6D is thought to be sensitive in groups with mild to moderate health problems,33 which makes it suitable for use in people with OSA.
Baseline SNORE-25 responses were analyzed for internal consistency using Cronbach α for the total instrument score as well as the 5 subscales. Confirmatory factor analysis was conducted to evaluate subscale construct validity to support the validity of using subscale scores in the utility model.34 Factors were assessed by scree plot to determine the number of factors to be retained in the final rotation. The number of factors in the final model was confirmed by retaining factors with an eigenvalue greater than 1. Eigenvalues estimate the variation in the total sample that each factor accounts for; an eigenvalue of 1 is the amount of variance that a single item would be expected to account for, after item standardization.35 Orthogonal varimax rotation was then used with the number of factors identified in the scree plot. Factor loadings of items after rotation, which express the association of each item with the underlying factor, were compared with the hypothesized subscales, with factor loadings greater than 0.4 considered meaningful.36
The SF-6D utility score was calculated for each study participant with descriptive statistics presented. The association of sleep apnea severity, determined using the Apnea-Hypopnea Index and oxygen saturation level, with the SF-6D utility score was measured with the Pearson correlation coefficient. Utility by SF-6D for each category of sleep apnea severity was also calculated. Although there are several measures of sleep apnea severity, we used the most common categorization, the Apnea-Hypopnea Index: 0.0 to 4.9 events per hour is considered normal, 5.0 to 14.9 events per hour is considered mild, 15.0 to 29.9 events per hour is considered moderate, and 30 or more events per hour is considered severe.
The utility score mapping analysis was designed to follow the recommendations of the National Institute for Health and Care Excellence Decision Support Unit for mapping preference-weighted health utility to disease-specific quality-of-life instruments.37 Participants were randomly divided into a model development group (60% of sample) and a cross-validation group (40% of sample). Data from the cross-validation group were not used in any of the model development analyses. Randomizing samples for developing and validating the scoring algorithm helps to ensure that the final model is robust and provides validation in an independent data set within a single study.
The association of the SF-6D utility index with the SNORE-25 was assessed using linear regression with SF-6D as the dependent variable and various forms of the SNORE-25 as independent predictor variables. Models of the independent variables began with simple models (such as SNORE-25 total score as the independent variable) and progressed to more complex models, including but not limited to subscales, individual items, transformed subscales, and categorical conversions.
The Akaike information criterion (AIC) was used to assess the relative goodness of fit and select a final model. The AIC is a calculation that estimates the expected divergence between a candidate model and the true model; as such, the smaller the AIC, the closer a model is to a hypothesized true model and the smaller the estimation or prediction error.38 While the AIC of any one model is meaningless because the true model is unknown, the relative differences between models allow them to be ranked according to their expected divergence from the true model, as represented by the AIC. Because the AIC includes a function for the number of parameters being measured, it does not reward overparameterizing of the model in the same way that R2 does. Using the AIC for model determination is recommended by the National Institute for Health and Care Excellence Decision Support Unit.37 In addition to the AIC, we used R2 and plotted different models of the SNORE Utility Index against the SF-6D utility index to gain a fuller understanding of model fit. The AIC was computed using the Stata postestimation command estat ic.
Models with SNORE parameters as the independent variables and the SF-6D utility index as the dependent variable were evaluated using participants from the model development group. Models included subscales, a variety of categorical subscales, and all items separately as well as models with stepwise inclusion of individual variables. Models were additionally adjusted for age and sex to determine if this significantly improved the models. Four models are shown in detail here.
Model 1 used the SNORE-25 total score as an independent variable. This model assumes that equal preference is given to each item in the instrument and to each increment within each item. Model 2 used the linear SNORE subscales as independent variables. This model allows for a different association between subscale scores and SF-6D but retains the assumption of a linear relationship for each of the subscale scores. Model 3 used categorical SNORE subscales as independent variables. The categorical variables allow for different preferences between SNORE subscales and SF-6D. It does not assume equal preference to subscale health states but has more parameters. Model 4 used natural log–transformed SNORE subscales as independent variables and allows for a nonlinear association between subscale increments (ie, moving from no problem to mild problem is not the same preference as moving from moderate problem to severe problem).
The models were additionally adjusted for age and sex to determine if this improved the goodness of fit. The best model from the development group, determined as described previously, is shown adjusted for age and sex. The best model from the development group was tested on the cross-validation sample with Pearson correlation and scatterplot of SF-6D to the modeled utility index. The R2 value for the final model was additionally calculated on the cross-validation sample and compared with that of the model development sample.
All analyses were performed with Stata/SE, version 15 (StataCorp). Statistics were presented with 95% CIs and effect sizes wherever possible. The 95% CIs for correlations were calculated using bootstrapping with 1000 iterations.
The analyzed sample of 500 participants included 295 men (59.0%); the mean (SD) age was 48.6 (12.8) years, with a range of 18 to 90 years. The mean (SD) Apnea-Hypopnea Index was 41 (28) events per hour, with a range of 1 to 131. This resulted in 13 participants (2.6%) without OSA, 82 (16.4%) with mild OSA, 128 (25.6%) with moderate OSA, and 271 (54.2%) with severe OSA. Randomization of participants to development and cross-validation samples was assessed by comparing the 95% CIs and was found to be adequate (Table 1).
Participants with untreated OSA (excluding those without OSA) had a mean (SD) SF-6D utility score of 0.61 (0.08) with a range of 0.40 to 0.85. The SF-6D utility score was not associated with the Apnea-Hypopnea Index (r = 0.02; 95% CI, −0.03 to 0.06), nor was a significant association found with lowest oxygen saturation level on polysomnography (r = 0.05; 95% CI, 0-0.09). Mean (SD) SF-6D utility was similar among those with mild OSA (0.60 [0.09]), moderate OSA (0.61 [0.08]), and severe OSA (0.60 [0.08]).
Reliability testing with Cronbach α showed good internal consistency for the total instrument (Cronbach α = 0.91). Subscale internal consistency was adequate for sleep problems (Cronbach α = 0.67) and good for awake problems (Cronbach α = 0.80), medical problems (Cronbach α = 0.72), emotional problems (Cronbach α = 0.88), and occupational problems (Cronbach α = 0.80). Combining the sleep and awake problems subscales resulted in good internal consistency (Cronbach α = 0.83).
A scree plot of the eigenvalues showed a leveling off at 4 factors. The fifth factor had an eigenvalue of 1.03, and after varimax rotation, only 1 item loaded greater than 0.4 on this factor. A 4-factor solution was used for final rotation and explained 55% of the variance in the SNORE items. Orthogonal varimax rotation of the 4-factor solution resulted in factor loadings that largely followed the hypothesized subscales (Table 2). Items from the awake problems subscale and some of the items from the sleep problems subscale loaded on the same factor. Item 1 (waking during sleep; inability to get a good night’s sleep) and item 3 (restless during sleep) loaded at approximately 0.4 on factor 1. Item 2 (loud/excessive snoring) did not load greater than 0.4 on any of the 4 factors. Most items in the SNORE-25 loaded primarily on 1 factor, except item 22 (feeling that the future is hopeless), which loaded equally on factors 2 and 4. Construct validation with factor analysis supported the use of subscale scores in the utility model analysis.
Several models with SNORE parameters as the independent variable and SF-6D utility index as the dependent variable were evaluated as described previously. The coefficients (β), β SE, R2, and AIC for each of the 4 models shown here can be found in Table 3.
Model 1 explained 29% of the variance in the SF-6D (R2 = 0.29) and had the most restrictive assumptions. Model 2 explained 32% of the variance with a smaller AIC value, showing a better fit to the data (Table 3). A scatterplot of the predicted utility index from model 2 vs the SF-6D index (Figure 1) shows good correlation (r = 0.58; 95% CI, 0.51-0.64) but with a ceiling effect in the predicted utility. This indicates that model 2 underestimates the utility (SF-6D), which was confirmed in a predicted vs fitted plot. Models 1 and 3 as well as many other linear transformations had the same limitation, with predicted utility not surpassing 0.7. Model 3 explains the same amount of variance as model 2 but with a larger AIC value, showing that it does not fit the data as well.
Model 4 (natural log–transformed SNORE subscales as independent variables) also explains 32% of the variance in the SF-6D (R2 = 0.32) and has an AIC value equal to that of model 2, showing a good fit to the data. A scatterplot of the predicted utility from model 4 vs SF-6D utility showed elimination of the ceiling effect (Figure 2). In model 4, the coefficients of the sleep problems and medical problems subscales were not statistically significant, but they had the hypothesized direction of association, so they were retained in the final model. Goodness of fit was not improved when the models were adjusted for age and sex. For illustration, model 4 was adjusted for age and sex (model 4a) and was found to explain more variance of SF-6D but with a larger AIC. Model 4 was found to be the best fit in the development group.
Analysis of this model of predicted utility (model 4) in the independent cross-validation sample found no shrinkage in R2 from the development group (R2 = 0.33). A scatterplot of predicted utility from model 4 vs SF-6D in the cross-validation group also showed continued elimination of the ceiling effect, and both showed good correlation with the SF-6D (r = 0.57; 95% CI, 0.50-0.69). The coefficients from model 4 along with SNORE subscale scores were used to create a utility score—hereafter referred to as the SNORE Utility Index—for the validation group for each subject. Because the log of 0 is infinity and Stata equates missing data as infinity, a linear conversion of the subscale mean was used by adding 0.1:
SNORE Utility Index = 0.61 − 0.0080 × ln(Sleep Mean + 0.1) − 0.022 × ln(Awake Mean + 0.1) − 0.00071 × ln(Medical Mean + 0.1) − 0.024 × ln(Emotional Mean + 0.1) − 0.0088 × ln(Occupational Mean + 0.1).
The present study provided further reliability testing and limited validation of the SNORE-25 OSA-specific quality-of-life instrument. This OSA-specific instrument was shown to have good internal consistency overall and adequate to good internal consistency to the 5 previously defined subscales. Previous criteria for good subscale internal consistency (with Cronbach α) have been defined as 0.70 to 0.95.39 Factor analysis confirmed the construct validity of the hypothesized subscales, although it appears there may be some overlap between the sleep problems and awake problems subscales. This reliability and subscale construct validation supports our use of subscale scores (or transformations of subscale scores) in the preference-weighted utility scoring of the SNORE instrument. Further reliability testing with test-retest reliability analysis as well as confirmation of the subscale factor loadings would be needed before considering modifying the scoring algorithm for the original SNORE-25 instrument itself. Additionally, allowing the sleep problems and awake problems subscales to vary independently in the mapping analysis with the current subscale breakdown (separate awake problems and sleep problems subscales) is more conservative than assuming that the 2 subscales will have the same associations in future patient samples.
The present study provides, to our knowledge, the first rigorously developed preference-weighted utility instrument for use in the sleep apnea population, allowing future studies in patients with OSA to estimate preference-weighted utility ratings. The modeling undertaken follows the recommendations of the National Institute for Health and Care Excellence Decision Support Unit.37 The SNORE Utility Index (log-transformed model) appears to function well with retained correlation (and R2) with the SF-6D in the cross-validation sample. It also did not have the ceiling effect that was seen in our other models. The SNORE Utility Index scores remained less than 0.9 likely because of the small proportion of subjects with health state utility greater than 0.9. This may limit the improvement in SNORE Utility Index with treatment. The R2 of the SNORE Utility Index and SF-6D shows that OSA symptoms (as measured by the SNORE instrument) account for approximately 32% of the variance in these participants’ overall utility. The R2 for the SNORE Utility Index is within the range of similar studies and is appropriate for mapping a disease-specific instrument to a generic preference-weighted instrument.40
In defining the inclusion and exclusion criteria for this study, we elected to include participants in the model development and validation who presented to the sleep medicine clinic with symptoms concerning for OSA but who did not have OSA verified by a sleep study. We elected to include these participants to attempt to obtain more variability in SNORE-25 scores and the SF-6D utility scores. For example, participants found not to have OSA would potentially have better health state utility and better SNORE-25 scores. This variability would be helpful in allowing the utility to improve as the SNORE-25 improved with treatments.
The mean utility in baseline (untreated OSA) participants by SF-6D was 0.61 (95% CI, 0.60-0.61). This is similar to the utility identified in previous studies of untreated sleep apnea. For instance, Tousignant et al20 found a utility of 0.63 (SD = 0.29) using the standard gamble technique in a sample of 19 patients with sleep apnea. Chakravorty et al21 found a utility of 0.32 (SD = 0.17) in 71 patients with untreated sleep apnea using the standard gamble technique and 0.73 (SD = 0.18) using the EuroQol-derived (general quality of life) utility. Schmidlin et al22 applied several methods to measure utility among 66 patients with sleep apnea and found higher utility by standard gamble than those previously mentioned (median, 0.97; interquartile range, 0.89-0.99) but similar utility by SF-6D (median, 0.75; interquartile range, 0.69-0.85). The present study provides a larger sample (n = 487) than these previous studies; however, it potentially draws from a population with more severe sleep apnea who were referred to a tertiary sleep center. Analysis of utility among the sample of participants with untreated sleep apnea found that utility was not associated with measures of sleep apnea severity. The lack of association between polysomnography indices and quality-of-life scores has previously been shown.41
It is worth noting that OSA is a severely debilitating condition with utility levels similar to those found for patients receiving maintenance hemodialysis. A systematic literature review and meta-analysis by Liem et al42 reported a mean utility based on time trade-off measures for hemodialysis of 0.61 (95% CI, 0.54-0.68). Renal dialysis has historically been used as a benchmark of societal willingness to pay for QALY gains because in the absence of transplantation, the alternative was certain death.43 The relevant economic question is how much do interventions such as continuous positive airway pressure improve this baseline and at what cost.
Our patient population included patients with newly diagnosed OSA. While this provided some benefits for measuring associations and understanding health state utility in patients with untreated OSA, it may have also led to the ceiling effects seen in the modeling. Although we were able to improve the modeling using log transformation, a ceiling effect remained. This may limit the ability of the SNORE utility scoring algorithm to measure improvement with treatments. Future studies assessing a different association between SF-6D and SNORE-25 may help to understand the influence this has on the current scoring algorithm.
The SNORE utility instrument provides a validated disease-specific, preference-weighted utility scoring algorithm that can be used in future studies of OSA treatments. Further studies of change in SNORE utility with OSA treatment are needed to test the responsiveness of this measure and provide information for future comparative effectiveness and cost-effectiveness studies.
Accepted for Publication: November 28, 2019.
Corresponding Author: Jonathan R. Skirko, MD, MPH, Division of Otolaryngology–Head & Neck Surgery, Department of Surgery, University of Utah, 100 N Mario Capecchi Dr, Salt Lake City, UT 84113 (email@example.com).
Published Online: January 30, 2020. doi:10.1001/jamaoto.2019.4469
Author Contributions: Drs Skirko and Weaver had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Skirko, Garrison, Weaver.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Skirko.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Skirko, Garrison.
Obtained funding: Weaver.
Administrative, technical, or material support: Skirko, James, Weaver.
Study supervision: Weaver.
Conflict of Interest Disclosures: Dr Weaver reported receiving grants (K23 HL068849, R01 HL084139, T32 DC000018, UL1 RR025014) from the National Institutes of Health and other support from the US Department of Veterans Affairs during the conduct of the study. No other disclosures were reported.
Funding/Support: This study was funded by the National Institutes of Health (grants K23 HL068849, R01 HL084139, R01 HL084139-03S1, DC00018-26, and 1UL1 RR025014-01) and also supported by resources from the Veterans Affairs Puget Sound Health Care System, Seattle, Washington.
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The contents of this article do not represent the views of the US Department of Veterans Affairs or the US government.
Create a personal account or sign in to: