Evidence reviews for the US Preventive Services Task Force (USPSTF) use an analytic framework to visually display the key questions that the review will address to allow the USPSTF to evaluate the effectiveness and safety of a preventive service. The questions are depicted by linkages that relate interventions and outcomes. A dashed line indicates health outcomes that follow an intermediate outcome. Further details are available from the USPSTF procedure manual.19,20
aInsufficient information to assess risk of bias.
Effect sizes in the figure are hazard ratios. Size of data markers reflects their relative contributions to the pooled effect size (largely because of their sample sizes). Obstructive sleep apnea (OSA) severity category definitions follow those provided in Table 1. For the meta-analysis of severe OSA, 2 studies were included that provided data for participants with severe OSA combined with some or all participants with moderate OSA (Marshall et al153 and Gooneratne et al147). EDS indicates excessive daytime sleepiness.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Jonas DE, Amick HR, Feltner C, et al. Screening for Obstructive Sleep Apnea in Adults: Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA. 2017;317(4):415–433. doi:10.1001/jama.2016.19635
Many adverse health outcomes are associated with obstructive sleep apnea (OSA).
To review primary care–relevant evidence on screening adults for OSA, test accuracy, and treatment of OSA, to inform the US Preventive Services Task Force.
MEDLINE, Cochrane Library, EMBASE, and trial registries through October 2015, references, and experts, with surveillance of the literature through October 5, 2016.
English-language randomized clinical trials (RCTs); studies evaluating accuracy of screening questionnaires or prediction tools, diagnostic accuracy of portable monitors, or association between apnea-hypopnea index (AHI) and health outcomes among community-based participants.
Data Extraction and Synthesis
Two investigators independently reviewed abstracts and full-text articles. When multiple similar studies were available, random-effects meta-analyses were conducted.
Main Outcomes and Measures
Sensitivity, specificity, area under the curve (AUC), AHI, Epworth Sleepiness Scale (ESS) scores, blood pressure, mortality, cardiovascular events, motor vehicle crashes, quality of life, and harms.
A total of 110 studies were included (N = 46 188). No RCTs compared screening with no screening. In 2 studies (n = 702), the screening accuracy of the multivariable apnea prediction score followed by home portable monitor testing for detecting severe OSA syndrome (AHI ≥30 and ESS score >10) was AUC 0.80 (95% CI, 0.78 to 0.82) and 0.83 (95% CI, 0.77 to 0.90), respectively, but the studies oversampled high-risk participants and those with OSA and OSA syndrome. No studies prospectively evaluated screening tools to report calibration or clinical utility for improving health outcomes. Meta-analysis found that continuous positive airway pressure (CPAP) compared with sham was significantly associated with reduction of AHI (weighted mean difference [WMD], −33.8 [95% CI, −42.0 to −25.6]; 13 trials, 543 participants), excessive sleepiness assessed by ESS score (WMD, −2.0 [95% CI, −2.6 to −1.4]; 22 trials, 2721 participants), diurnal systolic blood pressure (WMD, −2.4 points [95% CI, −3.9 to −0.9]; 15 trials, 1190 participants), and diurnal diastolic blood pressure (WMD, −1.3 points [95% CI, −2.2 to −0.4]; 15 trials, 1190 participants). CPAP was associated with modest improvement in sleep-related quality of life (Cohen d, 0.28 [95% CI, 0.14 to 0.42]; 13 trials, 2325 participants). Mandibular advancement devices (MADs) and weight loss programs were also associated with reduced AHI and excessive sleepiness. Common adverse effects of CPAP and MADs included oral or nasal dryness, irritation, and pain, among others. In cohort studies, there was a consistent association between AHI and all-cause mortality.
Conclusions and Relevance
There is uncertainty about the accuracy or clinical utility of all potential screening tools. Multiple treatments for OSA reduce AHI, ESS scores, and blood pressure. Trials of CPAP and other treatments have not established whether treatment reduces mortality or improves most other health outcomes, except for modest improvement in sleep-related quality of life.
Obstructive sleep apnea (OSA) (Table 1) has been associated with an increased risk of many adverse health outcomes, including motor vehicle crashes,7-9 cognitive impairment,10,11 cardiovascular events,12-14 atrial fibrillation,15 stroke,14,16 and mortality.8,13,14,17 However, there is controversy in the literature regarding the extent to which OSA independently contributes to various outcomes beyond the contributions of age, body mass index (BMI), and other potential confounders. OSA is common, with prevalence around 15% in men and 5% in women (ages 30-70 years), based on either an apnea-hypopnea index (AHI) of 15 or greater or an AHI of 5 or greater plus symptoms of disturbed sleep.17,18
Screening to identify unrecognized OSA followed by appropriate treatment might improve sleep quality and normalize the AHI and oxygen saturation levels to prevent adverse health outcomes. Potential screening strategies include questionnaires and clinical prediction tools that comprise combinations of subjective and objective findings. For people who screen positive, diagnostic polysomnography in a sleep facility or home-based testing with a portable monitor could be used to determine whether they have OSA.
To inform a recommendation by the US Preventive Services Task Force (USPSTF), the evidence on test accuracy and benefits and harms of screening and treatment for OSA in populations and settings relevant to US primary care was reviewed.
Detailed methods are available in the full evidence report at https://www.uspreventiveservicestaskforce.org/Page/Document/final-evidence-review152/obstructive-sleep-apnea-in-adults-screening. Additional subgroup analyses (by OSA severity, baseline sleepiness, and baseline blood pressure) and sensitivity analyses conducted to explore heterogeneity or robustness of findings are available in the full evidence report. Figure 1 shows the analytic framework and key questions that guided the review.
We searched PubMed/MEDLINE, the Cochrane Library, and EMBASE for English-language articles published through October 2015. ClinicalTrials.gov and the World Health Organization International Clinical Trials Registry Platform were also searched for unpublished literature. The search strategies for PubMed and Cochrane databases are detailed in the eMethods in the Supplement. To supplement electronic searches, the reference lists of pertinent articles were reviewed, as well as all studies suggested by reviewers or comments received during public commenting periods. Since October 2015, we conducted ongoing surveillance through article alerts and targeted searches of high-impact journals to identify major studies published in the interim that may affect the conclusions or understanding of the evidence and therefore the related USPSTF recommendation. The last surveillance was conducted on October 5, 2016.
Two investigators independently reviewed titles, abstracts, and full-text articles to determine eligibility using prespecified criteria for each key question (KQ) (eTable 1 in the Supplement). Disagreements were resolved by discussion. The review included English-language studies of adults conducted in countries categorized as “very high” on the Human Development Index. Only studies rated as good or fair quality using predefined criteria and definitions developed by the USPSTF and adapted for this topic (eTable 2 in the Supplement)20 were included. The review excluded studies of people with acute conditions (eg, stroke) that can trigger onset of OSA and studies focused on screening, diagnosis, or treatment of OSA among persons with rare conditions (eg, acromegaly) for whom testing for OSA would be considered part of management for their disease.
For the overarching question regarding direct evidence that screening improves health outcomes (KQ1) and the question on accuracy of clinical prediction tools or screening questionnaires (KQ2), studies were required to enroll asymptomatic adults or persons with unrecognized symptoms of OSA; referral populations were not eligible. For KQ1, randomized clinical trials (RCTs) comparing screened with nonscreened groups were eligible. For KQ2, studies that evaluated screening questionnaires or clinical prediction tools (alone or followed by home-based portable monitoring) compared with overnight polysomnography conducted in a sleep laboratory were eligible. Studies of people referred to sleep laboratories because of concern for OSA were excluded, and studies in which only a subgroup (usually the highest-risk group) underwent polysomnography were excluded because of concern for verification bias. Clinical prediction tools were required to include multiple factors.
For diagnostic test accuracy (KQ3) and harms associated with screening and diagnostic tests (KQ7), referral populations were also eligible (in addition to the populations eligible for KQ1 and KQ2). For KQ3, good-quality, recent systematic reviews comparing portable monitors (Table 2 describes the types of monitors) with polysomnography conducted in a sleep laboratory were eligible. Multiple good-quality, recent, and relevant systematic reviews for KQ3 were identified; primary studies published after the search cutoffs of the most recent systematic reviews were also included. For KQ7, studies eligible for KQ1, KQ2, or KQ3 that reported false-positive results leading to unnecessary treatment, anxiety, condition-specific distress, or stigma were eligible.
For benefits and harms of treatment (KQ4, KQ5, and KQ8), RCTs enrolling people with a confirmed diagnosis of OSA were eligible; studies could include asymptomatic adults, symptomatic adults, or both. Studies evaluating continuous positive airway pressure (CPAP), mandibular advancement devices (MADs), surgery, and weight loss programs were included; other treatments were not eligible (eg, oropharyngeal exercises). For KQ8, prospective cohort studies with at least 100 participants that reported harms of surgical interventions were also eligible.
For the association between AHI and health outcomes (KQ6), prospective cohort studies that followed up participants for at least 1 year were included. Studies were excluded that focused primarily on central sleep apnea, enrolled patients hospitalized for acute events, enrolled patients in a periprocedural period, or did not address potential confounding.
For each included study, one investigator extracted information about the populations, tests or treatments, comparators, outcomes, settings, and designs, and a second investigator reviewed for completeness and accuracy. Two independent investigators assessed the quality of studies as good, fair, or poor. Disagreements were resolved by discussion.
Findings for each question were summarized in tabular and narrative form. To determine whether meta-analyses were appropriate, the clinical and methodological heterogeneity of the studies was assessed following established guidance.22 When multiple similar studies were available, quantitative synthesis was conducted with random-effects models using the inverse-variance weighted method (DerSimonian and Laird) to estimate pooled effects.23 For all quantitative syntheses, the I2 statistic was calculated to assess statistical heterogeneity in effects between studies.24,25 Quantitative analyses were conducted using Comprehensive Meta-Analysis version 3.3 (Biostat Inc) and Stata version 14 (StataCorp). Statistical significance was assumed when 95% CIs of pooled results did not cross the null (ie, 0 or 1, depending on the effect measure). All testing was 2-sided. This review covered a wide range of outcome measures and instruments; key measures and questionnaires are summarized in eTable 3 in the Supplement.
For KQ4 and KQ5 the weighted mean difference (WMD) between intervention and control was calculated for continuous outcomes; when multiple scales were combined in a single meta-analysis (for sleep-related quality of life), we used the standardized mean difference, Cohen d. For Cohen d, a value of 0.20 is often interpreted as a small effect size, 0.50 as a medium effect size, and 0.80 as a large effect size.26 For meta-analyses of CPAP and MAD treatments, pooled estimates were calculated separately for studies using sham controls and those using other controls. Parallel trials and crossover trials were combined, but subgroup analyses were conducted to explore whether findings differed by this design feature.
For KQ6, we conducted meta-analyses of adjusted hazard ratios (HRs) and 95% confidence intervals for all-cause mortality. The HRs were converted to a log scale, and standard errors of the log HRs were calculated to normalize distributions and stabilize variances. The metan command with the eform command in Stata was then used to estimate pooled HRs. Analyses were by AHI thresholds corresponding to OSA severity categories.
A total of 110 studies (127 articles) with N = 46 188 participants were included (Figure 2). Individual study quality ratings are reported in eTables 4 through 12 in the Supplement. The main results for each key question are summarized below; additional details and analyses are available in the full evidence report.27
Key Question 1a. Does screening for OSA in adults improve health outcomes?
Key Question 1b. Does the evidence differ for subgroups defined by age, sex, BMI, or OSA severity?
No eligible studies were identified.
Key Question 2a. What is the accuracy of currently existing clinical prediction tools or screening questionnaires in identifying persons in the general population who are more or less likely to have OSA?
Key Question 2b. What is the accuracy of multistep screening approaches, such as using a questionnaire or prediction tool followed by overnight home-based testing, in identifying persons in the general population who are more or less likely to have OSA?
Three studies were included (Table 3).28-30 One evaluated the Berlin Questionnaire,28 and 2 evaluated the Multivariable Apnea Prediction (MVAP) score, alone and when followed by in-home portable monitoring.29,30 Details of the questions and scoring are reported in the eBackground in the Supplement.
The study evaluating the Berlin Questionnaire randomly sampled Norwegians from the National Population Register (55% response rate: 16 302/29 258).28 Of those completing the questionnaire, 24% were classified as high risk and 518 had undergone in-hospital polysomnography. Of those 518, mean age was 48 years, 45% were female, mean BMI was 28 (calculated as weight in kilograms divided by height in meters squared), and median AHI was 6.4. Although the group undergoing polysomnography oversampled high-risk participants (70% were high risk), the analyses adjusted for bias in the sampling to report estimated screening properties for the general population. The study found suboptimal screening properties (for AHI ≥5: sensitivity, 37.2%; specificity, 84%; for AHI ≥15: sensitivity, 43%; specificity, 79.7%) (Table 4). The unadjusted analyses showed much better sensitivity but worse specificity (for AHI ≥5: sensitivity, 79.4%; specificity, 40.5%; for AHI ≥15: sensitivity, 82.8%; specificity, 34.9%), likely reflecting spectrum bias.
Both studies assessing the MVAP included highly selected patients.29,30 One study evaluated Medicare recipients (n = 452), most (74%) of whom had daytime sleepiness.29 The percentage with OSA was not reported, but 27% had obstructive sleep apnea syndrome (OSAS), defined as AHI 5 or greater and Epworth Sleepiness Scale (ESS) score greater than 10. The other study evaluated patients with hypertension (n = 250).30 Eighty percent of participants had OSA (AHI ≥5); of those, 22% had moderate and 25% had severe OSA; 25% of all participants had OSAS. Mean ages of participants were 7129 and 5330 years; 60% to 64% were nonwhite; and mean BMIs were 30 to 32, respectively. Key quality limitations included concern for attrition bias30 and moderate concern for selection bias or spectrum bias (with high prevalence of OSA, OSAS, and/or daytime sleepiness among those undergoing polysomnography).29,30
Both studies reported operating characteristics of MVAP to predict severe OSAS (AHI ≥30 and ESS score >10) (Table 4). The study of Medicare recipients reported reasonable discrimination (area under the curve [AUC], 0.78 [95% CI, 0.71 to 0.85]), whereas the other study found inadequate discrimination (AUC, 0.68 [95% CI, 0.67 to 0.70]). An AUC less than 0.70 has been considered to indicate inadequate discrimination.31,32 Both studies also reported measures of discrimination for the MVAP score followed by in-home portable monitoring (Table 4).29,30 The studies by Morales et al29 and Gurubhagavatula et al30 reported characteristics to predict severe OSAS using different portable monitor–based AHI cutoffs (ie, 1529 and 1830). Both found better operating characteristics when using MVAP followed by in-home portable monitoring (AUC, 0.80-0.83) than when using MVAP alone.29,30
The study of participants with hypertension also reported operating characteristics of MVAP and MVAP followed by in-home portable monitoring to predict any OSAS (AHI ≥5 and ESS score >10).30 It found inadequate discrimination (Table 4).
Key Question 3a. What is the accuracy and reliability of diagnostic tests for OSA?
Key Question 3b. Do the accuracy and reliability of diagnostic tests for OSA differ for subgroups defined by age, sex, or BMI?
We included 3 studies evaluating type II portable monitors, 1 systematic review and 2 subsequent studies evaluating type III portable monitors, and 1 systematic review and 14 subsequent studies evaluating type IV portable monitors. Study participants were generally those referred to sleep units for suspected sleep apnea. No studies were found that identified participants via screening to provide evidence on asymptomatic patients or those with unrecognized symptoms, although detailed reporting of reasons for referral was generally limited. Details of individual study characteristics and results are provided in the eResults and eTables 13 through 22 in the Supplement.
Table 5 summarizes the range of sensitivities, specificities, and AUCs by type of portable monitor for AHI thresholds of 5, 15, and 30. The best evidence comes from systematic reviews that reported sensitivities of 93% (pooled estimate from in-home studies) and 96% (pooled estimate from in-laboratory studies) for type III portable monitors and at least 85% for type IV portable monitors for detecting any OSA (AHI ≥5).21 Corresponding specificities were 60% and 76% for in-home and in-laboratory type III portable monitors, respectively, and ranged from 50% to 100% for type IV portable monitors.21 Sensitivities decreased and specificities increased for detecting moderate or greater OSA (AHI ≥15) or severe OSA (AHI ≥30). The ranges of sensitivity and specificity reported across studies for type IV monitors were wide.
Key Question 4a. How much does treatment with CPAP, MADs, surgery, or weight loss programs improve intermediate outcomes (AHI, blood pressure, or daytime sleepiness) in persons with OSA?
Key Question 4b. Do the benefits of treatment (for intermediate outcomes) differ for subgroups defined by age, sex, BMI, or OSA severity?
Included were 76 RCTs: 56 trials evaluated CPAP (eTables 23 and 24 in the Supplement),53-112 10 trials evaluated MADs (eTable 25 in the Supplement),98,105,113-122 6 trials evaluated surgical interventions (eTable 26 in the Supplement),123-128 and 6 trials evaluated weight loss, diet, and exercise programs (eTable 27 in the Supplement).129-138 None of the trials focused on participants who were screen-detected in primary care settings.
Most studies identified participants from sleep clinics or referrals. Duration of treatment ranged from 1 week to 4 years. Most trials lasted for 12 weeks or less, but 5 trials treated participants for 24 weeks or longer,70,96,97,99,107 including 2 that followed up participants for 52 weeks96,107 and 1 that did so for a median of 4 years.97 Mean age was 40s to 50s in most studies (range, 42-71). The majority of participants in most trials were men, with 44 trials reporting that less than one-third of participants were women. Mean BMI was 30 to 35 in most trials (range, 27-39). Mean or median baseline AHI (or similar measure) was in the severe OSA range (AHI ≥30) for more than 75% of trials; 8 trials reported it in the moderate OSA range,75,76,80,87,98,103,105,107 and 4 reported it in the mild OSA range.91,99,101,108 Mean baseline ESS score was 10 or more in 33 trials, indicating excessive daytime sleepiness. Ten trials reported a mean baseline ESS score less than 10,55,59,63,72,87,96,97,99,103,106 and 13 trials did not report baseline ESS score.
For AHI, trials reporting sufficient data for meta-analysis followed up patients for 12 weeks or less. The meta-analyses found that CPAP was associated with reduction of AHI compared with sham CPAP (WMD, −33.8 [95% CI, −42.0 to −25.6]; 13 trials, 543 participants) and other controls (WMD, −25.8 [95% CI, −34.2 to −17.5]; 6 trials, 294 participants) (eFigures 1 and 2 in the Supplement). All individual studies reported end-point AHI values of 10 or less for CPAP-treated groups, and most were normal (<5).
Thirty-four trials reported sufficient ESS data to include in meta-analyses. Most were 12 weeks or less in duration; 5 followed up participants for 24 weeks,70,99 48 to 52 weeks,96,107 or longer.97 The meta-analyses found that CPAP was associated with reduction of ESS scores compared with sham CPAP (WMD, −2.0 [95% CI, −2.6 to −1.4]; 22 trials, 2721 participants) and other controls (WMD, −2.2 [95% CI, −2.8 to −1.6]; 12 trials, 2488 participants) (eFigures 3 and 4 in the Supplement). Among the 27 trials with mean or median baseline ESS scores of 10 or greater (mean baseline ESS score was 12.7 among them) or those that provided subgroup analyses for the participants with excessive sleepiness, the subgroup analysis found a similar result (WMD, −2.4 [95% CI, −2.9 to −1.9]) (eFigure 5 in the Supplement). Twenty-three of those 27 trials reported mean end-point ESS scores less than 10 for the CPAP group (mean end-point ESS score for the 23 trials was <8).
Twenty-nine trials reported sufficient blood pressure data to include in meta-analyses. Blood pressure outcomes were reported in a variety of ways; most commonly, diurnal systolic and diurnal diastolic blood pressure. Most trials were 12 weeks or less in duration; 3 followed up participants for 24 to 52 weeks.96,99,107 The meta-analyses found that CPAP was associated with reduction of diurnal systolic blood pressure by 2 to 3 points (WMD, −2.4 [95% CI, −3.9 to −0.9]; 15 trials, 1190 participants) (eFigure 6 in the Supplement) and diurnal diastolic blood pressure by more than 1 point (WMD, −1.3 [95% CI, −2.2 to −0.4]; 15 trials, 1190 participants) (eFigure 7 in the Supplement) compared with sham CPAP. Reduction in 24-hour mean arterial pressure was about 2 points with CPAP compared with sham CPAP (WMD, −2.1 [95% CI, −3.2 to −1.0]; 5 trials, 621 participants) (eFigure 8 in the Supplement).
Among the 6 studies that provided results for participants with uncontrolled hypertension,60,62,66,87,96,106 the subgroup analysis found similar but slightly larger effect sizes (eFigures 9 and 10 in the Supplement); reductions of −2.5 points for diurnal systolic blood pressure, −2.1 points for diurnal diastolic blood pressure, and −2.7 points for 24-hour mean arterial pressure.
Six of the 10 included RCTs compared MADs with sham devices.113-117,120 Comparators used in other RCTs were a placebo tablet,98 no treatment,121,122 and conservative management with weight loss.105 All studies recruited participants with known or suspected OSA from specialty clinics. Treatment durations ranged from 4 to 12 weeks for most studies, but 1 lasted only 1 week121 and 1 lasted 24 weeks.114 Mean age of participants ranged from 45 to 59 years. The majority of participants in all trials were men, with women comprising 17% to 25% of participants in the 9 trials reporting sex. All studies included participants with mild to moderate OSA, and 6 also included participants with severe OSA.105,113,116,117,120,121 Mean baseline ESS scores ranged from 11 to 14.
The meta-analyses found that MADs were associated with greater improvement in AHI than sham devices (−12.6 [95% CI, −15.5 to −9.7]; 6 trials, 307 participants) and other controls (−8.2 [95% CI, −13.9 to −2.5]; 5 trials, 358 participants) (eFigures 11 and 12 in the Supplement). MADs were also associated with reduction of ESS scores compared with sham devices (−1.5 [95% CI, −2.8 to −0.2]; 5 trials, 267 participants) and other controls (−1.7 [95% CI, −2.2 to −1.2]; 5 trials, 358 participants) (eFigures 13 and 14 in the Supplement).
Five trials reported sufficient blood pressure data for meta-analysis.105,113,115,116,119 The meta-analyses found no statistically significant differences between MADs and comparators for any of the blood pressure measures (eFigures 15 through 20 in the Supplement).
Six trials each evaluated a different surgical technique, including radiofrequency surgery of the soft palate,123 temperature-controlled radiofrequency tissue ablation,128 uvulopalatopharyngoplasty,124 laser-assisted uvulopalatoplasty,126 septoplasty,127 and bariatric surgery125 (eTable 26 in the Supplement). Sample sizes ranged from 32123 to 67.124 Overall, the trials provided limited evidence and found no significant reduction in AHI, ESS scores, or blood pressure, with the exception of the trials of uvulopalatopharyngoplasty124 and laser-assisted uvulopalatoplasty,126 which found greater reductions in AHI for surgery than for no treatment (−26.4 [95% CI, −36.2 to −16.6] and −10.5 [95% CI, −16.9 to −4.1], respectively). Further details of the characteristics and results of trials that evaluated surgical interventions are provided in the eResults and eFigures 21 and 22 in the Supplement.
Six trials evaluated weight loss programs (eTable 27 in the Supplement).129-138 Each trial evaluated a different intervention and control—2 interventions focused primarily on exercise,129,133 2 focused primarily on diet,132,136 and 2 used multicomponent lifestyle interventions (exercise, diet, and psychoeducation).130,135 Sample sizes ranged from 26129 to 264.130 Participants were generally identified from sleep clinics, referrals, and advertisements. Duration of follow-up was 4 to 26 weeks for 4 of the trials; the other 2 trials followed up participants for 4 or 5 years.130,136 Mean age ranged from 47 to 61 years. Mean BMI ranged from 30 to 40. Mean AHI was in the moderate to severe OSA range for 4 of the trials, in the mild range for 1 trial,136 and was moderate to severe but controlled with CPAP use in 1 trial.135 Mean baseline ESS score was 10 or more in 2 trials,129,136 less than 10 in 3 trials,132,133,135 and not reported for 1 trial.130 The weight loss achieved by intervention groups was very limited in 1 trial (−0.3 kg),133 modest in another (−2.3 kg),135 and larger in the rest (−5 kg to −20 kg).130,132,138
Four of the 5 trials129,130,132,133,138 reporting AHI found statistically significant reductions, ranging from −5.8 (95% CI, −9.7 to −1.9) to −23 (−30.1 to −15.9) (eFigure 23 in the Supplement). The trial reporting the largest reduction in AHI (a reduction nearing that achieved by CPAP) also reported a much larger weight reduction than other trials (−20 kg over 9 weeks from a very low energy diet).132 The meta-analysis for AHI found a WMD of −12.4 (95% CI, −19.4 to −5.5). Three of the 4 trials129,132,133,138 reporting ESS scores found statistically significant reductions, ranging from −3 to −7. The meta-analysis found that weight loss interventions were associated with improvement in ESS scores compared with controls (−3.4 [95% CI, −5.9 to −1.0]; 4 trials, 213 participants) (eFigure 24 in the Supplement). Three trials reported blood pressure outcomes and found no significant differences between treatment and control groups.134-136
Key Question 5a. Does treatment with CPAP, MADs, surgery, or weight loss programs improve health outcomes in persons with OSA?
Key Question 5b. Do the benefits of treatment (for health outcomes) differ for subgroups defined by age, sex, BMI, or OSA severity?
Included were 50 RCTs (eTables 25 through 31 in the Supplement) that reported at least 1 eligible health outcome (47 of these were included in KQ4). Most were short-term RCTs (12 weeks or less) that reported zero or few deaths. None focused on screen-detected patients from primary care settings. The main findings are summarized below; additional outcomes for which there were limited data are shown in eTables 29 through 31 in the Supplement and are summarized in the full report.
Thirty-five RCTs compared CPAP with sham CPAP53,55,62-65,67,70,72,75,76,79,80,82,86-89,91,93,97,139 or another control.95,97-103,105,107-109,140,141 Most trials followed up participants for 12 weeks or less; 4 trials measured outcomes over 24 weeks or longer,70,97,99,107 including 1 that followed up participants for a median of 4 years.97 Most enrolled populations with a mean age in the 40s to 50s (range, 42-71 years). Mean BMI was 30 to 35 in most trials (range, 27-37). Mean or median baseline AHI (or similar measure) was in the severe OSA range (AHI ≥30) for more than half of trials; 9 trials reported it in the moderate OSA range,75,76,80,87,98,103,105,107,140 and 5 reported it in the mild OSA range.91,99,101,108,141
Thirty-one RCTs reported on mortality (eTable 29 in the Supplement); most (29 RCTs) reported mortality rates at 12 weeks or less. Most (27 RCTs, 2211 total participants) reported no deaths in any study group.53,55,62,64,65,67,72,75,76,79,80,82,87-89,91,95,98,100-103,105,108,109,140,141 Two trials (462 total participants) reported 1 death, either in the CPAP group99 or in the sham CPAP group at 12 weeks.63 Two RCTs assessed mortality over a longer duration.70,97 One (n = 1105) reported 2 deaths in each study group over 24 weeks.70 The other (n = 723) reported 8 deaths in the CPAP group and 3 in the control group over about 4 years (incidence density ratio, 2.6 [95% CI, 0.70 to 11.8]; P = .16).97
Twenty-two RCTs reported a variety of quality-of-life measures (eTable 29 in the Supplement). The meta-analysis found no difference between CPAP and comparators in the change from baseline 36-Item Short Form Health Survey (SF-36) mental component score (WMD, 1.2 [95% CI, −0.8 to 3.2]; 8 trials, 1039 participants) (eFigure 25 in the Supplement). The meta-analysis found that CPAP was associated with improved SF-36 physical component score compared with sham CPAP over 12 weeks or less (WMD, 2.3 [95% CI, 0.2 to 4.4]; 7 trials, 648 participants) (eFigure 26 in the Supplement).
Thirteen RCTs assessed sleep-related quality of life—6 using the Sleep Apnea Quality of Life Index (SAQLI)89,93,99,105,107,142 and 7 using the Functional Outcomes of Sleep Questionnaire (FOSQ).55,76,79,86,91,98,102 Most reported outcomes at 12 weeks or less; 2 reported outcomes at 24 weeks (or 6 months)99,142 and 1 at 52 weeks.107 The meta-analysis (combining SAQLI and FOSQ scores) found that CPAP was associated with improved sleep-related quality-of-life scores compared with controls (standardized mean difference, 0.28 [95% CI, 0.14 to 0.42]; 13 trials, 2325 participants) (eFigure 27 in the Supplement). The sensitivity analysis including only studies with mean or median baseline ESS scores of 10 or greater found a similar effect size (0.33 [95% CI, 0.17 to 0.50]; 9 trials, 1709 participants) (eFigure 28 in the Supplement).
Eight RCTs reported on the incidence of 1 or more cardiovascular and cerebrovascular events (eTable 29 in the Supplement).63,70,76,93,97,99,103,107 Overall, too few cardiovascular and cerebrovascular events were observed to draw conclusions.
Included were 6 RCTs assessing the effect of MADs on health outcomes (eTable 30 in the Supplement).98,105,114,116,121,122 Treatment durations ranged from 4 to 12 weeks for most studies, while 1 lasted for only 1 week121 and 1 for 24 weeks.114 All studies included participants with mild to moderate OSA, and 3 also included participants with severe OSA.105,116,121
Among the 4 trials that reported on mortality over 1 to 12 weeks,98,116,121,122 3 reported no deaths in any participants and 1 reported 1 death in the group that received no treatment.116 Five included trials reported at least 1 quality-of-life measure.98,105,114,116,122 All 5 used the SF-36; 2 also used the SAQLI,105,122 and 2 also used the FOSQ.98,122 Overall, results were mixed, with some studies finding no significant benefits of MADs for improving quality of life,105,114 some reporting possible benefits for some measures or subscales but not others,98,116 and some reporting benefits for some overall quality-of-life scores.122 Because of inconsistency, imprecision, and heterogeneity of reporting, findings were insufficient to make conclusions about the potential benefits of MADs for improving quality of life.
Although 5 of the 6 RCTs included in KQ4 that evaluated surgical treatments reported some information about at least 1 health outcome, the trials provided limited evidence to determine whether treatments improve health outcomes. The RCT (n = 60) that compared bariatric surgery with a conventional weight loss program in people with severe OSA125 reported greater improvement in quality of life measured by the SF-36 physical component score for those randomized to bariatric surgery at 2 years (between-group difference, 9.3 [95% CI, 0.5 to 18.0]); however, there was no significant difference between groups in the change from baseline SF-36 mental component score (between-group difference, −0.3 [95% CI, −5.3 to 4.8]).125 Further details on the results of trials that evaluated surgical interventions are provided in the eResults in the Supplement.
Six RCTs evaluated weight loss programs (eTable 27 in the Supplement).129-138 Four RCTs (with a total of 45 participants) assessed mortality; 3 reported no deaths in any group over 9 to 208 weeks,130,132,133 and 1 reported 1 death at 52 weeks.136 Four RCTs assessed quality of life (eTable 31 in the Supplement).129,133,135,136 Overall, findings were mixed, and too few studies reported results for the same intervention and comparison using similar outcome measures to draw conclusions.
Key Question 6. Is there an association between AHI and health outcomes?
Included were 11 prospective cohort studies (described in 12 articles) that assessed the association between AHI and health outcomes (eTable 32 in the Supplement).12,143-153 All of them focused on community-based participants; 1 also enrolled some participants from a sleep clinic.12 Three studies analyzed participants from the Sleep Heart Health Study,148,149,151 a cohort of men and women 40 years or older recruited from other cohort studies between 1995 and 1998. Two studies evaluated the Wisconsin Sleep Cohort Study,145,150 a random sample of state-employed adults 30 to 60 years of age. Two articles reported data from the Busselton Health Study for different durations of follow-up.152,153
Six studies reported the association with all-cause mortality,144,145,147,150-153 3 with cardiovascular mortality,12,150,151 2 with cardiovascular events,12,148 and 1 each with cancer-related mortality,145 stroke,149 cognitive decline,143 and cognitive impairment or dementia.146 Nine of 11 were conducted in the United States. Most studies followed up patients for 8 to 14 years; follow-up ranged from a mean of 3.4 years144 to 22 years.145 Three studies included only men; half of the studies included between 45% and 56% women. Mean BMI ranged from 26 to 30 in most studies. Participants were generally untreated for OSA, or analyses were run to exclude those who were treated.
Quiz Ref IDSix studies evaluated AHI as a predictor of all-cause mortality.144,145,147,150-153 Sample sizes ranged from 289147 to 6 294.151 Mean duration of follow-up ranged from 3.4144 to 20 years.153 Mean age ranged from 48150 to 78 years.147 In multivariable analyses, all 6 studies reported that participants with severe or moderate to severe OSA at baseline had a higher risk of death. Variables included in the models are detailed in eTable 33 in the Supplement. Briefly, all included age and some medical conditions in the final model; all considered BMI (although it did not remain in the final model in 1 study); most included smoking, sex, race, hypertension or blood pressure, and diabetes. Comparing mortality for patients with severe or moderate to severe OSA vs controls, meta-analysis found an HR of 2.07 (95% CI, 1.48 to 2.91) (Figure 3). Two studies150,151 assessed whether moderate (AHI 15 to <30) or mild (AHI 5 to <15) OSA levels are associated with mortality; neither found a statistically significant association (Figure 3).
Two studies reported evidence for subgroups—either by sex and age151 or by presence of sleepiness.147 The former used the Sleep Heart Health Study data (n = 6294) and reported that the association between an AHI of 30 or greater and mortality was statistically significant for men 70 years or younger (adjusted HR, 2.09 [95% CI, 1.31 to 3.33]) but not for men older than 70 years (HR, 1.27 [95% CI, 0.86 to 1.86]) or for women of any age (HR, 1.40 [95% CI, 0.89 to 2.22]).151 The latter found that the association between AHI of 20 or greater and death was limited to those with excessive daytime sleepiness (determined by self-report of having a problem with feeling sleepy or struggling to stay awake during the daytime more than 3 or 4 times a week) but was not significant for those without excessive daytime sleepiness (HR, 2.28 [95% CI, 1.46 to 3.57] vs HR, 0.74 [95% CI, 0.39 to 1.38]) compared with a reference group with AHI less than 20 and no excessive daytime sleepiness.
Three studies evaluated the association between AHI and cardiovascular mortality.12,150,151 Sample sizes ranged from 1522150 to 6294.151 Mean duration of follow-up ranged from 8.2151 to 13.8 years.150 Mean age ranged from 48150 to 63151 years. In multivariable analyses, all 3 studies reported that participants with severe or moderate to severe OSA at baseline had a higher risk of cardiovascular death (eFigure 29 in the Supplement), with HRs of 1.7 (95% CI, 1.1 to 2.5) (for men only in the Sleep Heart Health Study),151 2.9 (95% CI, 1.1 to 7.3), and 5.9 (95% CI, 2.6 to 13.3).150 Variables included in the models are detailed in eTable 33 in the Supplement. Briefly, all of them included age, BMI, smoking, and multiple medical conditions or used matching for age and BMI. Two of 3 included alcohol use, blood pressure, and cholesterol level.
A single included study evaluated the association between AHI and the incidence of each of the following outcomes: cancer-related mortality,145 nonfatal cardiovascular events,12 heart failure,148 coronary heart disease,148 stroke,149 cognitive impairment or dementia,146 and cognitive decline143 (eFigure 30 in the Supplement). Overall, findings for these outcomes were imprecise, consistency was unknown (with a single study for each), and evidence was often limited by risk of bias (especially risk of residual confounding).
Key Question 7a. Are there harms associated with screening or diagnostic testing for OSA?
Key Question 7b. Do the harms of screening or diagnostic testing differ for subgroups defined by age, sex, or BMI?
Key Question 8a. Are there harms associated with treatment of OSA?
Key Question 8b. Do the harms of treatment differ for subgroups defined by age, sex, BMI, or OSA severity?
Reporting of harms in the included studies was sparse. Twenty-two of the RCTs included in KQ4 reported harms associated with treatments for OSA: 9 trials of CPAP,66,70,75,88,91,92,101,105,108 8 of MADs,105,114-122 1 of a very low energy diet,132 4 of airway surgical treatments,123,124,126,128 and 1 of bariatric surgery (eTables 35 through 38 in the Supplement).125
Of the 9 included RCTs, most enrolled fewer than 100 participants; 1 trial91 enrolled 281, and the Apnea Positive Pressure Long-term Efficacy Study (APPLES)70 enrolled 1098. Most of the studies followed up patients for 8 to 12 weeks. Overall, 2% to 47% of participants in trials reporting any harms had specific adverse events while using CPAP. Quiz Ref IDIn general, harms were likely short-lived and could be alleviated with discontinuation of CPAP or additional interventions. These harms included oral or nasal dryness, eye or skin irritation, rash, epistaxis, and pain.
Of the 8 included RCTs,105,114-116,118,120-122 study durations ranged from 4 to 24 weeks. Across studies that reported any discontinuation because of adverse events, 7% of patients using MADs discontinued use, compared with 1% of control patients.105,116,122 The most commonly reported symptoms that occurred more often in active MAD study groups were oral dryness,105,114,115,122 excess salivation105,114,115,117,122 (although 1 study reported a higher rate of excessive salivation in the sham MAD group than in the active MAD group115), and oral mucosal, dental, or jaw symptoms.
Five trials reported harms of surgical treatment: 1 each of single-session soft palate radiofrequency surgery,123 temperature-controlled radiofrequency tissue ablation,128 uvulopalatopharyngoplasty,124 laser-assisted uvulopalatoplasty,126 and bariatric surgery.125 Reported harms included postoperative bleeding; rehospitalization; difficulty speaking, breathing, drinking, opening the mouth, and swallowing; change in vocal quality; hematomas; ulcerations; infections; temporary nasal regurgitation; pain; and rehospitalization after bariatric surgery because of an acute proximal gastric pouch dilation that required additional surgery (eTable 38 in the Supplement).
The single weight loss study that reported harms compared a very low energy diet with usual diet over 9 weeks.132 In the very low energy diet group, fewer than 10% of patients reported each of the following: constipation, dizziness, gout, and dry lips.
The summary of findings is presented in Table 6. Quiz Ref IDNo eligible studies directly evaluated the effectiveness or adverse outcomes of screening compared with no screening. Potential harms include overdiagnosis and overtreatment for asymptomatic people (with AHI ≥5) who would never have developed symptoms of or problems from OSA, costs, and additional testing (eg, future polysomnographies to follow up patients over time). Furthermore, no eligible studies were found that evaluated the effect of screening on psychological outcomes such as distress due to labeling or stigma.
Very few eligible studies evaluated the accuracy of questionnaires or prediction tools for distinguishing people in the general population who are more or less likely to have OSA. The only screening approach with at least 2 included studies suggesting possible accuracy was the MVAP score followed by in-home portable monitoring for detecting severe OSAS. Although this approach may have potential for screening, the evidence was limited by potential spectrum bias,154-158 with oversampling of high-risk participants and those with OSA and OSAS, which may substantially overestimate the accuracy that would be achieved in the general population. Quiz Ref IDSpectrum bias occurs when heterogeneity of test performance exists across subgroups and studies preferentially sample (intentionally or unintentionally) from a subgroup of the target population. None of the included studies evaluating MVAP prospectively measured calibration, often assessed by plotting the predicted risk vs an observed event rate,31 and none assessed clinical utility for improving health outcomes.
This review included fewer studies evaluating questionnaires or clinical prediction tools than some previously published reviews and guidelines,1,21,159 primarily because of the requirement that studies enroll asymptomatic adults or persons with unrecognized symptoms of OSA; referral populations (eg, to sleep clinics) were not eligible. The focus of previous reviews and guidelines was generally on diagnostic testing (of adults with symptoms suggestive of disordered sleep) rather than on screening (of asymptomatic people or those with unrecognized symptoms). Nevertheless, those reviews and guidelines generally reported low overall quality or strength of evidence for questionnaires and prediction tools.
Related to accuracy of diagnostic tests, there was limited evidence evaluating type II portable monitors. For type III and IV monitors, existing literature revealed some inconsistency, with wide ranges of sensitivity and specificity, especially for single-channel type IV monitors for detecting moderate to severe OSA. Nevertheless, many studies reported moderate to high positive likelihood ratios (>5) and moderate to low negative likelihood ratios (<0.2), and previous reviews and guidelines concluded that moderate-quality evidence shows that type III and IV monitors are “generally accurate to diagnose OSA, but have a wide and variable bias in estimating the actual AHI.”21,159 Evidence for type IV portable monitors is limited by inconsistency and imprecision. In addition, unlike other types of portable monitors, type IV monitors are limited by their inability to differentiate obstructive and central events.
Quiz Ref IDThis review found consistent evidence from RCTs that CPAP effectively reduces AHI to normal (<5) or near-normal levels, reduces excessive sleepiness, and reduces blood pressure. However, the clinical significance of mean reductions of 2 points on the ESS and 2 to 3 points for blood pressure measures is somewhat uncertain. For sleepiness, the data suggest a clinically significant reduction in most included trials because 85% of the trials in the meta-analysis for ESS that had mean baseline ESS scores of 10 or greater (indicating excessive daytime sleepiness) reported mean end-point ESS scores in the normal range of less than 10160,161 for the CPAP groups (mean end-point ESS score across studies was <8). However, the threshold for a clinically significant change in ESS score is somewhat uncertain. Although a recent review noted that experts consider a 1-point change clinically significant,21 other sources suggest that a greater change, of at least 3 or 4 points, should be the clinically significant threshold. For example, some trials that use ESS score as an outcome have considered a change of 4 or more points to be clinically significant for their sample size calculations or interpretation of findings.162-164 Regardless of what constitutes a clinically significant change, potential bias from the subjective nature of the ESS remains, and some authors have raised concerns about its construct validity (ie, uncertainty regarding whether it is an accurate measure of sleepiness).165-167
For blood pressure reduction, some authors suggest that a difference of more than 9/10 (systolic/diastolic) mm Hg is clinically meaningful for individuals.168-170 However, across a population, guidelines have suggested that reductions of 2 to 3 mm Hg for systolic blood pressure could result in a significant reduction in cardiovascular mortality (by 4%-5% for coronary heart disease and 6%-8% for stroke).171
MADs and weight loss programs also reduce AHI and excessive sleepiness, although the magnitudes of effects were generally less than with CPAP, and blood pressure reduction was not established. Although this review did not evaluate head-to-head studies (eg, directly comparing MADs with CPAP), previous comparative effectiveness reviews examining head-to-head trials reported smaller effect sizes for MADs than for CPAP for reducing AHI.21 Evidence on surgical treatments was limited by unknown consistency and imprecision, because only a single RCT evaluated each surgical technique studied.
Evidence on most health outcomes was limited; too few RCTs reported them or too few events occurred to make conclusions about the effectiveness for reducing mortality, cardiovascular events, or motor vehicle crashes. However, the meta-analysis for sleep-related quality of life found a significant benefit for CPAP, albeit with a small effect size.
Reporting of harms from treatment in the included studies was sparse. In general, the adverse events related to CPAP treatment were likely short-lived and could be alleviated with discontinuation of CPAP or additional interventions. No included studies reported on psychosocial harms of treatment, such as marital stress due to disruption of partner sleeping (eg, because of the noise of CPAP).
Adverse effects may limit adherence to treatment. A wide range of adherence to CPAP usage recommendations has been reported, ranging from about 30% to 85%.172 A systematic review reported that cohort studies with multivariable analyses for predictors of nonadherence showed that 14% to 32% of patients discontinued CPAP over 4 years and that patients used CPAP for an average of 5 hours per night; data were too limited to provide adherence rates for MADs.21 A recent Cochrane systematic review of 33 studies (2047 participants) found low- to moderate-quality evidence that 3 types of interventions can increase CPAP usage in CPAP-naive participants.172 However, they noted that trials did not assess people who have struggled to adhere, and the effect of improved CPAP usage on health outcomes remains unclear.
Consistent evidence from prospective cohort studies that focused on community-based participants supports the association between AHI and all-cause mortality. People with severe (AHI ≥30) or moderate to severe (AHI ≥15) OSA had a hazard ratio for death of 2.07 compared with controls when pooling data from multivariable analyses. There was also consistent evidence showing that people with severe or moderate to severe OSA have increased cardiovascular mortality. The cohort studies controlled for many potential confounders, but residual confounding attributable to health-related factors associated with OSA (eg, physical activity, diet) and generally not accounted for is possible.
This review had limitations. The ability to describe the direct evidence on the effectiveness or harms of screening was inadequate, because no studies comparing screened and unscreened populations were identified. Therefore, literature was reviewed that might establish an indirect chain of evidence from multiple questions that link screening to health outcomes. For the first question in that indirect pathway (KQ2), there was limited evidence that one screening approach might be useful to screen for severe OSAS, but the evidence was limited by potential spectrum bias, and no studies prospectively assessed calibration or clinical utility for improving health outcomes. In addition, this review did not evaluate the accuracy of individual physical examination findings. Questionnaires or clinical prediction tools were required to have multiple factors because previous systematic reviews have found limited utility of individual findings. A recent review of clinical examination accuracy, which was not limited to asymptomatic patients or those with unrecognized symptoms, found that (among individual symptoms or signs) the most useful observation for identifying patients with OSA was nocturnal choking or gasping, imparting a small increase in the likelihood of disease (summary likelihood ratio, 3.3 [95% CI, 2.1 to 4.6] when the diagnosis was established by AHI ≥10).1 The review found that many symptoms and signs provide limited information in determining the likelihood of OSA.1
There is uncertainty about the accuracy or clinical utility of all potential screening tools. Multiple treatments for OSA reduce AHI, ESS scores, and blood pressure. Trials of CPAP and other treatments have not established whether treatment reduces mortality or improves most other health outcomes, except for modest improvement in sleep-related quality of life.
Corresponding Author: Daniel E. Jonas, MD, MPH, 5034 Old Clinic Bldg, Chapel Hill, NC 27599 (email@example.com).
Correction: This article was corrected online on March 28, 2017, for an incorrect value in Figure 2.
Author Contributions: Dr Jonas had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Jonas, Amick, Feltner, Palmieri Weber, Harris.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Jonas, Amick, Feltner, Palmieri Weber, Arvanitis, Stine, Harris.
Critical revision of the manuscript for important intellectual content: Jonas, Amick, Palmieri Weber, Arvanitis, Lux, Harris.
Statistical analysis: Jonas, Amick, Feltner.
Obtained funding: Jonas.
Administrative, technical, or material support: Amick, Feltner, Palmieri Weber, Stine, Lux.
Supervision: Jonas, Harris.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Funding/Support: This research was funded under contract HHSA-290-2012-00015-I, Task Order 4, from the Agency for Healthcare Research and Quality (AHRQ), US Department of Health and Human Services, under a contract to support the USPSTF.
Role of the Funder/Sponsor: Investigators worked with USPSTF members and AHRQ staff to develop the scope, analytic framework, and key questions for this review. AHRQ had no role in study selection, quality assessment, or synthesis. AHRQ staff provided project oversight, reviewed the report to ensure that the analysis met methodological standards, and distributed the draft for peer review. Otherwise, AHRQ had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript findings. The opinions expressed in this document are those of the authors and do not reflect the official position of AHRQ or the US Department of Health and Human Services.
Additional Contributions: We gratefully acknowledge the following individuals for their contributions to this project, including AHRQ Staff (Tina Fan, MD, Robert McNellis, PA, and Tracy Wolff, MD), Evelyn Whitlock, MD (former Kaiser Permanente Research Affiliates EPC Director), and RTI International/University of North Carolina EPC Staff (Meera Viswanathan, PhD, Carol Woodell, BSPH, Christiane Voisin, MSLS, Jennifer Cook Middleton, PhD, Sharon Barrell, MA, and Loraine Monroe). The USPSTF members, expert consultants, peer reviewers, and federal partner reviewers did not receive financial compensation for their contributions. Dr Lohr, Ms Woodell, Ms Voisin, Dr Middleton, Ms Barrell, and Ms Monroe received compensation for their role in this project.
Additional Information: A draft version of the full evidence report underwent external peer review from 5 content experts (Ethan M. Balk, MD, Brown University; Indira Gurubhagavatula, MD, University of Pennsylvania; Jon-Erik C. Holty, MD, Stanford University; David Hostler, MD, Tripler Army Medical Center; Paul E. Peppard, PhD, University of Wisconsin-Madison) and 6 federal partner reviewers from the National Institutes of Health, the US Department of Veterans Affairs, and the Food and Drug Administration. Comments from reviewers were presented to the USPSTF during its deliberation of the evidence and were considered in preparing the final evidence review.
Editorial Disclaimer: This evidence report is presented as a document in support of the accompanying USPSTF Recommendation Statement. It did not undergo additional peer review after submission to JAMA.
Create a personal account or sign in to: