Evidence reviews for the US Preventive Services Task Force (USPSTF) use an analytic framework to visually display the key questions that the review will address to allow the USPSTF to evaluate the effectiveness and safety of a preventive service. The questions are depicted by linkages that relate interventions and outcomes. A dashed line depicts a health outcome that follows an intermediate outcome. For additional information, see the USPSTF Procedure Manual.12 AHI indicates apnea/hypopnea index; MAD, mandibular advancement device; OSA, obstructive sleep apnea; PAP, positive airway pressure.
The sum of the number of studies per key question (KQ) exceeds the total number of studies because some studies were applicable to multiple KQs. USPSTF indicates US Preventive Services Task Force.
ESS indicates Epworth Sleepiness Scale; OSA, obstructive sleep apnea; PAP, positive airway pressure.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Feltner C, Wallace IF, Aymes S, et al. Screening for Obstructive Sleep Apnea in Adults: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA. 2022;328(19):1951–1971. doi:10.1001/jama.2022.18357
Obstructive sleep apnea (OSA) is associated with adverse health outcomes.
To review the evidence on screening for OSA in asymptomatic adults or those with unrecognized OSA symptoms to inform the US Preventive Services Task Force.
PubMed/MEDLINE, Cochrane Library, Embase, and trial registries through August 23, 2021; surveillance through September 23, 2022.
English-language studies of screening test accuracy, randomized clinical trials (RCTs) of screening or treatment of OSA reporting health outcomes or harms, and systematic reviews of treatment reporting changes in blood pressure and apnea-hypopnea index (AHI) scores.
Data Extraction and Synthesis
Dual review of abstracts, full-text articles, and study quality. Meta-analysis of intervention trials.
Main Outcomes and Measures
Test accuracy, excessive daytime sleepiness, sleep-related and general health–related quality of life (QOL), and harms.
Eighty-six studies were included (N = 11 051). No study directly compared screening with no screening. Screening accuracy of the Multivariable Apnea Prediction score followed by unattended home sleep testing for detecting severe OSA syndrome (AHI ≥30 and Epworth Sleepiness Scale [ESS] score >10) measured as the area under the curve in 2 studies (n = 702) was 0.80 (95% CI, 0.78 to 0.82) and 0.83 (95% CI, 0.77 to 0.90). Five studies assessing the accuracy of other screening tools were heterogeneous and results were inconsistent. Compared with inactive control, positive airway pressure was associated with a significant improvement in ESS score from baseline (pooled mean difference, −2.33 [95% CI, −2.75 to −1.90]; 47 trials; n = 7024), sleep-related QOL (standardized mean difference, 0.30 [95% CI, 0.19 to 0.42]; 17 trials; n = 3083), and general health–related QOL measured by the 36-Item Short Form Health Survey (SF-36) mental health component summary score change (pooled mean difference, 2.20 [95% CI, 0.95 to 3.44]; 15 trials; n = 2345) and SF-36 physical health component summary score change (pooled mean difference, 1.53 [95% CI, 0.29 to 2.77]; 13 trials; n = 2031). Use of mandibular advancement devices was also associated with a significantly larger ESS score change compared with controls (pooled mean difference, −1.67 [95% CI, 2.09 to −1.25]; 10 trials; n = 1540). Reporting of other health outcomes was sparse; no included trial found significant benefit associated with treatment on mortality, cardiovascular events, or motor vehicle crashes. In 3 systematic reviews, positive airway pressure was significantly associated with reduced blood pressure; however, the difference was relatively small (2-3 mm Hg).
Conclusions and Relevance
The accuracy and clinical utility of OSA screening tools that could be used in primary care settings were uncertain. Positive airway pressure and mandibular advancement devices reduced ESS score. Trials of positive airway pressure found modest improvement in sleep-related and general health–related QOL but have not established whether treatment reduces mortality or improves most other health outcomes.
Obstructive sleep apnea (OSA) is a sleep disorder marked by episodes of narrowing and obstruction of the upper airway during sleep, resulting in reduction or cessation in breathing.1 OSA is defined as more than 5 events per hour of partial (hypopnea) or total (apnea) upper airway obstruction despite efforts to breathe.2 Apnea is defined as total airway obstruction (>90%) for more than 10 seconds, and hypopnea as a partial airway obstruction (>30%) sufficient to cause at least a 3% reduction in blood oxygen saturation or sleep arousals.3 The apnea-hypopnea index (AHI) is used to define the severity of OSA: mild (5-15 events per hour), moderate (16-30 events per hour), and severe (>30 events per hour). Standardized prevalence estimates using the 2012 American Academy of Sleep Medicine (AASM) scoring criteria were 33.2% for any OSA (AHI ≥5) and 14.5% for moderate to severe OSA (AHI ≥15).4 Risk factors for OSA include male sex,5 postmenopausal status,6 increasing age (40-70 years),7,8 and higher body mass index (BMI).5 A variety of adverse health outcomes are associated with untreated OSA, including cardiovascular disease events, coronary heart disease, heart failure, atrial fibrillation, and stroke. Severe OSA (AHI ≥30) is associated with increased all-cause mortality.9,10
In 2017, the US Preventive Services Task Force (USPSTF) concluded that the evidence was insufficient to assess the balance of benefits and harms of screening for OSA in asymptomatic adults (I statement).11 This updated review assessed the current evidence on OSA screening in individuals and settings relevant to US primary care and was used to update the USPSTF recommendation.
Figure 1 shows the analytic framework and key questions (KQs) that guided the review. Detailed methods are available in the full evidence review.13 In addition to the KQs, this review looked for evidence related to 2 contextual questions that focused on barriers to undergoing diagnostic testing for OSA and the association between AHI and health outcomes (eContextual Questions in the Supplement).
PubMed/MEDLINE, the Cochrane Library, and Embase were searched for English-language articles published through August 23, 2021 (eMethods in the Supplement). ClinicalTrials.gov was searched for unpublished studies. The searches were supplemented by reviewing reference lists of pertinent articles, studies suggested by peer reviewers, and comments received during public commenting periods. From August 23, 2021, through September 23, 2022, ongoing surveillance was conducted through article alerts and targeted searches of journals to identify major studies published in the interim that may affect the conclusions or understanding of the evidence and the related USPSTF recommendation.
Two investigators independently reviewed titles, abstracts, and full-text articles using prespecified eligibility criteria (eTable 1 in the Supplement). Disagreements were resolved by consensus. For all KQs, English-language studies of adults 18 years or older conducted in countries categorized as “very high” on the Human Development Index14 and rated as fair or good quality were included.
For KQ1 and KQ3 (direct evidence of benefits and harms of screening) and KQ2 (accuracy of screening tools), studies of asymptomatic adults with OSA or persons with unrecognized OSA symptoms were included. For KQ1 and KQ3, randomized clinical trials (RCTs) comparing screened groups with nonscreened groups and reporting on health outcomes were eligible. For KQ2, prospective cohort studies and cross-sectional studies assessing the accuracy of screening questionnaires or clinical prediction tools (alone or followed by an unattended home sleep test) compared with polysomnography conducted in a sleep laboratory were eligible. For KQ2, studies limited to persons referred to sleep laboratories for suspected OSA were excluded. For KQ3 (harms of screening), studies eligible for KQ1 or KQ2 that reported harms of screening or diagnostic tests (eg, false-positive results leading to unnecessary treatment, anxiety, distress, or stigma) were eligible.
For KQs 4 through 6 (benefits and harms of treatment), studies were limited to those of interventions considered first-line treatment for persons diagnosed with OSA (positive airway pressure or mandibular advancement devices [MADs]) compared with inactive control; other interventions (eg, weight loss interventions, oral surgical procedures) were excluded. For KQ4 (benefit of treatment for improving intermediate outcomes), good-quality, recent (within 5 years) systematic reviews comparing positive airway pressure or MADs with an inactive control and reporting on changes in blood pressure or AHI were included. For KQs on the benefits of treatment for improving health outcomes (KQ5) and on the harms of treatment (KQ6), RCTs of adults with a confirmed diagnosis of OSA were eligible.
For each study, 1 investigator extracted information about populations, tests or interventions, comparators, outcomes, settings, and designs, and a second investigator reviewed the information for completeness and accuracy. Two investigators independently assessed the quality of included studies using criteria defined by the USPSTF adapted for this topic supplemented with criteria from the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2)15 and from A Measurement Tool to Assess Systematic Reviews (AMSTAR)16 (eTables 2-8 in the Supplement). Each study was assigned a final quality rating of good, fair, or poor; disagreements were resolved by discussion and consensus.
Findings for each KQ were summarized in tables, figures, and narrative format. To determine whether meta-analyses were appropriate, the clinical and methodological heterogeneity of the studies were assessed following established guidance.17 For KQ5, random-effects restricted maximum likelihood models were conducted on continuous measures of sleepiness, general health–related quality of life (QOL), and sleep-related QOL associated with positive airway pressure and MAD use when at least 3 similar studies were available, analyzing the mean difference in change from the baseline score or the standardized mean difference (SMD). The meta command in Stata version 16 was used to conduct all quantitative analyses.18 The I2 statistic was used to assess the statistical heterogeneity in effects between studies.19-21 Statistical significance was assumed when 95% CIs of pooled results did not cross the null. All testing was 2-sided.
A total of 86 studies (reported in 101 articles; N = 11 051) were included (Figure 2) in the review. Individual study quality ratings are reported in eTables 2 through 8 in the Supplement.
Key Question 1. Does screening for OSA in adults improve health outcomes, including for specific subgroups of interest?
No eligible studies addressed this question.
Key Question 2. What is the accuracy of screening questionnaires, clinical prediction tools, and multistep screening approaches (eg, using a questionnaire followed by home-based oximetry/testing) in identifying persons in the general population who are more or less likely to have OSA, including for specific subgroups of interest?
Seven fair-quality studies (n = 2589)22-28 assessing clinical prediction tools or screening questionnaires compared with facility-based polysomnography were included, 4 of which were new to this review (Table 1).25-28 Two evaluated the Berlin Questionnaire,22,25 4 evaluated the STOP-BANG (snoring, tiredness, observed apnea, high blood pressure, BMI, age, neck circumference, gender) questionnaire,25-28 and 2 evaluated the Multivariable Apnea Prediction (MVAP) score—alone and when followed by an unattended home sleep test.23,24
The Berlin Questionnaire includes 10 questions about snoring, tiredness, and blood pressure and gathers information on age, sex, height, and weight to classify OSA risk.29 Two included studies of the Berlin Questionnaire enrolled different populations. One sampled Norwegians from the National Population Register.22 Of those who responded, 24% were classified as high risk on the Berlin Questionnaire. The final sample enrolled a population with a mean age of 48 years, 45% were women, the mean BMI was 28 (calculated as weight in kilograms divided by height in meters squared), and the median AHI was 6.4. Although the group receiving polysomnography oversampled high-risk participants (70% were high risk), the authors’ analyses adjusted for bias in the sampling procedure to report estimated screening properties for the general population. In contrast, the second study assessing the Berlin Questionnaire25 included a small (n = 43) but unselected sample of adults with type 2 diabetes recruited from a US general internal medicine clinic. A majority (53%) were female, the mean BMI was 38.3, and the mean AHI was 31.2.
The study enrolling Norwegian participants22 found suboptimal screening accuracy for AHI 5 or greater (sensitivity, 37%; specificity, 84%) and for AHI 15 or greater (sensitivity, 43%; specificity, 80%) (Table 2). The study enrolling US participants with type 2 diabetes from a general internal medicine clinic assessed accuracy for mild (AHI 5-14), moderate (AHI 15-29), and severe (AHI ≥30) OSA.25 Specificity of the Berlin Questionnaire was suboptimal for all categories of OSA severity (mild, 0%; moderate, 31%; severe, 26%). Sensitivity was higher for moderate OSA (89%) and for severe OSA (93%) but was lower for mild OSA (80%).
The STOP-BANG questionnaire includes 8 dichotomous items (snoring, tiredness, observed apnea, blood pressure, BMI, age, neck circumference, and gender).30,31 The 4 studies assessing the accuracy of the STOP-BANG questionnaire enrolled diverse populations and used different scoring criteria and additional variables to determine a positive screen result.25-28 Detailed characteristics of each study are reported in Table 1.
The heterogeneity of studies and scoring approaches limits the ability to assess consistency of results. Overall, estimates varied, and no study found both high sensitivity and high specificity (Table 2). One study enrolling US adults with type 2 diabetes found good sensitivity for detecting mild (87%), moderate (93%), and severe (94%) OSA but very low specificity for the same subgroups (mild, 0%; moderate, 19%; severe, 15%).25 In contrast, the study enrolling Spanish adults with Alzheimer disease found modest sensitivity (61%) and somewhat better specificity (76%) for severe OSA.26 The study of Korean adults found moderate sensitivity (62%) and specificity (64%) for detecting mild through severe OSA.27The study of adults receiving opioids for chronic pain provided accuracy data for the STOP-BANG questionnaire alone as well as for the STOP-BANG questionnaire plus resting daytime Spo2 (first stage). Results for various cutoffs are reported in Table 2; across all screening approaches, sensitivity for the STOP-BANG to detect moderate to severe OSA was very good, but specificity was limited.
The MVAP score combines symptoms of snoring, choking, and witnessed apnea events with BMI, age, and sex.32 It rates apnea risk between 0 and 1, with 0 representing the lowest risk and 1 representing the highest risk. The 2 included studies assessing the MVAP were conducted by the same research group from Philadelphia.23,24 One study evaluated Medicare recipients (n = 452) from the city’s greater metropolitan area, most of whom (74%) had daytime sleepiness.23 The percentage with OSA was not reported, but 27% had OSA syndrome (OSAS) defined as AHI 5 or greater and Epworth Sleepiness Scale [ESS] score greater than 10. The second study evaluated patients with hypertension from internal medicine practices at a Veterans Affairs medical center and a university-based hypertension clinic (n = 250).24 Eighty percent of participants had OSA (AHI ≥5); of those, 22% had moderate OSA and 25% had severe OSA. Twenty-five percent of all participants had OSAS. The mean ages of participants were 71 years23 and 53 years,24 60% to 64% were non-White, and the mean BMIs were 30 to 32. The study of Medicare recipients included 70% women23; the other study included 20% women.24 Key quality limitations included concern for attrition bias24 and moderate concern for selection bias or spectrum bias (with high prevalence of OSA, OSAS, and/or sleepiness among those receiving polysomnography) (eTables 2 and 3 in the Supplement).23,24
Both studies reported operating characteristics of MVAP to predict severe OSAS (AHI ≥30 and ESS score >10) using MVAP cutoff scores of 0.48 to 0.49 (Table 2). Sensitivity was 90%23 and 92%,24 with specificity of 64% and 44%, respectively (95% CIs not reported). The study of Medicare recipients reported reasonable discrimination (area under the curve [AUC], 0.78 [95% CI, 0.71-0.85]), whereas the other study found inadequate discrimination (AUC, 0.68 [95% CI, 0.67-0.70]). An AUC less than 0.70 is thought to indicate inadequate discrimination.33,34 Calibration, which is often assessed by plotting the predicted risk vs the observed rate,33 was not reported.
The study of patients with hypertension24 also reported operating characteristics of MVAP to predict any OSAS (AHI ≥5 and ESS score >10) using an MVAP cutoff score of 0.559. That study reported a sensitivity of 69.4%, a specificity of 56.5%, and an AUC of 0.61.
The same 2 studies described in the previous section also reported measures of discrimination for the MVAP score followed by an unattended home sleep test compared with in-laboratory polysomnography (Table 1).23,24 Both reported characteristics to predict severe OSAS (AHI ≥30 and ESS score >10) using different home sleep test AHI cutoffs: 1 used 15,23 and the other used 18.24 Both studies found better operating characteristics with MVAP followed by a home sleep test than with MVAP alone (sensitivity, 88%-91%; specificity, 72%-76%; AUC, 0.80-0.83).
The study of patients with hypertension also reported operating characteristics of MVAP to predict any OSAS (AHI ≥5 and ESS score >10) using a home sleep test AHI cutoff of 13.5. It reported a sensitivity of 81%, a specificity of 54%, and an AUC of 0.67.
Key Question 3. What are the harms associated with screening or subsequent diagnostic testing for OSA, including for specific subgroups of interest?
No eligible study addressed this question.
Key Question 4. How effective is treatment with positive airway pressure or MADs for improving intermediate outcomes (ie, the AHI or blood pressure) in persons with OSA, including for specific subgroups of interest?
Four systematic reviews comparing positive airway pressure or MADs with inactive control for reducing AHI or blood pressure were included (eTable 9 in the Supplement).35-38 For blood pressure outcomes, 1 review of MADs found benefit associated with treatment compared with inactive control (by 1-2 mm Hg); however, differences between groups were imprecise and not statistically significant (eTable 9 in the Supplement).35 For positive airway pressure, pooled estimates from 1 review found benefit associated with positive airway pressure compared with control for reducing mean 24-hour blood pressure (−2.63 mm Hg [95% CI, −3.86 to −1.39]; 8 trials; n = 994); pooled results for measures of daytime systolic blood pressure and diastolic blood pressure were also significantly lower with positive airway pressure vs control, ranging from −2.76 to −1.98 mm Hg, respectively (eTable 9 in the Supplement). Results from 2 additional reviews focused on specific populations, including participants with treatment-resistant hypertension, are reported in eTable 9 in the Supplement.
Two reviews of positive airway pressure reported on the difference between groups in change from baseline AHI.37,38 One found a greater reduction in AHI associated with positive airway pressure than with controls (pooled mean difference, −23.41 events per hour [95% CI, −28.51 to −18.30]; 11 trials; n = 832).37 The second review—which limited inclusion to studies of asymptomatic adults with OSA or studies of minimally symptomatic, nonsleepy adults—found consistent but imprecise pooled estimates (eTable 9 in the Supplement).38
Key Question 5. How effective is treatment with positive airway pressure or MADs for improving health outcomes in persons with OSA, including for specific subgroups of interest?
This review included 73 good- or fair-quality RCTs (reported in 87 articles) that reported at least 1 eligible health outcome among groups treated with positive airway pressure or a MADs compared with inactive control.
Sixty-three RCTs (74 articles) comparing positive airway pressure with sham positive airway pressure (29 RCTs, 33 articles)39-71 or another inactive control (34 RCTs, 41 articles)72-112 reported at least 1 eligible health outcome. Most trials identified participants from sleep clinics or referrals, and none focused on persons who were screen detected in primary care settings. Detailed characteristics are reported in eTables 10 and 11 in the Supplement.
Most trials (53) followed up participants for 12 weeks or less; 10 trials followed up participants over a longer duration (16 to 24 weeks [5 trials],53,78,87,105,111 52 weeks [3 trials],74,96,108 a median of 4 years [1 trial],75 and a median of 4.7 years [1 trial]).97 The mean age of enrolled populations ranged from 44 to 78 years, and most trials enrolled populations with a mean age of 40 to 59 years; 7 enrolled populations with a mean age of 65 years or older.43,61,79,93,96,97,100 The majority of participants in most trials were men; 1 trial limited enrollment to women,77 and 3 enrolled a majority of women.104,109,113 Most trials did not describe the race and ethnicity of enrolled populations, and those that did (14 trials) used heterogeneous categories and varying levels of detail (eTables 10 and 11 in the Supplement). The mean BMI in most trials was 30 to 36 (range, 25-47). The mean or median baseline AHI (or similar measure) for most trials was in the severe OSA range (AHI ≥30); 13 trials reported mean baseline AHI in the moderate OSA range (AHI 16-30),43,58,61,66,76,80,89,96,97,105,108,109,111 and 6 reported a mean baseline AHI in the mild OSA range (AHI 5-15).69,78,81,83,101,107 The severity of OSA for participants enrolled in trials most frequently ranged from moderate to severe (29 trials) or from mild to severe (16 trials). Seventeen trials limited participants to more narrow ranges: mild only,83,107 mild to moderate or moderate only,58,69,76,97,100,101,105 or severe only.40,59,79,91-94,104 One trial did not report sufficient data to determine the range of OSA severity of participants.78 The mean or median baseline ESS score was 10 or greater in most trials, indicating excessive daytime sleepiness (EDS). Eighteen trials reported a mean baseline ESS score of less than 10,40,43,46,66,73-75,78,79,85,87,92,97,100,104,108,109,111 and 6 trials did not report a baseline ESS score.
Thirty-one RCTs reported on the number of deaths during the study period (eTable 12 in the Supplement). The majority (28 RCTs) reported mortality rates at 24 weeks or less, and most of these (25 RCTs) reported no deaths in any study group (eTable 12 in the Supplement). Two reported on mortality over a median duration of 4 to 5 years; 1 (n = 723) reported 8 deaths in the positive airway pressure group and 3 in the control group (incidence density ratio, 2.6 [95% CI, 0.70-11.8]; P = .16),75 and the second (n = 364) found a similar number of deaths among the positive airway pressure and control groups (8% vs 7%, respectively).97
Twenty RCTs reported on QOL using the 36-Item Short Form Health Survey (SF-36)40,46,50,59,60,67-69,76,78,83,86,89,94,96,105,107,108,111,112; most trials reported changes on the SF-36 physical component summary score and the mental component summary score. Pooled estimates in change from baseline SF-36 mental component summary score found a significantly greater improvement associated with positive airway pressure compared with inactive control (2.20 [95% CI, 0.95-3.44]; 15 trials; n = 2345).40,46,50,60,67-69,78,86,94,105,107,108,111,112 Similarly, pooled estimates for change in SF-36 physical component summary score from baseline found significantly greater improvement associated with positive airway pressure than with control (1.53 [95% CI, 0.29-2.77]; 13 trials; n = 2031 participants) (Table 3; eFigure 1 in the Supplement).40,46,50,60,67-69,86,94,107,108,111,112 The pooled estimates for change from baseline SF-36 mental component summary score and SF-36 physical component summary score associated with positive airway pressure were smaller than the range considered a minimal clinically important difference (MCID), which is 4 to 7 for both SF-36 component summary scores.116,117 Few RCTs reported on other measures of QOL. Few studies reported on other QOL measures; overall, results were mixed (eTable 12 in the Supplement).
Seventeen RCTs assessed sleep-related QOL: 6 using the Sleep Apnea Quality of Life Index (SAQLI),54,67,70,78,89,96 10 using the Functional Outcomes of Sleep Questionnaire (FOSQ),40,58-60,65,69,76,84,94,107,111 and 1 using the Quebec Sleep Questionnaire.79 The meta-analysis (combining all measures) found that positive airway pressure was associated with a small but statistically significant improvement in sleep-related QOL compared with controls (SMD, 0.30 [95% CI, 0.19-0.42]; 17 trials; n = 3083) (eFigure 2 in the Supplement). The subgroup analysis by mean baseline ESS score found a similar but slightly larger effect size in trials with a mean ESS score of 10 or greater (SMD, 0.35 [95% CI, 0.22-0.49]; 11 trials, n = 2228). In studies with a mean baseline ESS score of less than 10, the effect size was smaller and the pooled estimate was not statistically significant (eFigure 4 in the Supplement). Results shown as a mean difference in scores for each sleep-related QOL measure are provided in eFigure 3 in the Supplement and summarized in Table 3. For both the SAQLI and FOSQ, the pooled mean difference falls below the range considered an MCID.
Forty-seven trials reported sufficient ESS data to include in meta-analyses. Most were 12 weeks or less in duration; 7 followed up participants for 24 weeks,53,105,111 48 to 52 weeks,74,96,108 or longer.75 The meta-analyses found that positive airway pressure reduced mean ESS scores more than controls (pooled mean difference, −2.33 [95% CI, −2.75 to −1.90]; 47 trials; n = 7024) (Figure 3). The pooled mean difference is within the range considered an MCID for the ESS (−2 to −3).114,115 These analyses found substantial statistical heterogeneity that may be due to variation in positive airway pressure devices, participant characteristics (eg, baseline ESS score), treatment adherence, study duration, or chance; however, no clear explanation was found. As shown in Figure 3, heterogeneity is lower in subgroups defined by narrow ranges of OSA severity (severe only and mild only or mild to moderate vs mild to severe) (Figure 3). However, the meta-analyses by OSA severity subgroup (4 categories: mild to severe, mild only and mild to moderate, moderate only and moderate to severe, and severe only) did not find a clear difference by OSA severity. Differences in mean score change were −2.61 for mild to severe, −1.91 for mild only and mild to moderate, −2.21 for moderate only and moderate to severe, and −3.08 for severe only, and CIs overlapped; the analysis still found considerable statistical heterogeneity within the mild to severe group and the moderate only or moderate to severe group (Figure 3).
Fewer studies reported on other health outcomes (eTable 12 in the Supplement). Three RCTs reported on the incidence of motor vehicle crashes over 12 to 52 weeks, and none found a statistically significant difference between groups.53,85,96 Ten reported on the incidence of 1 or more heterogeneous cardiovascular outcomes.46,53,58,70,75,78,85,96,97,111 Six RCTs (1773 total participants) reported on the incidence of myocardial infarction; in 4 of these, a total of 1 myocardial infarction occurred (combined) in either group over 3 weeks to 1 year.58,78,85,96 Two RCTs reported on outcomes over a median of 4 to 5 years; 1 (n = 723) reported 2 myocardial infarctions in the positive airway pressure group and 8 in the control group,75 and the second (n = 244) found a similar number of myocardial infarctions in the positive airway pressure and control groups (9% vs 7%, respectively).97
RCTs reporting on other health outcomes (eg, angina, transient ischemic attacks, measures of cognitive impairment) are shown in eTable 12 in the Supplement. Overall, too few events occurred to draw conclusions.
Twelve RCTs (15 articles) evaluated the benefit of MADs for improving health outcomes (eTable 13 in the Supplement).76,89,120-132 Four studies compared MADs with sham devices that did not advance the mandible,120,121,130-132 1 compared a MAD with a placebo tablet,76 2 compared MADs with no treatment,123,129 and 1 compared a MAD with conservative management of OSA with weight loss.89 All studies recruited participants with known or suspected OSA from specialty clinics, such as sleep medicine or otolaryngology. Treatment durations ranged from 4 to 12 weeks for most studies; however, 1 lasted for only 1 week123 and 1 for 24 weeks.120,121 The mean age of enrolled participants ranged from 46 to 58 years. In 11 trials reporting on sex, the majority of participants were men. No study reported the percentage of minority participants. Almost all studies included participants with mild to moderate OSA, and 6 also included participants with severe OSA.89,122,123,125,128,132
Four trials reported on deaths in each group over 1 to 12 weeks of follow-up,76,123,129,132 3 reported no participant deaths, and 1 reported a single death in the control group.132
Six RCTs reported on at least 1 QOL measure.76,89,120,121,129,131,132 Overall, results were mixed, with some studies finding no significant improvement in QOL from using MADs,89,120,121,131 some reporting possible benefits for some measures or subscales but not for others,76,132 and some reporting benefits for some overall QOL scores.129 Further details and specific data are provided in eTable 14 in the Supplement. Because of inconsistency, imprecision, and heterogeneity of reporting, findings are insufficient to make conclusions about the potential benefits of using MADs for improving QOL.
Ten RCTs of MADs provided sufficient data on change from ESS scores from baseline to be included in pooled estimates76,89,122-125,128-130,132; MADs were associated with significantly greater reduction from baseline ESS scores than controls (−1.67 [95% CI, −2.09 to −1.25]; 10 trials; n = 1540 participants) (eFigure 5 in the Supplement). The pooled mean difference, however, falls below the range considered an MCID for the ESS.114,115
This review included 1 trial assessing each of the following outcomes for participants using MADs over 6 to 12 weeks: cognitive impairment,76 motor vehicle crashes,129 cardiovascular events,129 and headaches.131 Specific data are provided in eTable 14 in the Supplement. Because of unknown consistency, imprecision, and very small numbers of events, findings were insufficient to make conclusions about the potential benefits of MADs for these outcomes.
Key Question 6. What are the harms associated with treatment of OSA using positive airway pressure or MADs, including for specific subgroups of interest?
Reporting of harms in the included RCTs was sparse, and most did not report information on harms. Nineteen RCTs (reported in 24 articles) reported on harms associated with treatment of OSA, including 9 trials of positive airway pressure,49,53,54,68,69,83,89,101,105,113,133,134 9 of MADs,89,120,121,123-132 and 1 of positive airway pressure and MADs.89 Characteristics and detailed results of all 19 studies reporting harms are provided in eTables 10, 11, 13, 15, and 16 in the Supplement.
Of the 10 included RCTs of positive airway pressure, 6 compared positive airway pressure with a sham device,49,53,54,68,69,113,133,134 and 4 compared positive airway pressure with another control (eg, oral placebo, usual care).83,89,101,105 Most enrolled fewer than 100 persons; 1 trial, the Apnea Positive Pressure Long-term Efficacy Study,53,54 enrolled more than 1000 participants. The majority of participants were men, the mean age ranged from 42 to 62 years, and most participants were overweight or obese (mean BMI, 27-39). Most of the studies followed up patients for 8 to 12 weeks, and 2 lasted 24 weeks.53,54,105
Outcomes reported were heterogeneous, and detailed results are reported in eTable 15 in the Supplement. In general, harms related to positive airway pressure treatment were likely short-lived and could have been alleviated by discontinuing treatment with positive airway pressure or by supplementing positive airway pressure with additional interventions. Overall, 1% to 47% of participants in trials of positive airway pressure reporting any harms had specific adverse events while using positive airway pressure, including claustrophobia, oral or nasal dryness, eye or skin irritation, rash, nosebleeds, and pain.
Ten RCTs reported harms related to MAD use.89,120,121,123-132 Most RCTs (6) lasted 4 to 8 weeks, 1 lasted a single week,123 1 lasted 10 weeks,89 1 lasted 12 weeks,124 and 1 lasted 24 weeks.120,121 Across 3 studies that reported any discontinuation of treatment because of adverse events, 7% of patients in the active MAD group discontinued MAD use because of harms compared with 1% of patients in the control group.89,129,132 In 4 RCTs, rates of oral dryness ranged from 5% to 33% in the active MAD group compared with 0% to 3% in the control group.89,120,121,124,129 Six studies reported rates of excess salivation.89,120,121,124-127,129,131 Four trials reported significantly higher rates of excessive salivation associated with MAD use than with sham MAD or no treatment,89,120,121,129 In 7 studies, adverse oral mucosal, dental, or jaw symptoms ranged from 17% to 74% in MAD groups compared with 0% to 17% in the sham group, no-treatment group, or conservative management group. Two studies reported that there was a statistically significant difference only in the percentage who experienced jaw discomfort and tooth tenderness in the MAD group compared with that in the sham group.125-127,131
This systematic review synthesized evidence relevant to screening for OSA in adults. Table 4 summarizes findings, including an assessment of the strength of evidence for each KQ. To date, there is no direct evidence from trials on the benefits and harms of screening for OSA vs no screening. Potential harms of routine screening include overdiagnosis and overtreatment for asymptomatic persons with OSA (AHI ≥5) who never had symptoms of OSA or adverse health outcomes from OSA. Other potential harms include costs associated with referrals and additional testing (eg, future polysomnography for follow-up care).
This review identified few eligible studies evaluating the accuracy of questionnaires or prediction tools for distinguishing persons in the general population who are more or less likely to have OSA. No included screening approach was assessed by more than 2 included studies, which limits the ability to draw conclusions about the accuracy of screening tools in primary care. The screening approach evaluated by 2 studies, the MVAP score followed by an unattended home sleep test for detecting severe OSAS (AHI ≥30 and ESS score >10), may have promise for screening, but the evidence was limited by potential spectrum bias135-139 due to oversampling of high-risk participants and of those with OSA and OSAS, which may substantially overestimate the accuracy of using the MVAP score to screen for OSA in the general population. The included studies evaluating MVAP enrolled populations with a high prevalence of OSAS (≥25%),23,24 OSA (AHI ≥5 for 80% of participants),24 and sleepiness (74%).23
This review included fewer studies evaluating questionnaires or clinical prediction tools than some previously published reviews and guidelines,9,140,141 primarily because of the requirement to include studies that enrolled asymptomatic adults or adults with unrecognized symptoms of OSA; referral populations (eg, to sleep clinics) were not eligible. Previous reviews and guidelines focused generally on diagnostic testing (of adults with symptoms suggestive of disordered sleep) rather than on screening (of asymptomatic persons with OSA or those with unrecognized symptoms of OSA). Nevertheless, these reviews and guidelines generally reported low overall quality and strength of evidence for questionnaires and prediction tools.
This review found consistent evidence from good- and fair-quality RCTs that positive airway pressure reduces excessive daytime sleepiness and may improve general health–related and sleep-related QOL. However, benefit associated with positive airway pressure for both general health–related and sleep-related QOL measures falls short of the range considered an MCID (Table 3), and the clinical significance of the 2-point mean reduction on the ESS is somewhat uncertain. For excessive sleepiness, the data suggest a clinically significant reduction in most included trials because 85% of the trials in the meta-analysis for ESS with mean baseline ESS scores of 10 or greater (indicating excessive daytime sleepiness) reported mean end point ESS scores in the normal range of less than 10142,143 for the positive airway pressure groups (mean end point ESS score <8). However, the threshold for a clinically significant change in ESS score is somewhat uncertain. Although recent systematic reviews noted that experts consider a 1-point change in ESS score clinically significant,9 other sources suggest that a 2- to 3-point change114,115 or a 3- to 4-point change should be the clinically significant threshold for its sample size calculations or interpretation of findings.144-146 Also, the American College of Chest Physicians’ outcome experts evaluating the ESS informally stated that a clinically significant change in the ESS score probably is at least 3 points and cited a specific example that a reduction of 1 point (eg, from 3 [high] to 2 [moderate]) on 2 of 7 ESS domains was unlikely to be clinically relevant (Jon-Erik C. Holty, MD, MS, Stanford University, email, October 2015). Regardless of the clinically significant threshold level, the subjective nature of the ESS creates potential bias in trials of treatment (eg, overreporting of improvements in sleepiness after receiving treatment), and some authors have raised concerns about its construct validity (ie, authors have expressed uncertainty regarding whether it is an accurate measure of sleepiness).147-149
For blood pressure reduction (KQ4), recent systematic reviews found that MAD and positive airway pressure are associated with a reduction in blood pressure of 2 to 3 mm Hg, and 1 review limited to populations with resistant hypertension found a slightly higher mean reduction (5 mm Hg). Some experts suggest that a difference of more than 9 mm Hg systolic/10 mm Hg diastolic is clinically meaningful for patients.150-152 However, guidelines have suggested that across a population, a smaller reduction in systolic blood pressure (2-3 mm Hg) could result in a clinically significant reduction in cardiovascular mortality (4%-5% for coronary heart disease and 6%-8% for stroke).153 Even though MADs and positive airway pressure have been shown to reduce mean blood pressure, no trial to date has shown a significant reduction in mortality or cardiovascular disease.
Evidence on most health outcomes was limited (ie, too few RCTs reported on outcomes or too few events occurred to evaluate the effectiveness of positive airway pressure for reducing mortality or motor vehicle crashes). As summarized in the eContextual Questions in the Supplement, a relatively large body of observational evidence supports an association between severe OSA and increased risk of many adverse health outcomes, including cardiovascular events, mortality, and cognitive impairment. Some observational studies suggest that the risk of such outcomes increases with each level of OSA severity, which may indicate a dose-response effect; however, this finding is not consistent across all studies or outcomes. In addition, findings of increased risk associated with severe OSA are the strongest among male populations; however, it is difficult to assess whether these relationships do not hold for female populations or reflect sparse evidence on female populations compared with male populations. Observational studies focused on this association are limited, however, primarily owing to potential confounding.
Reporting of harms from treatment in the included studies was sparse. In general, the adverse events related to positive airway pressure treatment were likely short-lived and could have been alleviated by discontinuing treatment with positive airway pressure or by supplementing positive airway pressure with additional interventions. Common adverse events included oral or nasal dryness, eye or skin irritation, and rash. Common adverse effects from MADs included oral or nasal dryness, excessive salivation, and jaw discomfort.
Evidence included in the current review suggests several important research needs. To better understand the potential effectiveness of screening for OSA, RCTs of asymptomatic persons (or those with unrecognized symptoms of OSA) that directly compare screening with no screening and assess health outcomes are needed. To better determine the accuracy of screening questionnaires and clinical prediction tools when used in the general population (related to KQ2), additional studies are needed; such studies should aim to include a representative community population, to avoid spectrum bias, and to further evaluate promising screening approaches (eg, MVAP followed by an unattended home sleep test) as well as other approaches assessed in similar populations for which there were few studies, such as the Berlin Questionnaire and STOP-BANG questionnaire. Trials of treatment (positive airway pressure and MAD) that enroll participants who are screen-detected in primary care settings are needed; results of trials that enroll participants referred for OSA symptoms and other sleep issues may not be applicable to populations who are screen-detected.
This review has several limitations. First, studies of screening accuracy were required to have used in-laboratory polysomnography as the reference standard. This is similar to the approach used in previous systematic reviews. Second, studies that focused on the benefits and harms of treatment were limited to studies of interventions considered first-line treatment for persons with newly detected OSA (positive airway pressure and MAD); studies of interventions primarily offered to persons who do not benefit from or tolerate positive airway pressure or MAD were excluded. Third, some of the meta-analyses of RCTs evaluating the benefits of positive airway pressure (KQ5) found substantial statistical heterogeneity. Although a clear explanation for all statistical heterogeneity was not found, possible explanations include variation in enrolled populations, positive airway pressure devices (eg, machines, masks, humidifiers, filters, cushions), apnea and hypopnea definitions, adherence, study duration, study methods, or chance.
The accuracy and clinical utility of OSA screening tools that could be used in primary care settings were uncertain. Positive airway pressure and mandibular advancement devices reduced ESS score. Trials of positive airway pressure found modest improvement in sleep-related and general health–related QOL but have not established whether treatment reduces mortality or improves most other health outcomes.
Corresponding Author: Cynthia Feltner, MD, MPH, Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, 725 Martin Luther King Jr Blvd, CB#7295, Chapel Hill, NC 27599 (email@example.com).
Accepted for Publication: September 19, 2022.
Author Contributions: Dr Feltner had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Feltner, Wallace, Hicks, Voisin, Jonas.
Acquisition, analysis, or interpretation of data: Feltner, Wallace, Aymes, Cook Middleton, Hicks, Schwimmer, Baker, Balio, Moore, Jonas.
Drafting of the manuscript: Feltner, Wallace, Aymes, Cook Middleton, Hicks, Schwimmer, Baker, Moore, Voisin, Jonas.
Critical revision of the manuscript for important intellectual content: Feltner, Wallace, Hicks, Balio, Jonas.
Statistical analysis: Feltner, Wallace, Aymes, Hicks.
Obtained funding: Feltner, Jonas.
Administrative, technical, or material support: Feltner, Cook Middleton, Schwimmer, Baker, Moore, Voisin, Jonas.
Supervision: Feltner, Jonas.
Conflict of Interest Disclosures: Dr Aymes reported receiving a Health Resources and Services Administration Preventive Medicine Training Grant. No other disclosures were reported.
Funding/Support: This research was funded under contract HHSA-75Q80120D00007, Task Order 01, from the Agency for Healthcare Research and Quality (AHRQ), US Department of Health and Human Services, under a contract to support the US Preventive Services Task Force (USPSTF).
Role of the Funder/Sponsor: Investigators worked with USPSTF members and AHRQ staff to develop the scope, analytic framework, and key questions for this review. AHRQ had no role in study selection, quality assessment, or synthesis. AHRQ staff provided project oversight, reviewed the evidence review to ensure that the analysis met methodological standards, and distributed the draft for public comment and review by federal partners. Otherwise, AHRQ had no role in the conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript findings. The opinions expressed in this document are those of the authors and do not reflect the official position of AHRQ or the US Department of Health and Human Services.
Additional Contributions: We gratefully acknowledge the following individuals for their contributions to this project, including AHRQ staff (Justin Mills, MD, MPH, and Tracy Wolff, MD, MPH) and RTI International–University of North Carolina–Chapel Hill Evidence-based Practice Center (EPC) staff (Carol Woodell, BSPH, Roberta Wines, MPH, Staci Rachman, BA, Sharon Barrell, MA, Loraine Monroe, and Teyonna Downing). The USPSTF members, expert reviewers, and federal partner reviewers did not receive financial compensation for their contributions. Ms Woodell, Ms Wines, Ms Rachman, Ms Barrell, Ms Monroe, and Ms Downing received compensation for their role in this project.
Additional Information: A draft version of the full evidence review underwent external peer review from 3 content experts (Sean M. Caples, DO, MS, Mayo Clinic; Jon-Erik C. Holty, MD, MS, Stanford University; Paul E. Peppard, PhD, MS, University of Wisconsin-Madison) and 5 federal partner reviewers (Centers for Disease Control and Prevention; National Institute of Dental and Craniofacial Research; National Heart, Lung, and Blood Institute; National Institute on Minority Health and Health Disparities; and National Institutes of Health Office of Research on Women’s Health). Comments from reviewers were presented to the USPSTF during its deliberation of the evidence and were considered in preparing the final evidence review. USPSTF members and peer reviewers did not receive financial compensation for their contributions.
Editorial Disclaimer: This evidence review is presented as a document in support of the accompanying USPSTF recommendation statement. It did not undergo additional peer review after submission to JAMA.