Screening for Gestational Diabetes: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force

RESULTS A total of 76 studies were included (18 randomized clinical trials [RCTs] [n = 31 241], 2 nonrandomized intervention studies [n = 190], 56 observational studies [n = 261 678]). Direct evidence on benefits of screening vs no screening was limited to 4 observational studies with inconsistent findings and methodological limitations. Screening was not significantly associated with serious or long-term harm. In 5 RCTs (n = 25 772), 1-step (International Association of Diabetes and Pregnancy Study Group) vs 2-step (Carpenter and Coustan) screening was significantly associated with increased likelihood of gestational diabetes (11.5% vs 4.9%) but no improved health outcomes. At or after 24 weeks of gestation, oral glucose challenge tests with 140and 135-mg/dL cutoffs had sensitivities of 82% and 93%, respectively, and specificities of 82% and 79%, respectively, against Carpenter and Coustan criteria, and a test with a 140-mg/dL cutoff had sensitivity of 85% and specificity of 81% against the National Diabetes Group Data criteria. Fasting plasma glucose tests with cutoffs of 85 and 90 mg/dL had sensitivities of 88% and 81% and specificities of 73% and 82%, respectively, against Carpenter and Coustan criteria. Based on 8 RCTs and 1 nonrandomized study (n = 3982), treatment was significantly associated with decreased risk of primary cesarean deliveries (relative risk [RR], 0.70 [95% CI, 0.54-0.91]; absolute risk difference [ARD], 5.3%), shoulder dystocia (RR, 0.42 [95% CI, 0.23-0.77]; ARD, 1.3%), macrosomia (RR, 0.53 [95% CI, 0.41-0.68]; ARD, 8.9%), large for gestational age (RR, 0.56 [95% CI, 0.47-0.66]; ARD, 8.4%), birth injuries (odds ratio, 0.33 [95% CI, 0.11-0.99]; ARD, 0.2%), and neonatal intensive care unit admissions (RR, 0.73 [95% CI, 0.53-0.99]; ARD, 2.0%). The association with reduction in preterm deliveries was not significant (RR, 0.75 [95% CI, 0.56-1.01]).

G estational diabetes is diabetes that develops during pregnancy. 1,2 The prevalence of gestational diabetes in the US has typically been estimated at 5.6% to 9.2% when measured from 2007 to 2016 3-6 but may be up to 3-fold higher depending on the diagnostic criteria used. 7,8 Gestational diabetes is usually asymptomatic but is associated with increased risk for several pregnancy and neonatal complications. 9 In 2014, the US Preventive Services Task Force (USPSTF) recommended screening for gestational diabetes in asymptomatic pregnant women after 24 weeks of gestation (B recommendation). 10 The USPSTF found that evidence was insufficient to screen before 24 weeks of gestation (I statement). This evidence report was conducted to update the 2012 review 11 to inform updated USPSTF recommendations.

Scope of the Review
Detailed methods and additional study details are available in the full evidence report. 9 Figure 1 shows the analytic framework and key questions (KQs) that guided the review. KQ5 is addressed only in the full report. KQ3, comparing different screening strategies, was added for this update. This review did not address screening for preexisting or overt diabetes in early pregnancy.

Data Sources and Searches
Ovid MEDLINE and EMBASE, and CINAHL via EBSCOhost, were searched from 2010 to May 22, 2020 (eMethods 1 in the Supplement). Clinical trial registries and reference lists (including the 2012 review) were reviewed. Ongoing surveillance was conducted to identify major studies published since May 2020 that may affect the conclusions or understanding of the evidence and the related USPSTF recommendation.

Study Selection
Two investigators independently reviewed titles and abstracts, then full-text articles using predefined eligibility criteria (eMethods 2 in the Supplement). The population for screening and test accuracy was pregnant women without known preexisting diabetes mellitus. For treatment, the population was women with gestational diabetes or hyperglycemia. For benefits and harms of screening, comparative effectiveness of screening approaches, and screening test accuracy, studies using 1-step (diagnostic test only) or 2-step (diagnostic test in women with a positive screening test result) screening strategies at any time during pregnancy were included (eMethods 3 in the Supplement). In 2-step strategies, the screening test was measurement of fasting plasma glucose level, a 50-g oral glucose challenge test (OGCT), a risk factor-based tool, or glycated hemoglobin (HbA 1c ) concentration. For benefits of screening and treatment, comparisons were against no screening or treatment, respectively. For harms of screening, studies comparing outcomes before and after a gestational diabetes diagnosis or comparing women with gestational diabetes aware of their diagnosis vs those unaware were included. To evaluate potential labeling harms, studies on receipt of delivery and perinatal interventions among women diagnosed with gestational diabetes vs those without a diagnosis were included. For accuracy, the reference standard was a currently recommended oral glucose tolerance test (OGTT), mainly using Carpenter and Coustan, the National Diabetes Data Group, or the International Association of Diabetes and Pregnancy Study Group (IADPSG) Consensus Panel diagnostic criteria. Intermediate and health outcomes are listed in Figure 1. Studies had to be published on or after 1995 and conducted in settings applicable to primary care.
Randomized clinical trials (RCTs) and nonrandomized controlled intervention studies were included for screening and treatment; for screening vs no screening, controlled observational studies were also included because of anticipated lack of intervention studies and to assess potential harms. For screening test accuracy, prospective cohort studies in which at least a sample of screennegative women underwent the reference standard were included. Studies on risk factor strategies or models had to examine a validation cohort.

Data Extraction and Quality Assessment
One reviewer abstracted data from the studies; a second reviewer verified accuracy and completeness. Outcomes related to hypertension in pregnancy were classified as preeclampsia, gestational hypertension, or hypertensive disorders in pregnancy (mixed). For cesarean delivery, primary (first) cesarean deliveries were prioritized, but total and emergency cesarean rates were also evaluated. Two reviewers independently assessed the methodological quality of eligible studies using design-specific tools (eMethods 4 in the Supplement). 13-16 Disagreements were resolved by consensus and, if necessary, consultation with a third reviewer. Studies were rated as "good," "fair," or "poor," based on the seriousness of methodological shortcomings. 12

Data Synthesis and Analysis
For intervention effects using relative risks (RRs), meta-analyses used random-effects models in Review Manager version 5.1 (The Cochrane Collaboration). When moderate or greater statistical heterogeneity (I 2 Ն 40%) was observed, sensitivity analysis was performed using the profile likelihood method in Stata version 14.2 (StataCorp); these analyses did not change any of the conclusions, but results are available in the full report 9 (and for KQ3 are reported in eTables 3 and 4 in the Supplement). Pooled absolute risk differences (ARDs) were calculated when RRs were statistically significant and for all analyses with at least 1 zero-event study. Heterogeneity was explored with sensitivity analyses using predefined variables (eg, study quality, setting, differing outcome definitions); findings of withinstudy subgroup analyses were extracted.
For diagnostic accuracy, analyses were stratified by timing of the index test in pregnancy and comparison, including different test thresholds. If more than 3 studies were included for a particular comparison, sensitivities and specificities were pooled using bivariate analysis (metandi program in Stata version 14.2) with construction of hierarchical summary receiver operator characteristic curves.
The aggregate strength of evidence was assessed for each outcome, using the Agency for Healthcare Research and Quality methods guidance, based on the number, quality, and size of studies and the consistency and precision of results between studies. 17 Significance testing was 2-tailed; P Յ .05 was considered statistically significant.  4 What is the association between diagnosis of gestational diabetes and outcomes in women meeting more inclusive but not less inclusive diagnostic criteria for gestational diabetes? (see full report for details) 5 What are the harms of screening for and diagnosis of gestational diabetes to the mother, fetus, or neonate? 2 What are the harms of treatment of gestational diabetes, including severe maternal and fetal/neonatal hypoglycemia, delivery of neonates who are small for gestational age, and poor long-term growth and development outcomes in the child? [95% CI, -12.3% to -5.2%]) compared with historical controls. Prespecified analyses found screening in first trimester was significantly associated with decreased likelihood of NICU admissions vs second-trimester screening but with no significant difference for other outcomes. Both new studies were susceptible to confounding and selection bias. Key Question 2. What are the harms of screening for and diagnosis of gestational diabetes to the mother, fetus, or neonate?
All 7 studies 42-48 identified for KQ2 were new to this update (eTable 2 in the Supplement). No significant differences were found in anxiety and depressive symptoms before and after screening for those with negative or false-positive results in 2 cohort studies (n = 1015). 44,48 One study (n = 100) 42 found that anxiety symptoms scores were slightly higher (6 points on 60-point scale; P = .007) for women with vs without gestational diabetes immediately after receiving results but not significantly higher at gestational week 36 or 6 weeks postpartum.
One good-quality cohort study (n = 3778) 46 found that the association between macrosomia and cesarean delivery in women with normoglycemia or untreated borderline gestational diabetes was not observed in those with treated gestational diabetes, suggesting that a gestational diabetes diagnosis may have increased the propensity to perform cesarean deliveries. Three large US studies (n = 161 182) 43,45,47 found some differences in hospital experi-ences (eg, adjusted OR, 0.55 [95% CI, 0.36 to 0.85] for fewer newborns staying in mother's room) potentially related to labeling because of a gestational diabetes diagnosis. However, there were unmeasured potentially confounding factors such as rates of neonatal hypoglycemia, breastfeeding intentions, and varying hospital policies. Key Question 3. What is the comparative effectiveness of different screening strategies for gestational diabetes on health and intermediate outcomes? Does comparative effectiveness vary according to prespecified subgroups?

IADPSG vs Carpenter and Coustan Screening
Five RCTs (n = 25 772) 20-24 examined universal screening at 24 to 28 weeks of gestation with the 1-step IADPSG vs 2-step Carpenter and Coustan criteria ( Table 1). Three trials were rated fair quality and 2 22,24 good quality. In the largest trial (n = 23 792), 20 25% of women allocated to 1-step screening crossed over to 2-step screening, although results remained similar in an intention-to-treat analysis adjusted for gestational diabetes and adherence. Of the women in this trial's 2-step group, 1.4% received treatment despite having no diagnosis (only an isolated fasting glucose level Ն95 mg/dL), but the authors' sensitivity analysis for the outcome of large for gestational age showed no evidence that this reclassification affected results. Data from another trial (n = 786) 23 were obtained from a systematic review 94 and could not be verified.
One-step vs 2-step screening was significantly associated with identification of gestational diabetes in 11.5% vs 4.9% of participants but was not significantly associated with differences in any pregnancy or fetal/neonatal outcome (eTables 3 and 4, eFigures 1-3 in the Supplement). There was statistical heterogeneity in some analyses in which a fair-quality trial 23 found significant associations favoring 1-step screening, whereas findings between 1 goodquality trial 24 and the largest trial (fair quality) 20 were similar. In the largest trial, 1-step screening significantly increased risk for neonatal hypoglycemia vs 2-step screening, although this may have been in part due to the routine surveillance of neonates with risk factors including diagnosis of maternal gestational diabetes (eFigure 2 in the Supplement). In 1 trial (n = 921) 24 in which all women randomized to 2-step screening underwent the 100-g OGTT (to assist with blinding), 2-step screening was associated with significantly more testingrelated adverse events than 1-step screening (eg, reactive hypoglycemia, vomiting, nausea). However, these findings overestimated harms of 2-step screening in clinical practice, in which only women with an abnormal 50-g OGCT result would undergo the 100-g OGTT.

Early vs Usual Timing for Carpenter and Coustan Screening
One good-quality RCT (n = 922) 19 enrolling obese women found early (14 to 20 weeks) vs usual timing of screening with Carpenter and Coustan criteria potentially associated with increased risk of preeclampsia, but the difference was not statistically significant (RR, 1.42 [95% CI, 0.99 to 2.05]; ARD, 4.0% [95% CI, 0.0% to 8.0%]). There were no significant differences for other outcomes, although some estimates were imprecise (eTables 3 and 4 in the Supplement).  Across 45 prospective cohort studies on diagnostic accuracy, mean sample size was 500 (range, 42-24 854), mean age was 28.8 years (range, 25-32.7), and mean body mass index (BMI, calculated as weight in kilograms divided by height in meters squared) from 22 studies was 24.6 (range, 21.1-28.1). Studies were conducted in 25 countries. Seventeen studies (38%) were rated good quality and 28 (62%) fair quality. No study reported how accuracy varied according to patient characteristics.

50-g Oral Glucose Challenge Test
eFigure 4 in the Supplement shows findings for the OGCT; results from pooled analyses are summarized in eTable 5 in the Supplement. Against Carpenter and Coustan criteria, at a 140-mg/dL cutoff, the pooled sensitivity and specificity (8 studies, n = 6190) 53 71 ; at a 140-mg/dL cutoff, specificity in those 2 studies was 81% and 93%.

Risk-Based Screening
Single studies found different risk-based tools (some in combination with measurement of fasting plasma glucose level) associated with sensitivities of 83% to 98% against Carpenter and Coustan (n = 341), 53 National Diabetes Data Group (n = 3131), 81 or IADPSG (n = 258) 62 criteria; however, specificity was highly variable (17% to 80%).

Benefits and Harms of Treatment
Key Question 6. Does treatment of gestational diabetes during pregnancy reduce poor health and intermediate outcomes? Does effectiveness vary according to maternal subgroup characteristics?

Treatment at 24 to 28 Weeks of Gestation
Like the prior USPSTF review, 2 large good-quality RCTs (n = 1958) 27,32 contributed a substantial proportion (40%-90%) of the events for many analyses. Four new studies were added 28,31,35,36 and 6 new publications 95-100 for 1 large previously included trial 32 provided data for long-term outcomes or subgroup analyses. Based on trial inclusion criteria, findings are most applicable to adult women identified using 2-step screening, though there were some differences across trials in eligibility criteria, baseline glycemia, and treatment protocols (Table 2). Apart from 1 trial 29 that did not report data, weeks of gestation at delivery was similar between groups in all trials.
Treatment of gestational diabetes was significantly associated with lower risk of primary cesarean deliveries vs no treatment ( (Figure 3). There was no significant association but marked inconsistency for preeclampsia ( Figure 4; 6 studies 25,28,31,32,35,36 ) and hypertensive disorders in pregnancy (3 trials 27,32,35 ); findings appeared sensitive to inclusion of a trial 35 from a country not rated as "very high" on the Human Development Index (Figure 4). Treatment was not significantly associated with reduced risk of gestational hypertension (2 trials 32,35 ; some imprecision), total cesarean deliveries (8 trials 25 Long-term follow-up of 1 trial 32,97 found no significant association between treatment for gestational diabetes vs no treatment and maternal impaired fasting glucose, obesity, metabolic syndrome, or type 2 diabetes at 5 to 10 years. No study measured effects of treatment on long-term quality of life, cardiovascular outcomes, or mortality or major morbidity from type 2 diabetes. Regarding long-term child outcomes, treatment of mothers for gestational diabetes was not significantly associated with reduced risk of overweight/obesity at 4 to 7 years (2 trials), 27,32,99,101 obesity at 7 to 9 years (2 trials), 29,32,99,102 impaired glucose tolerance (median, 9 years [1 trial]) 29,102 or impaired fasting glucose (median, 7-9 years [2 trials]). 29,32,99,102 Evidence from 2 RCTs 29,32,99,102 on long-term risk of type 2 diabetes in children was too sparse to determine effect of treatment.
Subgroup analyses from 1 trial 32 found no significant differences in effects of gestational diabetes treatment for several maternal and fetal outcomes based on timing of treatment initiation, 100 race/ethnicity, 95 severity of dysglycemia, 98 or BMI. 96 Across trials, differences in gestational diabetes diagnostic criteria did not appear to affect findings or explain inconsistency.

Early Treatment vs Usual Care
Findings from 4 small trials (n = 21-95) 30,33,34,37 of treatment for gestational diabetes in early pregnancy (using HbA 1c concentration or IADPSG criteria before 14 to 15 weeks of gestation) were highly imprecise.          [25][26][27]31,32,36 No trial reported on the association between treatment and poor long-term growth and development outcomes in childhood. Findings from small RCTs of early treatment vs usual care were imprecise or did not report harms (eg, maternal hypoglycemia).

Discussion
The findings in this evidence report are summarized in Table 3. Direct evidence on the benefits of screening vs no screening remains limited and consists of observational studies with methodological limitations. Few studies reported on harms from screening or a diagnosis of gestational diabetes and those available were limited by imprecision and methodological limitations. There were no significant associations between screening using 1-step IADPSG vs 2-step Carpenter and Coustan criteria, but some statistical heterogeneity was present (especially for neonatal hypoglycemia) and estimates were heavily weighted by 1 large trial 20 that accounted for 92% of patients.
Treatment vs no treatment was associated with reduced risk for some pregnancy and several neonatal/fetal outcomes. Findings are most applicable for hyperglycemia identified using 2-step screening approaches and to adult (vs adolescent) women with singleton pregnancies and without chronic hypertension or previous gestational diabetes. Most of the treatment interventions relied on frequent self-monitoring of blood glucose levels and clinic visits to monitor glucose targets, which could reduce applicability of findings to women with limited or no insurance coverage, health care access, or ability to perform self-monitoring. Results for cesarean delivery and labor induction are difficult to interpret because of differences in delivery practices. Findings are sparse for long-term health outcomes from treatment and for all outcomes from early treatment. No trial of treatment at 24 weeks of gestation or after used oral medications; therefore, potential medication harms would not have been captured.
This review differs from the 2012 USPSTF review 11 by including additional evidence on potential harms of screening and gestational diabetes diagnosis; evaluating comparative effectiveness of different screening strategies; and relying on more rigorous inclusion criteria and applicable comparisons for test accuracy. Although findings were generally consistent with those from the prior review, there are some differences. New evidence resulted in increased certainty regarding the accuracy of fasting plasma glucose and HbA 1c levels as screening tests and the association between treatment and improved outcomes, including reduced risk of NICU admissions. Additional information on preeclampsia and NICU admissions was obtained from authors of 1 trial, 27 enhancing handling of these data. Several publications from one of the larger treatment trials 32 provided new evidence regarding lack of effect for several subgroups and long-term outcomes. For the new KQ on comparative effectiveness, several trials were located including 3 large trials 19,20,24 and 1 very large trial 20 from the US examining highly applicable comparisons. The greater prevalence in gestational diabetes diagnosis resulting from 1-step IADPSG vs 2-step Carpenter and Coustan screening, without associated benefits, suggests potential overdiagnosis and overtreatment. In addition, the 1-step approach requires additional resources related to having all women undertake a 2-hour OGTT and provision of counseling and treatment to more women.
Evaluating the effectiveness of screening vs no screening remains heavily reliant on indirect evidence about test accuracy and treatment effects. Although evidence on diagnostic accuracy is useful for assessing which screening tests may be most useful in a 2-step approach, reliance on these tests alone would result in a high number false-positive results (especially using lower cutoffs with high sensitivity), particularly in general-prevalence populations (eTables 6 and 7 in the Supplement). In addition, the applicability of treatment trials to women diagnosed with gestational diabetes using the OGCT as a stand-alone test is uncertain. Ongoing trials of treatment for women with positive OGCT screening results but not gestational diabetes, 103 and for those with gestational diabetes by IADPSG criteria but excluding those with 2 abnormal glucose values, 104 would be useful to further inform assessment of treatment benefits among women with lesser degrees of dysglycemia.

Limitations
This review had several limitations. First, only English-language studies were included. 105 Second, graphical and statistical tests for smallsample effects were not conducted because all analyses included fewer than 10 trials. 106 Third, the DerSimonian and Laird random-effects model was used to pool studies, which may result in CIs that are too narrow, particularly when heterogeneity is present. 107 However, results were similar when analyses were repeated using the profile likelihood method. Fourth, the observational studies included for KQs for which trials were lacking were susceptible to unmeasured confounding and other methodological limitations.
Fifth, some studies were conducted in countries in which screening and treatment for gestational diabetes, as well as management of pregnancy, may differ from that in the US. However, this review focused on screening and diagnostic criteria used in the US, and results appeared consistent across geographic settings. Sixth, data on how the effects of screening and treatment varied according to patient characteristics such as race/ethnicity, age, and other socioeconomic factors were very limited. Seventh, studies that applied older definitions for gestational diabetes or that did not screen for preexisting diabetes 2 may have included some women with overt diabetes, who are expected to have worse outcomes. 108 Conclusions Direct evidence on screening vs no screening remains limited. Onevs 2-step screening was not significantly associated with improved health outcomes. At or after 24 weeks of gestation, treatment of gestational diabetes was significantly associated with improved health outcomes.