aPopulation characteristics include sex, age, race/ethnicity, comorbid conditions, and new-onset depression vs recurrent depression.
aDetails about reasons for exclusion are as follows. Aim: study aim not relevant. Setting: study was not conducted in a setting or country relevant to US primary care. Comparative effectiveness: study did not have a control group. Instrument: study did not use an included screening instrument. Outcomes: study did not have relevant outcomes or had incomplete outcomes. Population: study was not conducted in a pregnant or postpartum population or was limited to a narrow population not broadly representative of primary care. Intervention: study used an excluded intervention or screening approach. Design: study did not use an included design. For review for key question 2 (KQ2), design included >2 weeks between screening and reference test, or reference test was not applied to full range of screening results or could not adjust for partial verification. Quality: study did not meet criteria for fair or good quality (ie, it was poor quality) using study design–specific criteria developed by the US Preventive Services Task Force for randomized clinical trials,16 the Quality Assessment of Diagnostic Accuracy Studies 2 for diagnostic accuracy studies,17 the Newcastle-Ottawa Scale18 for observational studies, or A Measurement Tool to Assess Systematic Reviews (AMSTAR) for systematic reviews.19 The criteria and definitions of good, fair, poor are provided in eTable 1 in the Supplement. Language: study was published in a non-English language. Instrument not brief: study included a screening instrument that was not brief (ie, exceeded 15 minutes to complete). Study included in systematic evidence review (SER): study was included in existing SER that was included as evidence.
EPDS indicates Edinburgh Postnatal Depression Scale. Error bars indicate 95% confidence interval.
EPDS indicates Edinburgh Postnatal Depression Scale; PHQ-9, 9-item Patient Health Questionnaire. Error bars indicate 95% confidence interval.
Error bars indicate 95% confidence interval.
aMorrell et al (2009)22 did not report sufficient data to extrapolate the number of false positives and true negatives; therefore, specificity could not be calculated.
bData were extrapolated from partial verification.
cBunevicius et al (2009a)34 and Bunevicius et al (2009b)35 did not report the number of false positives or true negatives; therefore, specificity could not be calculated.
PHQ indicates Patient Health Questionnaire. Error bars indicate 95% confidence interval.
BDI indicates Beck Depression Inventory; CBT, cognitive behavioral therapy; EPDS, Edinburgh Postnatal Depression Scale; LQ, Leverton Questionnaire; MDD, major depressive disorder; PHQ, Patient Health Questionnaire; RR, relative risk; SCID, Structured Clinical Interview for Depression. Error bars indicate 95% confidence interval.
aHours of contact were estimated based on planned number and length of sessions.
bNondirective therapy involves empathic, reflective listening rather than advice or direction in behavior change.
Some studies did not provide sufficient data to calculate the 95% confidence interval; these are indicated by a data marker with no error bars on the forest plot and NA (not available) in the data columns. BDI indicates Beck Depression Inventory; CBT, cognitive behavioral therapy; EPDS, Edinburgh Postnatal Depression Scale; MADRS, Montgomery-Asberg Depression Rating Scale; PHQ, Patient Health Questionnaire. Error bars indicate 95% confidence interval.
aNondirective therapy involves empathic, reflective listening rather than advice or direction in behavior change.
eMethods. Literature search strategies
eTable 1. Quality assessment criteria
eTable 2. Summary of adjusted results of maternal and infant harms with second generation antidepressant use during pregnancy (Key Question 5)
eFigure 1. Diagnostic Accuracy of the Edinburgh Postnatal Depression Scale Relative to a Diagnostic Interview (Key Question 2)
eFigure 2. Diagnostic Accuracy of the Patient Health Questionnaire Relative to a Diagnostic Interview (Key Question 2)
eFigure 3. Funnel plot with pseudo 95% confidence limits of included studies of cognitive behavioral therapy (Key Question 4)
eFigure 4. Benefits of Depression Treatment, Depression Symptoms (Key Question 4)
Elizabeth O’Connor, Rebecca C. Rossom, Michelle Henninger, Holly C. Groom, Brittany U. Burda. Primary Care Screening for and Treatment of Depression in Pregnant and Postpartum WomenEvidence Report and Systematic Review for the US Preventive Services Task Force. JAMA. 2016;315(4):388–406. doi:10.1001/jama.2015.18948
Depression is a source of substantial burden for individuals and their families, including women during the pregnant and postpartum period.
To systematically review the benefits and harms of depression screening and treatment, and accuracy of selected screening instruments, for pregnant and postpartum women. Evidence for depression screening in adults in general is available in the full report.
MEDLINE, PubMed, PsycINFO, and the Cochrane Collaboration Registry of Controlled Trials through January 20, 2015; references; and government websites.
English-language trials of benefits and harms of depression screening, depression treatment in pregnant and postpartum women with screen-detected depression, and diagnostic accuracy studies of depression screening instruments in pregnant and postpartum women.
Data Extraction and Synthesis
Two investigators independently reviewed abstracts and full-text articles and extracted data from fair- and good-quality studies. Random-effects meta-analysis was used to estimate the benefit of cognitive behavioral therapy (CBT) in pregnant and postpartum women.
Main Outcomes and Measures
Depression remission, prevalence, symptoms, and related measures of depression recovery or response; sensitivity and specificity of selected screening measures to detect depression; and serious adverse effects of antidepressant treatment.
Among pregnant and postpartum women 18 years and older, 6 trials (n = 11 869) showed 18% to 59% relative reductions with screening programs, or 2.1% to 9.1% absolute reductions, in the risk of depression at follow-up (3-5 months) after participation in programs involving depression screening, with or without additional treatment components, compared with usual care. Based on 23 studies (n = 5398), a cutoff of 13 on the English-language Edinburgh Postnatal Depression Scale demonstrated sensitivity ranging from 0.67 (95% CI, 0.18-0.96) to 1.00 (95% CI, 0.67-1.00) and specificity consistently 0.87 or higher. Data were sparse for Patient Health Questionnaire instruments. Pooled results for the benefit of CBT for pregnant and postpartum women with screen-detected depression showed an increase in the likelihood of remission (pooled relative risk, 1.34 [95% CI, 1.19-1.50]; No. of studies [K] = 10, I2 = 7.9%) compared with usual care, with absolute increases ranging from 6.2% to 34.6%. Observational evidence showed that second-generation antidepressant use during pregnancy may be associated with small increases in the risks of potentially serious harms.
Conclusions and Relevance
Direct and indirect evidence suggested that screening pregnant and postpartum women for depression may reduce depressive symptoms in women with depression and reduce the prevalence of depression in a given population. Evidence for pregnant women was sparser but was consistent with the evidence for postpartum women regarding the benefits of screening, the benefits of treatment, and screening instrument accuracy.
Major depressive disorder (MDD) is the leading cause of disease-related disability in women around the world.1 In a study of US women assessed in 2005, 9.1% of pregnant women and 10.2% of postpartum women met criteria for a major depressive episode.2 Maternal depression can affect offspring as well, leading to lower-quality interactions with the mother,3 higher rates of emotional and behavioral problems, worse social competence with peers, and poorer adjustment to school.4- 6 In 2009, the US Preventive Services Task Force (USPSTF) recommended screening adults for depression when staff-assisted depression care supports are in place to ensure accurate diagnosis, effective treatment, and follow-up (B recommendation).7 The USPSTF recommended against routinely screening adults for depression when such support is not in place but acknowledged there may be considerations that support screening for depression in an individual patient (C recommendation).7 These recommendations were based on a combination of results from the 2002 USPSTF review,8 which included very little evidence related to pregnant and postpartum women, and a targeted update published in 2009, which excluded studies limited to pregnant and postpartum women.9 We undertook the current review to help the USPSTF update its recommendation on depression screening and expand it to include evidence related to pregnant and postpartum women.
Detailed methods are available in the full evidence report at http://www.uspreventiveservicestaskforce.org/Page/Document/final-evidence-review144/depression-in-adults-screening1.10 Evidence related to general and older adults was only minimally changed from the previous review and are also presented in the full report. In this article, the focus is on the direct and indirect evidence for depression screening of pregnant and postpartum women, where most new evidence was found. The analytic framework and key questions (KQs) to guide the portion of our review related to pregnant and postpartum women are shown in Figure 1.
An initial search was conducted for existing synthesized literature and guidelines related to depression screening and treatment in MEDLINE/PubMed, the Database of Abstracts of Reviews of Effects, Cochrane Database of Systematic Reviews, BMJ Clinical Evidence, Institute of Medicine, the National Institute for Health and Clinical Excellence, PsycINFO, the Agency for Healthcare Research and Quality, the American Psychiatric Association, the American Psychological Association, the Campbell Collaboration, the Canadian Agency for Drugs and Technologies in Health, the National Health Services’ Health Technology Assessment Programme, and the Centre for Reviews and Dissemination, from 2008 through October 3, 2013. The search strategies are listed in the eMethods in the Supplement.
For pregnant and postpartum women, abstracts and full-text articles were systematically evaluated to identify existing systematic reviews to incorporate into the review, based on an approach outlined by Whitlock et al.11 Three good-quality reviews were identified that served as foundational reviews for 1 or more KQs. These reviews were chosen based on relevance (ie, inclusion and exclusion criteria that were at least as inclusive as our review), having conducted a good-quality search, having reported good-quality article evaluation methods, and recency.12- 14 For the question of harms of antidepressants (KQ5), 1 of the foundational reviews was of sufficient quality, and the evidence base was so extensive, that this review was used directly as evidence in the report and individual studies included in this review were not reevaluted.14 The other 2 foundational reviews were used for study identification, and then a search was conducted for additional original research published after the search windows of these foundational reviews.12,13 All studies included in each of these 2 foundational reviews were evaluated against our a priori inclusion/exclusion criteria.
We searched for newly published literature in the following databases: MEDLINE/PubMed, PsycINFO, and the Cochrane Central Register of Controlled Trials through January 20, 2015. The bridge search started from January 1, 2012, because there was at least 1 foundational review with a search period for each KQ that extended into 2012. Reference lists of other relevant publications were reviewed to identify additional potentially relevant studies that were not identified by the literature searches or foundational reviews.
Since January 2015, we continued to conduct ongoing surveillance through article alerts and targeted searches of high-impact journals to identify major studies published in the interim that may affect the conclusions or understanding of the evidence and therefore the related USPSTF recommendation. The last surveillance was conducted on December 9, 2015, and identified no new studies.
Two investigators independently reviewed 6536 titles and abstracts and 478 full-text articles against prespecified inclusion criteria (Figure 2). Disagreements were resolved through discussion or consultation with other investigators. We included English language fair- and good-quality studies involving women who were 18 years and older and pregnant or postpartum (within 1 year of birth at enrollment) and living in “very high-developed” countries according to the World Health Organization.15 Studies limited to persons with other medical or mental health conditions were excluded; however, studies that included some persons with such conditions were not excluded, as long as it was not a requirement of participation.
For benefits and harms of depression screening (KQ1, KQ3), we included randomized or nonrandomized clinical trials conducted in primary care settings, including obstetrics/gynecology or, for postpartum women, pediatrics. To allow determination of the full population effect of screening programs, studies that included some participants who already had a medical record diagnosis of depression or were being treated for depression were not excluded. Studies of depression screening could also include additional treatment elements, as long as the screening test results were given to the primary care clinician. A requirement was that the control group either was not screened (KQ1) or did not have screening test results sent to their clinician (KQ1a). Outcomes had to be reported at a minimum of 6 weeks after randomization.
For diagnostic accuracy (KQ2), we examined studies of the Patient Health Questionnaire (PHQ) or Edinburgh Postnatal Depression Scale (EPDS) compared with a valid reference standard, which was defined as a structured or semistructured diagnostic interview with a trained interviewer or a nonbrief (>5 minutes) unstructured interview with a mental health clinician. Studies that gave the reference test only to a subset of participants had to make appropriate adjustments to their analysis or provide sufficient data to allow statistically adjusted analysis. Studies had to report sensitivity or specificity or the raw data to allow their calculation. The time between the index and reference tests could not exceed 2 weeks on average. In addition, these studies had to include patients comprising a wide spectrum of symptom severity, comparable with what would occur in typical primary care settings, including those without symptoms, those with subclinical symptomatology, and those with diagnostic-level symptomatology (ie, case-control designs were excluded). Studies of non-English versions of the instruments were included as long as the study was published in English.
For studies of the benefits of antidepressants and behavioral-based treatments (KQ4), trials were included that had a minimum of 6 weeks’ follow-up after randomization that took place in primary or specialty care settings or online. Trials had to use population-based screening to identify eligible patients. Studies were considered to include population-based screening if they attempted to recruit all or a consecutive or a random subset of women in a specific setting or population during the study’s recruitment window, with individual outreach to potential participants for depression screening as part of determination of study eligibility. Thus, studies were excluded in which recruitment was based on referral, recruitment was from populations of known or likely depressed patients (eg, persons identified as depressed in their medical records), or volunteers were recruited through media or other advertising. Control groups could include usual care, no intervention, waitlist, attention control, or a minimal intervention (eg, ≤15 minutes of information, not intended to be a therapeutic dose).
These same studies were also examined for harms of treatment (KQ5). For serious harms of antidepressants in general populations of pregnant and postpartum women (not limited to screen-detected, KQ5b), systematic reviews, randomized or nonrandomized clinical trials, and large comparative observational studies were included. Maternal harms included suicidality, serotonin syndrome, cardiac effects, seizures, bleeding, cardiometabolic effects, miscarriage, and preeclampsia. Infant harms included neonatal death, major malformations, small for gestational age and low birth weight, seizures, serotonin withdrawal syndrome, neonatal respiratory distress, cardiopulmonary effects, and other serious events requiring medical attention. Comparative cohort studies had to have a minimum of 10 cases in each exposure group and include appropriate controls who were not taking antidepressants.
Two investigators independently assessed the quality of the included studies by using criteria defined by the USPSTF16 and supplemented with criteria from the Quality Assessment of Diagnostic Accuracy 2 (QUADAS-2)17 for diagnostic accuracy studies, the Newcastle-Ottawa Scale (NOS)18 for observational studies, and A Measurement Tool to Assess Systematic Reviews (AMSTAR) for systematic reviews (eTable 1 in the Supplement).19 Each study was assigned a final quality rating of good, fair, or poor; disagreements between the investigators were resolved through discussion. We rated and excluded studies as poor quality if there was a major “fatal flaw” (eg, attrition was >40%, differential attrition >20%) or multiple important limitations that could invalidate the results.
One investigator abstracted data from the included studies, and a second investigator checked the data for accuracy. We abstracted study design characteristics, population demographics, baseline history of depression and other mental health conditions, screening and intervention details, depression outcomes, other health outcomes (eg, suicidality, mortality, quality of life, functioning, health status, infant outcomes, emergency department visits, inpatient stays), adverse events, and diagnostic accuracy statistics.
We created summary tables of study characteristics, population characteristics, intervention characteristics, and outcomes separately for each KQ. These tables and forest plots of the results were used to examine the consistency, precision, and relationship of effect size with key potential modifiers. We had a sufficient number of trials with acceptable comparability to conduct a meta-analysis of trials examining the benefits of cognitive behavioral therapy (CBT) and related approaches. Because this analysis included 10 studies with low statistical heterogeneity, as assessed by the I2 statistic, and fairly comparable sample sizes, a random-effects model was used (DerSimonian and Laird),20 with a sensitivity analysis using a restricted maximum likelihood model with the Knapp-Hartung modification for small samples.21 Funnel plots and the Egger test were used to examine the risk of small study effects. For the studies of instrument accuracy (KQ2), sensitivity and specificity with Jeffrey confidence intervals were calculated, using data from 2 × 2 tables that included true positives, false positives, false negatives, and true negatives. Several studies only verified a negative screening result in a random sample of participants below a predetermined threshold (which was lower than the typical cutoffs for a positive screener in all cases).22- 24 For these studies, the proportion with a depressive disorder according to the reference standard was applied to the full sample of those below the threshold, and sensitivity and specificity were calculated based on these extrapolated results.25 In all cases, there were no false negatives, so sensitivity did not change, but specificity increased with extrapolation, although we were unable to accurately determine the number of noncases for 1 study and so did not calculate specificity.22 Side-by-side plots of sensitivity and specificity were created in R version 3.2.2 (R Foundation); all other analyses were conducted in Stata version 13.1 (StataCorp). All significance testing was 2-sided, and results were considered statistically significant if the P value was .05 or less.
This article focused only on the evidence related to pregnant and peripartum women, which covers most of the new evidence since the previous review and omits coverage of some sub-KQs that had no or minimal evidence, specifically key questions related to variation in results by population characteristics (KQs 1b, 2a, 3a, 4a, and 5a).
Key Question 1. Do primary care depression screening programs in pregnant and postpartum women result in improved health outcomes (decreased depressive symptomatology; decreased suicide deaths, attempts, or ideation; improved functioning; improved quality of life; or improved health status)?
Key Question 1a. Does sending depression screening test results to clinicians (with or without additional care management supports) result in improved health outcomes?
One good-quality and 5 fair-quality trials were included that examined the benefits of screening for pregnant and postpartum depression (n = 11 869)22,26- 30 with or without additional clinician training or treatment components (Table 1; trials are arranged in increasing order of the extensiveness of the treatment components in addition to screening). Five trials focused on postpartum women,22,26- 28,30 and the sixth focused on pregnant women.29 All trials studied women identified in health care settings and included all study-eligible women regardless of screening test results.22,26- 30 Two trials included unscreened control groups27,28 (KQ1), and 4 screened all participants but sent results to only the intervention group’s clinicians (KQ1a).22,26,29,30 Trials screened women at week 25 of gestation29 or 4 to 8 weeks postpartum.22,26- 28,30 Only 1 trial was conducted in the United States.26 Both of the individually randomized trials excluded women who were currently being treated for depression26,27; however, the trials that randomized at the level of a midwife or medical practice all had very broad inclusion criteria and did not exclude women being treated for depression.22,28- 30 All studies used the EPDS for screening; cutoffs for screening positive ranged from 10 to 13. While 1 trial focused narrowly on the benefit of adding the EPDS to the usual clinical evaluation,22 others provided a wide range of components in addition to the screening intervention, such as clinician training and support, person-centered counseling, or redesigned follow-up care.
At follow-up, which ranged from 1.5 to 16 months, 5 of 6 trials reported the proportion of women scoring above a specified cutoff on the EPDS, which we refer to as depression prevalence (Figure 3). In pregnant and postpartum women, there were relative reductions of 18% to 59% in the risk of depression at follow-up compared with usual care, which translated to 2.1% to 9.1% absolute reductions in depression prevalence, according to a variety of EPDS cutoff definitions. For example, depression prevalence (defined as an EPDS score ≥10) was 13% in the screened group in the Hong Kong–based screening-only intervention in the near term (4 months) but 22.1% in the nonscreened group.27 However, this effect was not sustained at 16 months.27 In the study of pregnant women that included feedback of screening results and a 1-afternoon depression training session for midwives, the effect size was smaller and not statistically significant, with 9.5% of women in the intervention group reporting EPDS scores of 12 or more at follow-up, compared with 11.6% of women in usual care.29 In the 3 studies that reported outcomes similar to remission (ie, no longer screened positive) or treatment response (ie, showed a predetermined level of improvement on a scale score) in postpartum women, there was a 21% to 33% increase in the likelihood of remission or response at 4.5 to 12 months (6-14 months postpartum), ranging from 10.0% to 33.8% absolute increases in the likelihood or remission or response (Figure 4).22,26,30 The effect was even larger in the trial of pregnant women, but last follow-up was only at 2.75 months.29
The results most applicable to US primary care come from a fair-quality US trial of screening plus clinician supports.30 Forty-five percent of intervention participants reported a 5-point or greater drop in their PHQ-9 scores, the a priori definition of clinical meaningful benefit, whereas 34% of those receiving usual care reported such a drop (odds ratio [OR], 1.74 [95% CI, 1.05-5.86], adjusted for depression history, marital status, income, education, age, and degree of parenting stress). This trial was rated as fair primarily because attrition was greater than 25% in both groups, which was a common problem in the studies on the benefits of screening for depression.
Key Question 2. What is the test performance of the most commonly used depression screening instruments in pregnant and postpartum women in primary care?
We identified 23 studies22- 24,31- 50 (n = 5398) that examined the accuracy of the EPDS and 3 studies that examined the PHQ (n = 777)51- 53 relative to a diagnostic interview (Table 2, EPDS studies are arranged in the order of decreasing proportion meeting the reference standard diagnosis, separately for English-version EPDS and non–English-version EPDS; PHQ studies are ordered by the PHQ versions reported). Eight of the included studies used the English-language version of the EPDS (n = 1905).22,23,32,39,41,42,48,49 Six of the English-language EPDS studies assessed postpartum women, usually between 6 and 12 weeks postpartum,24,26,28,34,37,39 1 assessed pregnant women,48 and 1 assessed women at any point during pregnancy and up to 26 weeks postpartum.42 We focused on the English-language EPDS and standard cutoff scores of 10 (indicating moderate-level symptoms) and 13 (indicating probable depressive disorder) for the EPDS.
At a cutoff score of 13 for identifying MDD, the sensitivity of the English-language EPDS ranged from 0.67 (95% CI, 0.18-0.96) to 1.00 (95% CI, 0.67-1.00), with most of the results between 0.75 and 0.82 (Figure 5 and eFigure 1 in the Supplement). The largest of these studies,22 from the United Kingdom, reported a sensitivity of 0.79 (95% CI, 0.72-0.85), which was very similar to that seen in a relatively recent US-based study with low-income African American women with a high rate of depression (0.81 [95% CI, 0.64-0.93]).42 The specificity of the English-language EPDS was 0.87 or greater in all studies. Sensitivity for detecting depressive disorders, including both major and minor depression, using the cutoff of 10 or greater ranged from 0.63 (95% CI, 0.44-0.79)23 to 0.84.42,49 At a cutoff score of 10, the study of low-income African American women reported42 sensitivity of 0.84 (95% CI, 0.69-0.94) and specificity of 0.81 (95% CI, 0.70-0.89) for identifying major or minor depression in pregnant and postpartum women combined. The estimates were very similar for pregnant and postpartum women.42
The PHQ studies covered 3 different versions of the PHQ (PHQ-2,51- 53 PHQ-8,53 PHQ-951) and 3 different scoring methods for the PHQ-2 (Figure 6 and eFigure 2 in the Supplement). Sensitivities and specificities were fairly wide-ranging across different versions, scoring methods, diagnostic comparators, and cutoffs, and no single method was reported in more than 1 study.
Key Question 3. What are the harms associated with primary care depression screening programs in pregnant and postpartum women?
Among the trials addressing the benefits of screening (KQ1), 1 trial reported that there were no adverse effects of depression screening in postpartum women (n = 462; Table 1)27; the remaining 5 trials did not report on harms.
Key Question 4. Does treatment (psychotherapy, antidepressants, or collaborative care) result in improved health outcomes (decreased depressive symptomatology; decreased suicide deaths, attempts, or ideation; improved functioning; improved quality of life; or improved health status) in pregnant and postpartum women who screen positive for depression in primary care?
We identified 2 good-quality and 16 fair-quality trials (n = 1638) that examined the benefits of interventions in pregnant and postpartum women who had screened positive for depression in a primary care or community setting, generally compared with usual care54- 71 (Table 3, trials are arranged in increasing order of estimated contact hours with the intervention). One trial combined treatment in depressed women and prevention in women who were not depressed, but we only included results related to the depressed subgroup (n = 324).63 Fifteen of 18 trials recruited women during the postpartum period (≤22 weeks) and 3 during their pregnancy.63,64,71 All 18 trials reported outcomes during the postpartum period. Time to follow-up varied widely, from 6 weeks59,69 to 18 months.56 Furthermore, trials varied in time between end of treatment and follow-up assessment, with 7 trials conducting follow-up assessment within 2 weeks of when treatment ended,55,57,62,65,66,69,71 while the remaining had a lag of 1 to 7 months between end of treatment and follow-up assessment. The most common behavioral interventions were CBT or related interventions that included traditional CBT components, such as stress management, goal setting, and problem solving, including 2 trials conducted with pregnant women.63,64 Other intervention approaches included fluoxetine,55 a health care system–level stepped-care intervention,57 nondirective counseling,56,60,69 psychodynamic therapy,58 an information-only intervention,59 and 2 different approaches to improving the mother-infant relationship.58,62
Of 18 trials, 15 reported an outcome similar to depression remission (usually the proportion below a specified cut point on a depression symptom scale) at follow-up ranging from 1.5 to 18 months (Figure 7, only outcomes within 1 year shown).54,56- 61,63- 67,69- 71 All 10 trials that used CBT or related interventions showed an increased likelihood of remission with treatment in the short-term, although not all results were statistically significant.54,56,61,63- 67,70,71 Effect sizes were similar for pregnant and postpartum women for CBT. Pooled results that used only the longest follow-up (within 1 year) showed an increase in the likelihood of remission with CBT (DerSimonian and Laird pooled relative risk [RR], 1.34 [95% CI, 1.19-1.50]; No. of studies (K) = 10; I2 = 7.9%) compared with usual care, with absolute increases ranging from 6.2% to 34.6%. Results were almost identical in sensitivity analyses using a more conservative pooling method, with even lower statistical heterogeneity (restricted maximum likelihood pooled RR, 1.34 [95% CI, 1.17-1.53]; K = 10, I2 = 0%). Increased hours of contact might be associated with larger effect sizes, but because contact hours, sample size, control group remission rates, and time to follow-up were all confounded with each other, conclusions could not be drawn about their relative importance. The funnel plot (eFigure 3 in the Supplement) suggested an increased risk of small studies bias, consistent with increased risk of publication bias; the Egger test did not identify a statistically significant small studies bias, but power was limited. The possibility of correlation between sample size and effect size raises the concern that the pooled effect may overestimate the true effect.
Results for the outcome of continuous symptom score showed a similar pattern (Figure 8 and eFigure 4 in the Supplement), although only 7 of the trials were available for pooling.54,61,64- 67,71 All of the trials showed greater symptom reduction in the intervention groups. Results were not statistically significant in 3 trials64,66,67; however, unadjusted mean differences were statistically significant in 1 of these.67 With usual care, EPDS scores declined by an average of 2 to 6 points, compared with 5 to 10 points in intervention groups. The pooled standardized mean difference in change between groups was −0.82 (95% CI, −1.10 to −0.54; K = 7, I2 = 35.4%), consistent with a medium to large effect size according to Cohen’s suggested convention.72 Average baseline EPDS scores were generally at or above the cutoff of 13 (cutoff for identifying MDD), and at follow-up most CBT group averages were below 10 (cutoff for identifying minor or major depressive disorder), which put them in the mild depressive symptom range, on average. Some studies showed average EPDS scores below 10 at follow-up in both the intervention and usual care groups64,67; in other trials, the usual care groups remained above 10 while the intervention groups were below 1054,70 or showed mixed results over time.56 Other instruments showed comparable results.
The 1 trial that examined pharmacotherapy (n = 87) reported a 10-point reduction in the EPDS with fluoxetine after 12 weeks, compared with a 7-point reduction in those taking a placebo (P < .05). Results were similar for 2 other continuous measures of depression symptom severity, but this trial did not report a dichotomous remission-related outcome.
Because non-CBT approaches, including fluoxetine, were highly variable in their effects and were limited by lack of replication, firm conclusions about those approaches could not be drawn.
Key Question 5. What are the harms of treatment in pregnant and postpartum women who screen positive for depression in primary care?
Key Question 5b. What is the prevalence of other selected serious harms of treatment with antidepressants in the general (ie, not limited to primary care) population of pregnant and postpartum women?
The examination of harms of antidepressants was limited to second-generation agents: selective serotonin reuptake inhibitors (SSRIs), selective norepinephrine reuptake inhibitors, bupropion, nefazodone, trazodone, and mirtazapine. Ten of the included studies on harms of treatment for depression were of good quality, and 4 were of fair quality (Table 4, studies are ordered by study design, then by primary reported outcome). Of the trials that addressed benefits of treatment, which all involved screen-identified patients, only the trial of fluoxetine also reported on harms of treatment.55 At 12 weeks of follow-up, 1 of 43 women (2.3%) taking fluoxetine and 3 of 44 women (6.8%) taking the placebo discontinued it due to adverse effects.
Considering studies not limited to women with screen-detected depression, a good-quality systematic review published in 201314 identified 15 observational studies providing evidence of the harms of antidepressants at unknown dosages in pregnant depressed women. The review included an additional 109 observational studies that provided evidence of the harms of antidepressants in pregnant women in whom depression status in either or both treatment groups was unknown. When available, data limited to depressed women were our focus.
An additional 12 fair- or good-quality large observational studies were identified that were published between 2012 and 2014 and that examined the harms of antidepressants in pregnant or postpartum women (n = 4 759 735).73- 84 Three were case-control studies82- 84; the remaining were cohort studies that used national registers or administrative health data to examine exposures and outcomes retrospectively in women who had been pregnant. Five studies provided evidence of outcomes in pregnant women with known depression who were or were not exposed to antidepressants.74- 76,78,84 The remaining 7 studies compared outcomes in exposed vs unexposed pregnant women with unknown depression status, although most of these analyses adjusted for presence of depression79 or conducted some analyses that were restricted to depressed women.77,80,81
Detailed results of the harms of treatment are shown in eTable 2 in the Supplement. There was evidence that use of some antidepressants during pregnancy, particularly SSRIs and venlafaxine, are associated with increased risk of preeclampsia, postpartum hemorrhage, and miscarriage as well as a number of adverse infant outcomes, including neonatal or postneonatal death, preterm birth, small for gestational age, neonatal seizures, serotonin withdrawal syndrome, neonatal respiratory distress, pulmonary hypertension, or major congenital malformations. The absolute increase in risk for most infant outcomes was very small, given the rarity of the events, and sometimes occurred only with higher levels of exposure. For example, a large retrospective cohort study reported a more than doubling of seizure occurrence in infants of depressed women who had been provided 3 or more prescription fills for antidepressants of any kind (but primarily SSRIs). However, the absolute risk remained quite small (0.66% among exposed infants vs 0.28% in unexposed infants; unadjusted OR, 2.39 [95% CI, 1.57-3.64]).75 In that study, there was no similar association among women with 1 or 2 prescription fills for antidepressants.
More common outcomes showed potentially important absolute increases. One study in the 2013 review14 reported neonatal respiratory distress among 7.8% of infants not exposed to SSRIs in utero, compared with 13.9% of exposed infants, and a pooled estimate combining 3 studies showed an increased odds of respiratory distress with SSRI exposure (pooled OR, 1.91; 95% CI, 1.63-2.24; I2 = 0%).14 As another example, a large US-based cohort study found development of preeclampsia among 8.9% of depressed women exposed to venlafaxine compared with 5.4% of unexposed women.81 However, because these are observational studies, causality cannot be determined; it is not possible to control for all possible confounders related to depression, particularly the fact that women with more severe depression may be more likely to take antidepressants during pregnancy.
We examined recent information on the benefits and harms of depression screening and treatment and the accuracy of selected screening instruments for pregnant and postpartum women to support the USPSTF updated recommendation on these topics. Evidence suggested that programs to screen pregnant and postpartum women, with or without additional treatment-related supports, reduced the prevalence of depression and increased remission or treatment response (Table 5). Quiz Ref IDMost of the screening trials included in this review provided treatment elements beyond screening, such as clinician training and supports, treatment protocols, or counseling with specially trained clinicians. Sensitivity of the English-language version of the EPDS was estimated to be approximately 0.80 and specificity approximately 0.90, using a cutoff of 13 to detect postpartum MDD. Quiz Ref IDFurther, evidence suggested that CBT improved depression in women with postpartum depression. In addition, the use of second-generation antidepressants during pregnancy may be associated with increased risk of some serious harms.
Evidence primarily focused on postpartum women, except for harms of antidepressants, but the little evidence among pregnant women suggested comparable effect with postpartum women. Important limitations to the evidence were noted for all bodies of evidence, including relatively small number of studies, few trials with good applicability to primary care in the United States, and many studies with very small study sizes.
The direct evidence of effects of screening for depression suggested that programs that include screening reduce the overall prevalence of depression and increase the likelihood of remission or treatment response in postpartum women. Results in pregnant women were consistent with postpartum women, although they came from only a single, smaller study. The direct (KQ1 and KQ1a) evidence base is relatively small (6 trials, most with fairly short follow-up) but included almost 12 000 women. Only 1 of these trials was conducted in the United States. Two trials provided minimal additional components beyond screening: one demonstrated reduced prevalence of depression27 and the other increased response to treatment.29 The results of this evidence report are consistent with 2 recent comprehensive reviews of depression identification in pregnant and postpartum women, which included overlapping (but not identical) evidence bases.12,13 One review concluded that their included studies showed that using the EPDS had beneficial effects, but the authors could not disentangle the effects of using an identification strategy from the effects of subsequent interventions provided.13 The other review concluded that screening was associated with modest improvement in depression across a variety of low-intensity interventions.12
One concern about the trials of screening programs is that 4 of the 6 studies did not exclude women who were previously known to be depressed. Because depression is often inadequately treated,7,8 however, it may also be important for persons who are still depressed despite previous treatment efforts to be identified so their clinician can continue to help them until they are able to find a successful treatment. While this falls outside the traditional definition of screening, it is nevertheless a potentially important side benefit of depression screening programs. Further, depression screening presents an opportunity to query suicidal ideation among those who screen positive. While the USPSTF has not recommended routine screening for suicide risk, they did note that “primary care clinicians should be aware of psychiatric problems in their patients and should consider asking these patients about suicidal ideation and referring them” for treatment.85 Thus, pragmatically, identifying incompletely treated patients could be considered an added benefit of routine depression screening, although this falls more in the realm of depression management than prevention through early detection, which is the traditional definition of screening.
In addition to the direct evidence, we also considered indirect evidence on screening accuracy and the benefits and harms of treatment for depression in pregnant and postpartum women. While the range of sensitivities and specificities were quite wide for the English-language version of the EPDS, the largest studies and the study most applicable to the US health care system reported sensitivities around 0.80 and specificities of 0.87 and higher at a cutoff of 13 to detect MDD, primarily in postnatal women. This body of evidence was fairly large (K = 23), but only 8 studies addressed the English-language version of the EPDS and only 2 of these were conducted in the United States. Furthermore, the literature on the English-language version was limited by small study sizes. However, the broad use of the EPDS and the relatively acceptable results despite the various languages and country populations can be seen as reassuring for its applicability to a diverse US pregnant and postpartum population. Evidence on the accuracy of the PHQ for pregnant and postpartum women was very limited. Other reviews drew similar conclusions and included additional screening instruments.12,13
Cognitive behavioral therapy and related behaviorally based approaches reduced the symptoms of postpartum depression and increased the likelihood of remission compared with usual care among depressed pregnant and postpartum women identified through screening. There were insufficient data to determine whether the use of other treatment modalities was beneficial in either pregnant or postpartum women, including only a single small trial of pharmacotherapy. Results were mixed in the studies conducted in the United States: 1 found benefit at both the 4.5- and 7.5-month follow-ups,54 but the other did not find statistically significant group differences.71 Effect sizes in CBT trials were very similar between the 2 trials of pregnant women and the trials of postpartum women. Although not limited to studies of women with screen-detected depression, other reviews have also concluded that behaviorally based treatment of depression is beneficial during the postpartum period and that data are lacking on the use of antidepressants.86,87
The generalizability of clinical trial treatment results may be reduced by restrictive inclusion and exclusion criteria. For example, excluding persons with greater disease severity and comorbidities may overestimate the effects of treatment.88,89 The treatment studies in our review generally excluded women with the greatest disease severity (such as history of psychosis, current suicidal ideation, and need for crisis management). Furthermore, bias related to small sample sizes has been reported in the psychotherapy literature90,91 and was a possible issue in our included studies, although one of those reports suggested that the statistical significance of pooled results was only minimally affected by this bias.91Quiz Ref ID Limiting trials to those that used screening for case-finding, rather than including trials with referral-based and self-selected entry, likely limited the degree of overestimation in this review. Trials that recruit through screening generally have smaller effect sizes than those enrolling self-selected volunteers from broad-based community recruitment through media ads and other means.92
There was very little evidence related to the harms of behaviorally based treatment in pregnant and postpartum women and no evidence that these treatments could be harmful. Data on the harms of antidepressant use in postpartum women were insufficient, with only a single small 12-week trial of fluoxetine. Evidence on harms of antidepressants was almost entirely limited to pregnant women, in contrast to the other bodies of literature in this review. The imbalance of evidence of benefits and harms on antidepressants is likely due to the difficulty of conducting randomized clinical trials in pregnant and breastfeeding women, yet observational studies are feasible and have the best chance of identifying rare harms, for which studies with very large sample sizes are needed. Results did suggest possible risk of harm. While these data were limited to observational studies, many were very large, population-based studies that controlled for depression status in some way. Nevertheless, causality could not be definitively determined from these studies. Quiz Ref IDPragmatically, CBT is not an option for every depressed woman because some will not want such therapy, some will not have access to trained CBT clinicians, and some may not respond fully to CBT treatment. For women with more severe depression who are not interested in or able to participate in CBT, further research is needed on the risks vs benefits of antidepressant therapy in order to guide shared decision making.
The evidence we included in this analysis targeted primarily postpartum women (except for harms of antidepressants, which pertained to prenatal women only). However, the little evidence found regarding pregnant women suggested comparable effects with postpartum women for benefits of screening, accuracy of the EPDS, and benefits of CBT.
Important limitations to the evidence reviewed were noted for all bodies of evidence, including the number of studies, study size, inconsistency in the specific outcomes reported, and applicability of trials to primary care in the United States. In addition, the scope of this review excluded areas of research that may be pertinent to depression screening in pregnant and postpartum women. For example, examination of screening instrument accuracy was limited to only 2 instruments, the PHQ and the EPDS. Nontrial evidence related to harms of screening or behaviorally based treatment was excluded, although the risks of these interventions are likely to be minimal. Furthermore, evidence of using antidepressants was limited to a prespecified list of serious harms; we did not examine other harms that, even if not life-threatening, might be clinically important, such as developmental outcomes (eg, autism) and behavioral outcomes (eg, crying or sleeping issues) in infants. Quiz Ref IDAlso, we did not review the effectiveness in pregnant and postpartum women of interventions that are widely available but generally offered outside of the health care setting (eg, yoga, exercise, or light therapy). As the scope of this review was limited to adults, studies focused on pregnant or postpartum females younger than 18 years were not included. In addition, a potential methodological limitation is reliance on other reviews to identify evidence for some years and, for harms of antidepressants, reliance on the synthesized work of previous reviewers. Although we assessed the pertinent sections of these reviews’ methods as being of good quality, it is nonetheless possible that they missed or incorrectly interpreted evidence.
The direct evidence suggested that screening pregnant and postpartum women for depression may reduce depressive symptoms in women with depression and reduce the prevalence of depression in a given population, particularly in the presence of additional treatment supports (eg, treatment protocols, care management, and availability of specially trained depression care clinicians). The indirect evidence showed that screening instruments can identify pregnant and postpartum women who need further evaluation and may need treatment. The only identified harm of treatment was the use of antidepressants during pregnancy, although the absolute risk of harm appeared to be small and CBT appeared to be an effective alternative treatment approach.
Corresponding Author: Elizabeth O’Connor, PhD, Kaiser Permanente Research Affiliates Evidence-based Practice Center, Center for Health Research, Kaiser Permanente Northwest, 3800 N Interstate Ave, Portland, OR 97227 (email@example.com).
Author Contributions: Dr O’Connor had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: O’Connor, Rossom, Groom, Burda.
Acquisition, analysis, or interpretation of data: O’Connor, Rossom, Henninger, Groom, Burda.
Drafting of the manuscript: O’Connor, Rossom, Henninger, Burda.
Critical revision of the manuscript for important intellectual content: O’Connor, Rossom, Groom, Burda.
Statistical analysis: O’Connor.
Administrative, technical, or material support: Rossom, Henninger, Groom, Burda.
Study supervision: O’Connor.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Funding/Support: This research was funded by the Agency for Healthcare Research and Quality (AHRQ) under a contract to support the US Preventive Services Task Force (USPSTF).
Role of the Funder/Sponsor: Investigators worked with USPSTF members and AHRQ staff to develop the scope, analytic framework, and key questions for this review. AHRQ had no role in study selection, quality assessment, or synthesis. AHRQ staff provided project oversight, reviewed the report to ensure that the analysis met methodological standards, and distributed the draft for peer review. Otherwise, AHRQ had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: We gratefully acknowledge the following individuals for their contributions to this project: the AHRQ staff; the USPSTF; and Evidence-based Practice Center staff members, who were Jillian T. Henderson, PhD, Smyth Lai, MLS, Keshia Bigler, MPH, and Elizabeth Hess, ELS(D); Bradley N. Gaynes, MD, MPH; and Gregory E. Simon, MD, MPH, for expert input on the review scope and draft report. USPSTF members, expert consultants, peer reviewers, and federal partner reviewers did not receive financial compensation for their contributions.
Additional Information: A draft version of this evidence report underwent external peer review from 4 content experts (Gregory E. Simon, MD, MPH, Group Health Research Institute; Barbara Yawn, MD, Department of Research, Olmsted Medical Center; Marian McDonagh, PharmD, Oregon Health and Science University; Ramin Mojtabai, MD, PhD, MPH, John Hopkins Bloomberg School of Public Health) and 4 federal partners: Centers for Disease Control and Prevention (CDC), National Institute of Mental Health (NIMH), Substance Abuse and Mental Health Services Administration (SAMHSA), and the US Air Force. Comments were presented to the USPSTF during its deliberation of the evidence and were considered in preparing the final evidence review.
Editorial Disclaimer: This evidence report is presented as a document in support of the accompanying USPSTF Recommendation Statement. It did not undergo additional peer review after submission to JAMA.