Mean posttreatment Hamilton Depression Rating Scale differences (solid circles) and 95% confidence intervals for drug-placebo comparisons. SSRI indicates selective serotonin reuptake inhibitor; open circles, estimates excluded from the meta-analysis; diamonds, estimates from the meta-analysis.
Mean posttreatment Hamilton Depression Rating Scale differences (solid circles) and 95% confidence intervals for drug-drug comparisons. SSRI indicates selective serotonin reuptake inhibitor; diamonds, estimates from the meta-analysis.
Mean posttreatment Hamilton Depression Rating Scale differences (solid circles) and 95% confidence intervals for psychological treatment comparisons. Diamonds indicate estimates from the meta-analysis; open circles, estimates excluded from the meta-analysis.
McCusker J, Cole M, Keller E, Bellavance F, Berard A. Effectiveness of Treatments of Depression in Older Ambulatory Patients. Arch Intern Med. 1998;158(7):705–712. doi:10.1001/archinte.158.7.705
To determine the effectiveness of acute-phase pharmacological and psychological treatments of depression in older ambulatory patients by systematically reviewing original research relevant to this topic.
Searches in MEDLINE and PsycINFO and manual reviews of bibliographies located 233 articles. Of these, 40 (37 different studies) met our 8 inclusion criteria: original research, written in English or French, subjects 55 years and older, diagnosis of depression, outpatient or community setting, prospective controlled study design, acute-phase pharmacological or psychological treatment, and outcome measure of depression. Two independent reviewers assessed the methodological quality of each article using a standard form and a quality score was computed. Quantitative data on levels of depression at the end of treatment were abstracted. Results were grouped by specific treatment comparison (type of treatment and type of control group). For comparisons that used the Hamilton Depression Rating Scale, we computed mean posttreatment differences. Effect sizes were computed from the Hamilton Depression Rating Scale or an alternative scale.
In studies that compared active drugs with placebo, the heterocyclic drugs significantly reduced the posttreatment Hamilton Depression Rating Scale score (mean difference, −5.78; 95% confidence interval, −8.31 to −3.25); other drugs had smaller effects. In studies that compared active drugs, there were no significant differences overall between different classes of drugs; selective serotonin reuptake inhibitors appeared to be as effective as heterocyclic drugs. Rational psychological treatments performed significantly better than no treatment (mean posttreatment Hamilton Depression Rating Scale difference, −7.25; 95% confidence interval, −10.10 to −4.40) but not significantly better than that for controls who received similar attention. Adjustment for the study quality score did not affect these results.
Based on comparisons with untreated controls, heterocyclic antidepressants and rational psychological therapies appear to be the most effective treatments for older ambulatory patients with mild to moderate depression. Based on drug-drug comparisons, selective serotonin reuptake inhibitors appear to be as effective as heterocyclic drugs. However, overall, the magnitude of the treatment effects is modest. Limitations in the quantity and quality of appropriate studies suggest a sober approach to treatment in this population.
FIVE PUBLISHED meta-analyses of treatments of depression in the elderly (3 on antidepressants,1- 3 1 on psychosocial therapies,4 and 1 on both5) and a consensus conference6 have concluded that pharmacological and psychosocial treatments are effective. However, these reviews included many studies of psychiatric inpatients who usually have more severe depression than that seen in ambulatory settings and did not compare pharmacological and psychological approaches. Therefore, we performed this meta-analysis to determine the effectiveness of acute-phase pharmacological and psychological treatments of depression in the older ambulatory population. We included studies of ambulatory patients conducted in outpatient, community, and nursing home settings.
This research was composed of the following stages: identification of potentially relevant studies, application of inclusion and exclusion criteria, systematic review of studies, computation of a quality score for each study, and quantitative meta-analysis.
We identified potentially relevant studies using computerized and manual search strategies. The computerized search included 2 databases: MEDLINE (1981 to March 1995), and PsycINFO (1984 to March 1995). In MEDLINE, the subject headings used were "depressive disorder" or "depression" (as major concepts) combined with "aged" and "clinical trials" (exploded to pick up all types of clinical trials). In PsycINFO, we used the subject headings "major depression" and "aged" and "therapy or treatment." The PsycINFO search covered a shorter period because the search strategy was less efficient than the MEDLINE strategy. The manual search involved review of the bibliographies of articles and books known to us as well as those from the articles and books identified in the computer search.
Each article identified in the search was first screened by one of us (E.K.) to determine whether it met our first 2 inclusion criteria: an original study and published in English or French (we had insufficient resources to translate articles published in other languages). Each article meeting these criteria was reviewed by at least 2 of us (J.M., M.C., or E.K.) to determine whether it met the following 6 inclusion criteria.
Either all subjects were 55 years or older or at least 20% of the sample was 60 years and older and the results were reported separately for this subgroup. (These cutoff points were based on those used in the studies reviewed.)
Study subjects met criteria for depression using an accepted diagnostic system or a cutoff point on a depression symptom scale. Diagnostic systems included those based on 3 versions of the Diagnostic and Statistical Manual of Mental Disorders7- 9 or the Research Diagnostic Criteria.10
Subjects were drawn from community, outpatient, or nursing home settings. Studies were excluded if the source of subjects was unspecified.
Study design consisted of a longitudinal comparison of at least 2 treatment groups (either active treatment vs control or comparison of 2 active treatments).
Acute-phase pharmacological or psychological treatment was provided.
Comparative effectiveness of the treatment(s) was reported using an accepted outcome measure of depression.
Studies meeting our inclusion criteria were reviewed by one of us (J.M., M.C., or E.K.) using a standard abstraction form.
Treatments were classified as pharmacological, psychological, and other. Pharmacological treatments were classified by one of us (M.C.) into the following categories: heterocyclic antidepressants, antianxiety agents, selective serotonin reuptake inhibitors (SSRIs), and other. When possible, the mean or range of daily doses was recorded. Psychological treatments were classified as rational or emotive. Rational treatments were those based primarily on learning new ways of thinking or behaving (eg, cognitive or behavioral therapy). Emotive treatments were those based primarily on expressing and understanding feelings in the context of a supportive relationship (eg, supportive or dynamic therapy).
For those drug studies that did not use an active treatment comparison group, the control group always received a placebo. For the psychological treatments, we classified the control group as follows: "attention" (controls received a nonspecific intervention that provided similar attention to that received by the treatment group) or "untreated" (eg, delayed treatment or usual care).
Source of subjects was classified as follows: primary care, general medical outpatient clinic, psychiatric outpatient clinic, other or unspecified outpatient clinic, community, home care, nursing home or residence for the elderly, and mixed sources.
Context of research variables included country where study was conducted and source of financial support.
We computed 2 quality scores: the first to assess methodological quality of the study and the second to assess quality of treatment. The methodological quality of each study was assessed using the following criteria adapted from those proposed by Chalmers and colleagues:11
Description of selection criteria: 3, adequate; 2, fair; and 1, not adequate.
Description of treatment(s): 3, adequate; 2, fair; and 1, not adequate.
Treatment assignment: 3, random; 2, partially random; and 1, nonrandom.
Blinding (masking). For pharmacological treatments: 3, explicit statement that drug(s) and/or placebo were identical in appearance; 2, placebo used but no explicit statement about identical appearance; and 1, neither. For other treatments: 3, treatment groups received equal attention and measurement of outcomes was blind to treatment assignment; 2, treatment groups received equal attention but blinding of measurement of outcomes was not done or not stated; and 1, neither.
Description of treatment groups at the beginning of study: 3, table provided showing distribution of important prognostic variables by treatment group; 2, partial information provided; and 1, neither.
Description of withdrawals: the score was obtained after discussion between the 2 raters based on their independent abstraction of data on number of subjects assigned to and withdrawn from each treatment group: 3, complete agreement; 2, agreement after discussion and rereading the article; and 1, lack of agreement even after further discussion.
Criteria 1, 2, 4, and 5 were rated independently by 2 raters (J.M., M.C., or E.K.) and an average score was calculated. The ratings on criteria 3 and 6 were given following discussion. We summed the ratings of the 6 criteria into a composite quality score with a possible range from 6 to 18, a higher score indicating better quality. For the first 9 articles, we discussed the ratings and developed consensus ratings; thereafter we did not discuss the ratings (except for criteria 3 and 6). Interrater agreement for the quality score was assessed with the concordance correlation coefficient.12
The following were abstracted but not included in the quality score: blinding of allocation, prior sample size estimates, and multiple looks at the results because they were not reported explicitly in any of the studies; power analysis was not applicable to all studies.
The quality of pharmacological treatment was determined by summing two 3-point scales: treatment duration (3, ≥12 weeks; 2, 8-11 weeks; and 1,< 8 weeks) and adequacy of dosage, determined by consensus among 2 geriatric psychiatrists (3, adequate; 2, probably adequate; and 1, probably inadequate). The score for the quality of the psychological treatment was based only on the treatment duration score.
Because a single study might contain several comparisons between treatment and/or controls, our analysis was based on comparisons between either a treatment and control or between 2 treatments. These comparisons were grouped by type of treatment and type of control.
Two of us (A.B., F.B., or J.M.) abstracted the following data, if available: sample sizes; mean depression rating scale scores at the beginning and end of treatment for each study group; SDs of the posttreatment scores; and the P value for posttreatment comparison between study groups. Discrepancies were resolved by consensus. If a study used multiple depression scales, we selected the most frequently used scale, the Hamilton Depression Rating Scale (HDRS), if available.13 If the HDRS was not used we selected an alternative scale using the following hierarchy, based on the relative frequency with which the scales were used in the studies: Beck Depression Inventory, Zung Depression Rating Scale, Geriatric Depression Scale, and any other depression rating scale. Studies that computed a percentage change on the HDRS but did not include mean HDRS scores were included in the latter.
For each comparison we computed our primary measure of effect, the mean posttreatment difference in HDRS scores. To assess potential bias because of the exclusion of studies that did not use the HDRS, we also computed a mean effect size (difference between treatment and control group posttreatment mean depression scale scores divided by the pooled SD and adjusted by a multiplicative factor proposed by Hedges and Olkin14) for each comparison, based either on the HDRS or an alternative scale. If the SDs were not available but the P value of the statistical test comparison was reported, we used it to obtain the pooled SD.15 If the P value was reported as less than or equal to "P," we conservatively set it to P. If the result of the test was reported as not significant, we arbitrarily set P= .50. We considered the P values as 2-tailed to derive the pooled SD, unless it was clearly mentioned that a 1-tailed P value was reported.
Within each comparison group, we combined the posttreatment mean differences and effect sizes across studies using the random effects model proposed by DerSimonian and Laird,16 and 95% confidence intervals (CIs) were calculated. Within each group we computed the combined posttreatment mean differences and effect sizes first based on comparisons that used the HDRS and then we combined effect sizes based on all comparisons. In addition, within each comparison group a test of homogeneity of effects across studies was performed,15 and we considered the effects as heterogeneous when the level of the test was less than 0.10. Quality-adjusted combined effect measures were also computed using the quality score as a weighting factor.15 Since the results were similar for the quality-adjusted and the unadjusted effect measures, we present only the unadjusted measures.
If 2 or more comparisons within a group were derived from the same study we used only 1 comparison from the study for the quantitative meta-analysis, selecting the one with the largest effect size. (Comparisons excluded from the meta-analysis are shown as open circles in the figures.)
All statistical analyses were conducted using SAS statistical software (version 6.10 for Windows).17
A total of 233 articles were retrieved that met the first 2 inclusion criteria: original studies published in English or French. Of these, 184 articles were excluded for the following reasons: 44 were not controlled trials, 84 did not meet our age criteria, 42 did not meet the research setting criteria, 11 did not meet the criteria for depression, and 3 did not include an outcome measure of depression. Of the 49 articles18- 67 remaining, 5 were excluded (1 was a duplicate46 and 4 were follow-up reports37,40,41,61 of other studies, with no new relevant data). Finally, we excluded 3 nonacute-phase drug studies27,38,54 and 4 studies of management interventions.19,21,24,48 Of the remaining 37 studies, 3 studies20,47,62 reported on comparisons involving both drugs and psychological treatments. Because the criteria for rating drug studies, where it was possible to use a placebo, were different from those for studies of psychological treatments, we conducted separate quality ratings of the drug and psychological treatment components of these 3 studies. Thus, we carried out a total of 40 quality ratings on the 37 studies.
The mean quality score from the 40 ratings was 14.9 (SD, 2.2). In the 32 ratings conducted by 2 independent raters, the concordance correlation coefficient was 0.84 (95% CI, l 0.55-0.95). Means (SDs) for each treatment group were 13.9 (2.4) for psychological treatment and 15.5 (1.9) for drug treatment. Means (SDs) were 14.1 (2.7) for studies published before 1985, 15.2 (1.5) for those published from 1985 through 1989, and 15.7 (1.7) for those published in 1990 or later.
There were 26 studies of pharmacological treatments20,22,23,25,26,28- 32,39,42,43,45,47,49- 53,56,58,62,65- 67 and 14 studies of psychological treatments,18,20,33- 36,47,55,57,59,60,62- 64 (Table 1). The majority of the studies were performed in the United States. Studies of drugs were more likely to have been published during the 1990s, while those of psychological treatments were mostly published before 1990. Sources of funding also differed among the groups: drug studies were more often funded by pharmaceutical companies or an unknown source, and psychological treatment studies by government sources. Subjects for the drug studies were recruited mainly from outpatient clinics whereas those for the psychological treatments were more likely to be recruited from the community, nursing homes, or mixed sources. Only 4 studies recruited patients from primary care settings.31,45,49,65 Finally, the drug studies used a formal diagnosis of depression as a selection criterion more often than the psychological studies.
A total of 21 drug-placebo comparisons are shown in Figure 1, grouped by drug type. Among 12 comparisons of a heterocyclic drug with placebo, 9 were statistically significant: 4 of imipramine, 2 of nortriptyline, 2 of nomifensine, and 1 of doxepin. There was significant heterogeneity in the study results (P=.03); this could not be explained either by variability in the treatment quality score or the mean HDRS score at baseline. Heterocyclic drug studies as a group (n=9 after exclusion of 1 comparison in 3 studies with 2 comparisons)26,47,52 had mean posttreatment differences in HDRS scores of −5.78 (95% CI, −8.31 to −3.25).
Regarding antianxiety drugs, buspirone was significantly more effective than placebo but alprazolam was not. Overall, antianxiety drugs were not significantly better than placebo and there was no significant heterogeneity (P=.22). Among other drug-placebo studies, significant benefits were found for fluoxetine,65 trazodone,42 and phenelzine39 (HDRS differences, −2.40, −7.50, and −0.75, respectively).
Seventeen drug-drug comparisons, mainly involving heterocyclic drugs, are shown in Figure 2. The mean differences were smaller than those in the placebo comparisons and none was statistically significant. There was significant heterogeneity among the 5 comparisons of SSRIs with heterocyclic drugs (P=.04).
Twelve comparisons between psychological treatments and controls are shown in Figure 3. None of 4 comparisons of emotive treatments with untreated controls was statistically significant, and there was no significant heterogeneity in the 3 comparisons included in the meta-analysis (P=.28); no comparisons of an emotive treatment with an attention control used the HDRS. Five of 6 comparisons of rational treatments with untreated controls were statistically significant. Overall, rational treatments performed significantly better than untreated controls; the mean HDRS difference was −7.25 (95% CI, −10.10 to −4.40) and the P value for homogeneity was .48. However, neither of 2 rational treatments performed significantly better than attention controls.
There were 6 comparisons between different psychological treatments that used the HDRS (Figure 3). Neither the individual comparisons nor the summary comparison indicated significant differences between rational and emotive treatments, and there was no significant heterogeneity in the 4 comparisons included in the summary estimate (P=.80).
There were 16 comparisons derived from 9 studies18,23,33- 35,53,55,57,67 that used a depression outcome scale other than the HDRS. In comparison groups that included 2 or more treatment comparisons, at least 1 of which did not use the HDRS, 7 additional comparisons were found; none of these was statistically significant and their inclusion had little impact on the effect sizes or homogeneity statistics (Table 2). Among 9 other individual comparisons that did not use the HDRS, not included in these comparison groups, only those of Gerovital H3 (a solution containing 2% procaine hydrochloride with benzoic acid as a preservative and potassium metabisulfite as an antioxidant) vs either imipramine or placebo67 and reminiscence therapy vs attention controls33 showed a significant benefit of treatment.
This review demonstrates that, in comparison with placebo controls, only 2 types of drugs, heterocyclic drugs and SSRIs, are effective in older ambulatory patients in the short-term. The evidence on SSRIs is less convincing than for heterocyclic drugs because that on the former is based on a single study, and the posttreatment mean difference in depression severity was small. However, because of the ethical problems in using placebo controls when effective treatment is available, most of the evidence on the effectiveness of SSRIs comes from comparisons with active treatment controls who received heterocyclic drugs. In these latter studies, the 2 classes of drugs appeared to be equally effective. It is therefore reasonable to conclude that both heterocyclic drugs and SSRIs are effective treatments for this population. However, this conclusion should be tempered by the rather modest mean posttreatment difference in the severity of depression (approximately 6 points on the HDRS). In many ambulatory settings, limited physician expertise, poor compliance with treatment,68,69 presence of chronic medical conditions, and concomitant treatment with other medications may reduce these treatment effects. Only 3 drug-placebo trials have been conducted in primary care settings31,49,65 and only 1 found the treatment (fluoxetine) to be marginally effective.65 Adverse effect profiles, however, which appear to be more benign for SSRIs than for heterocyclic drugs, may counterbalance these small treatment benefits.
As for psychological treatments, rational therapies (eg, cognitive or behavioral therapy) look more promising than emotive therapies. When rational therapies were compared with untreated controls, the mean difference in HDRS scores (7 points) was similar to that for heterocyclic antidepressant drugs, but when rational therapies were compared with attention controls, this difference disappeared, suggesting that attention alone may be an effective treatment for many older ambulatory patients with depression. The benefits of emotive therapies may be less than those of heterocyclic drugs or rational psychological therapies, but results of one trial33 of structured reminiscence therapy were positive and this approach should be replicated. Although many of these psychological approaches might be incorporated into clinical practice and could be an attractive alternative to drugs, it appears that much of the effects of psychological treatment can be attributed to the nonspecific effects of extra attention to the patient. Knowledge of this nonspecific attention effect may be used by the clinician or an adjunctive attention intervention might be implemented by nonprofessionals.
However, the evidence supporting drug and psychological approaches to the treatment of depression in older, ambulatory patients is far from conclusive, for the reasons outlined below.
Differences in sample selection may account for differences in the results of the various treatments that we reviewed. For example, the drug studies were conducted mainly in outpatient settings while the psychological studies more often used community samples; drug studies used formal diagnostic selection criteria more often than psychological studies. Regardless of the setting, studies recruited primarily convenience or volunteer samples; the results may not be generalizable to patients seen in clinical practice, particularly in primary care settings.70
Many of the treatments studied were short-term, approximately 1 in 4 lasted 4 weeks or less. Only 2 studies47,62 included direct comparisons of drugs with psychological treatment and none assessed multifaceted interventions, except for the management studies that were excluded from this review because of their small number and heterogeneity.
The most frequently used scale to measure depressive symptoms, the HDRS, was developed to rate symptoms in patients with serious affective disorders. This scale is heavily weighted for somatic (eg, anorexia, weight loss, or anergy) and melancholic (eg, feelings of guilt, hypochondriasis, or loss of insight) symptoms, and may not be an appropriate measure of the less severe symptoms of depression seen in ambulatory settings. Nevertheless, the effect sizes that included other depression rating scales were of a similar order of magnitude to those of the HDRS. The studies in this review rarely included either posttreatment follow-up measures or measures of physical or cognitive functioning.
Nonspecific effects of treatment may include the placebo effect (due to the expectation that the treatment will be effective), and the Hawthorne effect (due to knowledge that one is being studied). Because a true placebo is not possible for psychological treatments, it is often recommended that an effort be made to control for the increased attention involved in such treatments by using a control group that receives an equivalent amount of attention, but without the specific psychological component used in the experimental group.71 Our results indicate the importance of choice of control group; larger effects of treatment generally were found when an untreated control group was used.
Other frequent methodological problems in the studies reviewed include failure to describe blinding of randomization11 or to analyze the results by intention-to-treat.
Our meta-analysis is limited to those original studies published in English or French that were identified in our search strategy, and may not be representative of published studies in all languages. Since unpublished studies were excluded, publication bias is of concern. However, since publication bias might be expected to favor studies with positive results, inclusion of unpublished studies could be expected to further reduce the already modest treatment effects we observed. Our meta-analysis was further limited by the small numbers of published studies available; for some treatments only 1 study was available so that no pooling was possible. Finally, we did not assess adverse events, an important consideration in selecting among effective treatments.
In conclusion, we suggest that clinicians can expect only modest benefits from currently available treatments of depression in older ambulatory patients, taking into account their modest effectiveness and the limitations in the evidence highlighted in this meta-analysis. While awaiting more substantial evidence of effectiveness from more representative study populations, clinicians may wish to be cautious in their use of heterocyclic drugs and SSRIs in this population and consider using psychological or attention interventions, particularly for patients with milder depression.
Accepted for publication August 15, 1997.
Presented at the 48th Annual Meeting of the Canadian Psychiatric Association, Quebec City, Quebec, October 2, 1996.
Reprints: Jane McCusker, MD, DrPH, Department of Clinical Epidemiology and Community Studies, St Mary's Hospital, 3830 Lacombe Ave, Room 2508, Montreal, Quebec, Canada H3T 1M5.