Beekman ATF, Geerlings SW, Deeg DJH, Smit JH, Schoevers RS, de Beurs E, Braam AW, Penninx BWJH, van Tilburg W. The Natural History of Late-Life DepressionA 6-Year Prospective Study in the Community. Arch Gen Psychiatry. 2002;59(7):605-611. doi:10.1001/archpsyc.59.7.605
Copyright 2002 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2002
Accurate assessment of the natural history of late-life depression requires frequent observation over time. In later life, depressive disorders fulfilling rigorous diagnostic criteria are relatively rare, while subthreshold disorders are common. The primary aim was to study the natural history of late-life depression, systematically comparing those who did with those who did not fulfill rigorous diagnostic criteria.
Within the Longitudinal Aging Study Amsterdam, a large cohort of depressed elderly persons (n = 277) was identified and followed up for 6 years, using14 observations. Depression was measured using self-reports (the Center for Epidemiological Studies Depression Scale) and diagnostic interviews (the Diagnostic Interview Schedule). The natural history was assessed for symptom severity(Center for Epidemiological Studies Depression Scale score), symptom duration, clinical course type, and stability of diagnoses.
The average symptom severity remained above the 85th percentile of the population average for 6 years. Symptoms were short-lived in only 14%. There were remissions in 23%, an unfavorable but fluctuating course in 44%, and a severe chronic course in 32% (percentages do not total 100 because of rounding). Comparing the outcome, there was a clear gradient in which those with subthreshold disorders had the best outcome, followed by those with major depressive disorder, dysthymic disorder, and double depression. However, the prognosis of subthreshold disorders was unfavorable in most cases, while this group was at high risk of developing DSM affective disorders.
The natural history of late-life depression in the community is poor. DSM affective disorders are relatively rare among elderly persons, but do identify those with the worst prognosis. However, subthreshold depression is serious and chronic in many cases.
IN LATER LIFE, depression is a common disorder, with well-documented consequences for well-being, daily functioning, mortality, and service utilization.1- 10 Although depression is generally regarded to be highly treatable throughout the life cycle,11,12 most elderly persons with depression remain untreated.13,14 Because the primary aim of treatment is to change the prognosis of depression, detailed information on its natural history is of vital importance. Most previous studies have focused on younger adults recruited in treatment settings. The results suggest that the long-term outcome is heterogeneous, with many patients experiencing fluctuating symptom levels over time.15,16 The long-term risk of relapse may be as high as 80%.17
There is good reason to suspect that the prognosis of depression changes with age. The prevalence of known prognostic factors, such as physical illness, cognitive impairment, and lack of support, increases with age, suggesting that the prognosis may deteriorate in later life.18,19 Several community-based studies20- 27 focusing on elderly persons have recently reported data. In a meta-analysis13 of available studies, a 50% rate of chronicity was found in those alive at follow-up. An important weakness of available studies is that most relied on only 2 measurements, lacking information about the intervening interval. More frequent observations are necessary to reliably assess the prognosis of late-life depression. A second problem is the definition and measurement of depression. It is often assumed that depression is a disorder occurring at a continuum of severity. Recent studies1,28,29 suggest that major depression is relatively rare, while subthreshold depressive disturbances are particularly common in later life. These subthreshold disorders seem to have similar consequences for the well-being and functioning of elderly persons when compared with major depressive disorder (MDD).2,4 Moreover, recent studies5,10 of mortality suggest that major depression has the stronger effect on mortality but that subthreshold depression (SUBD) also has a unique and significant effect on mortality in men.
These findings have led to a debate in the literature, with critics suggesting that current criteria for affective disorders are less appropriate for older than for younger adults.30,31 A test would involve systematically comparing the prognosis of depressive states that do and do not fulfill rigorous DSM diagnostic criteria in a prospective community-based study. The available follow-up studies have not been able to do this, leaving a systematic comparison of the prognosis of depression in later life, measured at different levels of caseness unstudied.
In the present community-based study, the natural history of depression was assessed during a long interval (6 years), using 14 observations and standardized screening and diagnostic instruments. At 3 points (baseline, 3 years, and the end point), diagnostic interviews were administered, allowing a detailed assessment of the prognosis of DSM disorders (MDD and dysthymic disorder [DYSTHD]) and less well-defined depressive states (SUBD). The primary aims of the study were to describe the natural history of late-life depression in the community, systematically comparing the prognosis in those who did and did not fulfill rigorous diagnostic criteria for affective disorders.
The Longitudinal Aging Study Amsterdam is a 10-year prospective study of the well-being and functioning of older people in the Netherlands.32 Sampling, response, and procedures have been detailed elsewhere.4,5,33- 35 A large (n = 3056) representative sample of community elderly persons (aged55-85 years) was interviewed at baseline in 1992 or 1993. The sample was stratified for age, sex, and level of urbanicity. At baseline, depression was assessed both in terms of symptoms and at the diagnostic level. The follow-up consisted of frequent (5-month) postal questionnaires and infrequent (3-year) home interviews(Table 1).35,36 This procedure resulted in a maximum of 14 observations, including 3 diagnostic assessments (at baseline, after 3 years, and after 6 years), covering 6 years. Informed consent was obtained before the study, in accordance with legal requirements in the Netherlands.
Eligible for inclusion in the depressed cohort were all subjects scoring above the cutoff on the screener (Center for Epidemiological Studies Depression Scale [CES-D] score of ≥16 points) at baseline (n = 448).33 However, inclusion was only deemed appropriate when a complete diagnostic interview was available at baseline and when a minimum of 2 follow-up observations was available. This reduced the sample to 277 depressed subjects (62% of the original 448 depressed subjects). The mean number of valid observations was9.81 (SD, 3.92). Data on the response per observation are summarized in Table 1. Most of the analyses were limited to the depressed cohort. However, for some analyses, data from a similarly studied nondepressed cohort were used as the control.35 This cohort was defined as a random sample of those scoring below the threshold on the CES-D at baseline, which was similarly approached for diagnostic interviews and thereafter similarly followed up for 3 years.
Depression was measured using a self-report rating scale (CES-D) and a diagnostic instrument (Diagnostic Interview Schedule).37,38 The CES-D is a 20-item scale, developed to measure depressive symptoms in the community. It has been widely used in older community samples and has good psychometric properties in this age group.39,40 The Dutch translation had similar psychometric properties in 3 previously studied elderly samples.41 Because of the emphasis on affective items in the scale, the overlap with symptoms of physical illness is minimal.42,43 The total score on the CES-D ranges between 0 and 60. To identify those with clinically relevant levels of symptoms, the generally used cutoff score of 16 or more was used.41,42 Using this score, the criterion validity of the CES-D for MDD was excellent (sensitivity, 100%; and specificity, 88%).44 As described, the data were partly gathered in face-to-face interviews and partly using postal questionnaires. Previous analyses36 demonstrated that there was a mode effect: the scores derived from the postal questionnaires were systematically higher than those from the interviews. Because postal questionnaires and interviews alternated, it was possible to quantify the mode effect and make an appropriate adjustment, using a generalized T-score transformation.35,36,45,46 To diagnose MDD and DYSTHD, the Diagnostic Interview Schedule was used. Those with both MDD and DYSTHD were categorized separately as having double depression. The Diagnostic Interview Schedule was designed for epidemiological research and has been widely used among elderly persons. Interviewers were fully trained by certified staff, using the official Dutch translation of the Diagnostic Interview Schedule.47 For the present article, the prognosis of MDD and DYSTHD will be systematically compared with that of SUBD. Subthreshold depression was defined as a clinically significant level of depressive symptoms (CES-D score of ≥16), but the subject did not fulfill the diagnostic criteria for either MDD or DYSTHD.
The course of depression was described by symptom severity, symptom duration, and clinical course type. Symptom severity was defined as the average CES-D score over all observations over time. At the start of the study, it was hoped that it would be possible to estimate the average duration of depressive episodes. However, clearly delineated episodes were rare, most exhibiting either a chronic or a fluctuating course of symptoms. Therefore, the percentage of observations in which the subjects reported elevated symptom levels (CES-D score of ≥16) was used to estimate the proportion of time they were depressed. This was used as the measure of symptom duration. Clinically meaningful interpretation of the data is enhanced when the observations within subjects are collapsed into clinical types of course of depression. The course types distinguished were remission, remission with recurrence, a chronic–intermittent course, and a chronic course. A remission was defined as a combination of (1) a relevant(described later) decline of symptoms and (2) the subject remaining nondepressed(CES-D score of <16 and no DSM affective disorder diagnosis) throughout the rest of the study. A remission with recurrence was defined as a remission in which the subject had a relevant increase of symptoms later on in the study. A chronic–intermittent course type was defined as more than 1 remission, followed by a recurrence of symptoms. A chronic course was defined as 80% or more depressed observations.
For the classification of course types, criteria to define a relevant change had to be defined; these criteria had to be statistically sound and clinically relevant. To prevent random fluctuation from having undue influence on the results, a statistically relevant change was defined, taking into account the reliability and the average score and SD of the CES-D in this cohort.48 The criterion for a reliable change thus calculated was 3.4. To be clinically relevant, a change of 5 points would qualify as a middle to large effect size in the literature on power analysis.36,49 Therefore, a change of 5 CES-D points was chosen as the criterion for a relevant change. This had the added advantage that it is similar to what earlier studies21,22,50 using the CES-D have used as the criterion for a relevant change. For defining course types, a further criterion was that the cutoff of symptoms that is generally regarded to be clinically meaningful was crossed. Therefore, the criterion for a relevant change was that, between measurements, the change was 5 points or more, thereby crossing the cutoff of 16.
Variables used in subgroup analyses of baseline predictors of the course of depression were age, sex, chronic physical illness (0 vs ≥1),51 functional impairment (0 vs ≥1),52 cognitive impairment (Mini-Mental State Examination score of <24),53 and the size of the network.54
When comparing the average severity and duration of symptoms across diagnostic and course types, and in subgroup analyses of prognostic factors, an analysis of variance, χ2 statistics, and Spearman rank correlations were calculated. In the analyses of factors predicting attrition of subjects, bivariate (χ2 statistics, analysis of variance, and relative risk estimates) and multivariate (logistic regression) analyses were used. Effects of missing observations within participants were studied using correlations, analysis of variance, Friedman rank tests, and Cochran statistics. In all analyses, conventional criteria for statistical significance (α<.05) and 2-tailed tests were used.
Two types of attrition will be described: loss of subjects and loss of observations within participating subjects. In bivariate analyses, the characteristics of the 277 participants (Table 2) were compared with those of the 171 depressed subjects who dropped out. For baseline depression level, the participants' average CES-D score and the score of the dropouts did not differ significantly (22.6 vs 23.5) (F1,446 = 1.67, P = .20). There were also no differences for sex (χ21 = 0.94, P = .33), marital status (χ21= 1.98, P = .16), living in Amsterdam (χ21 = 0.47, P = .50), or chronic physical illness (χ21 = 3.21, P = .07). Attrition was predicted by age (F1,446 = 9.08, P = .003), lower level of education (χ21 = 5.82, P = .02), living in an institution(χ22 = 15.63, P<.001), cognitive impairment (χ21 = 18.31, P<.001), and functional limitation (χ21= 7.61, P = .006). In multivariate analyses (logistic regression), only living in an institution (β, .89; SE, .39; P = .01) and cognitive impairment (β, .77; SE, .27; P = .004) remained as unique predictors of attrition. Looking into mortality, the percentage deceased on January 1, 2000, was 28% among participants and 53% among those who dropped out (relative risk, 2.90; 95% confidence interval,1.95-4.32). This illustrates that death and frailty were important reasons for not being able to contribute to the study.
Looking at loss of observations within participants, 3 analyses were performed. There was no correlation between the number of observations and the baseline depression level (r = −0.05, P = .49). For the average level of depressive symptoms over 6 years (r = −0.22, P<.001) and the time depressed (r = −0.26, P<.001), there was a clear association, indicating that those with more persistent or more severe depression were more likely to miss observations. For the course types, the average number of valid observations was similar for those who experienced remission (8.84 observations), remission plus recurrence (8.32 observations), and a chronic course (8.58 observations). Those with a chronic–intermittent course type had more valid observations(12.29 observations) (F3,273 = 21.99, P<.001).
At baseline, the average age of the participants was 71.8 years (SD,8.8 years). Table 1 shows that the average CES-D scores decreased considerably between the baseline interview and the first follow-up (mean difference, 5.30; t204 = 7.87 [paired t test]; P<.001). After that, the average scores stabilized. Because this may be confounded by attrition during follow-up, several tests were performed. In those participating in all waves of the first 3 years of follow-up (n =118), average scores were stable (Friedman rank test χ25 = 4.48, P = .48). In those participating in all 13 follow-up measures after baseline (n = 76), there was a small decline(Friedman rank test χ212 = 41.6, P<.001). Considering the diagnostic assessment in those with valid measures at all 3 measurements (n = 97), the percentages with MDD (Cochran Q2 = 3.58, P = .17) and DYSTHD (Cochran Q2 = 0.36, P = .84) were stable.
The average CES-D score of all respondents, averaged over all available assessments during the 6-year follow-up period, was 17.28 (SD, 6.61). Compared with the distribution in the whole Longitudinal Aging Study Amsterdam sample at baseline, a score of 17.28 ranks within the top 15%.33 For the duration of symptoms, only 14% were depressed less than 20% of the time, while 46% were depressed more than 60% of the time. For clinical course types, there were only 23% remissions; 12% had a remission with recurrence,32% had a chronic–intermittent course, and 32% had a chronic course(percentages do not total 100 because of rounding). Statistical comparison of the 4 course types revealed highly significant differences for average symptom severity over time (F3,273 = 153.0, P<.001) and symptom duration (F3,273 = 244.0, P<.001). Regarding the potential biasing effects of loss of observations, analyses were performed again in those with complete observations during the first 3 years (n = 118) and in those with all 14 observations (n = 76). The percentages who experienced stable remission and chronicity declined somewhat when subjects with more available observations were selected, while the percentage with a chronic–intermittent course type increased. Within course types, the average level of symptoms and the average percentage time depressed were unaffected by the completeness of the data.
In subgroup analyses of potential predictors of symptom severity, symptom duration, and course types, there were no sex differences. Comparing 3 age groups, the older old (75-85 years at baseline) had a higher average symptom severity (F2,274 = 3.20, P = .04) and duration of symptoms (F2,274 = 6.23, P= .002), but there was no significant difference in course types (χ26 = 8.51, P = .20). Those with cognitive impairment did not have a higher average symptom severity (F1,272 = 2.92, P = .09), but did have a longer duration of symptoms (F1,272 = 4.68, P= .03) and were more likely to have a chronic course type (χ23 = 9.25, P = .03). Functional limitation was the strongest predictor for the severity (F1,270 = 7.07, P = .008) and duration (F1,270 = 9.19, P = .003) of symptoms and for a chronic course type (χ23 = 12.97, P = .005). In those without functional limitations, only 26% had a chronic course type, which compares with 74% in those with functional limitations. Chronic physical illness did not predict the outcome. Those with smaller networks had more severe (r = 0.20, P = .002) and persistent(r = 0.02, P = .02) symptoms and were more likely to have a chronic course type (F3,252 = 2.65, P = .05).
Comparing the severity and duration of symptoms across diagnostic subgroups, there was a gradient, in which those with SUBD had the lowest severity and duration of symptoms, while those with double depression had the highest severity of symptoms and were the most likely to have a chronic course type (Table 3). Statistical testing confirmed this for the associations between diagnosis and symptom severity (F3,273 = 19.40, P<.001) and duration (F3,273 = 10.15, P<.001). The percentage of remissions was highest in those with SUBD and MDD, lower in those with DYSTHD, and even lower in those with double depression. The percentage with an unfavorable but fluctuating course (remission and recurrence plus chronic-intermittent course) was highest in those with SUBD (49%), decreasing from 44% in those with MDD to 36% in those with DYSTHD and to 19% in those with double depression. A severe chronic course was least prevalent among those with SUBD (25%), increasing in those with MDD (35%) and DYSTHD (52%), to 77% among those with double depression. Statistical testing confirmed the statistical significance of the association between diagnosis and course type (χ29 = 31.52, P<.001). Regarding potential bias due to loss of observations, all analyses were performed again in those with complete observations during the first 3 years (n = 118) and in those with all 14 observations (n = 76). Selecting subjects with more available observations had little effect (results not shown).
Considering the stability and change of the diagnoses across the 3 diagnostic assessments, those with SUBD at baseline were the most likely to be well at3 and 6 years (46% and 48%, respectively). For those with SUBD at baseline, the risk of developing either one of the DSM diagnoses was 18% at 3 years and 16% at 6 years. This compares with 31% and 40% for those with MDD, 60% and 69% for those with DYSTHD, and 73% and 67% for those with double depression at baseline, respectively.
A final set of analyses was performed to compare the prognosis of those with SUBD with the prognosis of a random sample of those not depressed at baseline. During the first 3 years of follow-up, the average CES-D score in the cohort with SUBD was 17.39, while the score of the nondepressed cohort was 6.91 (F1,577 = 546.4, P<.001). The percentage time depressed among those with SUBD was 61.8%, which compares with 8.1% in the nondepressed group (F1,577 = 40.97, P<.001). Of those not depressed at baseline, 5% were diagnosed as having either MDD or DYSTHD at the 3-year follow-up, which increased to 12% at the 6-year follow-up. In those with SUBD, the corresponding figure was27% at 3 and at 6 years. Therefore, those with SUBD were clearly at risk of developing DSM affective disorders.
The overall conclusion of the study must be that the prognosis of late-life depression in the community is poor. The average level of symptoms remained clearly elevated throughout the study. Almost half the sample was depressed more than 60% of the time. Considering clinical course types, 23% had true remissions, 12% had remissions with recurrence, 32% had a chronic-intermittent course, and 32% had chronic depression (percentages do not total 100 because of rounding). These findings suggest that the prognosis is even more serious than previous studies among elderly persons in the community have reported. The difference is caused by the fact that this is the first study, to our knowledge, combining frequently repeated measurements over a long period, within a rigorous epidemiological design. If only 2 interviews had been available, with a 3-year interval (wave 1 and wave 8 in Table 1), the result would have been that 51% of those depressed at baseline had remitted. This would have resulted in a far more optimistic outcome, similar to previous studies13 among older people with a 2-wave design. Compared with findings among younger adults, our conclusion is similar in that most subjects experienced prolonged fluctuating symptoms.15 However, in the National Institute of Mental Health–Collaborative Depression Studies,15 subjects with MDD were symptom free or had returned to their usual selves41% of the time during 12 years of follow-up. Although the methods of the present study were different, it does seem that the prognosis was worse. Within the present study, modest effects were found for age and age-related prognostic factors, which supports the idea that the outcome is worse in later life.
Comparing the prognosis across levels of caseness, a clear gradient was found. Those with SUBD experienced the least level of symptoms, followed by those with MDD, DYSTHD, and, finally, double depression. Comparing average symptom levels cross-sectionally, MDD and DYSTHD were similar. Over time, the average level of (CES-D) symptoms was higher in those with DYSTHD than in those with MDD. This is due to the longer duration of symptoms in persons with DYSTHD. The category of double depression carried a grave prognosis in all analyses. There were few remissions, while the average level of symptoms was extremely high. Although all the indexes of prognosis used occurred on a continuum of severity, the DSM categories clearly predicted the severity and duration of symptoms. However, the evidence also supports critics who have argued that DSM identifies too narrow a range of clinically relevant depressive syndromes in older people.28- 31 Those with SUBD at baseline were much closer to those with DSM disorders in their outcomes than to a similarly followed up group of nondepressed elderly persons. Moreover, as in younger adults, they were clearly at risk of developing DSM affective disorders.55 Therefore, the term minor depression, often used to denote this heterogeneous group of affective states, is probably a misnomer. This conclusion is supported by previous studies2,4,29 of the consequences of minor and major depression, suggesting that the consequences of so-called minor depression are serious and comparable to the consequences of MDD in many areas.
The strong points of the study are that it was a prospective follow-up of a large community-based sample of elderly depressed subjects, in whom depression was measured using symptom rating scales and established diagnostic instruments. Moreover, to our knowledge, in no previous community study have so many observations, covering so long a period, been made available. A limitation of the community setting is that treatment could not be controlled or monitored in any detail. At baseline, 19% of those with MDD and 3% of those with SUBD were using antidepressants;10% (MDD) and 3% (SUBD) had been referred to community mental health centers, and 15% (MDD) and 3% (SUBD) had consulted a psychiatrist in the 6 months preceding the interview.4 These data were not used as predictors of outcome, because the level of treatment was low; no data on compliance, intensity, or duration of treatment were available; and interventions were not assigned in a way compatible with an adequate study of their effects. Moreover, assuming that treatment is not harmful in most cases, the effect of disregarding treatment would be that severity and chronicity are, if anything, underestimated. A second limitation of the study is that there was considerable attrition at all stages of the study. Because loss of depressed subjects and loss of observations within subjects may have influenced the findings, both were studied. For loss of subjects, the level of depressive symptoms at baseline did not predict attrition. However, those who were older and more functionally impaired were at a higher risk of attrition. This is similar to earlier analyses of nonresponse, in which invariably the oldest and most frail subjects were at highest risk of dropping out of the study.5,33- 35 Looking at loss of observations within participants, there was no association with baseline level of depression. However, during the study, those with more severe and more persistent depression were more likely to miss observations. Another way to examine potential effects of attrition is through subgroup analyses of subjects at risk for a poor prognosis. The data suggest that the most frail elderly persons (the older old, those with cognitive impairment and functional limitations, and those with the smallest contact networks) had the poorest outcome. Therefore, although loss of data limits somewhat the generalizability of the findings, the results of the analyses pertaining to attrition suggest that, if anything, the true prognosis of late-life depression is underestimated. A third limitation is that the criterion for remission may be too liberal, leaving some subjects so classified with residual symptoms. The advantage of the criteria used is that they enhance comparability with previous studies21,22,50 and that effects of random change are eliminated. The effect of misclassification would again be that the data underestimate the true severity of the prognosis of late-life depression.
The implications of the study are that the burden of depression for elderly persons in the community is even more severe than previously thought. DSM affective disorders are relatively rare among elderly persons, but do represent the group with the worst prognosis. Subthreshold depression, which is by far the larger group, is serious and chronic in many cases. The data clearly demonstrate the need for interventions that are helpful, acceptable, and economically feasible to be performed on a larger scale. Especially in the area of nonmajor depression, designing and testing such interventions should have a high priority.
Submitted for publication January 10, 2001; final revision received September 7, 2001; accepted October 1, 2001.
This study included data that were collected in the context of the Longitudinal Aging Study Amsterdam, which is financed primarily by the Netherlands Ministry of Welfare, Health, and Sports.
Corresponding author: Aartjan T. F. Beekman, MD, PhD, Department of Psychiatry, Vrije Universiteit, Valerius Clinic, Valeriusplein 9, 1075 BG, Amsterdam, the Netherlands (e-mail: email@example.com).