Agreement and disagreement between version 2 of the Stirling County Study's customary method, the DPAX (DP for depression and AX for anxiety), and the Diagnostic Interview Schedule (DIS) as measured by the κ statistic comparing current, past, and lifetime depression. The DPAX-2 and the DIS disagreed about the time when the depression occurred regarding 12 subjects (the DPAX-2 identified 11 as having current depression that the DIS identified as having had a major depressive episode in the past; the DPAX-2 identified 1 as a past depression that the DIS identified as current dysthymia). These 12 subjects are counted in the analysis of both current and past depression. CI indicates confidence interval; plus sign, positive for depression; and minus sign, negative for depression.
Agreement and disagreement about lifetime depression between the Structured Clinical Interview for DSM-III-R (SCID) and 2 methods of lay-administered interviews (version 2 of the Stirling County Study's customary method, the DPAX [DP for depression and AX for anxiety], and the Diagnostic Interview Schedule [DIS] as measured by the κ statistic) and by sensitivity and specificity showing unweighted results as well as estimates weighted so that the subsample interviewed by SCID reflects the whole sample. CI indicates confidence interval; plus sign, positive for depression; and minus sign, negative for depression.
Murphy JM, Monson RR, Laird NM, Sobol AM, Leighton AH. A Comparison of Diagnostic Interviews for Depression in the Stirling County StudyChallenges for Psychiatric Epidemiology. Arch Gen Psychiatry. 2000;57(3):230-236. doi:10.1001/archpsyc.57.3.230
High prevalence rates in psychiatric epidemiologic studies raise questions about whether data-gathering procedures identify transient responses rather than clinical disorders. This issue is explored relevant to depression using data from the Stirling County Study.
The study's customary method, the DPAX (DP for depression and AX for anxiety) was compared with the Diagnostic Interview Schedule (DIS), both of which were administered to a sample of 1396 subjects selected in 1992. Reasons for discordance were analyzed, and demographic correlates of responses to questions about dysphoria were examined. These lay-administered interviews were then compared with clinician-administered interviews that used the Structured Clinical Interview for DSM-III-R (SCID) with 139 subjects. The κ statistic and logistic regression were used for statistical assessment.
For the level of agreement between the DPAX and the DIS for current and lifetime depression, κ = 0.40 and κ = 0.33, respectively. Subjects diagnosed only by the DPAX tended to have less education than those diagnosed only by the DIS. Some idioms for dysphoria seemed to work better than others. Using SCID interviews as a clinical standard, the DPAX had 15% sensitivity and 96% specificity and the DIS had 25% sensitivity and 98% specificity.
Comprehension of an interview can be improved by using multiple questions for dysphoria and a simpler mode of inquiry. Clinician-administered interviews tend to corroborate disorders identified in lay-administered interviews but suggest that survey methods underestimate prevalence. Further research is needed to evaluate the validity of both types of interviews, but evidence from a 16-year follow-up evaluation indicates that depression diagnosed by the DPAX is a serious disorder in terms of morbidity and mortality.
IN COMPARING results from the Epidemiologic Catchment Area (ECA) program and the National Comorbidity Survey, Regier et al1- 3 concluded that psychiatric epidemiology "faces new challenges." Because high rates were reported for the National Comorbidity Survey, it was suggested that survey methods may mistake "normal and transient responses" for "clinical disorders." We address some of the issues involved in this challenge using information from the Stirling County Study.4- 7
In recent data gathering, we included the depression module of the Diagnostic Interview Schedule (DIS) of the ECA, along with the depression component of version 2 of our customary method, the DPAX-2 (DP for depression and AX for anxiety).8- 10 Despite differences in the questions asked and the style of inquiry, prevalence rates for current depression were generally similar.11 Recognizing that similarity in rates does not necessarily mean that the same individuals are counted, we analyzed the agreement for both current and lifetime depression.
To compile information about the content of questions and ways of asking them that might lead to improving the comprehensibility and reliability of lay-administered interviews, we studied the reasons for disagreement between the DPAX-2 and the DIS. We also investigated responses to key questions to see if they seemed to be understood equally well by different segments of the population.
To evaluate whether disorders identified in community surveys are similar to those recognized by clinicians, we report results based on the Structured Clinical Interview for DSM-III-R (SCID).12 Because the clinical significance of a disorder is often assessed by course and outcome, we draw on our earlier work about the long-term course and outcome of depression as diagnosed by version 1 of our methods (DPAX-1).13- 16
The Stirling County Study is a long-term investigation of psychiatric epidemiology in a general population located in Atlantic Canada. The design of the study is a combination of repeated cross-sectional sample surveys (1952, 1970, and 1992) and cohort follow-up investigations.17
This report deals mainly with data gathered in the 1990s, when both the DPAX-2 and the DIS were administered to the same subjects. The focus is on the 1396 subjects (86% completion rate) selected as a representative sample of adults in 1992.5,11,18 In terms of demographic characteristics, 54% of the sample members (n = 758) were women, 43% (n = 604) were younger than 45 years, and 53% (n = 746) had received less than an 11th grade education.
The DPAX-2 and DIS interviews were carried out in the homes of subjects by graduate students who received 10 days of training and practice. Instruction was provided by the staff of the Stirling County Study, one of whom had been trained at the DIS headquarters at Washington University, St Louis, Mo. Subjects gave informed consent for these interviews as well as for those using the SCID.
The goal in selecting subjects for SCID interviews was to create a matrix balanced for age and sex in which there would be a concentration of subjects who were positive for a condition according to the DPAX-2 and/or DIS and in which there would be representation of noncases as well.10 Diagnoses used in the selection included depression, anxiety disorders, and alcohol abuse. The DPAX-2 and DIS records were systematically searched until an appropriate person was found for a given category of age, sex, and diagnosis. By this means, 98 sample subjects and 41 follow-up subjects were chosen and interviewed (92% completion rate).
The SCID interviews were carried out by psychiatrists and clinical psychologists who were blind to the survey results but who had been trained in the administration of the nonpatient version.19 These interviews were conducted in the homes of the subjects on average 2 years after the DPAX-2 and DIS interviews. Five subjects were diagnosed by the clinicians as having had a SCID depression that occurred only after the lay interviewing had been concluded. For purposes of comparability between the clinician and lay interviews, these subjects were counted in the analysis as never having had a depression. Except for this adjustment, the SCID analysis deals with lifetime diagnoses of major depressive episode (MDE) and dysthymia.
The DPAX-1 method was designed for our 1952 study as a means of estimating "need for psychiatric attention."6,9 The DPAX-2 was designed to take account of additions that were made to the interview schedule for the 1970 study and thereafter to overcome the psychometric weakness of having only one question about dysphoria and only one about functional impairment.10 The DPAX-2 is used in this article except for a review of follow-up findings from the 1952 study. The DIS module for depression was administered after the portion of the interview dealing with the DPAX-2. The DIS was designed to implement the definitions given in DSM-III.20
Both the DPAX-2 and DIS follow a diagnostic algorithm that establishes the presence of the "essential features" of the depressive syndrome, addresses the completeness of the syndrome through requiring a certain number of "associated symptoms," and determines that criteria for a minimum duration have been met. However, there are important differences between the 2 methods in the questions asked for these diagnostic components, especially in the number and range of options offered.
The DIS uses a complex opening question for MDE that combines the essential features of dysphoria and anhedonia with minimum duration: "In your lifetime, have you ever had 2 weeks or more during which you felt sad, blue, depressed, or when you lost all interest and pleasure in things that you usually cared about or enjoyed?" This is followed by a similar but simpler question for dysthymia: "Have you had 2 years or more in your life when you felt depressed or sad most days, even if you felt OK sometimes?" While the DIS offers another chance for giving evidence about essential features, it was rarely used. Thus, on the whole, the DIS requires a positive response to one question about essential features for each diagnosis.
The DPAX-2, on the other hand, uses 3 separate questions for dysphoria as the essential feature: "Have there been times when you felt low and hopeless?" "Do you sometimes wonder if anything is worthwhile any more?" "Do you feel in good spirits?" A negative response to the last question is interpreted as positive evidence of "poor spirits." Three optional ways of describing dysphoria are offered, with a positive response required for one.
The DIS inquires about the 8 types of associated symptoms given in DSM-III and requires 4. For associated symptoms, the DPAX-2 asks about disturbances of appetite, sleep, and energy, and requires that each be present. For "duration," the DIS uses predetermined durations embedded in the questions about essential symptoms and later gathers information about time of occurrence. The DPAX-2 offers an opportunity for describing duration as it pertains to individual cases by using open-ended questions about when the essential symptom became noticeable, whether it was still bothersome at the time of interview, and, if not, when it ceased to be troubling. A minimum duration of 1 month is applied later by computer.
For the DPAX-2, the term current depression refers to the episodes present at the time of interview. For the DIS, current depression is defined as MDE present at the time of interview or within the preceding month, together with dysthymia. The reason for combining the episodic and chronic types of DIS depression is to provide a parallel to the DPAX-2, which identifies episodes with varying durations and different levels of severity.
The DIS has additional criteria for the psychiatric relevance of the symptoms, temporal clustering of features and symptoms as in a spell, duration and severity of the spell, and whether the spell occurred only as part of bereavement. In keeping with DSM-III, DSM-III-R, and DSM-IV, the underlying concept in the DIS questions is that major depression is an episodic disorder that expresses itself in spells.20- 22 The questions in the DPAX-2 do not assume that depression is episodic. They do, however, elicit more information about functional impairment, and, in this way, implement the criteria for disability introduced in DSM-IV.
The DIS questions are longer and involve more component parts than those in the DPAX-2. Fairly high levels of abstract generalization and acuity of recall are needed. Interviewer comments indicated that some subjects did not understand some DIS questions. Because a higher level of education might make it easier to grasp the meaning of the DIS, we included education in the analysis.
The κ statistic was employed for assessing the level of agreement between the results of the different schedules.23 Where the subsample interviewed by the SCID was concerned, estimates of κ, sensitivity, and specificity were computed after weighting for correspondence to the full sample. We follow the interpretation of Fleiss,24 who suggests that values above 0.40 indicate fair to good agreement and lower values indicate poor agreement. Relationships of sex, age, and education to discordant cases were tested by χ2 analysis. Responses to individual questions regarding the same demographic factors were tested using logistic regression. Assessment of significance is indicated by 95% confidence intervals.
We recently reported that the current prevalence of depression in Stirling County remained stable at slightly more than 5% across the samples of 1952, 1970, and 1992.11 In the first 2 samples, men and women had quite similar rates, and for both, prevalence increased with age. By 1992, a redistribution had occurred, which indicated higher prevalence among younger women. The DIS prevalence of current depression in 1992 was similarly close to 5%, and the DIS rates by sex and age showed considerable comparability with the DPAX-2, especially among women (Table 1). Lifetime rates were not as similar, since the DIS identified a larger proportion of past episodes than the DPAX-2. Despite similarity in current rates, case-by-case analysis indicated that the level of agreement as measured by the κ statistic was only fair (Figure 1). Where past depression was concerned, agreement was poor.
For each method, each component of the diagnostic algorithm used by the other method was associated with one or more disagreements regarding both current and past depression (Table 2). A larger proportion of disagreements regarding essential features pertained to the DIS, with its more complex approach, than to the DPAX-2, with its 3 optional questions. On the other hand, more disagreements regarding associated symptoms involved the DPAX-2, with its stringent requirement for each of 3 specific types of disturbance, in contrast to the larger number of options available in the DIS.
While the analysis presented does not distinguish between MDE and dysthymia, 24 (65%) of the 37 current MDE cases were also identified by the DPAX-2. Concordance was lower for the 45 cases of dysthymia, in that only 14 (31%) were also identified by DPAX-2. The subjects who did not fulfill the DPAX-2 requirements for impairment were concentrated among those diagnosed by the DIS as having dysthymia or as having had MDE in the past.
Analysis of discordant cases indicated that sex and age were not significantly related to whether a subject met the criteria of the DPAX-2 and not of the DIS or vice versa (Table 3). For example, discordant cases were predominantly women, irrespective of type of discordance. On the other hand, education was significantly (χ21 = 5.4; P = .02) related to disagreement regarding current depression. Three fourths of those who met the criteria for the DPAX-2 but not the DIS had a lower level of education, while those who met the criteria for the DIS but not the DPAX-2 were about equally distributed across the educational levels. Where past depression was concerned, the opposite association with education occurred but was not significant.
The 5 questions available for essential features elicited positive responses from different proportions of the sample (Table 4). The DIS question that specifies a 2-year duration of sadness and the DPAX-2 question that uses the idiom of being in poor spirits were answered positively by the fewest number of subjects. On the other hand, close to half of the sample responded positively to the question about feeling low and hopeless. Despite such differences, subjects who answered positively to one question were likely to respond positively to others as illustrated by the fact that 26% (n = 368) of the sample responded positively to at least 1 DPAX-2 question and 1 DIS question, and an additional 3% (n = 42) responded positively to all 5 questions.
In view of this, the odds ratios tended to show that the individual questions bore similar relationships to sex, age, and education. For example, positive responses to each question were more characteristic of women than men. There was, however, variability. The DIS question about 2 years of sadness was especially likely to show this sex difference. The DPAX-2 question about wonder worthwhile showed very little difference, suggesting that such a term may be especially meaningful for men.
The DPAX-2 and DIS information from the subjects selected for the SCID study exhibited approximately the same level of agreement as in the sample as a whole (κ = 0.34 for lifetime depression). Fifty-one subjects were diagnosed by one or the other of the lay-administered methods; of them, 17 were diagnosed by both.
The agreement between the SCID and each of the lay-administered interviews was poor, especially from the perspective of the weighted estimates (Figure 2). Both the DPAX-2 and the DIS had low sensitivity and high specificity. Using the SCID, the clinicians diagnosed about twice as many subjects as having had a depression, mainly MDE, than either of 2 lay-administered methods.
A main purpose in our comparison of the 2 lay-administered interviews was to see if differences would suggest ways in which the content and structure of questions could better achieve the goal of allowing people in the general population to give accurate and reliable information about the manifestations of depression.
In our view, the questions about essential feature work best if they identify all those persons who could conceivably meet other criteria required for a diagnosis. These questions should demonstrate extremely high sensitivity even though they may involve low specificity.25 The subsequent application of other criteria about duration, associated symptoms, and impairment can remove, step by step, those who do not qualify for a diagnosis, thereby improving specificity with minimal loss of sensitivity.
Survey methods and psychometric studies indicate that reliability is improved when questions are stated simply, when they involve only one idea, and when there are several on the same theme.26- 30 This suggests that it is valuable to use multiple questions that employ different idioms and phrases for the essential features.
The DSM-III and DSM-IV indicate that there are numerous ways of describing depressed mood and anhedonia. Each version indicates that clinicians need to be alert to colloquial variants. In reference to anhedonia, for example, DSM-IV indicates that some individuals complain of feeling "blah."22 Through multiple questions using different idioms, it is possible to create a set, so that if one question is unfamiliar to one segment of the population, it will be compensated for by another question that is familiar to that segment.
Our emphasis on multiple questions about essential features stems partly from experience in the Stirling County Study. Our original interview schedule used only the question about poor spirits. At that time and continuing into 1970, this question was answered positively by nearly equal proportions of men and women, thus contributing to our evidence that the prevalence of depression did not seem to differ much by sex. Also, we have shown that if we had continued to use only this one question, it would have been necessary to conclude that the prevalence of depression had decreased.11 This would have been a mistake, however, because the other indicators of dysphoria increased over time, suggesting that it was response to the idiom of poor spirits that declined, not the prevalence of depression.10,11 Because one of the tasks of psychiatric epidemiology is to document time trends, multiple questions provide a safeguard against the effect of changes in the meaning of specific idioms.
Two different ways of applying criteria for duration were illustrated. The terms 2 weeks and 2 years were embedded in the DIS questions about essential features, while separate and open-ended questions were used by the DPAX-2. There is growing evidence that depression is often a chronic disorder with a fluctuating course rather than a disorder that expresses itself mainly in discrete episodes, as assumed in DSM-III and DSM-IV.31,32 If future research comes to reflect this view, interviewing methods that assess duration separately would be better able to adjust than those that build duration into the key questions.
Interpretation about the methodologic differences in assessing associated symptoms points to the fact that the DPAX-2 uses a narrower and more homogeneous definition of depression than the DIS. This difference contributes substantially to disagreement between the methods but does not lead to suggestions about interview improvement, since the DIS approach is more in line with the guidelines in current manuals.
While the DIS deals more adequately with associated symptoms, the DPAX-2 provides more information about impairment. Subjects diagnosed as dysthymic were particularly likely to have not met the DPAX-2 criteria for impairment, suggesting that the present definition of dysthymia may not include sufficient evidence of disturbed functioning to warrant being counted as a disorder.
Distinctive modes of inquiry were employed by the 2 methods. We believe that many of the DIS questions ask subjects to recall the past and to think in complex ways that might be favored by those who have more education. This view is supported by evidence that people of less education, while meeting the requirements of the simpler DPAX-2 procedures, tended not to fulfill the criteria of the DIS. This suggests the need for finding ways to achieve precision by simpler mechanisms.
While several suggestions about ways in which interview schedules might be improved emerged from this analysis, experience should not be abandoned in favor of entirely new directions. One reason for satisfaction with progress stems from evidence about the reliability of responses. Even across the differences in words and sentence structure illustrated by the DPAX-2 and DIS, many people indicated through the consistency of their responses that they were listening carefully and understanding the questions. Another reason for methodologic optimism is that DSM-III gave an unprecedented platform for schedule construction by describing the structure of a diagnosis as composed of essential features, duration, and associated symptoms, to which DSM-IV has added the components of distress and disability. The DIS implemented this conception, and its heir, the Composite International Diagnostic Interview, maintains it while at the same time introducing methodologic advances.33 Some of the modifications now available in the Composite International Diagnostic Interview, such as breaking the coverage of essential features into 2 questions, are congruent with the ideas that grew out of this study.34
Since the earliest phases of psychiatric epidemiology, the finding that psychiatric disorders are common in the general population has aroused mistrust regarding the results of community surveys.35 However, interviews carried out by clinicians using the SCID indicated that most of the cases diagnosed by either one of the lay-administered methods were corroborated as depression. It was not a matter of the SCID negating the lay interviews but rather of the clinicians finding depression to be still more common than did the epidemiologic methods.
However, whether the SCID assessments are more valid than the others remains a question, since more evidence of the SCID's reliability than validity has thus far been accumulated.36 It will be useful if the results of SCID can be evaluated using the "LEAD standard"(L indicates longitudinal; E, experts; and AD, all data), which draws on longitudinal information evaluated by clinical experts who base their judgments on all data accumulated from various sources.37 The LEAD standard emphasizes that longitudinal information offers an important means of assessing the degree to which disorders in communities are clinically significant. In this regard, a 16-year follow-up study of 36 women and 24 men diagnosed as depressed by the DPAX-1 in our first study indicated that course and outcome were marked by continuing morbidity and premature mortality.
Depressed men, none of whom suffered from alcohol abuse, exhibited twice the number of expected deaths.13 While deaths among depressed women were not appreciably above expectation, one depressed woman did commit suicide. Of the 24 women and 8 men who survived and were reinterviewed, 78% (n = 25) remained chronically or recurrently ill with either a depression or anxiety disorder, with more women than men crossing over to anxiety.14 The severity of depression among men was emphasized by the fact that, at the end of 16 years, we found that 83% (n = 20) of them either were dead or had been chronically or recurrently depressed with persistent impairment throughout the period, while the same outcome applied to only 42% (n = 15) of depressed women.16
Our methods of diagnosis were intended to identify people who had a need for psychiatric attention. Information about the degree to which these depressed people did in fact seek attention was limited to those who were alive at the end of the follow-up interval. This perspective indicated that 83% (n = 20) of the depressed women had sought attention for their emotional problems from a physician, usually a general physician.15 Although the outcome was poorer for depressed men, only 37% (n = 3) of them had sought help through such sources.
Such long-term information indicates that the depressions identified by the lay-administered DPAX-1 method were not "normal and transient responses." Rather, the evidence betokens a pathologic process.38 Virchow described the pathological process as bearing a "character of danger" for survival and well-being, a character that has been borne out in the evidence about outcome for depression in the Stirling County Study.6,39
Accepted for publication August 19, 1999.
This work was supported by grant MH39576 from the National Institute of Mental Health, Rockville, Md (Dr Murphy).
We thank James Leavitt, MD, Martha Shenton, PhD, Carolyn Turvey, PhD, David Vogel, PhD, and John Worthington, MD, for administering the SCID interviews; Constantine Daskalakis, DSc, for statistical help; Patricia Merritt, Barbara Burns, and Ellen Krystofik for field management and data processing; and Barbara Burns and Ellen Krystofik for help with manuscript preparation.
Reprints: Jane M. Murphy, PhD, Department of Psychiatry, Massachusetts General Hospital, Room 9155, 149 13th St, Charlestown, MA 02129-2000 (e-mail: firstname.lastname@example.org).