Lenzenweger MF. Stability and Change in Personality Disorder FeaturesThe Longitudinal Study of Personality Disorders. Arch Gen Psychiatry. 1999;56(11):1009–1015. doi:10.1001/archpsyc.56.11.1009
There exists no empirical literature documenting the long-term longitudinal stability of personality pathology comparable to that available for normal personality. A number of test-retest studies have usefully established the short-term reliability of Axis II measures. However, the test-retest design is methodologically inadequate for resolving issues related to the long-term stability of personality disorder (PD). This prospective longitudinal study evaluated the stability of PD features in multiwave perspective.
Subjects (N=250) drawn from a nonclinical university population were examined for PD features at 3 different time points using the International Personality Disorders Examination (IPDE) and the Millon Clinical Multiaxial Inventory II (MCMI-II) during a study period of 4 years.
Features of PD displayed considerable evidence of stability for individual differences and group means, at the dimensional level of analysis, on both the IPDE and the MCMI-II. Both measures revealed modest declines in PD features over time; however, the observed changes were associated with relatively small effect sizes.
Features of PD, viewed from a dimensional perspective, seem to be relatively stable in terms of individual differences and group means based on both clinical interview and self-administered PD assessments.
THERE EXISTS no empirical literature documenting the long-term longitudinal stability of the personality disorders (PD) comparable to that available for normal personality.1,2 Although the DSM3- 5 asserts that PDs are enduring and stable over time and recognizable by adolescence or early adult life, supporting empirical data are virtually nonexistent. Test-retest studies6,7 have been conducted on PDs, typically borderline personality disorder—however, they reveal noteworthy methodological shortcomings that limit their utility for resolving issues related to long-term stability of the PDs. The greatest limitation of all such studies lies in the research design itself and the fundamental inability of test-retest observations to adequately address stability, an established fact in lifespan research methodology.8- 12 Test-retest studies have usefully established the test-retest reliability of the primary Axis II assessment devices.13- 15 The methodological superiority, however, of the prospective multiwave longitudinal design for studying continuity and change in personality is well known.2,8,9,11,16
The Longitudinal Study of Personality Disorders (LSPD) was begun in 1990 as a National Institute of Mental Health–sponsored prospective multiwave longitudinal study of personality pathology, normal personality, and temperament. A major goal of the project is the lifespan study of the stability of PD symptomatology. By using a nonclinical population, the LSPD avoids the complications attending the study of hospital/clinic PD cases (eg, treatment confounds, Berkson's bias). Finally, the LSPD began with subjects who were young enough to allow for an evaluation of the DSM assumption regarding age at onset for PD.
Stability of PD features can be assessed from 4 vantage points: rank order of individual differences, mean level (or group) stability, structural (factorial) stability, and ipsative (or intrapersonal) stability.17- 19 Additional approaches to stability evaluation involve structural equation modeling8,11,20 and latent growth curve procedures.10,21 This report from the LSPD focuses only on the stability of individual differences and mean levels for PD features.
The 258 subjects in the LSPD were drawn from an initial sample that consisted of 2000 first-year undergraduate students at Cornell University, Ithaca, NY.22 Subjects were assigned to either a possible personality disorder (PPD) or no personality disorder (NPD) group as determined by the International Personality Disorder Examination (IPDE) Screen (IPDE-S) (completed screens=1684, response rate=84.2%). The PPD subjects had to meet the diagnostic threshold for at least 1 specific DSM-III-R PD, whereas NPD subjects did not meet the DSM-III-R– defined threshold for diagnosis and had fewer than 10 PD features across all disorders. Extensive details concerning subject selection procedure and sampling are given elsewhere.22 The 258 subjects consisted of 121 men (47%) and 137 women (53%); 134 (66 women) in the PPD group and 124 (71 women) in the NPD group. All subjects gave voluntary written informed consent and received an honorarium of $50 at each wave. Of the initial 258 subjects, 250 completed all 3 assessment waves. Of the 8 subjects who did not complete all waves, 5 were in the PPD and 3 were in the NPD group; the proportions did not differ (goodness of fit χ21=0.50, P>.47). Of these 8 subjects, 6 transferred to other colleges and 2 died in motor vehicle crashes. This report concerns the 250 subjects completing all 3 assessment waves.
Compared with the US population, the sample overrepresents Asian/Pacific Islanders and underrepresents those of Hispanic, African American, and working-class origins (Table 1).
The LSPD has a prospective multiwave panel design8 in which subjects are evaluated at 3 points in time (ie, freshman, sophomore, and senior years in college). All interview assessments were conducted by experienced clinical psychologists (PhD level) or psychiatric social workers. The masters-level social workers had an average of 16 years of clinical experience. Subjects also completed a large battery of self-administered assessments of normal personality, temperament, and other psychological factors. Finally, as the LSPD is a naturalistic prospective study, subjects were free to seek psychological treatment of their own accord.
This is a 250-item self-administered true-false PD screening inventory developed by Armand W. Loranger, PhD. The diagnostic efficiency and psychometric properties of the IPDE-S in a 2-stage screen application were described in a prior report from the LSPD.22
The IPDE is the well-known semistructured interview designed for use by experienced clinicians for the assessment of both DSM and International Classification of Diseases, 10th Revision (ICD-10) PD features.13,23,24 The IPDE13 allows for both dimensional and categorical scoring of the DSM PDs. The DSM-III-R criteria were assessed in this study. Interviewers received training in IPDE administration and scoring by Loranger and were supervised throughout the project by the author. Supervision by the author was done blind to the subjects' identity, putative PD status, and all prior assessment information. The interrater reliability for IPDE assessments was generally excellent at all 3 waves (intraclass correlations range, 0.84-0.92). Interviewers were blind to the screening results and putative PD status of the subjects and the same interviewer never saw the same subject more than once during the 3 study waves.
This is a well-known semistructured DSM-III-R Axis I clinical interview for use with nonpatients.25 It was administered prior to the IPDE.
The Millon Clinical Multiaxial Inventory II (MCMI-II)26,27 is the well-known 175-item, true-false, self-administered inventory designed to coordinate with the DSM-III-R Axis II PDs. The MCMI-II possesses excellent psychometric properties26,27 and yields separate dimensional scores for each of the 11 PDs.
The primary analyses focused on the individual difference and mean level (or group) stability of PD features. These 2 forms of stability are conceptually independent and have different interpretations. Individual difference (ie, "rank order"17 or "normative"18) stability concerns the extent to which individuals maintain their relative position within a group ranking over time. Level, or mean level, stability concerns the extent to which group means on a variable (or disorder) of interest remain invariant over time. All data analyses were conducted at the level of dimensional scores on the IPDE and the MCMI-II.
Individual difference stability is assessed using the Pearson correlation coefficient (r) applied to actual symptom scores (ie, not ranks). The dimensional, cluster, and total scores for the PDs (IPDE and MCMI-II) were examined for rank order for wave 1 to wave 2, wave 2 to wave 3, and wave 1 to wave 3 comparisons. The differences between wave 1 to wave 2 vs wave 1 to wave 3 stabilities were tested to determine if the longer interval for the latter resulted in significantly lower stabilities; these correlated correlation coefficients were tested as per Meng et al.28
The stability of mean levels of the 11 Axis II disorders as well as the total number of PD features was evaluated for both measures using a repeated-measures analysis of variance (ANOVA) as implemented by MANOVA.29 The ANOVAs were all of a group × sex × time design, with 2 between-subjects factors (group, sex) and 1 within-subjects factor (time). All F tests involving the within-subjects factor (time) were corrected using the procedure of Greenhouse and Geisser.30 Two sets of 12 ANOVAs were conducted, one each for the IPDE and MCMI-II, for a total of 24 tests. A Bonferroni correction was used to maintain a nominal significance level of .05 for each effect repeatedly tested within each set of ANOVAs and only ANOVA results with P<.004 (ie, .05/12) are discussed as significant. Effect size estimates (partial η2) were calculated for significant group, time, and group × time effects. The χ2 test was used to contrast proportions and the kappa (κ) coefficient was used to evaluate the stability of categorical PD diagnoses over time. All tests were 2-tailed.
The lifetime DSM-III-R Axis I diagnoses (Table 2) of the study subjects are for definite and probable cases; a disorder was considered present if a subject met the criteria for the disorder at wave 1, wave 2, or wave 3 assessments. Eighty-one (62.8%) of the PPD subjects received an Axis I diagnosis compared with 32 (26.4%) NPD subjects (χ21=33.30, P<.001). Forty-one (31.8%) PPD subjects vs 21 (17.4%) NPD subjects reported a prior history of treatment by wave 3 (χ21=6.97, P<.008).
Stability coefficients (r) for the total number of PD features present on the IPDE range from 0.61 (wave 1/wave 3) to 0.70 (wave 1/wave 2) (Table 3). For the Axis "clusters," stability coefficient ranges were: cluster A (0.48-0.61), cluster B (0.60-0.78), and cluster C (0.52-0.67). Differences in the stabilities for the wave 1/wave 3 interval vs the wave 1/wave 2 interval were tested to determine if lower stabilities were observed for the longer time span. Wave 1/wave 2 stability coefficients were significantly higher for schizoid, antisocial, histrionic, and narcissistic PDs, clusters A and B, and total PD features (all P<.05). Overall, schizoid and antisocial PD features showed the highest level of individual differences stability on the IPDE.
For the overall mean MCMI-II PD dimensional score, stability coefficients range from 0.70 (wave 1/wave 3) to 0.77 (wave 1/wave 2) (Table 4). For clusters, stability coefficients ranges were: cluster A (0.64-0.73), cluster B (0.75-0.81), and cluster C (0.65-0.72). Differences in the MCMI-II stability coefficients were tested for the wave 1/wave 3 vs wave 1/wave 2 intervals. Wave 1/wave 2 stability coefficients were higher for paranoid, borderline, histrionic, narcissistic, and avoidant PDs; cluster A, B, and C; and total PD features (all P<.05). Overall, antisocial and passive-aggressive PD dimensions showed the highest level of individual differences stability on the MCMI-II.
The repeated-measures ANOVA results for IPDE are summarized with the F statistics, P values, and effect size estimates for the group, time, and group × time interactions (Table 5). The results for the sex factor will not be discussed owing to a relative absence of significant effects for the factor. Only effects associated with P≤.004 are interpreted. For the IPDE, the PPD group consistently displayed greater PD symptomatology relative to the NPD group (main effect). The total PD features on the IPDE showed a main effect for time, indicating a decline over time. A significant group × time interaction superseded the main effects of group and time, indicating a more rapid decline in PD features among the PPD (vs NPD) subjects over time. The mean (± SD) total PD scores for the PPD group over time were: wave 1=16.72 (±15.21), wave 2=8.91 (±10.95), and wave 3=8.68 (±10.58); and for the NPD group were: wave 1=4.90 (±5.91), wave 2=3.24 (±5.73), and wave 3=3.87 (±5.52).
On the IPDE, paranoid, borderline, narcissistic, histrionic, dependent, obsessive-compulsive, and avoidant PD features followed a longitudinal pattern similar to that seen for the total scores (Table 5). Generally, most symptom change (decline) occurred from wave 1 to wave 2 assessments, whereas symptom levels remained relatively constant from wave 2 to wave 3 in both groups. An effect for time (as either a main effect or in interaction with group) was detected for 9 of the 11 PDs assessed by the IPDE. The mean effect sizes, for the significant findings, were as follows: group (.08), time (.08), and group × time interaction (.04).
On the MCMI-II overall average base rate–adjusted dimensional score, the PPD group consistently displayed higher levels of PD symptomatology (main effect) (Table 6). A group × time interaction indicated a more rapid decline in PD features among the PPD (vs NPD) subjects over time. The mean (± SD) for the average base rate–adjusted PD dimensional score on the MCMI-II were: for the PPD subjects, wave 1=53.42 (±11.08), wave 2=48.49 (±12.24), and wave 3=46.30 (±12.03); for the NPD subjects: wave 1=37.82 (±7.44), wave 2=35.87 (±7.33), and wave 3=34.95 (±8.44).
The results for the MCMI-II (Table 6) for specific PDs were somewhat less uniform than those observed for the IPDE. Borderline, avoidant, and passive-aggressive MCMI-II PD dimensions followed a longitudinal pattern similar to that seen for the overall average MCMI-II dimensional score (ie, PPD subjects displaying a more rapid decline over time) indicated by a significant time × group interaction. Subjects in the PPD group had higher scores on the schizotypal, paranoid, antisocial, narcissistic, and histrionic PD dimensions (main effects). Time main effects were observed for the schizotypal, paranoid, antisocial, and dependent PD dimensions, indicating a decline for both subject groups over time, with the largest changes typically occurring from wave 1 to wave 2. An effect for time (as a main effect or in interaction with group) was detected for 7 of the 11 PD dimensions assessed by the MCMI-II. The mean effect sizes, for the significant findings, were as follows: group=0.22, time=0.06, and group × time interaction=.04.
These analyses were judged to be the most sensitive for evaluating stability because they were based on a dimensional approach to the PDs. In clinical practice, however, categorical methods of classification remain prominent and therefore the stability of the classification of "any definite PD" in these data was examined. The proportion of subjects meeting the criteria for a definite PD of any type (including not otherwise specified) on the IPDE at wave 1 was 7.6%, wave 2 was 5.6%, and wave 3 was 2.8%. The κ for wave 1/wave 2=0.45 (P<.001). The κ coefficient could not be computed for other comparisons owing to the prevalences falling below 5%,31 and such categorical analysis, therefore, remains inconclusive. The MCMI-II does not generate categorical diagnoses. Furthermore, the presence of an Axis I disorder of any type at wave 1 did not significantly moderate PD feature changes over time for either the IPDE or MCMI-II.
These initial data from the LSPD suggest that PD features display relatively high levels of individual difference stability and appreciable mean level stability, with some change occurring over time. Categorical diagnoses could not be reliably evaluated for stability owing to a low prevalence, perhaps reflective of the conservative diagnostic thresholds embodied in the IPDE. To date, there are no other published reports or large-scale studies underway that address the full range of Axis II disorders from a prospective multiwave longitudinal perspective. The wave 1 to wave 2 findings of the LSPD are broadly consistent with the prior 2-occasion studies mentioned earlier6,7; however, the absence of other multiwave data sets necessarily precludes comparison beyond wave 2 assessments.
The stability of individual differences is relatively high for total PD features assessed by both IPDE and MCMI-II. Differences in the rank order stability of the total PD feature scores across the 2 methods were expected given that the format of the MCMI-II remains constant over time, whereas clinical interviewers on the IPDE change. All 11 disorders revealed substantial stability of individual differences across the 4-year time span. The rank order stability of PD features is comparable to that found for major normal personality trait dimensions, where median stability coefficients range from 0.46 to 0.75.1(p87) Finally, the rank order stability coefficients for these PD data would be higher were one to correct for unreliability of measurement, which serves to underestimate true stability.
For mean level (ie, group) stability, the pattern of results reveals evidence of statistically significant changes in the mean levels of PD features, particularly within the PPD group, over time. However, despite being statistically significant, the changes were relatively small as revealed through the effect size estimates. Overall, the data suggest a relatively high level of stability for PD group means over time for both measures (IPDE, MCMI-II). The greatest changes in the mean levels observed for both measures tended to occur between the wave 1 and wave 2 assessments, a pattern of findings not unusual for prospective multiwave longitudinal studies.12 Change in mean levels from wave 2 to wave 3 was slight and longitudinal methodologists12 argue that this portion of the study period is the more accurate index of any real change. This is because changes from wave 1 to wave 2 might reflect some regression to the mean effect, even if relatively small, whereas change during the post–wave 2 assessments is more suggestive of genuine change free of regression to the mean effects.12
The overall theoretical implication of these mean level data suggests continuity consistent with the DSM definition of PD. Clinically, these data suggest also that change is possible in personality pathology, which may conceivably be catalyzed through therapeutic efforts. One must remember that these are changes at the group level and some persons within the group may have changed rather notably. What, however, is causing the mean level declines over time, though they are small effects? The observed change is not likely to be explained entirely by a regression to the mean effect because (1) highly reliable measures were used, (2) subjects were assessed on 2 clinical instruments (IPDE, MCMI-II) that were different from the selection instrument (IPDE-S), (3) the variances within each of the 2 groups were broadly comparable over time, and (4) the groups were not at the ceiling or floor on either measure. Moreover, regression to the mean as an effect per se loses much of its significance in a multiwave approach.12,32 The first-year college experience might have heightened PD symptomatology at wave 1, especially in those subjects who were already somewhat affected. However, broadly comparable patterns of results (ie, small-magnitude declines in group mean levels over time) have been observed for normal personality and such change is viewed as substantively trivial.1 Continued lifespan follow-up study of these individuals may help to shed further light on this issue of mean level stability and change.
Several caveats should be mentioned in considering these LSPD data. Compared with the US population at large, the LSPD sample is more homogeneous in age, educational achievement, and social class, which may differentially affect the study results. Also, the LSPD was begun when the subjects were first-year university students and some of the most severely affected individuals with PD might have never successfully enrolled in college and therefore would not be included in the study. However, one must recall the Axis I diagnostic data for the LSPD subjects (Table 2), diagnosed according to rigorous clinical thresholds, before ascribing undue levels of mental health to these subjects. Multiwave longitudinal data from clinically based PD samples, mindful of their limitations for generalization, are needed for contrast with the LSPD results. Finally, one could argue that the stability observed for PD features in this study might be artifactual (ie, subjects portray themselves as more consistent than they really are); however, a large methodological literature in the personality assessment domain has rendered such an assertion untenable.1 The present data will be further dissected using additional approaches to the analysis of stability, including analysis of latent growth curves and factorial invariance, as well as structural equation–based approaches to the analysis of stability and change.8,10,11,21 William James33(p121) claimed that "by the age thirty, the character has set like plaster, and will never soften again," and that indeed seems to be true of normal personality.1 However, clinical experience suggests that some PD features may diminish with age. Continued lifespan study of the LSPD subjects will offer the opportunity to determine if James' view also holds for PD. Finally, this study is best viewed as heuristic and hypothesis-generating, undertaken within a context of exploration rather than a context of justification or confirmation.
Accepted for publication July 28, 1999.
This research was supported in part by grant MH45448 from the National Institute of Mental Health, Washington, DC (Dr Lenzenweger).
Preliminary results of this study were presented at the Fifth Meeting of the International Congress on the Disorders of Personality, International Society for the Study of Personality Disorders, University of British Columbia, Vancouver, June 25, 1997; and at the 13th Annual Meeting of the Society for Research in Psychopathology, Harvard University, Cambridge, Mass, November 14, 1998.
I thank Margaret Dyer, MSW; Linda Foltz, PhD; Mickey Goldstein, MSW; Joan Lovejoy, MSW; Eileen Maxwell, MSW; Cynthia Neff, PhD; Judy Raabe, PhD; Carol Skinner, MSW; and Ba Stopha, MSW for their able assistance in conducting clinical assessments. I thank Stephen Cornelius, PhD, and Robert Rosenthal, PhD, for useful statistical consultations and Jerome Kagan, PhD, for discussions regarding stability concepts. I also thank Lauren Korfine, PhD; Armand W. Loranger, PhD; Theodore Millon, PhD; and Paul T. Costa, PhD, for their comments on an earlier version of this article.
Reprints: Mark F. Lenzenweger, PhD, Department of Psychology, Harvard University, 33 Kirkland St, Cambridge, MA 02138.