Results of cluster analysis in which 4 variables from the magnetic resonance imaging were used to define 5 clusters. The values for these magnetic resonance imaging variables were converted to z scores before performing the cluster analyses to allow easier comparison across variables.
Longstreth WT, Diehr P, Manolio TA, Beauchamp NJ, Jungreis CA, Lefkowitz D, for the Cardiovascular Health Study Collaborative Research Group. Cluster Analysis and Patterns of Findings on Cranial Magnetic Resonance Imaging of the ElderlyThe Cardiovascular Health Study. Arch Neurol. 2001;58(4):635-640. doi:10.1001/archneur.58.4.635
To characterize patterns of findings on cranial magnetic resonance imaging (MRI) of the elderly using a statistical technique called cluster analysis.
Subjects and Methods
The Cardiovascular Health Study is a population-based, longitudinal study of 5888 people 65 years and older. Of these, 3230 underwent cranial MRI scans, which were coded for presence of infarcts and grades for white matter, ventricles, and sulci. Cluster analysis separated participants into 5 clusters based solely on patterns of MRI findings. Participants comprising each cluster were contrasted with respect to cardiovascular risk factors and clinical manifestations.
One cluster was low on all the MRI findings (normal) and another was high on all of them (complex infarcts). Another cluster had evidence for infarcts alone (simple infarcts), whereas the last 2 clusters lacked infarcts, one having enlarged ventricles and sulci (atrophy) and the other having prominent white matter changes and enlarged ventricles (leukoaraiosis). Factors that distinguished these clusters in a discriminant analysis were age, sex, several measures of hypertension, internal carotid artery wall thickness, smoking, and prevalent claudication before the MRI. The atrophy group had the highest percentage of men and the normal group had the lowest. Cognitive and motor performance also differed across clusters, with the atrophy cluster performing better than may have been expected.
These MRI patterns identified participants with different vascular disease risk factors and clinical manifestations. Results of these exploratory analyses warrant consideration in other populations of elderly people. Such patterns may provide clues about the pathophysiology of structural brain changes in the elderly.
CRANIAL magnetic resonance imaging (MRI) of the elderly commonly reveals abnormalities in the brain. Many studies have attempted to understand the clinical importance of these abnormalities by concentrating on specific findings, most commonly presence of MRI-defined infarcts, changes in white matter, size of ventricles, or prominence of sulci.1,2 Each finding is typically considered alone, and yet the MRI findings often coexist. Patterns of MRI findings may be more important than any particular finding alone. What the patterns are and whether they are clinically important remains undetermined.
The Cardiovascular Health Study (CHS) is a population-based, longitudinal study of coronary heart disease and stroke in 5888 participants 65 years or older.3,4 As part of their comprehensive evaluation, more than 3000 participants have undergone cranial MRI. Much work has been done in the CHS to characterize risk factor and clinical manifestations of specific MRI findings.1,2,5- 15 In these analyses, sometimes the MRI findings have been used as dependent or outcome variables and sometimes they have been used as independent or predictor variables. For instance, as the dependent variable, presence of small subcortical infarcts was associated in the CHS with age, male sex, diastolic blood pressure, creatinine levels, maximum internal carotid artery stenosis, pack-years of smoking, and diabetes at baseline.10 White matter grade was associated with age, systolic blood pressure, forced expiratory volume in 1 second, and income.6 Ventricular grade and sulcal grade were associated most strongly with age.15 In these examples, correlations with the other MRI findings were also present even though analyses focused on each MRI finding separately.6,10,15 In addition, when used as independent variables, the MRI findings compete among themselves. The fact that only one of several MRI findings may enter a multivari able model does not address the possibility of correlations among the MRI findings.
In this article, rather than beginning with a particular MRI finding and seeking its clinical correlates, we asked whether certain combinations of MRI findings could be identified. Instead of hypothesizing what combinations would be important, we sought a method that would place participants into groups based solely on distinctive patterns of MRI findings. In these exploratory analyses, we used a statistical technique called k-means cluster analysis.16,17 We then turned to the question of the clinical importance of these data-derived clusters whose members all shared a similar pattern of MRI findings. Participants included in one cluster were contrasted with those in others with respect to the cardiovascular risk factors present before the scan and clinical manifestations around the time of the scan.
Members of the CHS cohort were recruited from a random sample of the Health Care Financing Administration Medicare eligibility lists in 4 US communities: Forsyth County, North Carolina; Sacramento County, California; Washington County, Maryland; and Pittsburgh (Allegheny County), Pennsylvania. Participants had to be 65 years or older, able to give informed consent, and able to respond to questions without the aid of a surrogate respondent. They could not be institutionalized, wheelchair-bound in the home, or receiving treatment for cancer. To enhance the minority representation in the original cohort, 687 African Americans were recruited from the centers in North Carolina, California, and Pennsylvania, bringing the total size of the cohort to 5888 people. More details about the study design and characteristics of the 5888 participants are published elsewhere.3,4,6
Eligible and consenting participants underwent an extensive baseline evaluation, including standard questionnaires, physical examination, and laboratory testing, as detailed previously.3,4,6 Subjects' cognitive functions were evaluated using a modified Mini-Mental State Examination18,19 and the Digit-Symbol Substitution test.20 They also completed a standard measure of depression21 and answered a single question about overall health. Subject's upper extremity function was assessed with the number of finger taps in 15 seconds, and lower extremity function, with the number of seconds to walk 4.6 m (15 ft) at a usual pace. Parts of the baseline evaluation have been repeated annually. Electrocardiogram, carotid ultrasound, echocardiogram, and extensive blood testing were done at baseline and repeated about 3 years later.
Cranial MRI scans were performed in a standard fashion.5,22 Magnetic resonance imaging was performed on 1.5-T scanners (General Electric Medical Systems, Milwaukee, Wis, or Picker, Cleveland, Ohio) at 3 field centers and on a 0.35-T Toshiba instrument (American Medical Systems, Tustin, Calif) at the fourth. The scanning protocol included standard sagittal T1-weighted images and axial T1, spin density, and T2-weighted images—all with 5-mm thickness and no interslice gaps. Imaging data were sent to a single reading center for standard interpretation without knowledge of any clinical information. Neuroradiologists at the reading center estimated the white matter, ventricular, and sulcal grades using a 10-point system with 0 representing no abnormality and 9 representing the most abnormal, as described previously.1,6,15Brain infarct was defined as an area of abnormal signal intensity in a vascular distribution, 3 mm or greater, that lacked mass effect.2,5,8,10,22 Scans were coded as either having 1 or more infarcts or no infarcts.
To identify patterns of MRI findings, we used a statistical technique called "cluster analysis."16,17,23 We decided to use standardized values of 4 key MRI variables: presence of infarct and grades for white matter, ventricles, and sulci. Values for these 4 variables were converted to z scores by subtracting the sample mean and dividing by the SD, so that each finding had an SD of 1, to give each variable approximately equal influence over the cluster formation. The cluster analysis provides solutions by which participants without missing values on these MRI variables are separated into clusters that differ as much as possible on the 4 MRI variables.
The method that we chose to define the dissimilarity of a set of variables was k-means cluster analysis.16,17,23 In these analyses, the user specifies k, the number of clusters desired. The program identifies k different people as the initial member of the k clusters. The program then compares the first person in the data set with each cluster and assigns the person to the closest cluster based on euclidean distance. Distance is computed by subtracting the person's values on the 4 key MRI variables from the average cluster values for these variables. Differences are then squared and added. After a person is assigned to a cluster, the cluster's means on the 4 variables are recomputed to include this person's values. These calculations are repeated for all persons in the data set. The process is repeated until the cluster means change by less than a prescribed amount. A person's original cluster assignment may change in later iterations. Also, the clusters are not hierarchical. For example, 2 participants who are in different clusters in a 2-cluster solution may be in the same cluster in the 3-cluster solution.
We performed these analyses specifying 2 through 8 clusters. The best solution depends partly on a quantitative assessment of how much information is gained by creating additional clusters17 and partly on a qualitative assessment of the resulting models. Once we decided on the best solution, we characterized the participants within each cluster on the basis of risk factors present before the scan and clinical manifestations around the time of the scan. Cardiovascular risk factors present before the scan included those described previously in articles from the CHS, including demographics; prevalent cardiovascular diseases, hypertension, atrial fibrillation, and diabetes; measures of subclinical disease such as carotid artery intima and media thickness; and lifestyle influences such as cigarette smoking and alcohol consumption.6,10,12,15,24
The statistical significance of a variable's values across the groups defined by the cluster analysis was assessed using analysis of variance for continuous variables and χ2 for discrete variables. Discriminant analysis was used to identify which among the numerous potential risk factors were independently and significantly different across the clusters. A stepwise model was used with a P-to-enter of .05. Finally, results were similar regardless of whether the variables for MRI findings were adjusted for age before the cluster analysis, whether the variables used to characterize the clusters were adjusted for age, or whether both adjustments were performed. For simplicity, we present only the results without any adjustments for age.
All members of the CHS cohort were invited to undergo MRI scanning, and 3660 (62%) agreed and were scanned. They were younger and healthier than those who did not undergo MRI.6,14 Of the 3660 participants who had an MRI, 377 (10.3%) had experienced a transient ischemic attack or stroke before MRI was performed. For ease of interpretation, these participants were excluded from analyses. After also excluding participants with missing values for 1 or more of the MRI findings, 3230 remained for these analyses. SPSS for Windows 6.0 statistical software23 was used for these analyses, which were based on the updated CHS database, incorporating minor corrections through December 1998.
In the 2-cluster solution, 3230 participants without transient ischemic attack or stroke were divided largely on the basis of whether their MRI showed 1 or more infarcts. More than 80% of the MRI-defined infarcts were subcortical and less than 20 mm in their largest diameter. In the 3-cluster solution, the MRI of everyone in the first cluster showed 1 or more infarcts. The other 2 clusters had few participants with MRI-defined infarcts, but 1 cluster had high values for ventricular and sulcal grades and the other cluster did not. In the 4-cluster solution, clusters were similar to those in the 3-cluster solution except that the fourth cluster had high white matter and ventricular grades. The 5-cluster solution was similar to the 4-cluster solution except that now 2 clusters had high white matter grades, 1 with and 1 without MRI-defined infarcts. Solutions with 6, 7, and 8 clusters were also examined and became increasingly complex and difficult to summarize.
Examining these solutions for information gained by adding clusters suggested that the solutions with 4 to 6 clusters were best. We chose to explore the solution with 5 clusters in greater detail. The z scores for the 4 MRI variables are displayed for the 5-cluster solution in Figure 1. The greater the z score for a particular MRI feature, the more prominent that feature is in the cluster. The first cluster, comprising 30.4% of all those scanned, was low on all z scores (normal); the second, comprising 27.6%, was high for ventricular and sulcal grades (atrophy); the third, comprising 14.2%, was high for white matter grade and somewhat high for ventricular grade (leukoaraiosis); the fourth, comprising 16.4%, was high for brain infarct only (simple infarct); and the fifth, comprising 11.4%, was high on all 4 variables (complex infarct). Table 1 lists for the 5 clusters the percentage with MRI-defined infarcts and the mean grades for white matter, ventricles, and sulci. The MRI variables differ significantly, as expected, because the clusters were created to maximize the differences among the 5 clusters. Only a single member of the normal, atrophy, and leukoaraiosis clusters had MRI-defined infarcts, whereas all but 2 members of the simple infarct and complex infarct clusters had MRI-defined infarcts.
Most of the potential risk factors, including vascular diseases prevalent before the MRI, were significantly different across the 5 clusters (Table 2); only current cigarette smoking, history of diabetes, albumin levels, and atrial fibrillation were not. When stepwise discriminant analysis with the clusters as the dependent variable was performed to identify which among all of the potential risk factors listed were most important, 8 variables explained all of the variability among the 5 clusters. Rows for these 8 variables are marked with a footnote symbol to the far right in Table 2 and included the following, in the order in which they entered the stepwise model: age at MRI, sex, ankle-to-arm ratio, internal carotid artery wall thickness, systolic blood pressure, pack-years smoked, history of hypertension, and claudication diagnosed prior to MRI.
The clusters that were most easily distinguished were the normal cluster, with the most favorable risk factor profile, and the complex infarct cluster, with the least. Exceptions were cholesterol and low-density lipoprotein cholesterol levels, which were lowest in the complex infarct cluster. The atrophy cluster was characterized by older men who had smoked cigarettes but who had seemingly avoided many of the cardiovascular complications expected in such a group. The atrophy cluster had the highest percentage of men, whereas the normal cluster had the lowest. The leukoaraiosis cluster was characterized by older subjects with measured hypertension, regardless of their report of having been diagnosed as having hypertension. The simple infarct cluster was characterized by younger subjects whose measured blood pressure was less than in some clusters despite their more often reporting a history of hypertension. The leukoaraiosis cluster had slightly higher systolic blood pressure than the simple infarct cluster, with the reverse being true for the diastolic blood pressure.
A similar pattern of performance on cognitive and motor tasks was evident (Table 3). The best performance was in the normal cluster, and the worst was in the complex infarct cluster. Despite the mean age being greater in the atrophy cluster than in the normal cluster, the performance in the atrophy cluster was almost as good on most measures as in the normal cluster. For the depression score and the 5-point self-assessment of health, the atrophy cluster had the best scores. The order of the remaining clusters was always the same, with worsening performance from the simple infarct cluster to the leukoaraiosis cluster and finally to the complex infarct cluster. The performance in the simple infarct cluster and the leukoaraiosis cluster was quite similar—the greatest difference being in the Digit-Symbol Substitution test, with members of the simple infarct cluster performing better than those of the leukoaraiosis cluster.
In this elderly population, cluster analysis defined 5 distinct groups based on 4 MRI findings: presence of MRI-defined infarcts and grades on white matter, ventricles, and sulci. We assigned descriptive names to each cluster: normal, atrophy, leukoaraiosis, simple infarct, and complex infarct. The 2 most common patterns were the normal cluster, containing 30.4% of participants, and the atrophy cluster, containing 27.6%. In general, the normal cluster had the best risk factor profile and the complex infarcts cluster had the worst. The potential risk factors that best distinguished these clusters in a discriminant analysis were age, sex, several measures of hypertension, internal carotid artery wall thickness, smoking, and prevalent claudication before the MRI. Cognitive and motor performance also differed across clusters, with the atrophy cluster performing better than may have been expected.
These results suggest that among these elderly people free of transient ischemic attacks or stroke, MRI findings define 2 main groups: 1 with and 1 without subclinical cerebrovascular disease. The group without subclinical cerebrovascular disease consists of 2 subgroups: 1 whose imaging looks normal and 1 whose imaging shows atrophy but no other changes. These 2 groups have the most benign risk factor profile and similar results on performance measures. The atrophy cluster had the highest percentage of men, while the normal cluster had the lowest. Members of the atrophy cluster were also older, had more education, had more pack-years of cigarette smoking, and had thicker internal carotid artery walls than members of the normal cluster.
The leukoaraiosis, simple infarct, and complex infarct clusters were defined by MRI findings likely reflecting cerebrovascular disease: white matter changes and MRI-defined infarcts. More than 80% of the MRI-defined infarcts were subcortical and small. Previous studies6,10 and the risk factor profiles found in this study support the hypothesis that both white matter changes and lacunar infarcts are related to vascular disease, especially affecting the small arteries of the brain. Although none of the participants had been recognized as having symptoms related to cerebrovascular disease, the results from previous work6,10 and from performance measures in this study suggest that white matter changes and MRI-defined infarcts are associated with dysfunction. More difficult to determine is the clinical importance of the differences detected in the performance measures.
These analyses do not indicate why some participants develop white matter changes, some develop infarcts, and some both. Perhaps the pattern is determined by the severity or mixture of the risk factors or perhaps the patterns evolve from one to another. For instance, we cannot address with these cross-sectional analyses whether over time members of the leukoaraiosis cluster and simple infarct cluster become contaminated with other findings, such as in the complex infarct cluster, or remain pure.
The CHS has many strengths, including having characterized a large group of elderly people with respect to cardiovascular and cerebrovascular risk factors and outcomes. Participants in the CHS, especially those who underwent MRI, are not representative of all older people. In general, they are healthier than the general population of elderly people, and the effect of such a bias on these analyses is unknown.6,14 Possibly these analyses simply emphasize some quirk of the data set, and these types of analyses should be considered in other populations of elderly people. In addition, the results could be affected by several arbitrary decisions that we made, such as which MRI variables to include in the analyses, which solution to examine in detail, and which participants to exclude, namely, those with a history of transient ischemic attack and stroke.
Members of these 5 clusters differ by potential risk factors and clinical manifestations. How well these results can be generalized to other populations of elderly people awaits further study. We would encourage investigators addressing these issues to broaden their analytic approach and move from examining single MRI findings to examining patterns of MRI findings. We believe that with such an approach the full potential of MRI in the elderly will be realized by providing clues about the pathophysiology of structural brain changes. With respect to clinical correlates, white matter changes, MRI-defined infarcts, and especially their combination define ominous patterns. Enlarged ventricles and prominent sulci alone define a benign pattern with clinical correlates similar to those without any of these MRI findings.
Accepted for publication August 18, 2000.
This work was supported by contracts N01-HC-85079, N01-HC-85086, and N01 HC-95100 from the National Heart, Lung, and Blood Institute, Bethesda, Md.
Participating Institutions and Principal Staff
Forsyth County, North Carolina—Wake Forest University School of Medicine: Gregory L. Burke, John Chen, Alan Elster, Walter H. Ettinger, Curt D. Furberg, Gerardo Heiss, Sharon Jackson, Dalane Kitzman, Margie Lamb, David S. Lefkowitz, Mary F. Lyles, Cathy Nunn, Ward Riley, Beverly Tucker; Forsyth County, North Carolina—Wake Forest University School of Medicine, Electrocardiography Reading Center: Farida Rautaharju, Pentti Rautaharju; Sacramento County, California—University of California, Davis: William Bommer, Charles Bernick, Andrew Duxbury, Mary Haan, Calvin Hirsch, Lawrence Laslett, Marshall Lee, John Robbins, Richard White; Washington County, Maryland—The Johns Hopkins University: M. Jan Busby-Whitehead, Joyce Chabot, George W. Comstock, Adrian Dobs, Linda P. Fried, Joel G. Hill, Steven J. Kittner, Shiriki Kumanyika, David Levine, Joao A. Lima, Neil R. Powe, Thomas R. Price, Jeff Williamson, Moyses Szklo Melvyn Tockman; Magnetic Resonance Imaging Reading Center—Washington County, Maryland—The Johns Hopkins University: R. Nick Bryan, Norman J. Beauchamp, Carolyn C. Meltzer, Douglas Fellows, Melanie Hawkins, Patrice Holtz, Naiyer Iman, Michael Kraut, Grace Lee, Cynthia Quinn, Larry Schertz, Earl P. Steinberg, Scott Wells, Linda Wilkins, Nancy C. Yue; Allegheny County, Pennsylvania—University of Pittsburgh: Diane G. Ives, Charles A. Jungreis, Laurie Knepper, Lewis H. Kuller, Elaine Meilahn, Peg Meyer, Roberta Moyer, Anne Newman, Richard Schultz, Vivienne E. Smith, Sidney K. Wolfson; Echocardiography Reading Center (Baseline)—University of California, Irvine: Hoda Anton-Culver, Julius M. Gardin, Margaret Knoll, Tom Kurosaki, Nathan Wong; Echocardiography Reading Center (Follow-up)—Georgetown Medical Center, Washington, DC: John Gottdiener, Eva Hausner, Stephen Kraus, Judy Gay, Sue Livengood, Mary Ann Yohe, Retha Webb; Ultrasound Reading Center—New England Medical Center, Boston, Mass: Daniel H. O'Leary, Joseph F. Polak, Laurie Funk; Central Blood Analysis Laboratory—University of Vermont, Colchester: Edwin Bovill, Elaine Cornell, Mary Cushman, Russell P. Tracy; Respiratory Sciences, University of Arizona, Tucson: Paul Enright; Coordinating Center, University of Washington, Seattle: Alice Arnold, Annette L. Fitzpatrick, Bonnie K. Lind, Richard A. Kronmal, Bruce M. Psaty, David S. Siscovick, Lynn Shemanski, Will Longstreth, Patricia W. Wahl, David Yanez, Paula Diehr, Maryann McBurnie, Chuck Spieker, Scott Emerson, Cathy Tangen, Priscilla Velentgas; and National Heart, Lung, and Blood Institute Project Office, Bethesda, Md: Robin Boineau, Teri A. Manolio, Peter J. Savage, Patricia Smith.
Corresponding author: W. T. Longstreth, Jr, MD, MPH, Department of Neurology, Box 359775, Harborview Medical Center, 325 Ninth Ave, Seattle WA 98104-2499.