Okereke OI, Copeland M, Hyman BT, Wanggaard T, Albert MS, Blacker D. The Structured Interview & Scoring Tool–Massachusetts Alzheimer's Disease Research Center (SIST-M)Development, Reliability, and Cross-Sectional Validation of a Brief Structured Clinical Dementia Rating Interview. Arch Neurol. 2011;68(3):343-350. doi:10.1001/archneurol.2010.375
The Clinical Dementia Rating (CDR) and CDR Sum-of-Boxes can be used to grade mild but clinically important cognitive symptoms of Alzheimer disease. However, sensitive clinical interview formats are lengthy.
To develop a brief instrument for obtaining CDR scores and to assess its reliability and cross-sectional validity.
Using legacy data from expanded interviews conducted among 347 community-dwelling older adults in a longitudinal study, we identified 60 questions (from a possible 131) about cognitive functioning in daily life using clinical judgment, inter-item correlations, and principal components analysis. Items were selected in 1 cohort (n = 147), and a computer algorithm for generating CDR scores was developed in this same cohort and re-run in a replication cohort (n = 200) to evaluate how well the 60 items retained information from the original 131 items. Short interviews based on the 60 items were then administered to 50 consecutively recruited older individuals, with no symptoms or mild cognitive symptoms, at an Alzheimer's Disease Research Center. Clinical Dementia Rating scores based on short interviews were compared with those from independent long interviews.
In the replication cohort, agreement between short and long CDR interviews ranged from κ = 0.65 to 0.79, with κ = 0.76 for Memory, κ = 0.77 for global CDR, and intraclass correlation coefficient for CDR Sum-of-Boxes = 0.89. In the cross-sectional validation, short interview scores were slightly lower than those from long interviews, but good agreement was observed for global CDR and Memory (κ ≥ 0.70) as well as for CDR Sum-of-Boxes (intraclass correlation coefficient = 0.73).
The Structured Interview & Scoring Tool–Massachusetts Alzheimer's Disease Research Center is a brief, reliable, and sensitive instrument for obtaining CDR scores in persons with symptoms along the spectrum of mild cognitive change.
As potential disease-modifying therapies for Alzheimer disease enter clinical trials, identifying illness at a prodromal phase takes on growing importance: early cognitive decline may be most amenable to interventions that could slow progression of neuropathologic changes and symptoms.1- 3 Standardized tools, such as the Clinical Dementia Rating (CDR),4,5 are effective at distinguishing normal aging from mild cognitive impairment and dementia. The CDR features a global rating of impairment as well as a CDR Sum-of-Boxes (CDR-SB) that totals the ratings from each of 6 cognitive and functional domains (Memory, Orientation, Judgment and Problem-Solving, Community Affairs, Home and Hobbies, and Personal Care); the CDR-SB can be used to quantify impairment within the range of mild symptoms. The CDR is a mandatory element of the National Institute on Aging–funded Alzheimer's Disease Centers Uniform Data Set6 and the Alzheimer Disease Neuroimaging Initiative and is increasingly used in multicenter trials.7 There is an in-depth formal interview protocol8; however, many clinicians score the CDR based on their usual clinical interview. An expanded structured interview is available,9 and with trained, clinically skilled interviewers, it can achieve high reliability and discriminative ability among persons with very mild cognitive change10—a population of increasing interest in prevention and early intervention trials. However, this expanded interview takes approximately 90 minutes to complete, limiting its efficiency in larger-scale research settings.
Thus, there is a need for shorter measures to quantify clinically important changes along the spectrum from normal aging to mild cognitive impairment.10,11 In this study, our objectives were to (1) develop an instrument to administer a shortened CDR interview (about 25 minutes), (2) verify its reliability, and (3) conduct a cross-sectional validation by testing concordance of CDR scores from the shorter interview with those obtained using the expanded interview.
Participants were part of the Massachusetts General Hospital Memory and Aging Study (MAS), a longitudinal study aimed at discriminating prodromal Alzheimer disease from conditions with less severe memory impairments.9,10,12,13 Older adults with and without memory complaints were recruited in 3 cohorts from the community through advertisements: cohort 1 (n = 165), from 1992 to 1993; cohort 2 (n = 120), from 1997 to 1998; and cohort 3 (n = 95), from 2002 to 2006. To be included in the study, participants needed to be 65 years or older (with the exception of 7 individuals aged 57-64 years); without dementia; free of significant medical, neurologic, or psychiatric illness; rated as having a global CDR οf 0.5 or less5; and willing to participate in study procedures. Each participant was recruited with a knowledgeable informant, usually an immediate family member (spouse, adult child, or sibling) or close friend.
Participants were part of the Massachusetts Alzheimer's Disease Research Center (MADRC) longitudinal research cohort, developed in recent years in response to changes in the Alzheimer's Disease Center program requiring the collection of a Uniform Data Set on a cohort with normal cognition, mild cognitive impairment, or Alzheimer disease/other dementias. The MADRC participants are recruited through community and clinic populations and are evaluated annually. In 2007, we began recruiting MAS participants into the MADRC, and 177 such individuals (the great majority of MAS members are still able to attend visits) have joined the MADRC. The combined cohort now totals 756: 58.2% female; 84.0% white, 11.2% of African descent, and 4.8% other races; mean (SD) age is 74.9 (9.6) years (range, 46-97 years).
Memory and Aging Study cohort members were administered the expanded CDR interview9; they also had medical evaluations (ie, history and physical examination, electrocardiography, and standard laboratory tests), structural and functional neuroimaging tests (magnetic resonance imaging and single photon emission computed tomography), comprehensive neuropsychologic testing, and blood collection for biomarker and genetic analyses. Memory and Aging Study participants were evaluated annually with the CDR and brief neuropsychologic testing; for those who developed significant decline, a consensus conference was held to determine dementia using standard diagnostic criteria.14 The MADRC cohort participants were evaluated each year according to Uniform Data Set protocol,6 which includes CDR ratings, a medical history, vital signs, neurologic examination, and a standard battery of cognitive tests.15 The present study was approved by the Institutional Review Board and Human Research Committee of the Massachusetts General Hospital, Boston.
We developed the shortened CDR interview using legacy data from baseline visits from MAS cohort 1. This development cohort consisted of 147 participants (18 participants had missing data on the expanded interview items). Expanded interview items had an unequal number of responses, and several items had missing values for many participants; consequently, an automated item selection procedure (eg, R2 method in stepwise linear regression) could not be applied to these data because such procedures exclude any participant with a missing value for even a single item within a domain. Thus, a multistep, semiquantitative procedure was used to identify the smallest set of items that could provide information adequate to score the CDR while maintaining sensitivity to the spectrum of mild cognitive change.
The expanded interview consists of 131 items covering the 6 CDR domains. Each item was graded by the original interviewer using CDR categories of 0, 0.5, 1, or 2. In the first step of item selection, item correlations were assessed by domain; exclusions were made if an item (1) had no variance; (2) had insufficient data to determine correlations with other items; (3) had weak correlations (≤0.2) with all or most of the other items as well as the domain rating; or (4) was redundant, because it tended to be scored identically with a few items in the same cognitive or functional topic area but was weakly correlated or not correlated with the CDR domain rating itself and with many other items, including core items of the domain (eg, the core item “overall more forgetful of recent events” in the Memory domain).
In a second step, some initially excluded items were “forced” back in because they were considered highly clinically relevant by experienced clinicians (eg, a Judgment and Problem-Solving item on whether driving difficulty due to poor cognition had resulted in motor vehicle crashes) or were helpful in completing other Uniform Data Set forms (eg, the Functional Assessment Questionnaire)16 and thus added efficiency (the Structured Interview & Scoring Tool-MADRC [SIST-M] covers all 10 Functional Assessment Questionnaire topic areas). A final set of 60 items included the following: 14 items in Memory, 8 in Orientation, 14 in Judgment and Problem-Solving, 6 in Community Affairs, 15 in Home and Hobbies, and 3 in Personal Care.
A SAS (SAS Institute, Cary, North Carolina) computer algorithm was written by one of us (O.I.O.) to simulate, in effect, how participants' would have been scored on the CDR if the development cohort interviews had been conducted using only the 60 items. The algorithm was a complex hierarchical design that used a combination of the grade of each item (eg, 0, 0.5, and 1), the frequency with which different grades of items were observed within a CDR domain, and the relative clinical importance (ie, weight) of each item. We further refined this hierarchical algorithm by addressing whether CDR domains were unitary constructs or composed of key subdomains using principal components analysis. Because our raw variables were ordinally ranked, we first calculated polychoric correlations and then applied principal components analysis to the polychoric correlation matrix17 with orthogonal rotation (varimax method). We used the %POLYCHOR macro18 and FACTOR procedure in SAS. The weighting structure was slightly refined after key subdomains were identified in 2 domains: Orientation (time and space) and Judgment and Problem-Solving (complex decision-making; finance management; multitasking activities, including driving; and working memory operations).
The final instrument (SIST-M) provides interview prompts representing each of the 60 items and a scoring grid (values of 0, 0.5, 1, or not applicable/unknown for each item). In addition to the SIST-M symptom interview, our clinicians conduct a standard 5-minute objective examination that includes orientation, 3-item registration and delayed recall, abstraction, calculation, and serial subtraction. Finally, a separate form, the SIST-M-Informant Report (SIST-M-IR), was created to obtain reports from a knowledgeable informant. The SIST-M-IR consists of the same 60 items but frames them such that the informant can rate how much the participant has changed, if at all, from 5 to 10 years earlier. Each item is represented by an introductory question and item-specific response anchors, which can be circled directly on the form. The SIST-M-IR features simple instructions and language, large type fonts, and an alternating item-shading sequence to enhance readability; pilot work demonstrated that this form is easy for older people to complete in no more than 5 to 10 minutes. Administration of the SIST-M takes approximately 25 minutes and involves performing the structured interview and objective examination with the participant and separately reviewing the SIST-M-IR with the informant. The SIST-M and SIST-M-IR forms are available online (http://madrc.mgh.harvard.edu/structured-interview -scoring-tool-massachusetts-adrc-sist-m).
The SIST-M scoring algorithm was assessed using the legacy data replication cohort, which consisted of 200 participants (15 participants from MAS cohorts 2 and 3 had missing data on expanded interview items). We cross-sectionally validated the SIST-M in live interviews among MADRC participants between February 1 and September 4, 2008. Fifty consecutively recruited participants and their informants were interviewed 1 to 2 weeks apart (mean [SD], 9.7 [11.6] days) using the SIST-M and the long (expanded) interview. The SIST-M interviews were completed at the MADRC by neurologists and psychiatrists; all had completed online CDR training,19 and 26 of the SIST-M interviews were performed by raters with prior experience with the long interview. Long interviews were conducted via telephone by 3 experienced MAS raters who were masked to the SIST-M ratings and algorithm design. Raters for both the SIST-M and expanded interview were unaware of participants' neuropsychologic test results. Interviews were assigned such that nearly half the participants were former MAS members (n = 24) and half were members of the MADRC cohort only (n = 26). Furthermore, approximately half of the participants were administered the SIST-M first (n = 27), and half received it second (n = 23). Before administration of the SIST-M, all informants completed the SIST-M-IR independently.
Internal consistency of the SIST-M was initially assessed by calculating Cronbach α and item-total correlations for each domain in the legacy data replication cohort. To address reliability further, original and algorithm-based CDR ratings for participants were compared using simple or weighted κ20,21; CDR-SB agreement was evaluated using intraclass correlation coefficients (ICCs).22,23
Internal consistency of the SIST-M was also assessed among the 50 participants of the cross-sectional validation sample, and agreement of the short and long interviews was evaluated using weighted κ for CDR ratings and ICC for CDR-SB. We also assessed whether scores were systematically higher or lower in short vs long interviews using the Wilcoxon signed rank test for paired observations. Agreement was further scrutinized using the Bowker test of symmetry24 to identify patterns among mismatched cells. In addition, differences by cohort type (MAS or MADRC), sex, and interview order were assessed using χ2 and Fisher exact tests, as appropriate. Finally, we used κ and ICC to assess agreement between algorithm-based CDR ratings determined using only unguided informant reports on the SIST-M-IR vs those from the short interview (in which the physician interviewed both the participant and informant).
Table 1 illustrates demographic and clinical characteristics of the SIST-M development and replication cohorts. The cohorts were generally similar with the exception of race/ethnicity, reflecting assertive recruitment efforts by the MAS to increase minority representation in the later cohorts, as well as greater mean years of education in the replication cohort.
Characteristics of the cross-sectional validation sample are detailed in Table 2. Both subcohorts were well educated, with mean years of education at the baccalaureate level. Notable characteristics among participants recruited directly into the MADRC include younger age, greater proportion with global CDR of 0, higher participation by individuals of minority race/ethnicity, and higher prevalence of hypertension and diabetes mellitus. Neuropsychologic test performance was generally comparable between the cohorts.
There was high internal consistency of SIST-M items for each domain (Table 3) except for Home and Hobbies. The relatively low Cronbach α values for all 15 Home and Hobbies items (1 item had a negative item-total correlation) was explained by the fact that this domain consists, by definition, of 2 categories (home and hobbies); the corresponding items were better correlated. Comparing original vs algorithm-based scores, these were almost identical in the replication cohort; agreement ranged from κ = 0.66 to 0.79 for individual domains; ICC for the CDR-SB was 0.9 (Table 4).
Measures of internal consistency were good to superior (Table 5). Item-total correlations were generally good, but poor correlations were observed for 2 items. Since 1 of these items was a core aspect of Community Affairs (decreased participation in social activities) and the other was a Home and Hobbies item on the Functional Assessment Questionnaire (difficulty playing a game of skill, such as bridge or chess), we did not consider removing these items from the SIST-M.
Agreement of short and long interviews was generally good, with κ ≥ 0.70 for key ratings of memory and global CDR (κ = 0.55-0.75 is considered good, and κ ≥ 0.75 is considered excellent25); the ICC of 0.73 for the CDR-SB was also good (Table 6). However, comparison of mean ratings from the short and long interviews showed that the short interviews generated lower scores (P < .05 on Wilcoxon signed rank tests for all ratings except the global CDR rating; data not shown in table). Further scrutiny using the Bowker test revealed that a disproportionate number of mismatches occurred in which the short interview rating was lower than that of the long interview. This was especially true for Home and Hobbies (P = .001); results of the Bowker test were also statistically significant for Judgment and Problem-Solving (P = .02) and borderline significant for Orientation (P = .06). Mismatches did not vary significantly by cohort type, sex, or order of interview.
Finally, we compared CDR scores obtained by applying the algorithm only to responses on the SIST-M-IR to actual scores from the SIST-M interviews (ie, combined information from both participants and informants). Similarly, we generated CDR scores by applying the algorithm to the long interview and SIST-M items and compared these with the actual scores. Results demonstrated that, whether applying the algorithm to the long or short interview, algorithm-based scores agreed strongly with actual scores (Table 7). However, when the algorithm was applied to the informant-only responses, agreement with the SIST-M was substantially lower. For example, the ICC for the CDR-SB was only 0.57 (95% confidence interval, 0.38-0.74); it was even lower when comparing these informant-only ratings with the long interview (ICC, 0.39; 95% confidence interval, 0.19-0.64) (data not shown in table).
The SIST-M is an efficient structured interview that can be used to generate CDR scores that are reliable and discriminate along the spectrum of mild cognitive deficits (ie, CDR-SB, 0.0-4.0).10 The SIST-M also provides a scoring grid for each component item such that a validated algorithm can be applied for generating CDR ratings, which is a useful application for training purposes. Finally, the 60 items of the SIST-M were adapted to create a convenient informant-report form, the SIST-M-IR. Our results show that the SIST-M produces ratings consistent with those from an expanded CDR interview.9 We observed strong concordance of CDR scores, whether we applied an algorithm based on the SIST-M to legacy data or compared SIST-M scores with those from long interviews among participants in a cross-sectional validation.
Although there are briefer (5-10 minutes) measures of cognition (eg, Mini-Mental State Examination,26 Montreal Cognitive Assessment,27 and Mini-Cog28), most are based solely on objective performance and cannot be used to address subtle changes and symptoms. A brief informant interview based on the CDR has been developed (the AD829); it takes approximately 3 minutes to complete and correlates strongly with the CDR.30 However, the AD8 was designed to achieve rapid yet reliable classification of normal cognition (CDR, 0) vs dementia, including mild dementia (CDR, ≥0.5); it cannot be used alone to obtain the 6 CDR domain ratings and the CDR-SB. By contrast, the SIST-M is an interview method for determining ratings in all CDR categories as well as the graded outcome of the CDR-SB. Thus, the SIST-M makes a unique contribution to the existing repertoire of measures: it is a relatively short interview at approximately 25 minutes, is easy to administer, and yields both the quantitative and qualitative information of the CDR with sensitivity to very mild symptoms.
Another valuable aspect of this study was the development of the SIST-M-IR. Although other informant-based assessments of cognitive symptoms31 and dementia32 are available, these were not designed to map directly to CDR domains. By contrast, the SIST-M-IR yields information necessary to rate each CDR domain. However, we identified important caveats for its use. Informants tended to indicate fewer symptoms when filling out questionnaires on their own than were identified within the context of clinician-guided interviews covering identical items; furthermore, in early stages of cognitive change, informants may be unaware of subtle symptoms or of compensatory measures that the individual being examined has adopted in response to his or her cognitive challenges. When the SIST-M scoring algorithm was applied to unguided informant reports, there was fair or poor agreement with interviewers' CDR scores from both short and long interviews. By contrast, when the algorithm was applied to item ratings from the clinician interviews, the ICCs comparing algorithm-based and clinician-rated CDR-SB remained larger than 0.9. This suggests that the algorithm itself was not the primary factor with regard to lack of agreement but rather the loss of information that occurs when considering only reports from informants. Nevertheless, history on the participants is often obtained only from informants in many clinical research settings for a variety of practical reasons. Our results show that such an approach is likely to systematically underestimate levels of impairment. Obtaining joint information from the person being evaluated and the informant, during a clinician-guided interview, provides the optimal method for detecting early cognitive changes.
Limitations of this study must be recognized. First, our results were likely influenced by differences across CDR interviewers. Although all CDR raters had completed training and certification,19 those who conducted the long interviews had generally been evaluators in the MAS for longer than those who completed the SIST-M; there may have been some “drift” down in CDR ratings by the newer interviewers. This possibility was suggested by the significant differences on tests of mean differences in scores and concordance asymmetry. Consequently, the overall strong agreement (eg, κ ≥ 0.70 for global CDR and Memory) between the SIST-M and long interview was likely an underestimate of true agreement. Notably, κ statistics were lowest for Orientation (0.51) and Home and Hobbies (0.46); however, this is not surprising because prior work33 indicated that these 2 domains are the most difficult to rate and have the lowest agreement with a criterion standard rater, even among experienced evaluators. A second limitation is that responses on the SIST-M-IR may have been affected by response biases (eg, global denial or “naysaying”34); thus, future enhancements, such as intermittent reverse-coding of items, will be considered.35,36 Finally, the SIST-M and SIST-M-IR were developed in a cohort of well-educated older adults; thus, generalizability to populations with lower educational levels has not been established. However, the degree of education of our cohort is consistent with educational levels observed nationally in other Alzheimer's Disease Centers/Alzheimer's Disease Rearch Centers, and it is likely that the instrument calibrated to grading subtle changes in our cohort would perform equally well in other sites.
In summary, the SIST-M is an efficient, easily administered, reliable tool for obtaining CDR scores and provides particular value in clinical and research settings focused on persons with mild cognitive symptoms. Furthermore, we developed a SIST-M algorithm—a tool that could supplement CDR interview training and/or assist with interrater score calibration. Finally, we created the SIST-M-IR for rapid attainment of informant input on symptoms. While not sufficient for independent scoring of the CDR, the SIST-M-IR may prove useful for memory and general cognitive screening in large-scale research or primary care clinical settings. Thus, further work in this regard is warranted.
Correspondence: Olivia I. Okereke, MD, SM, Gerontology Research Unit, Massachusetts General Hospital, 149 13th St, Ste 2691, Charlestown, MA 02129 (firstname.lastname@example.org).
Accepted for Publication: May 4, 2010.
Author Contributions:Study concept and design: Okereke, Copeland, and Blacker. Acquisition of data: Okereke, Copeland, Hyman, Wanggaard, Albert, and Blacker. Analysis and interpretation of data: Okereke, Hyman, and Blacker. Drafting of the manuscript: Okereke. Critical revision of the manuscript for important intellectual content: Okereke, Copeland, Hyman, Wanggaard, Albert, and Blacker. Statistical analysis: Okereke and Blacker. Obtained funding: Okereke, Hyman, Albert, and Blacker. Administrative, technical, and material support: Okereke, Hyman, Wanggaard, Albert, and Blacker. Study supervision: Copeland, Hyman, and Blacker.
Financial Disclosure: None reported.
Funding/Support: This study was supported by the Harvard NeuroDiscovery Center and grants P01-AG004953 and P50-AG005134 (the Massachusetts Alzheimer's Disease Research Center) from the National Institutes of Health.
Additional Contributions: We thank Jeanette Gunther, MS, and Kelly A. Hennigan for assistance with participant recruitment and visit coordination; Mark Albers, MD, PhD, Virginia Hines, MA, Gad Marshall, MD, Carol Mastromauro, MSW, Nikolaus McFarland, MD, PhD, and Anand Viswanathan, MD, for assistance with clinical evaluations; Laura E. Carroll, BA, Sheela Chandrashekar, BA, and Michelle Schamberg, BA, for assistance with data collection, entry, and quality checking; and Mary Hyde, PhD, for assistance with data management and statistical program review. We express special appreciation to all our study participants.