Chui HC, Mack W, Jackson JE, Mungas D, Reed BR, Tinklenberg J, Chang F, Skinner K, Tasaki C, Jagust WJ. Clinical Criteria for the Diagnosis of Vascular DementiaA Multicenter Study of Comparability and Interrater Reliability. Arch Neurol. 2000;57(2):191-196. doi:10.1001/archneur.57.2.191
Several clinical criteria have been developed to standardize the diagnosis of vascular dementia (VaD). Significant differences in patient classification have been reported, depending on the criteria used. Few studies have examined interrater reliability.
To assess the concordance in classification and interrater reliability for the following 4 clinical definitions of VaD: the Hachinski Ischemic Score (HIS), the Alzheimer Disease Diagnostic and Treatment Centers (ADDTC), National Institute of Neurological Disorders and Stroke–Association Internationale pour la Recherche et l'Enseignement en Neurosciences (NINDS-AIREN), and Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV).
Structured diagnostic checklists were developed for 4 criteria for VaD, 2 criteria for Alzheimer disease (AD), and 4 criteria for dementia. Twenty-five case vignettes, representing a spectrum of cognitive impairment and subtypes of dementia, were prepared in a standardized clinical format. Concordance in case classification using different criteria and interrater reliability among 7 ADDTCs given a specific set of criteria was assessed using the κ statistic.
The frequency of a diagnosis of VaD was highest using the modified HIS or DSM-IV criteria, intermediate using the original HIS and ADDTC criteria, and lowest using the NINDS-AIREN criteria. Scores for interrater reliability ranged from κ = 0.30 (ADDTC) to κ = 0.61 (original HIS).
Clinical criteria for VaD are not interchangeable. Depending on the criteria selected, the reported prevalence of VaD will vary significantly. The traditional HIS has higher interrater reliability than the newer criteria for VaD. Prospective longitudinal studies with clinical-pathological correlation are needed to compare validity.
THE CRITERIA chosen to diagnose vascular dementia (VaD) will influence estimates of its incidence and prevalence, as well as its recognition and treatment.1- 4 Since 1975, the Hachinski Ischemic Score (HIS) in its original5 or modified form6- 9 has provided a principal method for the diagnosis of multi-infarct dementia (MID). The original HIS assigns 1 or 2 points to each of 13 clinical features thought to be associated with MID; the modified HIS constitutes a subset of the original 13 items. The HIS has proven to be fairly sensitive and specific in differentiating pure Alzheimer disease (AD) and MID (approximately 70%-80%), but relatively insensitive to the presence of mixed diseases (17%-50%10). A more recent meta-analysis by Moroney et al11 of 312 pathologically verified cases (191 AD, 80 MID, and 41 mixed AD-MID [MIX]) showed significant group differences in mean HIS, but very few items distinguished MIX from MID or AD. A HIS of 5 or 6 was 93% sensitive, but only 17% specific, in distinguishing MID from MIX, and 84% sensitive, but only 29% specific, in distinguishing AD from MIX.
In the last decade, several new diagnostic criteria have been developed to broaden the concept of VaD, to incorporate neuroimaging findings, and to offer guidelines for determining whether the vascular brain injury is causally related to the dementia. Examples of these criteria include the State of California Alzheimer Disease Diagnostic and Treatment Centers (ADDTC) criteria for probable and possible ischemic vascular dementia (IVD),12 the International Classification of Diseases, Tenth Edition (ICD-10) criteria for VaD,13,14 the National Institute of Neurological Disorders and Stroke–Association Internationale pour la Recherche et l'Enseignement en Neurosciences (NINDS-AIREN) criteria for probable and possible VaD,15 and the Diagnostic and Statistical Manual for Mental Disorders, Fourth Edition (DSM-IV), criteria for dementia of the vascular type.16 Few studies have examined the comparability of these diagnostic criteria. In 1994, the Consortium of Canadian Centres for Clinical Cognitive Research concluded that "no one set of criteria is demonstrably superior to another."17
Only 1 study, to our knowledge, has compared the sensitivity and specificity of the newer criteria using neuropathological findings as the criterion standard.18 In the absence of neuroimaging data, however, that study was limited to the clinical criteria for possible (ie, not probable) VaD.18 Two studies have compared the agreement in patient classification resulting from various criteria for VaD.19,20 Verhey et al19 compared the classification of 124 patients resulting from the application of 7 different criteria for VaD and 4 sets of criteria for AD. Agreement was lower among the diagnostic criteria for VaD (κ = 0.52) than for AD (κ = 0.75). The number of patients with a diagnosis of VaD ranged from a high of 29% using the modification of HIS by Rosen et al6 to a low of 6% using NINDS-AIREN criteria.15 Similarly, Wetterling et al20 compared the ADDTC, DSM-IV, ICD-10, and NINDS-AIREN criteria in 167 elderly patients with dementia. In descending order of frequency, a diagnosis of VaD was made in 52.3% of cases using DSM-IV criteria, 32.9% using ICD-10 criteria, 27.1% using ADDTC criteria, and 14.1% using NINDS-AIREN criteria. Both studies highlight major differences among the criteria for VaD. A 4- to 5-fold difference in estimated incidence or prevalence might be anticipated, simply based on the choice of diagnostic criteria.
To our knowledge, there have been few studies of the interrater reliability associated with various criteria for VaD. Lopez et al21 examined the interrater reliability of the NINDS-AIREN criteria; Larson et al22 examined DSM-IV criteria. In our study, we compared patient classification among several sets of criteria for VaD, but also examined interrater reliability among 7 independent ADDTCs given a specific set of criteria. To place our findings in a broader context, diagnostic criteria for dementia and AD were also examined.
Several criteria for VaD, AD, and dementia were selected for study. Twenty-five cases were chosen to represent a spectrum of severity of cognitive impairment and subtypes of dementia. The cases were prepared in a uniform format, then rated at 7 independent ADDTCs using a diagnostic checklist. A computerized algorithm was used to generate the final clinical diagnoses. Comparisons were made between classifications resulting from the application of various diagnostic criteria and between raters (centers) given specific diagnostic criteria.
We studied 4 clinical criteria for VaD, 2 clinical criteria for AD, and 4 definitions of dementia. The clinical criteria were formatted as itemized checklists to eliminate unfamiliarity as a source of misclassification, as follows:
NINCDS-ADRDA criteria for probable and possible AD.23
DSM-IV criteria for dementia of the Alzheimer type.16
HIS (original, ≥75; or modified score, ≥46).
ADDTC criteria for probable and possible IVD.12
NINDS-AIREN criteria for probable and possible VaD.15
DMS-IV criteria for dementia of the vascular type.16
The 4 definitions for VaD differ significantly in their inclusion criteria, particularly regarding their requirement for focal neurologic signs, neuroimaging evidence of stroke or cerebrovascular disease, or causal relation between stroke and dementia (Table 1). Focal neurologic signs receive 2 points according to the HIS, but are not necessarily required for diagnosis of MID. Focal neurologic signs or evidence of significant cerebrovascular disease (by history, examination, or neuroimaging evidence) suffice for a DSM-IV diagnosis of dementia of the vascular type. Evidence of infarction on neuroimaging study is required for a diagnosis of probable (but not possible) IVD using ADDTC criteria. Focal neurologic signs, evidence of cerebrovascular disease on neuroimaging studies, and a causal relation between stroke and dementia are required for a diagnosis of NINDS-AIREN probable (but not possible) VaD.
Definitions of VaD also differ in their exclusion criteria. The HIS does not use exclusion criteria. Delirium is excluded by the ADDTC, NINDS-AIREN, and DSM-IV guidelines. In addition, severe aphasia, AD, or other brain disorder sufficient to cause dementia preclude an NINDS-AIREN diagnosis of VaD. Thus, the DSM-IV criteria and the HIS are the most lenient, the ADDTC criteria are intermediate, and the NINDS-AIREN criteria are the most stringent with regard to inclusion and exclusion criteria.
The sample consisted of 25 case vignettes selected at the coordinating ADDTC, located at Rancho Los Amigos Medical Center–University of Southern California, Los Angeles. Subjects underwent a comprehensive interdisciplinary evaluation for memory loss. There were 13 men and 12 women (mean ± SD age, 67.8 ± 9.1 years). Of the 25 subjects, 4 were cognitively healthy; 4, cognitively impaired but not demented; 14, mildly demented; and 3, moderately demented. Six subjects had primary degenerative dementia; 6, strokes; 3, parkinsonian features; 3, depressed mood; and 7, atypical presentations (including prominent personality change [n = 3], language disturbance [n = 1], apraxia [n = 1], visual disorientation [n = 1], and history of encephalitis [n = 1]). The initial mean (±SD) Folstein Mini-Mental State Examination (MMSE)24 score was 22.0 ± 6.6.
The vignettes included a narrative of the chief complaint, history of present illness, medical history, family history of dementia, results of physical and neurologic examination, MMSE,24 Blessed Memory Information Concentration Test,25 and laboratory test results (including complete blood cell count; sequential multiple analyzer for levels of sodium, potassium, carbon dioxide, chloride, glucose, serum urea nitrogen, and creatinine; thyroid functions; levels of vitamin B12; and testing for microhemagglutination–Treponema pallidum). The results of neuropsychological testing were included for 21 subjects. The results of neuroimaging studies (computed tomography or magnetic resonance imaging) were summarized for 24 of the 25 subjects (not available for 1 subject who was cognitively healthy but depressed). In 20 subjects, a composite of representative or key neuroimages (eg, showing an infarct or asymmetric pattern of atrophy) was also included.
The participating ADDTCs are university affiliated and highly experienced in the diagnosis of dementia and its subtypes. The center responsible for preparing the case vignettes did not participate in rating them. At each of the remaining 7 centers, the diagnostic ratings were completed during a consensus team conference, which included a geriatric neurologist, psychiatrist, or internist. For the HIS, the teams were instructed to check a box if any of the 13 items was present. For the other criteria, the raters were asked to check yes, no, or don't know regarding the presence or absence of each key element. A response of don't know occurred infrequently (106/4725 [2.2%]) and usually indicated that despite the best available information, a determination could not be made (eg, unable to determine whether there was a temporal relation between the stroke and the onset of dementia). The coding sheets (7 centers × 25 vignettes = 175 rating sheets) were sent to the coordinating ADDTC for data entry and statistical analysis.
Responses for each item were scored using computer algorithms to arrive at a diagnosis (yes or no) of dementia, AD, or VaD. In the algorithms, items rated as don't know were considered to be absent (no). Agreement in diagnostic classification was assessed across 7 centers given the same set of criteria, using the κ statistic for 4 sets of criteria for dementia, 2 for AD, and 4 for VaD. The κ statistic is the rate of observed agreement between a pair of ratings (or all possible pairs of ratings when multiple raters are involved26) adjusted for the proportion of the agreement that can be expected to occur by chance. The following guidelines have been suggested for interpreting κ scores: 0.00 to 0.20 indicates slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, almost perfect agreement.27 To determine whether any center was an outlier in the comparison of interrater reliability, the analyses were repeated after removing each center.
For the diagnosis of dementia, nearly perfect agreement was observed across the definitions embodied in the DSM-IV, National Institute of Neurological and Communicative Disorders and Stroke–Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA), ADDTC, and NINDS-AIREN criteria. The κ scores ranged from 0.87 to 0.96 (Table 2). Substantial agreement was found between NINCDS-ADRDA criteria for probable AD and DSM-IV criteria for dementia of the Alzheimer type (κ = 0.79). Lesser (ie, fair to moderate) agreement was observed among the criteria for VaD. The κ scores ranged from 0.24 (between the ADDTC probable IVD and a modified HIS) to 0.60 (between the combined ADDTC probable and possible IVD and DSM-IV dementia of the vascular type).
Among the 4 criteria for VaD, significant differences occurred in diagnostic classifications (Table 3). The DSM-IV and the modified HIS were the most liberal, each yielding a diagnosis of VaD in 25% of the ratings. The NINDS-AIREN criteria were the most conservative, leading to a diagnosis of probable or possible VaD in only 6% of the ratings (ie, 1 case). Intermediate frequencies were noted for a diagnosis of VaD by ADDTC criteria and for an original HIS (20.6%, and 13.7%, respectively).
Substantial interrater agreement was noted among the 7 centers for the diagnosis of dementia. The κ values for each of the 4 definitions of dementia ranged from 0.62 to 0.74. Moderate interrater reliability was found for the diagnosis of AD, using DSM-IV or NINCDS-ADRDA criteria (κ = 0.56 and κ = 0.46, respectively). A DSM-IV diagnosis of dementia of the Alzheimer type and an NINCDS-ADRDA diagnosis of probable AD were assigned at similar rates (20.6% of all ratings in both cases). Possible AD by NINCDS-ADRDA criteria was diagnosed in 53.7% of ratings, with moderate interrater reliability (κ = 0.47).
Variable interrater reliability was encountered for the diagnosis of VaD. Substantial agreement was realized for the original or modified HIS (κ = 0.65 and κ = 0.59, respectively). Moderate interrater agreement was achieved for an ADDTC diagnosis of probable IVD (κ = 0.44), as well as the NINDS-AIREN criteria for probable or possible VaD (κ = 0.42) and the DSM-IV diagnosis of dementia of the vascular type (κ = 0.59). Lowest interrater reliability was found for the ADDTC diagnosis of possible IVD (κ = 0.15). One-by-one removal of each participating center, followed by a subsequent interrater reliability analysis, did not identify any particular center as an outlier.
Unlike criteria for dementia and AD, clinical criteria for a diagnosis of VaD are not interchangeable. In our study, excellent agreement was found among 4 diagnostic criteria for dementia (κ range, 0.87-0.96), despite differences in the number and type of cognitive impairments required. Similarly, the DSM-IV criteria for dementia of the Alzheimer type and the NINCDS-ADRDA criteria for probable AD exhibited high intercriterion agreement (κ = 0.81) and moderate interrater reliability (κ = 0.56 and κ = 0.49). These findings are comparable to those of previous reports in the literature for the diagnosis of dementia20,22 and AD.28- 30
On the other hand, the 4 clinical criteria for VaD studied herein produced significant differences in case classification and interrater reliability. The frequency of a diagnosis of VaD was highest using the modified HIS or DSM-IV criteria and the lowest using the NINDS-AIREN criteria. The classification rate for VaD, using the original HIS and the ADDTC criteria, fell in an intermediate range.
With a few exceptions, the rank order and relative frequencies noted in our study (ie, in descending order, DSM-IV, modified HIS, ADDTC, original HIS, and NINDS-AIREN) recapitulate the findings of previous investigators. Verhey et al19 noted the following classification frequencies in descending order: original and modified HIS, Erkinjuntti et al31 ADDTC, Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III),32 and NINDS-AIREN. Wetterling et al20 reported the following frequencies in descending order: DSM-IV, ICD-10, ADDTC, and NINDS-AIREN. Thus, 2 of these 3 studies report the DSM-IV as the most liberal. The DSM-III32 rather than the DSM-IV criteria16 were examined in the third study19 and yielded more conservative results. The DSM-III criteria are more taxing, requiring abrupt onset, stepwise progression, and etiologic relation between cerebrovascular disease and dementia. The DSM-IV criteria allow focal neurologic signs or presence of significant cerebrovascular disease. Neither ICD-10 criteria14 nor that of Erkinjuntti et al31 reported by other investigators was examined herein. In all 3 studies, the NINDS-AIREN criteria proved to be the most conservative, which is not surprising given the stringency of its inclusion and exclusion criteria.
Among the 4 criteria for VaD, the HIS showed the best interrater reliability (κ = 0.65 for the original HIS; κ = 0.61 for the modified HIS). This advantage may be related to the feature's checkoff format. For the HIS, congruence in patient classification was greatest with the DSM-IV criteria (κ = 0.61).
The DSM-IV criteria also showed the best overall agreement with other criteria for VaD (κ range, 0.32-0.60), yielding the greatest overlap with the combined ADDTC for probable and possible IVD. The DSM-IV criteria showed moderate interrater reliability (84% agreement; κ = 0.59) in our study. This is somewhat higher than the 46% to 60% agreement for the Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition33 diagnosis of MID noted among 4 sites in the Ni-Hon-Sea study.22
Although moderate interrater reliability was found for ADDTC criteria for probable IVD (κ = 0.44), the level of agreement for possible IVD was extremely poor (κ = 0.15). A diagnosis of possible IVD can be made in the presence of dementia and one of the following: a single stroke without a clearly documented temporal relationship to the onset of dementia, or Binswanger disease (ie, early-onset urinary incontinence or gait disturbance, vascular risk factors, and extensive white matter changes on neuroimaging findings). Neither of these conditions was rated reliably among the different centers. We have not determined whether further training of raters might improve reliability of diagnosis of Binswanger disease using ADDTC criteria. Favorable effects of training on the reliability of the diagnosis of dementia and dementia subtypes were achieved in the Ni-Hon-Sea study.22
Although conservative, the NINDS-AIREN criteria showed moderate interrater reliability (κ = 0.42). This is somewhat lower than the agreement reported by Lopez et al,21 where κ scores ranged from 0.46 to 0.72. However, in the latter study, case information was coded from the clinical record onto a standardized form, rather than reported as vignettes; coding might be expected to improve interrater reliability. A post-hoc item analysis was conducted for 12 cases that were classified by a participating center as VaD using DSM-IV but not NINDS-AIREN criteria (Table 4). The 3 most common items leading to discrepant diagnoses were exclusion of other systemic or brain diseases (including AD) that might cause the dementia (failed in 29/34 [85%] of ratings leading to discrepant diagnoses); a causal relationship between cerebrovascular brain injury and dementia (failed in 25/34 [74%]); and the presence of focal neurologic signs (failed in 15/34 [44%]). Among the criteria studied herein, the exclusion of AD and the requirement for focal neurologic signs were unique to the NINDS-AIREN criteria.
Given their lack of comparability, the choice of diagnostic criteria for VaD remains a critical but elusive methodological issue for clinical and epidemiological studies. Estimates of the incidence and prevalence of VaD will vary severalfold, depending on the criteria selected. Full evaluation of the relative merits of various diagnostic methods must include data from clinical-pathological correlation. The limitations of the HIS for the diagnosis of pathologically confirmed MIX have been recognized previously.10,11 More recently, Gold et al18 reported the pathological findings for cases retrospectively diagnosed using ADDTC and NINDS-AIREN criteria for VaD. Unfortunately, neuroimaging studies were not available in 80% of cases, thereby limiting their report to the clinical diagnosis of possible VaD. Also, neither subcortical ischemic vascular dementia nor Binswanger disease was included in their pathological definition of VaD. Prospective longitudinal studies including clinical-pathological correlation are greatly needed. Meanwhile, the diagnostic approach that will enable accurate prediction of significant vascular brain injury, as well as the presence or absence of AD lesions, remains an enigmatic challenge.
Accepted for publication February 17, 1999.
This work was supported by grants 1P01 AG12435, 1P50 AG05142, and 1P30 AG10129 from the National Institute on Aging, Bethesda, Md, and the State of California Department of Health Services, Sacramento.
Reprints: Helena C. Chui, MD, Geriatric Neurobehavior and Alzheimer Center, Rancho Los Amigos National Rehabilitation Center, 800 Annex W, 7601 E Imperial Hwy, Downey, CA 90242 (e-mail: firstname.lastname@example.org).