3MS indicates modified Mini-Mental State Examination (MMSE); ACE-R, Addenbrooke’s Cognitive Examination–Revised; AMT, Abbreviated Mental Test; CDT, Clock Drawing Test; GPCOG, General Practitioner Assessment of Cognition; IQCODE, Informant Questionnaire on Cognitive Decline in Elderly; MIS, Memory Impairment Screen; and MoCA, Montreal Cognitive Assessment.
Data are provided for the Mini-Cog test,10 Addenbrooke’s Cognitive Examination–Revised (ACE-R),9 and Montreal Cognitive Assessment (MoCA).31 MCI indicates mild cognitive impairment.
eTable 1. Number of Cohorts Included in Each Study
eTable 2. Original Data of Individual Studies in Terms of the True-Positive, False-Positive, False-Negative, and True-Negative Values
eTable 3. Subgroup Analyses for Diagnostic Accuracy of MMSE on Dementia
eFigure 1. HSROC Curve Demonstrated the Summary Point for Sensitivity and Specificity of MMSE for the Detection of Dementia
eFigure 2. Confidence Regions and Summary Points of the Sensitivity vs Specificity of the ACE-R, Mini-Cog Test, and MMSE for the Detection of Dementia
eFigure 3. Confidence Region and Summary Point of the Sensitivity vs Specificity of MMSE and MoCA for the Detection of MCI
Tsoi KKF, Chan JYC, Hirai HW, Wong SYS, Kwok TCY. Cognitive Tests to Detect DementiaA Systematic Review and Meta-analysis. JAMA Intern Med. 2015;175(9):1450-1458. doi:10.1001/jamainternmed.2015.2152
Dementia is a global public health problem. The Mini-Mental State Examination (MMSE) is a proprietary instrument for detecting dementia, but many other tests are also available.
To evaluate the diagnostic performance of all cognitive tests for the detection of dementia.
Literature searches were performed on the list of dementia screening tests in MEDLINE, EMBASE, and PsychoINFO from the earliest available dates stated in the individual databases until September 1, 2014. Because Google Scholar searches literature with a combined ranking algorithm on citation counts and keywords in each article, our literature search was extended to Google Scholar with individual test names and dementia screening as a supplementary search.
Studies were eligible if participants were interviewed face to face with respective screening tests, and findings were compared with criterion standard diagnostic criteria for dementia. Bivariate random-effects models were used, and the area under the summary receiver-operating characteristic curve was used to present the overall performance.
Main Outcomes and Measures
Sensitivity, specificity, and positive and negative likelihood ratios were the main outcomes.
Eleven screening tests were identified among 149 studies with more than 49 000 participants. Most studies used the MMSE (n = 102) and included 10 263 patients with dementia. The combined sensitivity and specificity for detection of dementia were 0.81 (95% CI, 0.78-0.84) and 0.89 (95% CI, 0.87-0.91), respectively. Among the other 10 tests, the Mini-Cog test and Addenbrooke’s Cognitive Examination–Revised (ACE-R) had the best diagnostic performances, which were comparable to that of the MMSE (Mini-Cog, 0.91 sensitivity and 0.86 specificity; ACE-R, 0.92 sensitivity and 0.89 specificity). Subgroup analysis revealed that only the Montreal Cognitive Assessment had comparable performance to the MMSE on detection of mild cognitive impairment with 0.89 sensitivity and 0.75 specificity.
Conclusions and Relevance
Besides the MMSE, there are many other tests with comparable diagnostic performance for detecting dementia. The Mini-Cog test and the ACE-R are the best alternative screening tests for dementia, and the Montreal Cognitive Assessment is the best alternative for mild cognitive impairment.
Early diagnosis of dementia can identify people at risk for complications.1 Previous studies2,3 have found that health care professionals commonly miss the diagnosis of cognitive impairment or dementia; the prevalence of missed diagnosis ranges from 25% to 90%. Primary care physicians may not recognize cognitive impairment3 until the moderate to severe stage.4- 6 Screening tests are quick and useful tools to assess the cognitive condition of patients.1
The Mini-Mental State Examination (MMSE)7 is the most widely applied test for dementia screening. Since the intellectual property rights of the MMSE were transferred to Psychological Assessment Resources in 2001, it has become less accessible and useful.7,8 However, there are more than 40 other tests available for dementia screening in health care settings, many of which are freely available, such as Addenbrooke’s Cognitive Examination–Revised (ACE-R),9 the Mini-Cog test,10 the General Practitioner Assessment of Cognition (GPCOG),11 and the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE).12,13 The diagnostic performances of these tests have not been systematically evaluated and synthesized for relative comparison, which is particularly salient because the MMSE, as a proprietary instrument, incurs a cost, whereas others do not. Thus, it is worth identifying the best alternative among the long list of screening tests. Therefore, this systematic review aimed to quantitatively analyze the diagnostic accuracy of various dementia screening tests and compare their performance to that of the MMSE.
This systematic review followed standard guidelines for conducting and reporting systematic reviews of diagnostic studies, including Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA)14 and guidelines from the Cochrane Diagnostic Test Accuracy Working Group.15,16
A list of screening tests was identified in previous systematic reviews.3,11,17,18 Literature searches were performed on the list of dementia screening tests in MEDLINE, EMBASE, and PsychoINFO from the earliest available dates stated in the individual databases until September 1, 2014. Each screening test was separately searched with general keywords of dementia, including Alzheimer, Parkinson, vascular, stroke, cognitiveimpairment, and dementia. Diagnostic studies comparing accuracy of screening tests for detection of dementia were manually identified from the title or abstract preview of all search records. The selection was limited to peer-reviewed articles published in English abstracts. Because Google Scholar searches literature with a combined ranking algorithm on citation counts and keywords in each article, our literature search was extended to Google Scholar with individual test names and dementia screening as a supplementary search. The first 10 pages of all search records were scanned. Manual searches were extended to the bibliographies of review articles and included research studies. Screening tests were classified into different categories according to the administration time: 5 minutes or less, 10 minutes or less, and 20 minutes or less.
Cross-sectional studies were included if they met the following inclusion criteria: (1) involved participants studied for the detection of dementia associated with Alzheimer disease, vascular dementia, or Parkinson disease in any clinical or community setting; (2) screened patients or caregivers with a face-to-face interview; (3) used standard diagnostic criteria as the criterion standard for defining dementia, including the international diagnostic guidelines (eg, Diagnostic and Statistical Manual of Mental Disorders, International Classification of Diseases, National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer Disease and Related Disorders Association, National Institute of Neurological and Communicative Disorders and Stroke and the Association Internationale pour la Recherche et L’Enseignement en Neuroscience criteria, or clinical judgment after a full assessment series); (4) reported the number of participants with dementia and evaluated the accuracy of the screening tests, including sensitivity, specificity, or data that could be used to derive those values. Studies were excluded if they were not written in English or only included a screening test that (1) requires administration time longer than 20 minutes, (2) was identified in fewer than 4 studies in the literature search, or (3) was administered to participants with visual impairment.
Two investigators (J.Y.C.C., H.W.H.) independently assessed the relevancy of search results and abstracted the data into a data extraction form. This form was used to record the demographic details of individual articles, including year of publication, study location, number of participants included, mean age of participants, percentage of male participants, type of dementia, recruitment site, number of participants with dementia or mild cognitive impairment (MCI), diagnostic criteria, cutoff values, sensitivity, specificity, and true-positive, false-positive, true-negative, and false-negative likelihood ratios. When a study reported results of sensitivity and specificity across different cutoff values of a screening test, only the results from a recommended cutoff by the authors of the article were selected. If the study did not have this recommendation, the cutoff used to summarize sensitivity and specificity in the abstract was chosen. When discrepancies were found regarding inclusion of studies or data extraction, the third investigator (K.K.F.T.) would make the definitive decision for study eligibility and data extraction.
Potential risks of bias in each screening test were evaluated by the Quality Assessment of Diagnostic Accuracy Studies 2 instrument,19 which evaluated patient selection, execution of the index test and the reference standard, and flow of patients. All high risk of bias was counted in an Excel worksheet (Microsoft Inc) and presented as a percentage in each screening test. The quality of study was also assessed according to the methods section of the Standards for Reporting of Diagnostic Accuracy statement.20 An 8-point scale was designed for the evaluation of study quality, including description of the following: (1) study population, (2) participant recruitment, (3) sampling of participant selection, (4) data collection plan, (5) reference standard and its rationale, (6) technical specifications, (7) rationale for units and cutoffs, and (8) methods for calculating diagnostic accuracy with CIs. This quality score was presented as median and range across the screening tests.
Statistical analyses were performed with the Metandi and Midas procedures in STATA statistical software, version 11 (StataCorp). The overall sensitivity and specificity of each diagnostic test were pooled using a bivariate random-effects model.21 Forest plots were used to graphically present the combined sensitivity and specificity. The accuracy of a screening test had to allow trade-off between sensitivity and specificity that occurs when different threshold values were used to define positive and negative likelihood ratios of the tests. Therefore, a diagnostic odds ratio was used as a single indicator of test performance.22 In addition, a hierarchical summary receiver-operating characteristic (HSROC) curve was generated to present the summary estimates of sensitivities and specificities along with 95% CIs and prediction region.23 The area under the curve (AUC) for the HSROC was also calculated, and an area between 0.9 and 1.0 indicated that the diagnostic accuracy was good.24 When the Hessian matrix of bivariate approach was unstable or asymmetric, a random-effects model following the approach of DerSimonian and Laird was applied to estimate the pooled sensitivity and specificity, and a summary receiver-operating characteristic (SROC) curve was generated to present the summary estimates of sensitivities and specificities with an AUC for SROC presented as a summary statistic.25,26 Statistical heterogeneity among the trials was assessed by I2, which describes the percentage of total variation across studies due to the heterogeneity rather than the chance alone. P < .10 was considered as statistically significant heterogeneity. Because we used random-effects models to combine the results, the heterogeneity among the studies was taken into account.
Subgroup analysis was conducted across the studies by geographic regions, recruitment settings, and patients with MCI. Geographic regions were classified as Americas, Asia, and Europe. Recruitment could be performed in the community, memory clinics, cognitive function clinics, or hospitals. The definitions of participants with MCI were according to the cutoff values suggested in the individual studies.
A total of 26 165 abstracts were identified from the databases, and 215 potential studies were further extracted from the bibliographies. All titles or abstracts were screened, and 346 articles were relevant to screening tools for dementia. One hundred ninety-seven were excluded for the following reasons: studies were systematic reviews (n = 30), studies did not fulfill inclusion criteria (n = 121), studies lacked data details for meta-analysis (n = 39), and studies reported results of screening tests without comparing to a criterion standard (n = 7) (Figure 1). The definitive analysis in this systematic review included 149 studies published from 1989 until September 1, 2014, for patients with dementia from the United States, the United Kingdom, Canada, and 30 other countries.
A total of 11 screening tests9,10,12,13,27- 33 were identified in the 149 eligible studies, including 102 studies (eTable 1 in the Supplement) for the MMSE,7 12 studies for the ACE-R,9 13 studies for the Abbreviated Mental Test,27 9 studies for Sunderland’s version of the Clock Drawing Test,28 9 studies for Shulman’s version of the Clock Drawing Test,29 5 studies for the GPCOG,11 15 studies for the long-form IQCODE,12 7 studies for the short-form IQCODE,13 9 studies for the Mini-Cog test,10 6 studies for the Memory Impairment Screen,30 20 studies for the Montreal Cognitive Assessment (MoCA),31 6 studies for the modified MMSE,32 and 7 studies for the verbal fluency tests.33 Some other screening tests were excluded because of the limited number of studies reported, for example, the Free and Cued Selective Reminding Test, the Mental Scale Questionnaire, the Cognitive Assessment Screening Instrument, the Self-administered Gerocognitive Examination, and the Short Blessed Test. The components of each screening test, the administration time required, and the range of total score are presented in Table 1. High scores represented good cognitive function in most screening tests, except the IQCODE.
A total of 149 studies with more than 40 000 patients across the 11 screening tests were included (Table 2). One hundred ten eligible studies (73.8%) reported the diagnostic performances of at least 2 screening tests, including those compared with the MMSE. Approximately 12 000 participants were confirmed as having dementia (Table 2). Most studies (68.5%) used the MMSE as the screening test for dementia in 29 regions. The next most common screening test studied was the MoCA, which was used in 20 studies (13.4%) from 9 countries. Patients were mainly recruited from community or clinic settings (80.3%). One hundred ten (73.8%) of 149 studies had good study quality with quality scores of 7 to 8. The quality scores were comparable across the 11 screening tests with median scores of approximately 7 (range, 3-8). The original data of each study on the true-positive, false-positive, false-negative, and true-negative likelihood ratios were presented (eTable 2 in the Supplement). Furthermore, risks of bias were not identified among these studies, and only the studies for the GPCOG, MoCA and modified MMSE revealed approximately 20% to 30% high risks of bias on execution for the index test and the reference standard (Table 2).
There were 10 263 cases of dementia identified from 36 080 participants in 108 cohorts studying the MMSE. The most common cutoff values to define participants with dementia were 23 and 24, used in 48 cohorts (44.4%). With different cutoff threshold values, we found considerable variation in the sensitivity and specificity estimates reported by individual studies. The sensitivities ranged from 0.25 to 1.00, and the specificities ranged from 0.54 to 1.00. The heterogeneity among studies was large, with I2 statistics for sensitivity and specificity of 92% and 94%, respectively. The diagnostic accuracy is summarized by meta-analysis (Table 3). The combined data in the bivariate random-effects model gave a summary point with 0.81 sensitivity (95% CI, 0.78-0.84) and 0.89 specificity (95% CI, 0.87-0.91). The HSROC curve was plotted with a diagnostic odds ratio of 35.4, and the AUC was 92% (95% CI, 90%-94%) (eFigure 1 in the Supplement).
The performances of the other 10 screening tests were summarized by random-effects models (Table 3). All tests presented with AUCs of at least 85%, and most of the tests had comparable performance to that of the MMSE. The Mini-Cog test and the ACE-R were the best alternative tests. Among the studies with the Mini-Cog test,10,34- 41 the pooled sensitivity was 0.91 (95% CI, 0.80-0.96), and the pooled specificity was 0.86 (95% CI, 0.74-0.93) (Figure 2A). The heterogeneity among studies was large, with I2 statistics for sensitivity and specificity of 89% and 97%, respectively. Among studies that used the ACE-R,9,42- 52 the pooled sensitivity was 0.92 (95% CI, 0.90-0.94) and the pooled specificity was 0.89 (95% CI, 0.84-0.93) (Figure 2B). The confidence regions of the HSROC curves for sensitivity and specificity of the Mini-Cog test and the ACE-R were plotted with reference to the HSROC curve of the MMSE (eFigure 2 in the Supplement).
Only the MMSE had a sufficient number of studies to perform subgroup analysis. For the geographic regions, studies were conducted in Europe (44.4%), Americas (31.5%), and Asia (23.1%). The diagnostic performances of the MMSE were comparable across these regions with similar AUCs (eFigure 2 in the Supplement). For the recruitment settings, participants were recruited in hospital (9.3%), clinic (32.4%), primary care (12.0%), community (38.9%), and other settings (7.4%). The diagnostic performances were comparable across different recruitments settings (P > .05 for all) (eTable 3 in the Supplement).
Only 21 of 108 cohorts reported diagnostic performance of the MMSE for the detection of MCI. The combined data gave a summary point of 0.62 sensitivity (95% CI, 0.52-0.71) and 0.87 specificity (95% CI, 0.80-0.92). Nine of 20 studies reported diagnostic performance of the MoCA for the detection of MCI.31,53- 60 The combined data gave a summary point of 0.89 sensitivity (95% CI, 0.84-0.92) and 0.75 specificity (95% CI, 0.62-0.85) (Figure 2C). The confidence regions of the HSROC curve for sensitivity and specificity of the MoCA were plotted with reference to the HSROC curve of the MMSE (eFigure 3 in the Supplement).
This systematic review and meta-analysis included 149 studies that assessed the accuracy of the MMSE and 10 other screening tests for the detection of dementia. Compared with other screening tests, the Mini-Cog test and the ACE-R had better diagnostic performance for dementia, and the MoCA had better diagnostic performance for MCI. The Mini-Cog test is relatively simple and short compared with the MMSE.
In a previous meta-analysis, Mitchell61 combined 34 diagnostic studies to evaluate the accuracy of the MMSE, but he only combined the sensitivity and specificity without mentioning the methodologic details. Mitchell and Malladi62,63 also published 2 meta-analyses that included 45 studies to compare diagnostic performance of single-domain and multidomain tests. They found that 15 brief single-domain tests were less accurate than that of the MMSE in detecting dementia in community and primary care settings. These studies used an uncommon approach of Bayesian curve modeling,64 instead of using the ROC curve to evaluate the diagnostic performance of the tests. A systematic review3 reported a combined diagnostic accuracy of the MMSE and summarized the sensitivity and specificity ranges of 10 other screening tests, but the literature search was limited to studies from systematic reviews conducted in primary care settings. In some other studies, dementia screening was performed in secondary or tertiary care settings. Therefore, the review combined only 14 studies with 10 185 participants using the MMSE as the screening test. The lack of a precise estimate of sensitivity has resulted in confusion among health care professionals to apply the MMSE for dementia screening. In our meta-analysis, we tried to make our findings more comprehensive, using publications from all possible sources, and included 102 studies with 36 080 participants to evaluate the diagnostic performance of the MMSE. The results reported a sensitivity of 0.81 and a specificity of 0.89 for the MMSE. The diagnostic performance of MMSE is good because the AUC was 92%.
Diagnostic sensitivity improves with lower cutoff values but with a corresponding decrease in specificity. High sensitivity corresponds to high negative predictive value and is the ideal to rule out dementia. We found considerable variation on the definitions of cutoff thresholds among the individual studies. According to our selection criteria, the most common cutoff scores for the MMSE for dementia were 23 and 24 (44.4% study cohorts), and approximately 20% of eligible cohorts used cutoff scores of 25 to 26 (range, 17-28). The range of scores for the Mini-Cog test is similarly 0 to 5, and 7 cohorts (77.8%) used a score of less than 3 as the cutoff for dementia, indicating disagreement on the optimal cutoff score across different screening tests. The users of screening tests should strike a balance between sensitivity and specificity to rule in or out the participants with dementia according to the available resources.
This study has several limitations. First, the screening tests were not directly compared in the same populations. Each study used different populations, and the inclusion criteria and prevalence of dementia varied. It would be preferable to directly compare screening tests using the same group of participants with similar educational levels. Second, only a few studies were included that showed head-to-head comparison between the screening tests, so the test performance could not be directly compared. Third, the screening tests were translated into different languages, which may have unknown effects on the results. We assumed that the tests were all validated in various languages in the individual studies although this was not guaranteed, and unidentified cultural effects on the use of screening tests may still exist. Fourth, we only included studies that reported the diagnostic performance of screening tests for dementia. Although we used MCI as a secondary outcome, the definitions of MCI were heterogeneous across studies. Studies that only reported the results of MCI or cognitive impairment but not dementia (cognitive impairment no dementia) were not included in this meta-analysis. Finally, some unpublished studies may not have been identified through the literature search in OVID databases, and publication bias may exist.
This review systematic and meta-analysis found that the MMSE is the most frequently studied test for dementia screening. However, many other screening tests have comparable diagnostic performance. The Mini-Cog test and the ACE-R had better performance than the other dementia screening tests. The MoCA had better performance than the other MCI screening tests. Although the MMSE is a proprietary instrument for dementia screening, the other screening tests are comparably effective but easier to perform and freely available.
Accepted for Publication: April 8, 2015.
Corresponding Author: Timothy C. Y. Kwok, MD, PhD, Department of Medicine and Therapeutics, The Chinese University of Hong Kong, 9/F, Clinical Sciences Building, Prince of Wales Hospital, Ngan Shing Street, Shatin, Hong Kong (email@example.com).
Published Online: June 8, 2015. doi:10.1001/jamainternmed.2015.2152.
Author Contributions: Dr Tsoi had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Tsoi, Kwok.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Tsoi.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Tsoi, Hirai.
Administrative, technical, or material support: Chan, Hirai.
Study supervision: Tsoi, Wong, Kwok.
Conflict of Interest Disclosures: Dr Kwok reported lecturing in workshops sponsored by Lundbeck and Novartis and receiving donations from Novartis for a website for family caregivers of dementia. No other disclosures were reported.