Corresponding scores and percentile ranks for the Sweet 16 and the Mini-Mental State Examination (MMSE). The log-linear method was used for smoothing raw scores obtained by equipercentile equating.
Receiver operating curve for the Sweet 16 (S16) and the Mini-Mental State Examination (MMSE) scores compared with an Informant Questionnaire on Cognitive Decline in the Elderly score higher than 3.5. Area under the curve (AUC) for S16, 0.84; AUC for MMSE, 0.81. See text for details.
Receiver operating curve for the Sweet 16 (S16) and the Mini-Mental State Examination (MMSE) scores compared with clinical diagnoses. Area under the curve (AUC) for S16, 0.97; AUC for MMSE, 0.95. See text for details.
Fong TG, Jones RN, Rudolph JL, Yang FM, Tommet D, Habtemariam D, Marcantonio ER, Langa KM, Inouye SK. Development and Validation of a Brief Cognitive Assessment ToolThe Sweet 16. Arch Intern Med. 2011;171(5):432-437. doi:10.1001/archinternmed.2010.423
For many older adults, cognitive impairment contributes to loss of independence, decreased quality of life, and increased health care costs.1 Currently, an estimated 3.4 million individuals older than 71 years in the United States suffer from dementia, and an additional 5.4 million have milder forms of cognitive impairment.2 In a recent study, more than 30% of patients older than 75 years admitted to the hospital were found to have some degree of cognitive impairment.3 While the public health impact of cognitive impairment is clear, this condition is often underrecognized.3 A simple, rapid general cognitive assessment instrument is therefore a valuable tool for use in both clinical and research settings.
Routine measurement of cognitive function can facilitate the early detection of dementia, provide a measure of disease severity, or identify individuals who are at risk for conditions such as delirium4 and functional impairment (eg, driving).5 Also, cognitive assessment is often used in research studies for determination of eligibility, risk stratification, and measurement of outcome. The most widely known and used measure of global cognitive impairment is the Mini-Mental State Examination (MMSE).6 The MMSE is limited by a ceiling effect in individuals with high premorbid intelligence and high educational attainment, whereas low educational attainment can result in falsely reduced scores.7 Although some authors argue that there is a need to adjust for education,8 not all studies have shown that MMSE scores are influenced by education.9 The specificity of the MMSE is also limited in individuals with low IQ, frontal and subcortical-type impairment, or delirium.10 A more recent issue is that a number of cognitive assessment instruments, including the MMSE, are copyrighted and now have restrictions or fees associated with their use.11 In clinical practice, cognitive assessment instruments are used inconsistently. For these reasons, the ideal cognitive assessment instrument would be an instrument that can be administered easily, quickly, accurately, and reliably by busy providers and researchers. The instrument should accurately characterize mental status across sociodemographic subgroups. Finally, an instrument that does not rely on props such as pen and paper or diagrams would be more useful across many settings, including hospitals and institutions where patients may be limited by the presence of intravenous lines or immobilized in bed or where props may otherwise not be reliably available.
We therefore developed an instrument to address these issues and, importantly, to maximize sensitivity in order to be a useful screening instrument for cognitive impairment. In this study, the goals were (1) to develop the Sweet 16 instrument; (2) to identify cut points that correlate with well-accepted cut points on the MMSE; (3) to compare the performance of the Sweet 16 with the MMSE; and (4) to validate the Sweet 16 in an independent sample, against a reference standard for cognitive impairment.
The Sweet 16 instrument is scored from 0 to 16 (with 16 representing the best score) and includes 8 orientation items, 3 registration items, 4 digit spans, and 3 recall items. Each item receives 1 point, except the first 2 digit spans, which are provided as a learning (practice) task and are not scored. The items were selected based on widely used cognitive assessment measures that tested those domains most likely to be impaired by any type of cognitive impairment, such as memory (a cardinal feature in both mild cognitive impairment and Alzheimer disease) and executive dysfunction, an important secondary cause of memory loss. The items also do not require pen and paper or other props to complete and can be readily administered by clinical or lay staff in different settings. While there is overlap with items used by the MMSE, it is important to note that all of the cognitive subtests used in the Sweet 16 are widely applied independently of the MMSE, alone and in other batteries. The orientation items assess both temporal and spatial orientation.12 Registration and recall of 3 items is a widely used short-term recall task13 that is sensitive to the effects of aging and dementia. The digit span test14 assesses sustained attention, concentration, and mental control aspects of working memory and was selected because it has minimal educational and cultural bias.15
The Sweet 16 instrument was developed and tested in a previously assembled cohort of 457 eligible participants drawn from a randomized trial of a delirium abatement program16 plus 317 participants screened for the clinical trial who did not meet delirium criteria but were included as part of a nested cohort study,16 as described in detail previously. This sample, the post–acute hospitalization data set, was used as the development cohort for the present study and included participants 65 years or older who were admitted to a skilled nursing facility directly from an acute medical or surgical hospitalization over a 3-year period. On enrollment, the participants underwent a structured cognitive assessment, including the MMSE6 and digit span tests,14 along with baseline demographics, including age, sex, ethnicity, and Charlson score.17 Hospital discharge summaries were reviewed for medical diagnosis (International Classification of Diseases, Ninth Revision [ICD-9 ]), including dementia. After instrument development, the Sweet 16 scores were calculated in the development cohort using data from baseline cognitive testing. The administration time of the Sweet 16 was determined in a separate pilot study consisting of 20 volunteers, including elderly persons with vision, hearing, and cognitive impairment.
To maximize utility, Sweet 16 scores were directly correlated with widely used cut points on the MMSE. Equipercentile equating, a procedure that creates the same percentile distributions across both scales (see the “Statistical Analysis” section for further details) was used18 to create a crosswalk between Sweet 16 and MMSE scores. Using this approach, corresponding score levels were identified on the Sweet 16 and the MMSE. Furthermore, agreement and correlation of the Sweet 16 with both the MMSE and the modified Blessed Dementia Rating Scale (mBDRS) were also examined.
For the validation cohort, data from the Aging, Demographics, and Memory Study (ADAMS), a substudy of the Health and Retirement Study (HRS), were used. The HRS, which is described in detail elsewhere,19 is a nationally representative panel survey of more than 20 000 persons that was designed to examine health and economic well being before and after retirement. The ADAMS substudy was designed to examine the risks and outcomes of dementia and cognitive impairment in a nationally representative sample.20 A stratified random sample of HRS participants were selected to participate in ADAMS to ensure adequate representation across a broad range of cognitive functioning as well as across age and sex groups. Of the 856 persons included in ADAMS, 45 were excluded from this study because of missing items that were needed to score the Sweet 16, and 102 were excluded because of missing Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) scores.21 Therefore, the final cohort used in this analysis included 709 ADAMS participants. Complex sampling and poststratification weights provided by the HRS were used in all relevant analyses to permit population-level inferences and appropriate standard error estimation.
The ADAMS baseline interview included a 3- to 4-hour assessment by a nurse and a neuropsychological technician. The assessment included general demographic information (age, sex, ethnicity, marital status, living situation, and educational level), functional status, and health information (medical diagnoses, medications, history of memory impairment, and neurologic examination results). The neuropsychological examination included assessment of multiple domains, including verbal learning and memory, visual memory, object memory, language, drawing, praxis, attention, premorbid achievement, and general cognitive status.20 Specifically, the assessment included the MMSE6 and the digit span test.14 All of the cognitive items used to create the Sweet 16 were administered and scored in a similar fashion to the methods used in the post–acute hospitalization data set, except for the digit span test (see details in the “Results” section). The baseline assessment included an interview with a key informant to complete the IQCODE21 and the mBDRS.12
An expert panel of clinicians assigned consensus diagnoses broadly grouped as normal cognitive function, cognitive impairment without dementia, and dementia.20,22 Consensus diagnoses were derived from a review of all information collected during the in-home assessment. The results of all of the neuropsychological tests, digit span tests, MMSE, and mBDRS were presented to the expert panel. The results of the IQCODE were not presented. Clinical dementia diagnosis was based on guidelines from the Diagnostic and Statistical Manual of Mental Disorders (Third Edition Revised), the Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition), and the clinical judgment of the panel. Cognitive impairment without dementia was defined as functional impairment reported by the patient or informant that did not meet criteria for dementia or as performance on neuropsychological measures greater than or equal to 1.5 SDs below the published norms on any test within a cognitive domain.20
The agreement and correlation of the Sweet 16 with the MMSE and the mBDRS was examined first in the validation cohort. Subsequently, the IQCODE was used as the reference standard for cognitive impairment. Because this score was not presented to the expert panel, it was considered an independent reference standard for cognitive impairment. Based on previous studies,21 a cut point of 3.5 was used to indicate cognitive impairment.
Equipercentile equating, a statistical process to determine comparable scores based on percentile equivalents, was used for direct comparison of Sweet 16 and MMSE scores. This method has been described in detail elsewhere.18 Equipercentile equating can result in irregular score distributions when actual values are graphed. A log-linear method23 was used to smooth the raw scores that were presented graphically. Comparison of the Sweet 16 and the MMSE across both samples was conducted by examining overall agreement, weighted κ as an indicator of chance-corrected agreement, and Spearman correlation coefficient.
For the validation study, performance characteristics of the Sweet 16 compared with the reference standard (IQCODE) were calculated, including sensitivity, specificity, positive predictive value, and negative predictive value. For comparison purposes, the performance characteristics of the MMSE were also determined. Receiver operating characteristic curve analyses compared with the IQCODE, along with the area under the curve (AUC), was calculated for both the Sweet 16 and the MMSE. To test whether the AUCs for the Sweet 16 and the MMSE were significantly different, the AUCs were compared using bootstrap methods with 1000 replications. In brief, participants were drawn with replacement from the ADAMS sample. The population weights were rescaled so that the population number remained constant. The AUCs were calculated for the Sweet 16 and the MMSE, and their differences were calculated. The means of the differences were compared statistically using a Z-test. Analysis of the AUCs by education group was similarly performed.
Characteristics of the participants in the 2 cohorts are shown in Table 1. At baseline, the participants in both samples represented vulnerable, older populations with a substantial degree of functional and cognitive impairment. Diverse populations were chosen to demonstrate a range of applicability for the Sweet 16 instrument. In the development cohort, which included patients with delirium, the mean (SD) MMSE score was 15.1 (7.4), and 59% of the patients had an MMSE score of less than 18, indicating severe cognitive impairment. Eleven percent of the patients had a diagnosis of dementia based on the ICD-9 code and a Charlson Comorbidity Index of 1.0 (1.0) (mean [SD]). All members of the development cohort were residents in a skilled nursing facility by design. In the validation cohort, the average MMSE score was 25.8 (5.1). One percent of the patients had a diagnosis of dementia based on the ICD-9 code and a Charlson Comorbidity Index of 0.5 (0.8). Eight percent had an MMSE score of less than 18, and 6% resided in a nursing home. All prevalence estimates were weighted according to the complex sampling design in ADAMS.
The Sweet 16 required no pencil, paper, or other props and was easy to administer with a minimum of training (Table 2). In the pilot group, completion time for the instrument ranged from 1.4 to 2.9 minutes, with a mean (SD) of 2.0 (0.4) minutes and median of 1.9 minutes. In the development cohort, the Sweet 16 score was 8.2 (4.4), with a median of 8 (observed range, 0-16 points) and an interquartile range of 5 to 12 points. In this cohort, 2% of the patients scored at the highest value of 16, compared with less than 1% at the highest value of 30 on the MMSE. Figure 1 indicates corresponding scores and percentile ranks for the Sweet 16 and the MMSE. To provide correlation with commonly used cut points on the MMSE, the following values were calculated: an MMSE score of 24 correlates with a Sweet 16 score of 13.1; an MMSE score of 20 correlates with a Sweet 16 score of 10.9; an MMSE score of 18 correlates with a Sweet 16 score of 9.7; and an MMSE score of 10 correlates with a Sweet 16 score of 4.6. (A complete table of corresponding values with percentiles for the Sweet 16 and the MMSE is available on request).
In the development cohort, the Sweet 16 is highly correlated with the MMSE, with a Spearman r of 0.94 (P < .001). The overall agreement between the Sweet 16 and the MMSE at clinically relevant thresholds (<14 for Sweet 16 and <24 for MMSE) was a weighted κ of 0.60 (P < .001).
In the validation cohort, the Sweet 16 correlated with the mBDRS (Pearson r, −0.60; P < .001). The IQCODE was used as the primary independent reference standard, with an IQCODE score higher than 3.5 and an MMSE score less than 24 or a Sweet 16 score less than 14 indicating cognitive impairment. Compared with the IQCODE, the AUC is 0.84 for the Sweet 16 and 0.81 for the MMSE (P = .06). Table 3 and Figure 2 indicate the performance characteristics of the Sweet 16 compared with the IQCODE. The widely used cut point of an MMSE score less than 24 to indicate cognitive impairment maps to a Sweet 16 score of 13.1. However, since this value falls between a Sweet 16 score of 13 and 14, we chose the cut point of the Sweet 16 to be less than 14 for cognitive impairment, as this threshold maximizes sensitivity (80% vs 70%), which was an important goal for our screening instrument. At this cut point, the Sweet 16 had a sensitivity of 80% and a specificity of 70% for identifying cognitive impairment, and the MMSE had a sensitivity of 64% and a specificity of 86%. Note that the sensitivity and specificity for the MMSE in the validation cohort falls within the range of previously reported values.7 Furthermore, the Sweet 16 performs very well at excluding disease, as the likelihood ratio for a negative test result was 0.29 for the Sweet 16 compared with 0.42 for the MMSE. Across a range of education levels, the AUC for higher education (≥12 years) was 0.90 for the Sweet 16 and 0.84 for the MMSE (P = .03). Although not statistically significant by AUC analysis, the sensitivity was higher for the Sweet 16 compared with the MMSE across all quartiles of education (≤8 years, 9-12 years, 13-15 years, and ≥16 years).
The Sweet 16 was also tested against the clinicians' diagnosis of dementia or no dementia (including diagnosis of either cognitive impairment without dementia or normal). Because the clinicians had access to the MMSE and digit span test scores, the clinician diagnoses could not be considered a completely independent reference standard for validation. For Sweet 16 scores of less than 14, comparison against clinicians' diagnoses yielded an AUC of 0.97 for the Sweet 16 and 0.95 for the MMSE (P = .04) (Figure 3). The performance characteristics of the Sweet 16 based on the clinician's diagnoses were as follows: sensitivity, 99%; specificity, 72%; positive predictive value, 33%; negative predictive value, 100%; and likelihood ratio, negative test result, 0.01.
Of note, there was a difference in the administration of the digit span test in the development cohort (1 trial) and the validation cohort (2 trials). The effect of scoring the digit span as correct only if both spans were correct was evaluated. In a sensitivity analysis, requiring both digits spans to be correct made the test slightly more difficult, but the AUC was identical for both scoring methods, and the Sweet 16 had a higher sensitivity (more true positives) and a lower specificity (more false positives). Because of the identical AUCs and the additional subject burden and time for completing 2 spans for each test, the original scoring method of accepting either digit span trial as correct was kept in the validation cohort.
In summary, we have successfully developed and validated a cognitive assessment instrument that is efficient and simple; requires no pen, paper, or other props to administer; and will be available open access. To enhance its utility, scores on the Sweet 16 that correlate directly with widely used cut points on the MMSE have been identified. The performance characteristics of the Sweet 16, in particular sensitivity, are equivalent or superior to those of the MMSE. Most notably, the Sweet 16 was also validated in a separate, large, population-based sample against the IQCODE as an independent reference standard for cognitive impairment. Therefore, the Sweet 16 could be used in place of other screening measures, such as the MMSE, to rapidly identify cognitive impairments in general clinical practice as well as in research settings. In particular, the Sweet 16 may be preferred over the MMSE in frail older or medically ill hospitalized or institutionalized patients in whom the ability to write and manipulate props may be limited for reasons other than cognitive impairment (eg, intravenous tubing, positioning in bed) and in situations such as a busy clinical practice or a large study cohort in which the ability to quickly complete a cognitive assessment is essential. In contrast to the MMSE, which takes 10 to 15 minutes to administer (http://www4.parinc.com/Products/Product.aspx?ProductID=MMSE), the Sweet 16 can be completed in just 2 to 3 minutes. The Sweet 16 has important advantages over other brief instruments, including, eg, superior validation over the brief MMSE24 or the Abbreviated Mental Test.25 There is no need for props, such as pen, paper, or special forms, as are needed for the Mini-Cog, Montreal Cognitive Assessment, and the MMSE.6 Finally, the Sweet 16 is open access, whereas the MMSE and the MMSE-2 are restricted by copyright. One important caveat is that the Sweet 16 should be used only as a brief screening instrument and is not intended to replace more sensitive and comprehensive tests of cognitive function. Therefore, any identification of cognitive impairment by the Sweet 16 should prompt more comprehensive assessment as well as evaluation for reversible causes (eg, medications, metabolic derangements, depression).
We found that the Sweet 16 had equivalent or superior performance compared with the MMSE across all levels of education. There are a number of reasons that may account for this. First, the tasks in the MMSE oversample from domains such as language and attention that are not as relevant to informant and clinician ratings of dementia. Second, performance on some of the tasks on the MMSE, such as the 3-step command or copy of the cube and intersecting pentagons, may be affected by vision impairment, physical impairment, or coordination problems, all of which are common in an older population owing to aging and age-related conditions such as stroke or Parkinson disease but which may not reflect cognitive impairment. Third, the Sweet 16 does not include those items from the MMSE that are most biased against persons with low education,9 including writing a sentence, naming the season, copying intersecting pentagons, spelling “world” in reverse, repeating “no ifs ands or buts,” or performing serial 7s.
Several caveats deserve further comment. First, there were slight differences in the scoring of digit spans between the data sets used in the development and validation cohorts. However, additional sensitivity analysis resulted in no difference in the AUC with alternate scoring of the digit span test in the validation cohort. Second, the cognitive domains screened include only focused and sustained attention, concentration, mental control, orientation, and immediate recall, while other domains, such as long-term memory and judgment, were not fully tested. Third, as with the MMSE, there are ceiling effects, which limit the usefulness of this instrument for individuals with only slight cognitive impairments or for those who are highly educated. The instrument is also limited in usefulness for tracking subtle cognitive changes over time in high-functioning individuals. Finally, the data presented herein were calculated Sweet 16 scores, which may differ from scores obtained from direct administration of the instrument; also, given the fixed ordering of the items in our preexisting database, it is not possible to evaluate the effects of changes in the ordering of items on test performance.
Further studies, including prospective studies to establish the predictive validity of the Sweet 16, to assess test-retest reliability, and to compare performance with other brief cognitive measures, are greatly needed. Ultimately, it is hoped that this test will help improve assessment of cognitive function across many settings.
Correspondence: Tamara G. Fong, MD, PhD, Aging Brain Center, Institute for Aging Research, Hebrew SeniorLife, 1200 Centre St, Boston, MA 02131 (firstname.lastname@example.org).
Accepted for Publication: September 13, 2010.
Published Online: November 8, 2010. doi:10.1001/archinternmed.2010.423
Author Contributions:Study concept and design: Fong, Jones, Rudolph, Yang, Marcantonio, and Inouye. Acquisition of data: Tommet, Marcantonio, Langa, and Inouye. Analysis and interpretation of data: Fong, Jones, Rudolph, Yang, Tommet, Habtemariam, Marcantonio, Langa, and Inouye. Drafting of the manuscript: Fong, Jones, Yang, Tommet, and Inouye. Critical revision of the manuscript for important intellectual content: Fong, Jones, Rudolph, Yang, Tommet, Habtemariam, Marcantonio, Langa, and Inouye. Statistical analysis: Jones, Yang, Tommet, and Inouye. Obtained funding: Inouye. Administrative, technical, and material support: Yang, Habtemariam, and Inouye. Study supervision: Fong, Marcantonio, and Inouye.
Financial Disclosure: None reported.
Additional Contributions: Nina O’Brien provided invaluable assistance in coordination and manuscript preparation.
Additonal Information: This work is dedicated to Benjamin and Jordan Helfand and to the memory of Joshua Helfand.