Robert S. Wilson, Sue E. Leurgans, Tatiana M. Foroud, Robert A. Sweet, Neill Graff-Radford, Richard Mayeux, David A. Bennett, . Telephone Assessment of Cognitive Function in the Late-Onset Alzheimer's Disease Family Study. Arch Neurol. 2010;67(7):855–861. doi:10.1001/archneurol.2010.129
Administration of cognitive test batteries by telephone has been shown to be a valid and cost-effective means of assessing cognition, but it remains relatively uncommon in epidemiological research.
To develop composite cognitive measures and assess how much of the variability in their scores is associated with mode of test administration (ie, in person or by telephone).
Cross-sectional cohort study.
Late-Onset Alzheimer's Disease Family Study conducted at 18 centers across the United States.
A total of 1584 persons, 368 with dementia, from 646 families.
Main Outcome Measures
Scores on composite measures of memory and cognitive function derived from a battery of 7 performance tests administered in person (69%) or by telephone (31%) by examiners who underwent a structured performance-based training program with annual recertification.
Based in part on the results of a factor analysis of the 7 tests, we developed summary measures of working memory, declarative memory, episodic memory, semantic memory, and global cognition. In linear regression analyses, mode of test administration accounted for less than 2% of the variance in the measures. In mixed-effects models, variability in cognitive scores due to center was small relative to variability due to differences between individuals and families.
In epidemiologic research on aging and Alzheimer disease, assessment of cognition by telephone has little effect on performance and provides operational flexibility and a means of reducing both costs and missing data.
Alzheimer disease (AD) is a progressive illness that devastates the lives of millions of older people. Although some genetic and experiential risk factors have been identified, the pathophysiology of AD is not securely understood. The National Institute on Aging Genetics Initiative for Late-Onset Alzheimer's Disease was designed to provide resources to help identify additional genes contributing to late-onset AD. One part of that initiative is the Late-Onset Alzheimer's Disease Family Study, which has been recruiting and clinically characterizing persons across the United States from families with multiple affected members and unrelated control subjects without dementia. Collecting uniform cognitive data is a substantial challenge in a study of this nature given multiple examiners from multiple centers. Further, the dispersion of family members across the United States presents logistic challenges that are most economically addressed by testing many affected and unaffected persons by telephone.
In this article, we evaluate the extent to which differences between modes of test administration and between centers affect cognitive performance. After undergoing a structured performance-based program of training and certification, research assistants administered a battery of 7 cognitive tests to more than 1500 older persons, nearly one-third of whom were tested by telephone. We first developed summary measures of different forms of memory and global cognition. We then performed a series of linear and mixed-effects regression models to determine how much of the variability in performance between persons was attributable to test administration mode or to center.
Subjects were recruited through 18 participating AD centers. As previously described,1 many index cases were recruited through one of the federally funded AD research centers. Media and other recruitment efforts directed other interested families to a toll-free number at the National Cell Repository for Alzheimer's Disease (http://ncrad.iu.edu), which assigned them to the nearest participating center. Each center also recruited unrelated control subjects, up to half of whom could be spouses of participating family members. An eligible family was required to have at least 3 biologically related members willing to provide clinical data and a biological sample for DNA extraction. Each family included a proband diagnosed with AD after age 60 years and a full sibling of the proband diagnosed with AD after age 60 years. A third family member could be a full or half sibling, parent, offspring, aunt, uncle, niece, nephew, or first cousin of the proband and had to have AD (diagnosed after age 50 years) or mild cognitive impairment or be unaffected after age 60 years as determined by cognitive testing and clinical evaluation. The determination of the eligibility of each family group was made by the coordinating center at Columbia University. Once criteria were met, other family members were eligible to participate. Informed consent was obtained from the participant or from a proxy if the participant lacked the capacity to consent. The study was approved by the institutional review board of each participating center.
At the time of these analyses, 1584 people had agreed to participate and completed the initial evaluation, including the cognitive testing. They had a mean (SD) age of 71.1 (11.2) years (range, 28-99 years) and had completed a mean (SD) of 14.2 (3.0) years of schooling (range, 0-29 years); 61% were women. They represent 646 families, with 360 contributing a single family member, 197 with 2 to 4 participating members, 51 with 5 to 7 participating members, and 38 with 8 or more participating members.
Data on demographic variables, diagnosis of dementia and AD, and medical history were obtained from each participant or an informant. Clinical classification of dementia and AD followed the guidelines of the joint working group of the National Institute of Neurological and Communicative Disorders and Stroke/Alzheimer's Disease and Related Disorders Association. These require a history of cognitive decline and evidence of impairment in at least 2 cognitive domains, one of which must be memory to meet AD criteria.2 In a subset of persons who could not be directly examined, clinical classification was based on a detailed review of medical records.
Genotyping of apolipoprotein E (APOE) polymorphisms (based on single-nucleotide polymorphisms rs7412 and rs429358) was performed at PreventionGenetics (http://www.preventiongenetics.com). Genotyping was carried out in array tape using allele-specific polymerase chain reaction with universal molecular beacons. Sequencing of positive control DNA samples was completed to assure correct assignment of alleles.
Cognition was measured with a battery of 7 brief tests.3 Working memory was assessed with digit span forward,4 digit span backward,4 and digit ordering.5 Two measures of episodic memory were included: immediate and delayed recall of story A from the Wechsler Memory Scale–Revised.4 Semantic memory was assessed by asking persons to name members of 2 semantic categories (animals, vegetables) in separate 1-minute trials.3,5,6 In previous research, these tests have been shown to have adequate reliability,4,7,8 change in performance on them has been associated with a genetic risk factor for AD,3 and level of performance proximate to death has been associated with level of AD pathological findings on postmortem examination.3 Administration of the test battery requires 10 to 15 minutes and can be done in person or by telephone.
After data collection began, we changed the category for the second fluency task from fruits and vegetables to vegetables to match the procedure used in the Uniform Data Set by the National Institute on Aging Alzheimer's Disease Centers.9 In preliminary analyses, raw scores for vegetables were slightly lower than scores for fruits and vegetables, but each had comparable associations with animal fluency score, suggesting that they were measuring the same underlying ability. In computing cognitive scores, therefore, we treated scores for vegetables and for fruits and vegetables as equivalent after converting them to a common scale.
The test battery was administered by multiple research assistants at the 18 participating centers. To maximize uniformity of test administration and scoring, each research assistant underwent a structured 4-step program of training and certification coordinated by Rush University Medical Center personnel. The first step was to carefully read the cognitive assessment manual. Next, research assistants had to complete a minimum of 2 practice administrations of the battery at their site under the supervision of an individual with testing experience if not previous certification. Third, research assistants were required to score a set of 8 samples of story A with at least 95% accuracy. The Rush University Medical Center coordinators provided story samples and checked scoring accuracy. The procedure was repeated with new story samples until 95% accuracy was achieved. The final step involved giving the battery twice in succession by telephone to a coordinator at Rush University Medical Center without major errors of administration, data entry, or scoring, repeating the process (and providing feedback when needed) until the criterion was reached. Prescripted test responses were used to ensure exposure to a range of testing situations. We recertified test administrators at 12-month intervals, again requiring 95% accuracy in story scoring and 2 successive error-free administrations of the test battery by telephone to the Rush University Medical Center coordinator.
To minimize random variability, we developed composite measures of cognition in a 3-step process.5,10,11 We began by hypothesizing 2 ways in which the tests could be grouped into functional domains. Next, we empirically grouped the tests by performing a factor analysis with varimax rotation and clustering tests with rotated loadings of 0.50 or higher on the same factor. Finally, we used the Rand statistic12 to test the concordance of the hypothesized groupings with the empirically based groupings obtained in this cohort and in an independent group of subjects from the Rush Memory and Aging Project.13 We formed composite measures of the hypothesized domains by converting raw scores on each component test to z scores, using the mean and SD of all participants, and then averaging the z scores to yield the composite measure. We also formed a composite measure of global cognition based on all 7 tests. To assess the effects of APOE genotype on performance, we formed a reference group of no ε4 alleles (ie, ε2/ε2, ε2/ε3, ε3/ε3) and contrasted it with groups of 1 ε4 allele (ie, ε2/ε4, ε3/ε4) and 2 ε4 alleles (ie, ε4/ε4), with separate analyses for those with and without dementia.
To assess the effect of mode of test administration on performance, we conducted a series of linear regression analyses of each composite measure, with separate models in each diagnostic subgroup. A first analysis included terms for age, sex, and education. The analysis was then repeated with an indicator for whether testing was done by telephone. We controlled for age, sex, and education in these and subsequent analyses because of their associations with cognitive performance.
To examine other sources of variability in performance, we constructed a series of mixed-effects regression models.14 Each model had fixed effects for age, sex, education, and mode of test administration and random effects for center, family membership, and subjects within center.
Models were graphically and analytically validated. Programming was done in SAS version 8 statistical software (SAS Institute, Inc, Cary, North Carolina).
The test battery was administered to 1584 individuals, 368 with dementia and 1216 without it. The dementia subgroup as compared with the no-dementia subgroup was older (mean age, 79.2 vs 68.6 years, respectively; t902 = 21.3; P < .001) and less educated (mean education, 13.2 vs 14.5 years, respectively; t1,504 = 7.0; P < .001), and the distribution of sex was similar (61% vs 62% female, respectively; χ21 = 0.1; P = .71). Table 1 provides psychometric information on the test scores within each diagnostic subgroup. In those without dementia, the distribution of scores on each test was approximately normal. The level of test performance in the dementia group was lower than in the group without dementia, as expected. The distributions of scores in the dementia group were also approximately normal except for positively skewed memory performances, which reflect the ubiquity of memory impairment in dementia.
We hypothesized 2 ways in which the individual tests could be grouped into functional domains (groupings 1 and 2 in Table 2). In one grouping, we specified 2 domains: working memory and declarative memory. In the second grouping, we specified 3 domains with declarative memory subdivided into episodic memory and semantic memory.
We next empirically grouped the tests in a factor analysis with varimax rotation. Because dementia severity can influence correlations among cognitive tests, we restricted the factor analysis to those without dementia. As shown by the first set of factor loadings in Table 2, this analysis identified 2 factors. We used the Rand statistic, rescaled to range from −1 (complete disagreement) to 1 (complete agreement), to assess the concordance of the empirical grouping with the hypothesized groupings. The factor analytic results showed good agreement with the hypothesized 2-domain (Rand statistic = 1.00; P = .03) and 3-domain (Rand statistic = 0.62; P = .03) groupings. To test the generalizability of these results, we conducted an identical factor analysis of these same 7 tests in a different cohort: 1099 older persons without dementia from the Rush Memory and Aging Project.13 As shown in the 2 right-hand columns of Table 2, these factor loadings were quite similar to those obtained in the Late-Onset Alzheimer's Disease Family Study cohort (Rand statistic = 1.00; P = .03).
Given the empirical support for the hypothesized groupings, we formed composite measures of each hypothetical cognitive domain. We also created a composite measure of global cognition based on all 7 tests in view of the positive correlations among all measures. To construct each composite measure, we converted raw scores on each test to z scores and then averaged the z scores of component tests to yield the composite score as previously done for other composite cognitive measures.5,10,11 The composite measure was treated as missing if more than half of the component test scores were missing. As shown in Table 1, these composite cognitive measures had relatively normal distributions in each diagnostic subgroup except for the skewed episodic memory distribution in those with dementia.
To assess the validity of the composite measures, we examined their association with APOE genotype in a series of linear regression models that controlled for age, sex, and education. In both the no-dementia and dementia subgroups, inheritance of 1 or 2 ε4 alleles was associated with lower scores on all cognitive measures (data not shown).
To enhance participation in the Late-Onset Alzheimer's Disease Family Study and to reduce its operational costs, we selected cognitive tests that could be administered either in person or by telephone. To date, 495 participants (31%) have been tested by telephone. They were younger than participants tested face-to-face (mean age, 69.2 vs 71.9 years, respectively; t1050 = 4.7; P < .001), more educated (mean education, 14.4 vs 14.1 years, respectively; t1504 = 2.0; P = .048), and less apt to have dementia (13% vs 28%, respectively; χ21 = 41.2; P < .001). To assess the association of mode of test administration with cognitive performance, we constructed a series of linear regression models. Each model had an indicator for telephone vs in-person testing and terms to control for the potentially confounding effects of age, sex, and education, with separate analyses for those with dementia and those without it. As shown in Table 3, administering the tests by telephone was associated with a slightly higher global cognitive score in those without dementia, but the effect accounted for less than 1% of the variability in global cognition. Among the memory systems measures, only working memory showed this effect, with no association between administration mode and performance in the remaining measures. Among those with dementia, telephone administration was not associated with cognitive test performance. Overall, these data suggest that mode of test administration is not strongly related to performance on the composite cognitive measures.
The Late-Onset Alzheimer's Disease Family Study includes 18 centers contributing cognitive data on 3 to 377 individuals (median, 65 individuals) representing 646 families, with 1 to 25 participating members. To examine center and familial effects, we constructed a series of mixed-effects regression models with separate analyses for each composite cognitive outcome in each diagnostic subgroup. Each model had terms to account for the fixed effects of age, sex, education, and mode of test administration. We also included random effects for center, family, and subjects within center. As shown in Table 4, the amount of variability in the composite cognitive measures attributable to center was low both in an absolute sense (ie, 0%-2% in all instances) and in comparison with the variability attributable to familial aggregation and individual differences between persons within centers. The amount of variability attributable to family was somewhat larger, ranging from 3% to 11%, but still substantially less than person-specific variability, which ranged from 15% to 69%.
As part of the Late-Onset Alzheimer's Disease Family Study, examiners from multiple centers administered a battery of 7 cognitive tests either in person or by telephone to more than 1500 older persons with and without dementia, more than 75% of whom represented persons evaluated as parts of families. Composite measures of global cognition and specific memory systems were derived from the individual tests and showed the expected associations with an external validity criterion. Relatively little of the variability in the composite measures was related to mode or site of test administration. Slightly more variability was related to family membership and most reflected person-specific factors. The results suggest that the battery provides a psychometrically sound, operationally flexible, and cost-effective means of assessing multiple memory systems in older persons.
A substantial body of research has examined cognitive testing of older people by telephone. Much of the research has focused on the level of agreement between telephone and in-person testing. Studies with a repeated-measurement design, including one using a cognitive test battery similar to the present one,15 have shown that testing the same individuals by telephone and face-to-face yields equivalent results.16- 20 This work has also shown that multiple domains of memory and cognition can be assessed3,15,20,21; that persons with neurologic conditions, including mild cognitive impairment,22- 26 dementia,27,28 and stroke,29 can be tested by telephone; and that conditions like diabetes that have been linked to cognitive impairment and decline based on in-person testing also show these associations when testing is done by telephone.15,30 The present study represents an attempt to apply these findings in a multicenter epidemiologic study. The cross-sectional finding that mode of test administration contributes little to between-person differences in cognitive performance regardless of domain tested or dementia status is consistent with prior work and extends it to a multicenter context. That neither the mode of administration nor the effect of multiple testing sites contributed materially to the variance of cognition is likely due in part to the efforts expended in developing uniform test procedures and certification processes. However, it is also likely due in part to the large person-specific differences in cognitive performance among older persons with and without dementia.
Key operational aims of epidemiologic research on cognitive function are to minimize missing data and to maintain uniformity in test administration and scoring. The option of administering tests by telephone is likely to increase participation, particularly when subjects are geographically dispersed as in the present study. Use of tests amenable to telephone administration also has the advantage of allowing training and certification of examiners to be done by telephone. This increases efficiency, especially in a study with multiple examiners from multiple sites, by facilitating centralization of training and making it easier to recertify examiners at regular intervals. It also reduces travel costs and requires fewer trainers because the process of training, certification, and recertification can be spread out over time.
Because this study included family members, we were able to estimate the amount of variability due to family effects. Overall, only about 5% of the variability in cognitive performance was due to family effects in comparison with approximately 40% due to between-person effects. The relative size of between-person and familial effects appeared to vary across cognitive domains. For example, among persons with dementia, family effects accounted for more than 10% of the variability in semantic memory, with between-person effects accounting for less than 30%. By contrast, between-person effects accounted for nearly 10 times more variability in working memory than did family membership. These data support the notion that genetic influences on cognition remain strong even during old age.31 The use of families along with measures of different types of cognitive abilities provide an opportunity to identify these genetic variants.
In summary, the Late-Onset Alzheimer's Disease Family Study cognitive test battery provides psychometrically sound measures of global cognition and different forms of memory affected by advancing age and AD. The results provide further evidence of the utility of cognitive tests that are amenable to telephone administration.3
Correspondence: Robert S. Wilson, PhD, Rush Alzheimer's Disease Center, Rush University Medical Center, 600 S Paulina Ave, Ste 1038, Chicago, IL 60612 (firstname.lastname@example.org).
Accepted for Publication: October 5, 2009.
Author Contributions:Study concept and design: Wilson, Sweet, Mayeux, and Bennett. Acquisition of data: Wilson, Sweet, Graff-Radford, Mayeux, and Bennett. Analysis and interpretation of data: Wilson, Leurgans, Foroud, Sweet, and Bennett. Drafting of the manuscript: Wilson and Mayeux. Critical revision of the manuscript for important intellectual content: Wilson, Leurgans, Foroud, Sweet, Graff-Radford, Mayeux, and Bennett. Statistical analysis: Leurgans. Obtained funding: Foroud, Sweet, Graff-Radford, and Mayeux. Administrative, technical, and material support: Wilson, Sweet, Mayeux, and Bennett.
Financial Disclosure: None reported.
Funding/Support: This work was supported by grants U24AG26396 (Late-Onset Alzheimer's Disease Genetic Initiative), U24AG021886 (National Cell Repository for Alzheimer's Disease), R01AG17917 (Rush Memory and Aging Project), P50AG8702 (Columbia University, New York, New York), P30AG10161 (Rush University Medical Center, Chicago, Illinois), P30AG013846 (Boston University, Boston, Massachusetts), P30AG028377 (Duke University, Durham, North Carolina), P30AG010133 (Indiana University, Indianapolis), P50AG05134 (Massachusetts General Hospital, Boston), P50AG016574 (Mayo Clinic, Rochester, Minnesota, and Jacksonville, Florida), P50AG05138 (Mount Sinai School of Medicine, New York), P30AG013854 (Northwestern University Medical School, Chicago), P30AG08017 (Oregon Health and Science University, Portland), P50AG016582 (University of Alabama at Birmingham), P50AG016570 (David Geffen School of Medicine, University of California, Los Angeles), P30AG028383 (University of Kentucky, Lexington), P30AG010124 (University of Pennsylvania, Philadelphia), P50AG005133 (University of Pittsburgh, Pittsburgh, Pennsylvania), P50AG05142 (University of Southern California, Los Angeles), P30AG012300 (University of Texas Southwestern Medical Center at Dallas), P50AG05136 (University of Washington, Seattle), and P50AG05681 (Washington University School of Medicine, St Louis, Missouri) from the National Institute on Aging, National Institutes of Health.
Role of the Sponsors: The sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
Additional Contributions: Creighton H. Phelps, PhD, Marcelle Morrison-Bogorad, PhD, and Marilyn Miller, PhD, National Institute on Aging, provided support and guidance. Jennifer Williamson, MS, MPH, Susan LaRusse Eckert, MS, and Stephanie Doan, MPH, Columbia University, and Michele Goodman, JD, and Kelley Faber, MS, Indiana University, made efforts in coordinating the project across the United States. Tracy Faulkner, MA, Holli Jacobs, BA, and Jenny Haddow, BS, Rush Alzheimer's Disease Center, provided training and certification of cognitive examiners.