Sex Differences in Cognitive Decline Among US Adults

Key Points Question Does the risk of cognitive decline among US adults vary by sex? Findings In this cohort study using pooled data from 26 088 participants, women, compared with men, had higher baseline performance in global cognition, executive function, and memory. Women, compared with men, had significantly faster declines in global cognition and executive function, but not memory. Meaning These findings suggest that women may have greater cognitive reserve but faster cognitive decline than men.


Cohort Studies
The Coronary Artery Risk Development in Young Adults study (CARDIA) (http://www.cardia.dopm.uab.edu/) is a study on development of and risk factors for cardiovascular disease (CVD) in young adults. In 1985-1986, the study recruited 5,115 adults aged 18-30 (50% black, 50% white) from 4 US cities (Birmingham, AL; Chicago, IL; Minneapolis, MN; Oakland, CA). Within sites, the sample aimed to comprise ~equal numbers of participants by sex, self-defined race (black or white), age (18 to 24 or 25 to 30 years), and education (≤high school or >high school). Follow-up exams have occurred every 2-5 years for 30 years. Exams at years 25 and 30 measured cognition.
The Northern Manhattan Study (NOMAS) (http://columbianomas.org/index.html) is a study on determinants of stroke and CVDs in middle-aged and older adults in 3 race-ethnic groups. From 1993-2001, the study recruited 3,298 stroke-free adults aged 40 or older (20% black, 25% white, 55% Hispanic) from Northern Manhattan. NOMAS has a longitudinal dementia sub-study (n=1,290) that expanded the neuropsychology battery; these participants have had follow-up exams every 5 years for 15 years (3rd exam approximately 2016-2020) and cognition measured at each exam. We included the expanded neuropsychology battery and the dementia sub-study in the analysis.
The Atherosclerosis Risk in Communities Study (ARIC) (https://www2.cscc.unc.edu/aric/desc) is a study of the causes of atherosclerosis and its clinical outcomes, and variation in CVD risk factors, medical care, and disease by race, gender, location, and date. In 1987-1989, the study recruited 15,792 adults aged 45-64 (28% black, 72% white) from 4 US communities (Forsyth County, NC, Jackson, MS, Minneapolis, MN, Washington County, MD). Participants have had follow-up exams every 2-13 years for 25+ years. Starting at the 2 nd exam, every exam measured cognition (n=14,348) using a short battery. We did not use data from the 3 rd exam, when only a very small proportion of the cohort had cognition assessed. ARIC has a neurocognitive sub-study to perform CID surveillance. ARIC expanded the neuropsychology battery in exams 5-7. We included the neurocognitive sub-study and expanded neuropsychology battery in the analysis.
The Cardiovascular Health Study (CHS) (http://www.chs-nhlbi.org/) is a study of risk factors for CVD in older adults. In 1989-1990, the study recruited 5,201 adults aged ≥65 (16% black, 84% white), and in 1992-1993, added a supplementary cohort of 687 blacks, from 4 US communities (Sacramento County, CA, Washington County, MD, Forsyth County, NC, Pittsburgh, PA). We used in-person cognitive test data from study years 3-11 and TICS data from years 8-11 and 20-25. Data on test timing (days from baseline) were not available after year 25. CHS also had a dementia sub-study that added a neuropsychology battery and neurology exam, which we included in the analysis.
Follow-up exams have occurred every 4-8 years for 45 years. All exams measured cognition using: MMSE at all exams and an expanded neuropsychology battery at exam 7 forward. In 1999-2000, FOS began a cognitive and MRI sub-study that added an expanded neuropsychology battery. We removed 9 non-Hispanic blacks to avoid working with a qualitative variable containing a category with a small number of subjects. We included the expanded neuropsychology battery and the cognitive and MRI sub-study in the analysis. We included FOS because it contributes individual participant data to estimate cognitive decline in whites-the reference group in the pooled cohort analysis to which blacks are compared-as well as the effect of timedependent cumulative mean BP on cognitive decline.

Harmonization of Cognitive Function Assessments
In a pre-statistical harmonization phase, we identified 126 test items from 32 cognitive instruments across the cohorts and determined shared items between cohorts. 1 Expert neuropsychologists (EMB, BJG) assigned each test item to a cognitive domain. This phase entailed extensive review of study-specific scoring procedures, test administration, and documentation of theoretical and empirical score ranges of each cognitive test score in each study. The goal was to identify comparable and noncomparable tests across studies. Neuropsychologists paid attention to versions of tests (e.g., WAIS vs WMS). We used equipercentile equating to adjust the distributions of like test items across cohorts (e.g., Auditory Verbal Learning Test and the California Verbal Learning Test) to be on a common scale prior to IRT co-calibration. 2 In equipercentile equating, each percentile rank of a score on one study is equated to the score with the same percentile in another study; the determination of which tests were truly comparable in quality was made by the expert clinicians. Decisions regarding how specific cognitive test variables were treated are available from the corresponding author.
For statistical harmonization we used item response theory (IRT). IRT describes a family of statistical methods that are used to relate discrete observations (e.g., cognitive tests) to a latent trait that underlies the observations 3 . In an IRT model, each test item is empirically weighted based on its correlation with other items and assigned a relative location along the latent trait (e.g., global cognition) corresponding to its estimated difficulty. This weighting and calculation of relative location is encapsulated in item discrimination (or loading) and local (or threshold) parameters, which are estimated simultaneously alongside a person's ability level and expressed on a unified metric via maximum likelihood during the model estimation process. We computed factor scores from IRT-graded response models for each domain using the regression-based method in Mplus version 8.1. [4][5][6][7] Using the regression method -also known as the empirical Bayes predictor 8 -the factor score is computed on the basis of all available information in a given observation's response patterns and thus can be calculated in the presence of missing data. This IRT approach produces cognitive scores that are more precise than those derived from a z-score approach that standardizes and averages test scores. 9 Details are available in a forthcoming publication. The measurement model for each cognitive factor was constructed such that cognitive tests provided no information about cognitive levels at study visits in which the tests were not administered. Although the cognitive factors were constructed in this way, there remains an autocorrelation across time within people that is modeled using random effects models described in the main analysis section.

Covariates
We harmonized covariates across cohorts by choosing common response categories for categorical variables and converting measurements to common units for continuous variables. The cohorts used experts to adjudicate incident clinical events during followup (e.g., stroke, myocardial infarction, atrial fibrillation). Years of school was categorized as eighth grade or less, grades 9-11, completed high school, some college but no degree, and college graduate or more, and cohort study. Alcohol use was categorized as drinks per week (none, 1-6, 7-13, 14 or more). Cigarette smoking was defined as current or not. Waist circumference was measured in centimeters. Assessment of physical activity varied across the cohorts. CHS and NOMAS asked participants about activity over the previous two weeks. CARDIA and ARIC created summary scores from questions about frequency of specific activities over the previous year. FOS asked participants to provide the number of hours per day of moderate or heavy exercise. A harmonized binary variable was created by assigning a "yes" value to any participants indicating that they performed any activity over the respective time periods. Low-density lipoprotein (LDL) cholesterol and fasting glucose were measured in mg/dL. Fasting glucose levels were provided by ARIC, CARDIA and FOS. For CHS and NOMAS, glucose levels were provided along with separate fasting indicators. In CHS data, the number of hours since eating was provided, and we defined fasting as having gone 8 or more hours since eating. In NOMAS data, the fasting indicator was a binary variable defined as fasting vs. non-fasting, so we could not consider the number of hours since eating. In both CHS and NOMAS, glucose levels were treated as missing if not fasting.
History of atrial fibrillation at or before the first cognitive assessment was defined as history of atrial fibrillation based on self-report and/or physician adjudication at baseline or cohort-adjudicated incident atrial fibrillation before the first cognitive assessment.
Cohorts measured current hypertension medication use by evidence of medication bottles and self-report. History of myocardial infarction at or before the first cognitive assessment was defined as history of myocardial infarction based on self-report and/or physician adjudication at baseline or cohort-adjudicated incident myocardial infarction before the first cognitive assessment. Kidney function (glomerular filtration rate) was measured using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation. History of stroke at or before the first cognitive assessment was defined as history of stroke based on self-report and/or physician adjudication at baseline or cohort-adjudicated incident stroke before the first cognitive assessment. We summarized systolic blood pressure (BP) as the time-dependent cumulative mean of all BPs before each cognitive measurement. It is a "running average" of the systolic BP measurements before each cognitive measurement.
Missing data on the cognitive measures (the dependent variable) measured longitudinally exhibit non-monotone missing data patterns, making the likelihood-based methods challenging to implement if missing data result from the missing not at random (MNAR) mechanism. For this reason, we elected to proceed with the analysis using linear mixed-effects models, which allow for a valid inference assuming that data on cognition are missing at random (MAR).    Interpretative Key: Global cognition measures global cognitive performance. All cognitive measures are set to a T-score metric (mean 50, SD 10); a 1-point difference represents a 0.1 SD difference in the distribution of cognition across the 5 cohorts. Higher cognitive scores indicate better performance. SBP= systolic blood pressure. NA=not applicable. y=year. Linear mixed-effects models included time since first cognitive assessment and baseline values (measured before or at time of first cognitive assessment) of sex, race, age, cohort study, years of school, alcohol use, cigarette smoking, body mass index, waist circumference, physical activity, time-varying cumulative mean systolic blood pressure (SBP), hypertension treatment, fasting glucose, low density lipoprotein (LDL) cholesterol, history of atrial fibrillation, age* follow-up time, sex* follow-up time, race* follow-up time, time-varying cumulative mean SBP* followup time, and hypertension treatment*follow-up time.. To take into account correlation between longitudinal cognitive measures, we included random intercept and slope effects associated with subjects. All continuous variables were centered at the overall median, except cumulative mean SBP, which was centered at 120 mmHg. Glucose, LDL cholesterol, and SBP values were divided by 10 so that the parameter estimates refer to a 10-unit change in the variables. SBP was the time-dependent mean of all SBPs before the measurement of cognition. To estimate sex differences in cognitive decline, models included a sex*follow-up time interaction term.  Interpretative Key: Global cognition measures global cognitive performance. All cognitive measures are set to a T-score metric (mean 50, SD 10); a 1-point difference represents a 0.1 SD difference in the distribution of cognition across the 5 cohorts. Higher cognitive scores indicate better performance. SBP= systolic blood pressure. NA=not applicable. y=year. Linear mixed-effects models included time since first cognitive assessment and baseline values (measured before or at time of first cognitive assessment) of sex, race, age, cohort study, years of school, alcohol use, cigarette smoking, body mass index, waist circumference, physical activity, time-varying cumulative mean systolic blood pressure (SBP), hypertension treatment, fasting glucose, low density lipoprotein (LDL) cholesterol, history of atrial fibrillation, age* follow-up time, sex* follow-up time, race* follow-up time, time-varying cumulative mean SBP* followup time, and hypertension treatment*follow-up time. To take into account correlation between longitudinal cognitive measures, we included random intercept and slope effects associated with subjects. All continuous variables were centered at the overall median, except cumulative mean SBP, which was centered at 120 mmHg. Glucose, LDL cholesterol, and SBP values were divided by 10 so that the parameter estimates refer to a 10-unit change in the variables. SBP was the time-dependent mean of all SBPs before the measurement of cognition. To estimate sex differences in cognitive decline, models included a sex*follow-up time interaction term.  Interpretative Key: Global cognition measures global cognitive performance. All cognitive measures are set to a T-score metric (mean 50, SD 10); a 1-point difference represents a 0.1 SD difference in the distribution of cognition across the 5 cohorts. Higher cognitive scores indicate better performance. SBP= systolic blood pressure. NA=not applicable. No.=number. y=year. Linear mixed-effects models included time since first cognitive assessment and baseline values (measured before or at time of first cognitive assessment) of sex, race, age, cohort study, years of school, alcohol use, cigarette smoking, body mass index, waist circumference, physical activity, number of APOE E4 alleles, time-varying cumulative mean systolic blood pressure (SBP), hypertension treatment, fasting glucose, low density lipoprotein (LDL) cholesterol, history of atrial fibrillation, age* follow-up time, sex* follow-up time, race* follow-up time, number of APOE E4 alleles*follow-up time, time-varying cumulative mean SBP* follow-up time, and hypertension treatment*follow-up time. To take into account correlation between longitudinal cognitive measures, we included random intercept and slope effects associated with subjects. All continuous variables were centered at the overall median, except cumulative mean SBP, which was centered at 120 mmHg. Glucose, LDL cholesterol, and SBP values were divided by 10 so that the parameter estimates refer to a 10-unit change in the variables. SBP was the time-dependent mean of all SBPs before the measurement of cognition. To estimate sex differences in cognitive decline, models included a sex*follow-up time interaction term. Interpretative Key: Global cognition measures global cognitive performance. All cognitive measures are set to a T-score metric (mean 50, SD 10); a 1-point difference represents a 0.1 SD difference in the distribution of cognition across the 5 cohorts. Higher cognitive scores indicate better performance. SBP= systolic blood pressure. NA=not applicable. y=year.

eTable 4: Sensitivity Analysis of Association of Cognitive Decline with Sex Including Kidney Function and History of Myocardial Infarction as Covariates
As a sensitivity analysis, we repeated analyses within the individual cohorts to assess heterogeneity in the associations between sex and cognitive decline. Linear mixed-effects models included time since first cognitive assessment and baseline values (measured before or at time of first cognitive assessment) of sex, race, age, years of school, alcohol use, cigarette smoking, body mass index, waist circumference, physical activity, time-varying cumulative mean systolic blood pressure (SBP), hypertension treatment, fasting glucose, low density lipoprotein (LDL) cholesterol, history of atrial fibrillation, age* follow-up time, sex* follow-up time, race* follow-up time, time-varying cumulative mean SBP* followup time, and hypertension treatment*follow-up time. To take into account correlation between longitudinal cognitive measures, we included random intercept and slope effects associated with subjects. All continuous variables were centered at the overall median, except cumulative mean SBP, which was centered at 120 mmHg. Glucose, LDL cholesterol, and SBP values were divided by 10 so that the parameter estimates refer to a 10-unit change in the variables. SBP was the time-dependent mean of all SBPs before the measurement of cognition. To estimate sex differences in cognitive decline, models included a sex*follow-up time interaction term.
The linear mixed-effect model for the CARDIA (Coronary Artery Risk Development in Young Adults) study with random intercept and random slope was non-identifiable owing to participants having up to two cognition visits. defined clinically meaningful cognitive decline as a decline in cognitive function of 0.5 or more standard deviations (SD) from baseline cognitive scores 10 . Declines of 0.5 SD or greater from baseline cognitive scores correlate with a clinically meaningful decline in adults with normal cognition and dementia 11,12 . An informal approach is to compare the time to reach the threshold of a 0.5-SD decrease in cognitive function from baseline between women and men. eTable 7 shows the time to reach the threshold of a 0.5-SD decrease in cognitive function from baseline by sex. Women will reach the thresholds of a 0.5-SD decrease from baseline cognitive scores faster than men for global cognition, executive function, and memory.
For example, the mean global cognition slope is -0.21 points per year in men and -0.28 points per year in women. It takes men, on average, 18.88 years and women 14.16 years for global cognition scores to decline 0.5 SD from the baseline score. Women will reach the threshold of a 0.5-SD decrease from the baseline score for global cognition 4.72 years faster than men. Similarly, women will have decreases of 0.5 SD from baseline scores 1.97 years faster for executive function and 0.24 years faster for memory. Based on this approach, the sex differences in cognitive declines are clinically meaningful.

eTable 8: Association between Sex and Exclusion Because of History of Stroke or Dementia at Baseline or Incident Stroke or Incident Dementia Before First Cognitive Assessment
Excluded because of history of stroke or dementia at baseline or incident stroke pr incident dementia before first cognitive assessment