Associations Between Longitudinal Trajectories of Cognitive and Social Activities and Brain Health in Old Age

This cohort study examines trajectories of cognitive and social activities from midlife to late life and evaluates whether these trajectories are associated with brain structure, functional connectivity, and cognition.


eMethods. Supplementary Methods
Cognitive and social activity: As activities vary in their cognitive and social requirements, 1 we weighted each item by their relative demand. Item weightings were independently assigned by three authors (MA, CES, KPE), with disagreements resolved through discussion (for agreed ratings, see eTable 1). The weights assigned consisted of 'low' (= 0), 'medium' (= 1) or 'high' (= 2) cognitive/social demand.
Therefore, an item with a higher social/cognitive demand weighting provided a greater contribution towards the overall social/cognitive engagement score, compared to an activity rated as less demanding. A weighted mean 2 was then calculated as follows: (item1*weight)+(item2*weight2) … (itemn*weightn) weight1 + weight2 + … weightn This produced summary scores corresponding to engagement in socially/ cognitively demanding activities.
Cognitive function: Executive function assessments included sub-scores of the digit span test (forward, backward and sequence), 3 language fluency (category and verbal) and part B of the Trail-making test (TMT). 4 Measures of memory included the Rey-Osterrieth Complex Figure (RCF; immediate recall, delayed recall and recognition) 5 and Hopkins Verbal Learning Test Revised (HVLT-R; total recall, delayed recall and recognition). 6 Processing speed was assessed by part A of the TMT, 4 digit coding 3  A-levels, college certificate or professional qualification (at 18+ years), (4) degree (BSc, BA), (5) higher degree (MA, MSc, PhD). As a new scanner was introduced mid-way through data collection, the model of scanner is included as a further covariate. To account for motion-related variance, we included an index of mean head motion during the acquisition of functional images. This is a similar approach to other neuroimaging studies. 8,9 In our study, relative mean displacement obtained MRI data pre-processing: FSL-VBM 10 was used to examine the associations between activities and voxel-wise measure of grey matter. The raw T1-weighted images were first reoriented to a standard MNI template, bias field corrected, and registered to the MNI template using linear 11 and non-linear registration 12 . Brain tissue was then segmented into GM, WM and cerebrospinal fluid (CSF) using FMRIB's Automated Segmentation Tool (FAST) 13 and global volumetric measures of these tissues were extracted. Global GM and WM volumes were adjusted for total intracranial volume.
T1-weighted images were brain extracted, grey matter segmented and then registered to the MNI 152 standard space with non-linear registration. 12 These images were averaged and flipped along the x-axis to produce a symmetrical, studyspecific grey matter template. All native grey matter images were non-linearly registered to the grey matter template and modulated to correct for local expansions and contractions. The resulting images were smoothed with an isotropic Gaussian kernel with a sigma of 3 mm.
The Tract-Based Spatial Statistics (TBSS) 14 pipeline was used in the analysis of white matter microstructure. For the diffusion-weighted images, FSL's topup was first applied in order to estimate the susceptibility induced off-resonance field using the b0 scans. 15 Eddy was then used to correct for distortions attributed to motion and eddy currents. 16 If a given slice was >3 standard deviations from the Gaussian process predicted slice these were labelled as outliers and replaced. Volumes with >10 'outlier' slices were excluded. Participants with more than 5 volumes missing from their scans were excluded from the analysis. Diffusion-weighted scans were subsequently submitted to DTIFIT, which uses a diffusion tensor model to derive spatial maps of fractional anisotropy (FA), axial diffusivity (AD), mean diffusivity (MD) and radial diffusivity (RD) for each individual. The resulting images were brain extracted with FSl's Brain Extraction Tool. 17 Each individual's FA, AD, RD and MD images was then non-linearly registered into standard MNI space using FMRIB58_FA as the target image. Subsequently, FA AD, RD and MD values were projected onto a study-specific mean FA tract skeleton, to derive skeletons for every participant. The averaged skeletons were intensity thresholded (= 0.2), in order to represent shared tracts across the entire sample. Mean FA, AD, RD and MD was also calculated for each participant, by averaging over these values across the entire white matter skeleton. We extracted white matter lesions (WML) using Brain Intensity AbNormality Classification Algorithm (BIANCA) 18 . All WML segmentations were visually inspected, excluding those that were identified as inaccurate.
Resting-state functional MRI (fMRI) images underwent the following preprocessing steps: motion correction, brain extraction, high-pass temporal filtering (cut-off = 100 sec), field map corrections; performed using FSL Multivariate Exploratory Linear Optimized Decomposition into Independent Components (MELODIC). 19 Artefactual components attributed to non-neuronal fluctuations were removed with single-subject ICA and FMRIB's ICA-based X-noiseifier (FIX). 20,21 The training data for FIX were from the WhII_MB6.RData trainedweights file (available at http://www.fmrib.ox.ac.uk/datasets/FIX-training/), consisting of manually labelled data from 25 participants. After pre-processing and cleaning, all resting-state images were registered to the individual's structural scan and standard space images using FNIRT. The images were then spatially smoothed using an isotropic Gaussian kernel of 6 mm full width at half maximum (FWHM).
In order to create group-level spatial maps, MELODIC group-ICA was performed with 25 components. These spatial maps were created from all Whitehall II imaging sub-study participants with usable resting-state images without any neurological diseases or structural abnormalities (n = 678). MA and SS categorised the derived components as signal or noise. Dual regression was then used to extract subjectspecific maps for each of the signal components. For the present analyses, only components representing the DMN, ECN and FPN are considered (n = 6, Figure S3).
Missing data: Instead of excluding respondents who had omitted a single item on the activity questionnaire, we used the weighted mean score from all items available at each time point. Weighted means for each time point were used to reduce the bias introduced by the one missing item on the overall summary score.
Full Information Maximum Likelihood was employed to address situations where an entire questionnaire was missing from a participant at a particular phase (e.g. due to non-attendance on the assessment day). This method uses all available information to estimate population parameters, which produces less biased estimates relative to common deletion (i.e. pair-wise or list-wise) and mean imputation approaches to addressing missing data. [22][23][24] Trajectory analyses: We assessed the fit of the LGCM models based on the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Sample-Size Adjusted BIC (SSA-BIC), Tucker-Lewis Index (TLI), Comparative Fit Index (CFI) 25 and Root Mean Square Error of Approximation (RMSEA). 26 Following Grimm et al., 27 we considered adequate fit as a combination of the following: TLI ≥ 0.95, CFI ≥ 0.95 28 and a RMSEA ≥ 0.10. 29 AIC, BIC, and SSA-BIC were also used to compare models, with lower values indicating better fit. 27 As large sample sizes can bias the likelihood ratio chi-square towards rejecting even well-fitting models, 30 this statistic was not considered in our comparisons. Among the LCGMs, the model with best fit was identified as having the lowest AIC, BIC and SSA-BIC values, entropy values ≥ 0.8 and a p-value < 0.05 for the Lo-Mendel-Rubin likelihood ratio test and Bootstrap likelihood ratio test. A minimum of 5% of participants within each class was also considered essential for model selection. 31 For both the LGCM and LCGM analyses, time scores were entered as the mean years since the baseline assessment (i.e. 0, 6, 9, 11 and 15 years since Phase 5), dividing by either 10 for LGCMs 32 or 100 for LCGMs 33 to aid model convergence. Further, variances of the observed variables (i.e. activity measures) were constrained to be equal over time.
All LCGM were run with at least 100 sets of random sets of starting values, 10 optimizations and 10 iterations. If convergence was not achieved, the number of random starts optimizations and iterations were increased, as done in previous publications. 34 Persistent issues in model convergence or estimation are reported as experiencing "convergence problems".  27 The quadratic coefficients, on the other hand, reflected change in activities across time, otherwise interpreted as the acceleration or deceleration of change in activity levels. 27 As discussed in the main text, multicollinearity was detected between intercepts and quadratic coefficients for the analyses examining cognitive activity trajectories.
As including both variables in a model may lead to biased results, their relationship with cognitive and MRI outcomes were examined separately, while adjusting for linear coefficients (in addition to age and other co-variates described in main text: Statistical analyses). The rationale here was that linear coefficients were one of two estimates of change in activities over time, with change representing an important co-variate when considering the relationship between activity level (i.e. intercepts) and brain/cognitive markers. For example, individuals who decline in activities at a faster rate over the study period, may report lower activity levels measured at a given time point. Given that the quadratic and linear coefficients are intricately

eResults. Supplementary Results
Comparisons of included and excluded participants: On average, participants included in the analyses were younger (p = 0.018), more educated (p = 0.031) and achieved higher MoCA scores (p < 0.001) relative to excluded participants. There were no differences in the proportion of females between these groups (For results, see eTable 6).   The MoCA is a 10-minute cognitive screening test that assesses multiple domains, including visuospatial abilities, executive function and language. This test integrates a range of sub-tests, such as the naming of low-familiarity animals, short-term memory recall task, a clock-drawing task, a three-dimension cube copy task, alphanumeric trail making, phonemic fluency task, verbal abstraction task, digit forward and backward and orientation. MoCA scores ranges from 0 -30, with higher scores reflecting better overall cognition. An additional one point is given to an individual with less than 12 years of education. In a clinical setting, scores below 26 may indicate cognitive impairment (Nasreddine et al., 2005).
Digit span (Wechsler, 2008) In this task, a trained psychology graduate read out a series of numbers. Participants were either required to recall the numbers in the same order (digit forward), in reverse order (digit backward) or from smallest to largest number (digit sequence). The outcome was the maximum number of digits correctly recalled, under each condition.
Digit Coding (Wechsler, 2008) Participants were presented with a key that contained a series of numbers, with a unique symbol associated with each number. In a grid containing just numbers, the main task was to draw the correct symbol paired with each number (as stated in the key), within a 2-minute period. The outcome was the total number of correct digit symbol matches.
Language fluency (adapted from ACE-III; Hsieh et al.

2013)
This task required individuals to list as many words as possible starting with the letter 'S' (verbal fluency) or name as many animals as possible (category fluency) within a 60 second time frame. The outcome was the number of words recalled for each type of language fluency. TMT A and B (Reitan, 1958) For the trail making tasks, participants were instructed to connect a series of distributed circles on a page consisting of 25 numbers (TMT A) or numbers and letters (TMT B) as quickly and as accurately as possible. The outcome was the time taken to correctly complete the trail (seconds).
RCF (Osterrieth, 1944) Participants were presented with a complex geometric diagram that they were initially asked to copy. The image was then removed, with individuals immediately instructed to redraw the diagram, this time, from memory (outcome 1: immediate recall score). After a delay, participants were required to once more draw the image from memory (outcome 2: delayed recall score). In the final section of the RCF, participants were presented with several geometric shapes, and asked whether they formed part of the original complex diagram (outcome 3: recognition score).
HVLT-R (Brandt, 1991) This test required individuals to learn a list of 12 words (drawn from three semantic categories, such as precious gems or vegetables), through three learning trials. A delayed recall task was then administered with a delay of 20-25 minutes, which was followed by subsequent recognition task. For the recognition task, individuals were presented with 24 words and required to identify whether a given word had been in the original list of words to learn (12 were correct). This task therefore provided a measure of delayed recall, recognition and total recall.