Association of Prenatal Exposure to Population-Wide Folic Acid Fortification With Altered Cerebral Cortex Maturation in Youths

Importance Presently, 81 countries mandate the fortification of grain products with folic acid to lessen the risk of neural tube defects in the developing fetus. Epidemiologic data on severe mental illness suggest potentially broader effects of prenatal folate exposure on postnatal brain development, but this link remains unsubstantiated by biological evidence. Objective To evaluate associations among fetal folic acid exposure, cortical maturation, and psychiatric risk in youths. Design, Setting, and Participants A retrospective, observational clinical cohort study was conducted at Massachusetts General Hospital (MGH) among 292 youths 8 to 18 years of age born between January 1993 and December 2001 (inclusive of folic acid fortification rollout ±3.5 years) with normative results of clinical magnetic resonance imaging, divided into 3 age-matched groups based on birthdate and related level of prenatal folic acid fortification exposure (none, partial, or full). Magnetic resonance imaging was performed between January 2005 and March 2015. Two independent, observational, community-based cohorts (Philadelphia Neurodevelopmental Cohort [PNC] and National Institutes of Health Magnetic Resonance Imaging Study of Normal Brain Development [NIH]) comprising 1078 youths 8 to 18 years of age born throughout (PNC, 1992-2003) or before (NIH, 1983-1995) the rollout of folic acid fortification were studied for replication, clinical extension, and specificity. Statistical analysis was conducted from 2015 to 2018. Exposures United States–mandated grain product fortification with folic acid, introduced in late 1996 and fully in effect by mid-1997. Main Outcomes and Measures Differences in cortical thickness among nonexposed, partially exposed, and fully exposed youths (MGH) and underlying associations between age and cortical thickness (all cohorts). Analysis of the PNC cohort also examined the association of age–cortical thickness slopes with the odds of psychotic symptoms. Results The MGH cohort (139 girls and 153 boys; mean [SD] age, 13.3 [2.3] years) demonstrated exposure-associated cortical thickness increases in bilateral frontal and temporal regions (9.9% to 11.6%; corrected P < .001 to P = .03) and emergence of quadratic (delayed) age-associated thinning in temporal and parietal regions (β = –11.1 to –13.9; corrected P = .002). The contemporaneous PNC cohort (417 girls and 444 boys; mean [SD] age, 13.5 [2.7] years) also exhibited exposure-associated delays of cortical thinning (β = –1.59 to –1.73; corrected P < .001 to P = .02), located in similar regions and with similar durations of delay as in the MGH cohort. Flatter thinning profiles in frontal, temporal, and parietal regions were associated with lower odds of psychosis spectrum symptoms in the PNC cohort (odds ratio, 0.37-0.59; corrected P < .05). All identified regions displayed earlier thinning in the nonexposed NIH cohort (118 girls and 99 boys; mean [SD] age, 13.3 [2.6] years). Conclusions and Relevance The results of this study suggest an association between gestational exposure to fortification of grain products with folic acid and altered cortical development and, in turn, with reduction in the risk of psychosis in youths.

MRI scans from a third cohort (NIH), obtained from the NIH MRI Study of Normal Brain Development 4 , were used to study thinning trajectories in a sample of deidentified youth who gestated prior to the folate rollout. IRB approval and informed consent have been described elsewhere 5 . This project recruited a total of 431 healthy subjects aging between 4.6 and 18 years (at the time of first enrollment) in 6 pediatric study centers in the US between November 2001 and September 2003. Participants underwent MRI scans in 1.5T GE and Siemens scanners. Data collection for follow up scans continued until November 2007. For our study, we selected a subsample from this cohort that was previously identified as passing stringent quality control measures 6 , and further narrowed it to our age interval of interest (8.0 to 18.0). In order to ensure that none of the subjects in our subsample was exposed to folic acid fortification we only included subjects that were 87 months and older at the time of first enrollment. The final sample included 217 individual subjects and 383 scans.
MRI quality control and final scan selection in the MGH cohort. MRI data for the remaining reports (n=1,111) were then visually inspected for image quality by a trained investigator using ChRIS software 7 . Image inspection was conducted blind to group assignment. 796 scans were excluded due to the failure to meet stringent quality control (presence of any visible motion, susceptibility, or other artifact; poor contrast; ghosting; ringing; aliasing; any blurring of gray-white matter demarcation), by the absence of high-resolution T1-weighted structural scans conducted at MGH, or by the presence of more than one scan per patient (in which case only the first scan was used).
The remaining 315 scans were pre-processed using FreeSurfer v5.0, which provides an automated segmentation of cortical gray matter. Each of the pre-processed scans were then subjected to manual editing of the pial and graywhite boundaries 8,9 by a single technician who was blind to all subject-level information (including exposure group). An additional 23 scans were excluded during editing due to skull-wrap deformities or other non-editable abnormalities. The remaining 292 scans were included in the analysis. These scans were obtained at various imaging sites at MGH using one of the five scanner types: 1.5T GE, 1.5T Siemens Avanto, 1.5T Siemens Aera, 3T Siemens Trio, and 3T Siemens Skyra. Subjects were sorted into pre-rollout (nonexposed, n=97), rollout (partially exposed, n=96) and post-rollout (fully exposed, n=99) groups based on their date of birth (eFigure 2b). This sample size exceeded that which was pre-determined to be sufficiently powered for group cortical thickness analyses, as above.
Geospatial analysis of socioeconomic variables and vitamin intake. Addresses from all 292 MGH patients were geocoded with the Esri Business Analyst 2015 address locator. All addresses were located to the address level of accuracy. The address locations were then overlaid with the Business Analyst 2010 block group dataset, and corresponding economic, education, and 6-month consumer vitamin use and spending variables were extracted from the block group each address fell into. Block groups are subdivisions of census tracks and provide neighborhoodlevel analysis of approximately 700 residents. Five addresses could not be located within block groups; for these subjects, zip code-level data were used. Vitamin use data is reported as an index value, which may be compared to nationwide average value of 100.
Image processing. All scans in this study were analyzed using FreeSurfer v5.0 (http://surfer.nmr.mgh.harvard.edu/). The automated image processing pipeline of FreeSurfer performs full characterization of the anatomy including the cortex, white matter/gray matter boundaries, folding patterns, cortical and subcortical regions of interest. Specifically, the typical processing of a structural MR image includes removal of non-brain tissue, automated Talairach transformation, segmentation of the subcortical white matter and deep gray matter structures, intensity normalization, tessellation of the gray/white matter boundary, automated topology correction, and surface deformation following intensity gradients to optimally position the gray/white and gray/cerebrospinal fluid (CSF) borders 8,10 . Once the cortical model is complete, the cerebral cortex is parcellated into units based on the gyral and sulcal structure 11 . This method uses both intensity and continuity information from the volumetric image to produce representations of cortical thickness, calculated as the closest distance from the gray/white boundary to the gray/CSF boundary at each vertex on the tessellated surface 10 . Methods for cortical thickness measurements have been validated against histological analyses 12 and manual measurements 13 . Importantly, FreeSurfer structural image processing has been demonstrated to show high test-retest reliability across scanner manufacturers and across field strengths 14,15 . In all imaging analyses, individual thickness maps were smoothed with a 22-mm full width half maximum (FWHM) kernel in order to remove noise-related variations in the images.
Cortical thickness analyses in MGH cohort. To contrast thickness in nonexposed and fully exposed groups, cortical surface clusters were identified based on significant group differences, defined by a cluster extent threshold p<.05 and cluster-wise correction for multiple comparisons across the entire cortical surface using 10,000 Monte Carlo simulations (cluster-wise p<.05). Z-transformed age, age 2 , sex, total brain volume (TBV), and scanner field strength, which have been shown to influence cortical thickness 6,16 were entered as nuisance covariates. Of note, both scanner field strength and manufacturer have been associated with effects on cortical thickness measurements (albeit in different directions depending on cortical location) 14,16 . Within the MGH sample, all GE scans were conducted at 1.5T, while Siemens scans were conducted at 1.5 and 3T. Given that scanner field strength and manufacturer were highly collinear, and that field strength has been associated with stronger effects on cortical thickness than manufacturer 14 , we chose (a priori) field strength rather than manufacturer as a covariate. While each of the five scanner platforms used slightly different T1 acquisition sequences, given the uneven distribution of scanners across the sample ( Table 1), it would have been impractical to include specific scanner platform as a covariate, over and above scanner field strength.
After significant clusters were defined in the group analysis, individual-level thickness values (indicating mean thickness across all vertices within the group-average cluster) were extracted for those clusters in each subject in the nonexposed, partially exposed, and fully exposed groups. These values were used to generate subject-level plots (e.g., Figure 1b, 2b) and to further quantify group differences in terms of percent difference (eTable 3). Additional sensitivity analyses (MANCOVA) examined effects of both scanner field strength and scanner manufacturer on cortical thickness in regions demonstrating significant between-group differences.
To understand exposure-related effects within the context of brain development, additional surface-wide analyses probed group differences in age-related cortical thinning contours (i.e., intercept, slope, and non-linear thinning). First, difference maps for nonexposed versus fully exposed groups were re-centered at all ages from 8.0 to 18.0 by shifting subject ages in 0.1 y increments (raw values, not z-transformed) accordingly and introducing an age x group covariate into the general linear model. At each age (e.g., 8.0), Monte Carlo analysis was repeated to identify significant between-group differences in cortical thickness at that age point (e.g., Figure 1c, Video). Second, to determine whether and where differences in age-thickness slope contributed to overall group thickness differences, the age x group interaction term was selected as the covariate of interest (i.e., with z-transformed age, group, sex, scanner strength, and TBV as nuisance covariates). Finally, to determine if groups differed in non-linear thinning contours, age 2 x group was entered as the covariate of interest, with nuisance covariates as above.
Cortical thickness analyses in PNC cohort. Surface-wide analysis of non-linear thinning was assessed across the entire PNC sample. The general linear model included age 2 as the covariate of interest, and z-transformed age, sex, and TBV as nuisance covariates. Scanner strength was not entered into the model as all participants were scanned on the same 3T magnet. As above, clusters were defined as significant based on a cluster extent threshold p<.05 and surface-wide correction for multiple comparisons using 10,000 Monte Carlo simulations (cluster-wise p<.05). Subject-level data on cortical thickness within significant clusters were extracted as above.

Cortical thickness analyses in the NIH cohort.
To determine whether non-linear thinning occurred prior to the fortification rollout, clusters identified as having quadratic age-thinning contours in the MGH or PNC cohorts were assessed for non-linear thinning in the NIH cohort. As the NIH sample comprised subjects with between one and three scans, linear mixed models were employed (SPSS v25), using a diagonal covariance structure. Individuallevel thickness values were extracted from each cluster (as above) and were treated as the dependent variable. Subject ID and visit number were entered as repeated measures; sex and scanner site were entered as categorical covariates; and z-transformed TBV, age, and age 2 were entered as continuous covariates. Clusters where age 2 predicted cortical thickness at p<.05 (uncorrected) were considered to show significant quadratic thinning.

Break point analyses for regions demonstrating quadratic thinning.
For regions demonstrating quadratic (delayed) age-related thinning in the MGH, PNC, or NIH cohort, we estimated the break point (age in years) where the age-thickness relationship changed from flat to sloped using least squares analysis (e.g., for eFigure 6). Subjectlevel cortical thickness values were extracted from the cluster (as above); sex, TBV, and other cohort-specific nuisance covariates (scanner strength for MGH, site for NIH) were regressed out, and z-transformed residual cortical thickness values were used for subsequent analysis (SPSS v25). Using Matlab R2015b (https://www.mathworks.com), break points were tested serially, starting at age 9.0 and then incrementally increasing by 0.1 year at a time up to 17.0. Break point range was determined as 9.0-17.0 to ensure sufficient data for the computation of goodness-of-fit at both ends. At a given break point, age-related thinning changes were modeled by a "flat-slope" function as follows: (1) the average of all thickness values from age 8.0 up to and including the break point became the y-value for the "flat" part of the thickness vs. age function, and (2) the best fit line of all remaining thickness values occurring after the break point, constrained by having left edge of the new sloped line contiguous with the flat line established in step 1, became the "slope" part of the thickness vs. age function. At each 0.1-year break point interval, we determined and squared the vertical distance between each cortical thickness data point and the "flat-slope" function derived for that interval; the average of these values became the mean of squares for that particular break point, reflecting the goodness-of-fit of that break point model with respect to the actual data from participants aged 8 to 18. We then plotted the average of squares as a function of break point age, generating an inverted-U shaped curve, where the minimum value of the curve identified the break point with optimal fit.
In the case of the NIH cohort cluster showing quadratic thinning (i.e., where the cluster was defined based on quadratic thinning in the MGH or PNC cohorts, and then found to also be quadratic in the NIH cohort), we determined whether the break point defined through the analysis of NIH data differed significantly from the break point defined through the referring data set (MGH or PNC) using chi-square. For example, if the break point for a given cluster occurred at age 10.0 in the PNC data and at 12.0 in the NIH data, chi-squared was conducted using the NIH data, and determined whether the proportion of subjects age 10 and under differed significantly from the proportion of subjects age 12 and under.
Clinical phenotyping of PNC participants. PNC participants received standardized clinical evaluations using a structured computerized screening instrument (GOASSESS) as previously described [17][18][19] . Psychiatric history was obtained from a caregiver (participants age 8 to 10) or caregiver and proband (participants age 11 to 18). Psychosis Spectrum youth (PS) were identified based on (1) an age-deviant score of ≥2 SD above age-matched peers on the Prevention through Risk Identification, Management, and Education (PRIME) Screen-Revised (PS-R) 20 , or the presence of ≥1 PRIME item rated 6 or ≥3 items rated at least 5; (2) endorsed definite or possible hallucinations on the Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS) 21 psychosis screen; or (3) had an agedeviant total negative/disorganized Scale of Prodromal Symptoms (SOPS) 22 score ≥2 SD above age-matched peers. Psychosis Low youth (PL) included subjects who endorsed more subpsychotic (PS-R) or subthreshold negative or disorganized (SOPS) symptoms than their age-matched peers (≥1 SD) but who did not meet full criteria for PS. Typically Developing (TD) youth lacked any significant psychopathology, history of psychotropic medication use, or history of inpatient psychiatric hospitalization. Remaining youth were classified as Other Psychopathology (OP), indicating significant non-psychotic symptoms (mood, anxiety, attention-deficit, disruptive behavior, or eating disorders) and/or psychotropic medication use. Of the 881 included PNC participants, 209 were classified as TD, 216 as PS, 111 as PL, and the remaining 345 as OP. A total of 30 participants in the OP group, 43 in the PS group, and 19 in the PL group had a history of exposure to psychotropic medications. The prevalence of psychosis spectrum symptoms observed in the PNC sample 19 is consistent with previous studies investigating the similar age range. A meta-analysis of population-based studies in children and adolescents showed higher rates of psychotic-like experiences in youth than in adults (range 5-35%), with meta-analytically derived medians of 17% in children (9-12 years old), and 7.5% in adolescents (13-18 years old) 23 .

Derivation of local cortical thinning slopes in PNC participants.
To test the hypothesis that cortical thinning delays related to fortification exposure conferred protection against psychosis risk, we first needed to derive a subject-level index for whether individuals were situated in relatively flat or steep parts of the age-related thinning curve. As such, best-fit local thinning slopes for cortical thickness versus age were calculated for each subject. Subject-level cortical thickness values were extracted from the cluster as above; sex and TBV were regressed out, and z-transformed residual cortical thickness values were used for subsequent analysis (SPSS v25). Before testing the effects of local best-fit slopes on clinical phenotypes, the method for obtaining these slopes was optimized by varying the temporal window size, i.e., the number of months' worth of data on either side of a given subject to include for that subject's local slope measurement. Data on the extreme ends were excluded (e.g., for a temporal window of six months, only subjects between 8.5 and 17.5 were included). In selecting the optimal window size, normality (Kolmogorov-Smirnov test) of local slope distributions was the primary metric of interest, and slope and kurtosis were secondary metrics of interest. For regions identified as showing significant quadratic thinning in the PNC cohorts, empirical tests of these parameters for various window sizes are summarized below. An inclusion window of six months on either side of the subject was found to be optimal, and therefore this window size was used in the computation of local slopes.

Relation of local cortical thinning slopes to clinical phenotypes in PNC participants.
Multinomial logistic regression models (SPSS v25) were used to determine whether local thinning slopes in clusters with significant quadratic (delayed) thinning predicted diagnosis, with one model per region (4 total analyses). PS, PL, and OP were each compared to TD within the same multinomial model. A priori covariates included age and method of diagnosis ascertainment [binary, based on history from caregiver only (age 8-10) versus caregiver plus proband (age [11][12][13][14][15][16][17][18]], as well as sex and TBV, given established relationships between these factors and psychosis risk. Resulting adjusted odds ratios reflect the relationship between subjects' local thinning slopes and odds of diagnosis (PS, PL, or OP  The search generated a total of 3,311 radiology reports, which were manually screened for exclusion criteria. The remaining 1,111 scans were imported from clinical image repositories and inspected for artifact using ChRIS software. Exclusion of 796 scans based on visual inspection yielded 315 potentially usable scans from unique individuals. These scans were manually edited by a technician who was blind to demographic information or group membership, resulting in exclusion of 23 scans for skull wrap or other non-editable artifact. This resulted in a final tally of 292 usable scans.

eFigure 4. Additional Clusters Demonstrating Exposure-Related Effects on Cortical Thickness in the MGH Cohort
Dot plots showing cortical thickness in the right frontal and bilateral inferior temporal gyrus (ITG) as a function of exposure group (Non, nonexposed; Part, partially exposed; Full, fully exposed), suggesting intermediate effects in the partially exposed group. Bars indicate median values and shaded boxes indicate interquartile ranges. Cortical thickness values are z-transformed residuals after controlling for nuisance covariates.

eFigure 6. Estimate of Thinning Delay in MGH and PNC Clusters Showing Quadratic Age-Related Thinning
Estimates for the break point age, representing the age when cortical thinning begins, were developed for each MGH (A) and PNC (B) cluster showing significant quadratic thinning. "Flat-slope" models were derived for potential break points starting at age 9.0 and continuing upward by increments of 0.1 years to 17.0. For example, the function for the break point occurring at age 11.5 would consist of (1) a straight line from age=8.0 to 11.5, comprising the average thickness value for that interval, and (2) a sloped line from age 11.5 to 18.0, based on the best linear fit of all data points across that interval, further constrained so that the new sloped line was contiguous with the flat line from part (1). Mean of squares were determined for each incremental break point model, representing the goodness-of-fit of that model to the actual thickness data from participants between 8.0 and 18.0 years old. The mean of squares for each break point model were plotted as a function of break point age; the nadir of the resulting curve reflected the break point age that optimally fit the data. Below, the original cortical thickness versus age data are plotted, indicating at what age the optimal break point occurred. Red data points comprise the "flat" part, and blue data points comprise the "slope" part. Accordingly, the data suggest that cortical thinning among fortification-exposed individuals began between ages 13.0 and 14.3, depending on region. (C) Among the six regions showing quadratic thinning in the MGH and PNC cohorts, five showed no evidence of quadratic thinning in the nonexposed NIH cohort. As such, thinning had already begun by age 8 among NIH participants. However, one region, the left frontal cortex cluster derived in the PNC cohort, did exhibit significant quadratic thinning within the NIH cohort -suggesting that for that region, quadratic thinning was present before the onset of fortification. However, break point analysis for the NIH data in this cluster indicated an earlier onset of thinning (age 12.1) than that observed within the same cluster in the PNC data (age 13.0). Chi-square analysis of the NIH data comparing distributions of data before versus after the two break points indicated a significant difference in thinning onset. 13 (A) Surface-wide maps of age-squared effects on cortical thickness in the PNC cohort (N=861) indicate quadratic age-related thinning, driven largely by individuals who were fully exposed to fortification during gestation. Consistent with the MGH cohort, delayed thinning occurred in left frontal, bilateral inferior parietal lobule (IPL, extending to supramarginal gyrus on the right side), and right inferior temporal gyrus (ITG). Unlike the MGH cohort, accelerated thinning was seen in bilateral lingual gyrus. (B) Agethickness scatterplots depicting subject-level data from panel A. (C) Within in all but one of these clusters, only linear thinning was seen in the nonexposed NIH cohort (N=383 scans), the exception being the left frontal cluster (see also eFigure 6). Images in panel A are masked to show only clusters that survive correction for multiple comparisons (p<.05, clusterwise). Cortical thickness values in scatterplots represent z-transformed residuals after controlling for nuisance covariates.