Association of a Schizophrenia-Risk Nonsynonymous Variant With Putamen Volume in Adolescents

Key Points Question Is there any genetic variant associated with adolescent brain development that can inform psychopathology of schizophrenia? Findings In this imaging genetics study of brain structure, a significant association between a missense mutation in SLC39A8 (a gene previously associated with schizophrenia) and gray matter volume in putamen was discovered and replicated using 10 411 healthy participants from 5 independent studies. Compared with healthy control individuals, such association was significantly weakened in both patients with schizophrenia and unaffected siblings. Meaning Common genetic variant indicates an involvement of neuronal ion transport in both pathophysiology of schizophrenia and structural development of putamen.


eMethod 4. Lieber Institute for Brain Development (LIBD) Study
This study was conducted by the Lieber Institute for Brain Development (LIBD), US (http://www.libd.org/). All subjects are Caucasian. After quality control and preprocessing, the LIBD sample includes 272 healthy participants (mean age 32 years), 157 chronic treated patients with schizophrenia (mean age 35 years) and 149 unaffected siblings (mean age 37 years), who were recruited as part of the Clinical Brain Disorders Branch Sibling Study (National Institutes of Health Study ID NCT00001486). All participants were right handed. For detailed demographic information and inclusion/exclusion criterion of this clinical study we refer to our previous publication 6 .
Genetic Data For LIBD samples, genotyping was done using Illumina BeadChips (510/610/660/2.5). Quality control was performed using PLINK (version 1.07; http://pngu.mgh.harvard.edu/purcell/plink/). Participants with missing rate higher than 2% and extreme heterozygosity values (±3 SD) were removed. SNPs with missing rate higher than 2% and difference in missingness between cases and controls > 0.02 were removed. SNPs were also excluded if they failed Hardy-Weinberg equilibrium test (P < 10 −6 in controls or P < 10 −10 in cases) and if they have minor allele frequency less than 1%. Pre-phasing was done before imputation using SHAPEIT, and imputation was done using IMPUTE2 with 1000 genome as reference panel.
Structural MRI Data Three-dimensional structural MRI scans were acquired on a 1.5-T GE scanner (GE Medical Systems, Milwaukee, Wisconsin) using a T1-weighted spoiled gradient recalled sequence (repetition time, 24 milliseconds; echo time, 5 milliseconds; number of excitations, 1; flip angle, 45°; matrix size, 256x256; field of view, 24x24 cm), with 124 sagittal slices (0.94x0.94x1.5mm resolution). Images were processed by VBM8 in SPM8 following the same procedure as described above for the IMAGEN sample. The mask defined by the association result in the IMAGEN sample of those four significant clusters in the standard MNI space was then applied to the normalized brain images to extract the corresponding grey-matter volumes.

Association analysis
The association of the SNP and the grey-matter volumes were analyzed in R, and Sex, Age, Age×Age, IQ, eTIV and eWBV were used as covariates. The 95% confidence interval of the statistics were given by 3000 bootstraps, and the boxplots for comparison of brain volumes between genotypes were presented in eFigures 8-10.

eMethod 5. Three-city (3C) study
This study aims to evaluate the risk of dementia and cognitive impairment attributable to vascular factors. It is conducted in three cities of France: Bordeaux, Dijon and Montpellier (http://www.three-city-study.com/). In this analysis, we used the healthy control individuals in Bordeaux, France as both genetic data and neuroimaging data were available at this site.
Genetic data Successful genome-wide genotyping was performed on the Illumina Human610-Quad Beadchip. The quality control was performed by the Pharnext, France. After a data set pruning, the filtering of samples and exclusion of subjects was based on the following criteria: individual call rate> 5%, inbreeding coefficient >0.2, sex mismatch, identical by descent >0.1875 and detection of outliers using principal component analysis (PCA) by EIGENSOFT. There were 6467 subjects remained. Then, the filtering of SNPs was performed based on: SNP call rate>5 %, heterozygous number <40, minor allele frequency <0.0017 and Hardy-Weinberg disequilibrium p<8.6 -08 in controls. After filtering, 542 058 SNPs (93%) were left for association analyses.
Structural MRI Data Structural MRI was performed on 1.5-Tesla scanner with fast 3-dimensioal spoiled gradient-echo T1-weighted axial acquisitions with 2 excitations. Data preprocessing was performed by Laboratoire de recherche en neuroimagerie, CHUV Lausanne. All data used here have gone through visual quality check. The grey matter was subsequently normalized to MNI space using the Dartel method. The smoothing kernel has FWHM 8mm. The resulting voxel size is 1.5mm 3 . Images were processed by VBM8 in SPM8 following the same procedure as described above for the IMAGEN sample. The mask defined by the association result in the IMAGEN sample of those four significant clusters in the standard MNI space was then applied to the normalized brain images to extract the corresponding grey-matter volumes.
Association analysis Among all 634 subjects after the quality control, 515 subjects had genetic data available. Therefore, 515 subjects were used for the replication of the association between SNP and GMV. The association of the SNP and the grey-matter volumes were analyzed by Plink2 using linear regression. Sex, age, age×age, education, eTIV and eWBV were used as covariates in association analysis. The 95% confidence interval of the statistics were given by 3000 bootstraps, and the boxplots for comparison of brain volumes between genotypes were presented in eFigure 12.

Structural MRI Data
In the Februray 2017 release, a single scanner dedicated to UKB imaging in Cheadle Manchester. The scanner is a standard Siemens Skyra 3T running VD13A SP4 (as of October 2015), with a standard Siemens 32-channel RF receive head coil. The imaging protocol is as follows: Resolution: 1×1×1 mm Field-of-view: 208×256×256 matrix Duration: 5 minutes 3D MPRAGE, sagittal, in-plane acceleration iPAT=2, prescan-normalise For more details we refer to the official document for neuroimaging of the UKB (http://biobank.ctsu.ox.ac.uk /crystal/docs/brain_mri.pdf). To be consistent, we followed exactly the same pre-processing procedure as the IMAGEN study using the VBM8. In total, we were able to successfully process T1 images of 9 880 subjects, and extracted the grey matter volumes of the predefined clusters by the discovery sample.
Genetic data Genotype data are available for all 500,000 participants in the UKB cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array. The two arrays are extremely similar (with over 95% common content). Among those participants with neuroimaging data been successfully pre-processed using the VBM8, 6 932 subjects had also the genotype information at both SNPs [rs13107325 (genotyped, MAF = 0.07) and rs7182018 (imputed by IMPUTE2, MAF = 0.09, INFO score = 0.9961)] after imputation from the March 2018 release. All subjects in this subset were estimated to have recent British ancestry and have no more than 10 putative third-degree relatives in the kinship table using the sample quality control information provided centrally by UKB (using the variables of white.British.ancestry.subset and excess.relatives in the file ukb_sqc_v2.txt). For more details we refer to the official document for genetic data of the UKBk (http://www.ukbiobank.ac.uk/scientists-3/genetic-data/).

Association analysis
In total, 6932 subjects were used for the replication of the association between SNP and GMV. The association of the SNP and the grey-matter volumes were analyzed by linear regression model. Sex, age (we used the age when the participants were actually scanned, which was the third instance of the date when the participants attended assessment centre recorded in Data-Field 21003 minus the date of birth recorded in Data-Field 33), age×age, eTIV were used as covariates in association analysis. We also replicated the findings by including either BMI (t 6870 = 14.87/16.42, p = 1.59×10 -49 /8.61×10 -60 for the clusters in the left/right central sulcus; t 6870 = 4.85/6.49, p = 6.44×10 -7 /4.69×10 -11 for the clusters in the left/right putamen) or eWBV (t 6885 = 15.05/16.52, p = 1.03×10 -50 /6.89×10 -51 for the clusters in the left/right central sulcus; t 6885 = 3.74/5.58, p = 9.34×10 -5 /1.24×10 -8 for the clusters in the left/right putamen) as additional covariates. The 95% confidence intervals of the statistics were given by 3000 bootstraps, and the boxplots for comparison of brain volumes between genotypes were presented in eFigure 11.

eMethod 7. Brain eQTL database
A database for eQTL (expression Quantitative Trait Loci) in the brain was made publicly available by the UK Brain Expression Consortium (UKBEC 8 ). The central nervous system tissues originating from 134 neuropathologically normal individuals was collected by the Medical Research Council (MRC) Sudden Death Brain and Tissue Bank, Edinburgh, UK, and the Sun Health Research Institute (SHRI) an affiliate of Sun Health Corporation, USA. From each individual, up to 10 brain regions were analyzed, including cerebellar cortex (CRBL), frontal cortex (FCTX), hippocampus (HIPP), medulla (specifically inferior olivary nucleus, MEDU), occipital cortex (specifically primary visual cortex, OCTX), putamen (PUTM), substantia nigra (SNIG), thalamus (THAL), temporal cortex (TCTX) and intralobular white matter (WHMT). The Affymetrix Exon 1.0ST Arrays were used to measure the RNA transcripts, and all arrays were pre-processed and log2 transformed following the standard protocol. Genomic DNA was extracted from sub-dissected samples of human post-mortem brain tissue, and all samples were genotyped on the Illumina Infinium Omni1-Quad BeadChip with a custom genotyping array. The expression QTL for each expression profile against every genetic marker in MatrixEQTL, and subsequent analyses were conducted in R. All gene expression, genotype, and eQTL results were free to access through a website http://www.braineac.org/.
First, as the SNP rs13107325 is a missense variant change Als391 to Thr391 in the ZIP8 protein, and this substitution is likely to cause structural change in a α-helical transmembrane domain of the ZIP8 protein. Therefore, we first tested whether this SNP was associated with expression of the gene SLC39A8 in the putamen. We identified a significant association between the SNP rs13107325 and gene expression of SLC39A8, particularly measured by the exon-specific probe Affymetrix ID 2779840, which was a probe of a transcript cluster (ID 2779823) of SLC39A8.
Second, to test whether such association was tissue specific to putamen among other brain regions and whether this SNP also has the cis effects on nearby genes, we second tested such association against 10 brain tissues available in the UKBEC database and 6 genes within ±1Mb of the location of rs13107325. For this extended search, we adopted a threshold of 0.0008 (= 0.05/10/6, corrected for 10 types of brain tissues and 6 genes).

eMethod 8. Voxel-wise and Genome-wide Association Study (vGWAS)
Association analysis Firstly, we did the genome-wide association study for the whole brain by using the grey-matter volumes of each voxel as a continuous trait. The AAL (automatically anatomic labeling 9 ) template was used to restrict our analysis in the grey matter of the brain. Therefore, the whole brain was divided into 438 145 voxels. A significant association was identified if a cluster with more than 217 (≈ 4/3 × π × (3.3970 × 1.645) 3 / 1.5 3 falling into the 90% confidence interval of the smoothing kernel) voxels (~0.7ml) had the p values of the associations all survived a Bonferroni correction at the brain-wide and genome-wide significant level (p < 2.4483×10 -13 = 0.05 / 438 145 [n of voxels] / 466 114 [n of SNPs]). As the statistical threshold of significance can be given by various criterions, we also tried one more stringent threshold (p < 1.1412×10 -13 = 5×10 -8 [genome-wide significance level ] / / 438 145 [n of voxels] ) and one less stringent threshold (  p  1 (1 510 )  1.9417 10 -11 ), where M  2575 was the effective number of independent tests (https://neurogenetics.qimrberghofer.edu.au /SNPSpDlite/ 10 ). The results were listed in eTable 3. As expected, the higher threshold gave smaller clusters while the lower threshold gave larger clusters. Both of them were at the same locations as the clusters defined by the Bonferroni threshold. The GWAS results at the cluster level were listed in eTable 7 for IMAGEN baseline sample and eTable 8 for UK Biobank sample.
Secondly, the clusters identified by the brain-wide and genome-wide association analysis defined regions of interest. To control the possible confounding factors, we used 14 covariates during the association study, including sex, handedness, site (7 dummy variables for 8 sites), eTIV, and first four principal components from multidimensional scaling analysis 11 . Excluding the subjects with any missing values in the genetic data, volume, or covariates, we had 1 721 subjects left for the GWAS. Since rs13107325 had been associated with BMI 12 , we considered BMI as an additional covariate to see if the observed association remains. To further conditioning out the potential confounding effects, we also tried to include the age, perceptual reasoning IQ, verbal comprehension IQ, and puberty as further covariates to see if the identified association holds (eTable 4). A mask of these significant clusters were generated, and applied to other independent samples to verify our findings. Followed the literature 13 , the percentage variance explained by each genome-wide significant SNP was estimated after accounting for covariate effects.
Two clusters in bilateral putamen were reported in the main text, and the other two clusters in the central sulcus were as follows: the MNI coordinate of the peak voxel at the left was (-49.5, -7.5, 30) and the association was estimated as b = 1.78×10 -9 and p=2.75×10 -22 ; at the right, the association of the peak voxel (54, -4.5, 19.5) was b = 1.50×10 -9 and p=6.27×10 -26 .
To facilitate the replications and the behavioural analysis, we used the volumes of the identified clusters instead of the volume of each voxel in the following analyses. The grey matter volume of a cluster was calculated by summing over all voxels within that cluster 14 . As expected, the association between the SNPs and the volumes of these identified clusters were still significant (b>0, p<10 -17 , eFigure 1 and Table 1; eTable 5). We first made a mask for those four significant clusters identified using the IMAGEN sample (i.e., two clusters in the bilateral putamen and two clusters in the bilateral central sulcus). Second, by applying this mask labelling 4 regions-of-interest (ROI) to structural MRI data from each replicating cohort, we estimated the grey matter volume (GMV) for each ROI. Third, we calculated the associations between SNP rs13107325 and GMV of each of these two putamen ROI's, and between SNP rs7182018 and GMV of each of these two central sulcus ROI's.
As IMAGEN is a longitudinal cohort, we also tested whether our findings in the IMAGEN sample at age 14 years remained significant 5 years later when these participants turned to age 19 years. Among 869 19year-old subjects in the follow-up sample of the IMAGEN study, we confirmed that SNP rs13107325 remained significantly associated with both the clusters (t 854 = 5.86/6.26, 95% CI = 3.89-7.74 / 4.33-8.57, p = 3.33×10 -9 /3.08×10 -10 , variance explained = 3.86%

eMethod 9. Statistical meta-analyses
Putting together all 5 samples, including the IMAGEN sample at the baseline, the SYS sample, the LIBD healthy control sample, the UKB sample, and 3C sample, we conducted a meta-analysis of the identified associations with a fixed effect model. The analyses were implemented by the R package metafor http://www.metafor-project.org/ and the forest plots were also generated by this package.

eMethod 10. Two-sample Summary-data-based Mendelian Randomization
To test whether the pleiotropic associations of SNP rs13107325 with both grey matter volume of the putamen cluster and the risk for schizophrenia, we conducted a 2-sample summary-data-based Mendelian randomization (SMA) analysis as proposed in the literature 15 . In this analysis, we considered SNP rs13107325 as the instrument variable, GMV of the right putamen volume as the exposure variable (the current vGWAS in IMAGEN), and schizophrenia as the outcome variable (GWAS by PGC2 16 ). This analysis was conducted by a web-based version of the MR-Base (http://www.mrbase.org), which is a platform integrating a curated database of complete GWAS results 17 . A significant result given by the SMR suggests a significant pleiotropic associations of both exposure variable and outcome variable with the instrument variable (SNP), and furthermore there might also be a causal effect of the exposure on the outcome, i.e., a vertical pleiotropic association, but inferring causality using observational data must be interpreted with great caution 15 .
As vGWAS signal might indicate a real effect from the genetic variants in a LD block of the indicator SNP, we submitted the SMR results of all SNPs within ±1M bp of the indicating SNP to Locus Zoom (http://locuszoom.org/) to plot the MR results together with recombination rate. With these figures, we could check whether there was any significant effect of the nearby SNPs that are less likely to be separated from the indicating SNP due to recombination. It is also recommended that using multiple independent instrumental variables may provide further support on the causal relationship between exposure and outcome 17 . Using the standard genome-wide significance level (p<5e-8), we identified more SNPs associated with either putamen clusters or central sulcus clusters that could be used as instrumental variables (eFigure 17). However, we found that these SNPs were in the same LD block with either SNPs rs13107325 or rs7182018 (eTable 14). Therefore, we reported the SMR results for the SNPs with the strongest association only.

eMethod 11. Power analysis
In the LIBD study, we had 272 healthy controls. We estimated the partial correlation between genotype of rs13107325 and grey matter volume of the right putamen cluster was 0.3117 using age, age^2, IQ, sex, TIV and WBV as covariates. Setting Test family as t tests, Statistical test as Linear multiple regression: Fixed model, single regression coefficient, Type of power analysis: A priori: Compute required sample sizegiven α, power, and effect size in G*Power (http://www.gpower.hhu.de/en.html), we estimated the required sample size as 102 for 95% power assuming a 5% significance level and a one-side test. Therefore, the LIBD study provided us adequate sample sizes (157 patients and 149 unaffected siblings)

. Clusters reached a brain-wide and genome-wide significance level.
Significant clusters defined by the association study after Bonferroni correction (p<2.45×10 -13 ) and cluster size correction (>217 voxels falling into the 90% confidence interval of the smoothing kernel, ~0.7ml). The betas and p-values of these associations were listed for the peak voxels. MAF is the frequency rate of the minor allele A2. 'A1 A1' coded as 0, 'A1 A2' coded as 1, and 'A2 A2' coded as 2. The 'L.PUT' is short for the left putamen, 'R.PUT' for the right putamen, 'L.CEN' for the left central sulcus, and the 'R.CEN' for the right central sulcus.

. Significant associations defined by different thresholds.
Significant clusters defined by the association study according to two alternative criterions for multiple comparisons and cluster size correction (>217 voxels falling into the 90% confidence interval of the smoothing kernel, ~0.7ml). The betas and p-values of these associations were listed for the peak voxels. MAF is the frequency rate of the minor allele A2. 'A1 A1' coded as 0, 'A1 A2' coded as 1, and 'A2 A2' coded as 2. The 'L.PUT' is short for the left putamen, 'R.PUT' for the right putamen, 'L.CEN' for the left central sulcus, and the 'R.CEN' for the right central sulcus. "F" for female, "M" for male. P values were given by testing if beta is greater than 0. The mean volumes of each cluster had been compared between female and male by one-way analysis of variance, and the significantly larger volume was marked by "*" for p < 0.01, "**" for p < 0.001, and "***" for p < 0.0001. The associations between brain clusters and SNPs were estimated by linear regression model as described in the eMethods of vGWAS. Both estimates of the beta coefficient and its standard error were listed. Left PUT stands for the association between SNP rs13107325 and grey matter volume of the cluster in the left putamen. Right PUT stands for the association between rs13107325 and the right putamen cluster; Left CEN is for the association between rs7182018 and the cluster in the left central sulcus; Right CEN is for the association between rs7182018 and the cluster in the right central sulcus Meta-analyses were conducted using the R package metafor with the inputs listed in eTable 9. The discovery sample (IMAGEN baseline) and the replication samples (SYS, LIBD healthy control, UKB, and 3C) were included in this meta-analysis. The meta estimate of the beta, its standard error (SE), and its 95% confidence interval (ci.lb is the lower bound of the CI, while ci.ub is its upper bound) were listed. The corresponding Z statistic and the p-value were also reported. Given the large sample size of the UK Biobank cohort, we further replicated our findings by the following two whole-brain approaches: 1) testing if the peak voxels within the significant clusters could reach the significance level of 1.9417e-11; 2) testing if the pre-defined clusters were significant by randomized permutation test at a cluster level. We first calculated the mean T (T 0 ) statistic of the associations between the SNP and all voxels within the mask of the significant cluster. Second, we randomly permuted the genotype data 10,000 times, calculated its association with each voxel in the brain, identified the peak voxel with the strongest association, selected the top N voxels (N was the number of voxels in the corresponding cluster defined by the mask of the discovery results listed in eTable 2) within its neighbourhood with the strongest associations, and then estimated the mean T statistic of the associations (T perm ). Finally, we added one to the count (S) if T perm > T 0 at each permutation, and the significance level of this randomized permutation test was given by p perm = S/10,000.   Significance level (p-value) given by Wald ratio test in SMR analysis (putamen volume as an exposure, schizophrenia as an output, and each SNP as an instrument; http://www.mrbase.org) were plotted with recombination rate for ±1M region of the location of rs13107325 on the 4 th chromosome. The figure was generated by LocusZoom (http://locuszoom.org). A) for the cluster in the left putamen; B) for the cluster in the right putamen.

Marker
A B eFigure 4. The association between central sulcus volume and schizophrenia free of non-genetic confounders given by SMR analysis using each SNP as an instrument.
Significance level (p-value) given by Wald ratio test in SMR analysis (central sulcus volume as an exposure, schizophrenia as an output, and each SNP as an instrument; http://www.mrbase.org) were plotted with recombination rate for ±1M region of the location of rs7182018 on the 4 th chromosome. The forest plot for the meta-analysis of the association between rs7182018 and the grey matter volume of the cluster in the right central sulcus. This figure was generated by the R package metafor.

eFigure 8. Comparison of volumes and histogram of t-statistics in the LIBD healthy controls.
Boxplots of grey matter volumes grouped by genotype and histogram of t-statistic by 3000 bootstraps in the LIBD healthy controls.

eFigure 9. Comparison of volumes and histogram of t-statistics in the LIBD patients.
Boxplots of grey matter volumes grouped by genotype and histogram of t-statistic by 3000 bootstraps for the schizophrenic patients in the LIBD study.

eFigure 10. Comparison of volumes and histogram of t-statistics for unaffected siblings of patients in the LIBD study.
Boxplots of grey matter volumes grouped by genotype and histogram of t-statistic by 3000 bootstraps for the unaffected siblings of schizophrenic patients in the LIBD study.

eFigure 15. Forest plot of the meta-analysis in the left putamen.
The forest plot for the meta-analysis of the association between rs13107325 and the grey matter volume of the cluster in the left putamen. This figure was generated by the R package metafor. eFigure 16. Forest plot of the meta-analysis in the right putamen. The forest plot for the meta-analysis of the association between rs13107325 and the grey matter volume of the cluster in the right putamen. This figure was generated by the R package metafor. eFigure 17. GWAS signals for GMV using the genome-wide significance level (p < 5e-8).
A. left putamen cluster; B. right putamen cluster; C. left central sulcus cluster; D. right central sulcus cluster; E. left putamen brain area; F. right putamen brain area.