The 22 autosomes are displayed along the x-axis, with the negative logarithm of the association P value for each block displayed on the y-axis. All P values above the upper (red) line have q values of less than 0.01, and those above the lower (blue) line have q values of less than 0.1.
eFigure 1. Severity of block QC versus number of sites remaining in the analysis
eFigure 2. Scree plot from principal component analysis
eFigure 3. QQ plot
eFigure 4. Relation between severity of block QC and lambda
eFigure 5. Permutation test results for remaining hypoxia genes
eFigure 6. Permutation test results for remaining immune system genes
eTable 1. Descriptive statistics for sequencing parameters
eTable 2. Design features of pyrosequencing assays
eTable 3. Full replication results
eTable 4. Smoking and hypoxia: bivariate correlations and regression analyses
eTable 5. Alcohol: bivariate correlations and regression analyses
eTable 6. Narcotics: bivariate correlations and regression analyses
Customize your JAMA Network experience by selecting one or more topics from the list below.
Aberg KA, McClay JL, Nerella S, et al. Methylome-Wide Association Study of Schizophrenia: Identifying Blood Biomarker Signatures of Environmental Insults. JAMA Psychiatry. 2014;71(3):255–264. doi:10.1001/jamapsychiatry.2013.3730
Epigenetic studies present unique opportunities to advance schizophrenia research because they can potentially account for many of its clinical features and suggest novel strategies to improve disease management.
To identify schizophrenia DNA methylation biomarkers in blood.
Design, Setting, and Participants
The sample consisted of 759 schizophrenia cases and 738 controls (N = 1497) collected in Sweden. We used methyl-CpG–binding domain protein-enriched genome sequencing of the methylated genomic fraction, followed by next-generation DNA sequencing. We obtained a mean (SD) number of 68 (26.8) million reads per sample. This massive data set was processed using a specifically designed data analysis pipeline. Critical top findings from our methylome-wide association study (MWAS) were replicated in independent case-control participants using targeted pyrosequencing of bisulfite-converted DNA.
Main Outcomes and Measures
Status of schizophrenia cases and controls.
Our MWAS suggested a considerable number of effects, with 25 sites passing the highly conservative Bonferroni correction and 139 sites significant at a false discovery rate of 0.01. Our top MWAS finding, which was located in FAM63B, replicated with P = 2.3 × 10−10. It was part of the networks regulated by microRNA that can be linked to neuronal differentiation and dopaminergic gene expression. Many other top MWAS results could be linked to hypoxia and, to a lesser extent, infection, suggesting that a record of pathogenic events may be preserved in the methylome. Our findings also implicated a site in RELN, one of the most frequently studied candidates in methylation studies of schizophrenia.
Conclusions and Relevance
To our knowledge, the present study is one of the first MWASs of disease with a large sample size using a technology that provides good coverage of methylation sites across the genome. Our results demonstrated one of the unique features of methylation studies that can capture signatures of environmental insults in peripheral tissues. Our MWAS suggested testable hypotheses about disease mechanisms and yielded biomarkers that can potentially be used to improve disease management.
The methylation of DNA cytosine residues at the carbon 5 position is a common epigenetic modification that is often found in the sequence context CpG. Investigations of these markings provide a promising complement to schizophrenia studies of DNA sequence variation. First, methylation can directly affect gene expression, so it may capture additional variation in disease susceptibility. Indeed, specific epimutations have already been associated with human diseases, including psychiatric disorders.1 Second, methylation studies may advance our understanding of schizophrenia. For example, they can potentially account for a variety of features, such as its episodic nature.2,3 Third, the translational potential is considerable. For example, epigenetic markings are modifiable by pharmaceutical interventions, making them possible new drug targets.4
The pathogenic processes for psychiatric disorders likely involve the brain. However, brain tissue is not readily accessible in living patients, so blood is typically used in biomarker studies. There are 2 models explaining how methylation studies in blood can advance schizophrenia research.5 Neither model assumes that methylation in blood directly affects disease susceptibility, although this is possible, in principle, because blood provides a biological environment for other tissues, including the brain. In the “signature” model, associations occur between schizophrenia and methylation markings because the factors that increase disease susceptibility leave a biomarker signature in blood. Thus, the methylation markings in blood implicate a cause of the disease, which may affect schizophrenia through processes that are unrelated to methylation in the brain. In contrast, the “functional mirror site” model assumes a causal role of methylation sites in the brain. When the methylation status of these sites in the brain is mirrored by the corresponding sites in the blood, we will observe associations between schizophrenia and methylation markings at the same loci in blood. Compared with tissue-specific differentially methylated regions,6 correlated methylation profiles across tissues are common. Mirror sites occur because peripheral tissues may reveal methylation markings predating or resulting from the epigenetic reprogramming events affecting the germ line and embryogenesis,7 and environmental factors and genetic polymorphisms can affect methylation levels in multiple tissues.8,9 To study the 2 models,5 we administered haloperidol decanoate to inbred mice and then performed whole-methylome profiling in the blood, cortex, and hippocampus. More than 65% of the sites showed correlated changes where the concordance rates were similar between blood and brain vs between the 2 brain tissues. This showed that factors affecting brain processes (eg, haloperidol) can leave biomarker signatures in blood and that the methylation status of many sites in the brain is mirrored in the blood.
Current knowledge about the role of DNA methylation in schizophrenia is mainly acquired from relatively small studies of peripheral blood10-17 and postmortem brain tissue.18-26 Most studies focused on specific genes, such as RELN,19,22HTR2A,20COMT,13,18SOX10,23 and FOXP2.26 Two studies1,12 investigated a broader set of sites. One investigated approximately 12 000 regulatory regions in postmortem brain tissue samples from 35 patients with schizophrenia and 35 controls.1 It reported differences in the vicinity of loci that can be functionally linked to disease etiology. The second study12 investigated approximately 27 000 CpG sites in peripheral blood from 11 pairs of monozygotic twins discordant for schizophrenia. Dempster et al12 observed significant epigenetic disruptions in biological networks relevant to psychiatric disease and neurodevelopment.
The goal of the present study is to identify schizophrenia methylation biomarkers in blood through a methylome-wide association study (MWAS). The most comprehensive method involves the use of next-generation sequencing after bisulfite conversion of unmethylated cytosines. Currently, however, this is not economically feasible considering the sample sizes required for an MWAS.27 As a cost-effective alternative, we first captured the methylated DNA fragments and then sequenced this methylation-enriched portion of the genome28 (see Aberg et al29 for a discussion of the merits of methyl-CpG binding domain [MBD] protein-enriched genome sequencing [MBD-seq]). Our “discovery” MWAS sample consisted of almost 1500 schizophrenia cases and controls. Critical findings were replicated in an independent group of participants using targeted bisulfite pyrosequencing.
Detailed descriptions of the method can be found elsewhere.29-31 Our study was approved by the institutional review board at Karolinska Institutet, Stockholm, Sweden, and written informed consent was obtained from all participants.
Table 1 describes the “discovery” MWAS and replication samples. All participants were selected from national population registers in Sweden and are part of a larger study.32 Because 3 participants withdrew their consent during the study, we report results for 1497 participants. Key findings were replicated in an independent group of 1144 participants and at other sites in an independent group of 360 participants. For all participants, DNA was extracted from the buffy coat of whole blood.
We used MethylMiner (Invitrogen), which employs MBD protein-based enrichment of the methylated DNA fraction, followed by single-end sequencing (50 base-pair reads) on the SOLiD platform (Life Technologies). We eluted the captured methylated fraction with 0.5M sodium chloride to increase the relative number of fragments from CpG-poor regions,29 which otherwise would not be as well covered.33 To avoid batch effects, samples were processed in random order.
eTable 1 in the Supplement gives descriptive statistics for a variety of sequencing parameters. In summary, after deleting reads with more than 2 missing calls, we obtained a mean (SD) number of 68 (26.8) million reads per sample. Reads were aligned (build hg19/GRCh37) using BioScope 1.2 (Life Technologies). We deleted all samples with less than 40% alignment. For the remaining samples, the mean (SD) percentage of mapped reads was 69.2% (6.2%). We eliminated 32.1% of the mapped reads because they were low-quality multireads (reads aligning to multiple locations) or duplicate reads (reads with identical start positions). We excluded 38 participants because less than 15 million reads remained after quality control. This left 1459 participants with a mean (SD) number of 32.4 (13.7) million quality-control reads. Using data from 73 technical replicates, we observed a mean/median correlation of 0.90/0.92 between the methylation profiles from the replicates.29 This supported the reproducibility of our assay.
The MBD protein only binds to methylated CpG sites, so we only consider the 26 752 702 autosomal CpG sites in the reference genome for our analysis. The 10.5 million CpG sites (36%) located in regions showing alignment problems were eliminated.29 Most (71.8%) of these were in regions flagged as repetitive elements by RepeatMasker (http://www.repeatmasker.org/). Methylation measurements were obtained by estimating how many fragments covered each CpG site.31 Highly intercorrelated coverage estimates at adjacent CpG sites were combined to obtain more reliable measurements.34 Rather than using a sliding window of an arbitrary fixed length, we combined sites adaptively based on their observed intercorrelations.30 Using the 99th percentile of the coverage estimates at non-CpG sites29 as the threshold for background noise, we excluded 730 522 blocks with low coverage (likely unmethylated) from further analysis (eFigure 1 in the Supplement). This left 4 344 016 blocks for association testing.
A variety of efforts were made to control for confounders. First, we regressed out possible assay-related technical artifacts such as the quantity of genomic DNA starting material, the quantity of methylation-enriched DNA captured, and the sample batch. In addition, we controlled for age and sex.
Second, after regressing out the measured confounders, we performed principal component analysis to capture the major remaining unmeasured confounders. Because existing software cannot handle the ultrahigh-dimensional MWAS data, we used our own software30 that allows for parallel processing, that uses C++ for CPU-intensive and input/output-intensive calculations, and that follows Gower35 by performing the eigen-decomposition of a much smaller transposed variant of the data matrix. Based on a scree test (eFigure 2 in the Supplement), the first 7 principal components were regressed out of the association analysis.
Third, we correlated principal component scores with a variety of variables to check whether additional covariates were required (see Table 2 in Aberg et al29). For example, these analyses showed that, in this fairly homogeneous sample, ancestry did not contribute substantially to variation in the methylome, and it was therefore not included as a covariate.
Blood consists of a variety of cell types. By using whole-blood samples, we are studying an “average” methylation pattern that will be dominated by the common types. This can produce false positives only if both (1) the relative abundance of common cell types differs across cases and controls, and (2) methylation patterns of common cell types differ. Ideally, we would have case-control MBD-seq data obtained from separated white blood cells36 to identify sites that are at risk for creating false positives. The principal component analysis provides an alternative in situations where cell-type heterogeneity affects many methylation sites.36-38 Participants with a similar cell-type composition will have more similar multilocus methylation patterns, and these patterns will be captured by the principal components. However, situations where few methylation sites are involved will remain uncorrected. We note, however, that most tissue samples will be heterogeneous, so similar risks are present when studying other tissues too.
We used ConsensusPathDB39-41 to generate protein-protein interaction (PPI) networks and perform pathway analyses based on the Reactome,42 Kyoto Encyclopedia of Genes and Genomes,43 and BioCarta databases. To create microRNA (miRNA) networks, we used the University of California, Santa Cruz, genome track TS miRNA site for GRCh37/hg19, which is based on TargetScan 5.1 (Bioinformatics and Research Computing). All blocks with q < 0.01 in the MWAS were matched to the closest gene ±20 kilobases. For each of the 4601 reference pathways present in ConsensusPathDB, incorporating 9859 known genes, a hypergeometric test was performed to study whether the overlap between the top MWAS genes and those present in each reference pathway was higher than expected by chance.
For the replication, we used targeted bisulfite pyrosequencing.44,45 We replicated the top 5 MWAS findings and 10 sites selected from the network analyses. Controlling the familywise error rate at the α level of .05 through a Bonferroni correction therefore gives a threshold of .05/15 = 3.3 × 10−3. We conservatively used the highest (least significant) P value if there were multiple (correlated) CpG sites in the same assay. Finally, we added a negative control by assaying a site with a high nonsignificant MWAS P value, and to assess the efficacy of the principal component analysis, we selected the 2 most significant findings obtained after performing the MWAS without principal components.
For network/pathways findings, a second “replication” opportunity existed by testing whether, after excluding the (top) findings used to identify the networks, the remaining genes from that network are also associated with case-control status in the MWAS (for miRNA networks, these tests are not suitable because miRNA likely regulates genes that may have different functions). For this purpose, we performed permutation tests.
Our Figure shows the MWAS Manhattan plot with 139 tests with q < 0.01, meaning that less than 1% of the 139 findings are expected to be false discoveries (eFigure 3 in the Supplement).46,47 The P values for these sites ranged from 10−7 to 10−11, with 25 sites reaching significance after we used the highly conservative Bonferroni correction (threshold P = 1.15 × 10−8). Our test statistic inflation parameter λ of 1.12 was higher compared with what is commonly observed in genome-wide association studies. This λ value is unlikely an artifact. After we performed a square root transformation to normalize the data and mitigate the effects of possible outliers, λ did not change. Furthermore, increasing the stringency of the quality control resulted in higher rather than lower λ values (eFigure 4 in the Supplement). Instead, this λ value reflects that methylation studies are more akin to gene expression studies that typically show many correlated effects with relatively large effect sizes.
Of the 139 MWAS findings, 112 overlapped with genes. Table 2 shows that regardless of whether we used PPI networks, pathway databases, or miRNA target networks, hypoxia was the dominant theme. For example, the PPI network centered on EPAS1 (previously known as hypoxia-inducible factor 2) includes 2 genes, both of which were detected in our MWAS. EPAS1 encodes a transcription factor induced as oxygen levels fall and is known to specifically interact with ETS1, another center for a PPI network among our findings, which is involved in the regulation of vascular development in the neonatal mouse brain.48 Furthermore, transcription coactivator EP300 is necessary for hypoxia-induced transcriptional activation and is upregulated in low-oxygen conditions.49 Using reference biological pathways, we detected the hypoxia-inducible factor 1 alpha (HIF1A) transcription factor network. HIF1A, together with ARNT, forms hypoxia-inducible factor (HIF), which regulates hypoxia-inducible genes.50 In addition, AKT signaling is an important modulator of HIF activity,51 and signaling by Rho GTPases has been linked to hypoxia response, particularly in the vascular system.52 Finally, miRNA miR-217 regulates heme oxygenase 1, an enzyme responsive to hypoxic conditions.53
Other findings shown in Table 2 converge on immune system themes. A prominent example is IgA1 (encoded by IGHA1), which is highlighted by our PPI network analyses. Although several genes associated with this network (RUNX3, CREB1, and SMAD3) are involved in multiple pathways, FCAR is highly specific to IgA because it encodes the receptor for the Fc fragment of IgA. In blood, FCAR interacts with IgA to initiate inflammatory reactions and phagocytosis. Fcγ-mediated phagocytosis, related to the action of IgG, was also among the top findings in our pathway analysis.
Table 3 shows the replication results (for design features of pyrosequencing assays and for full replication results, see eTables 2 and 3, respectively, in the Supplement). Except for the control sites, the direction of effects was the same in the replication as in the MWAS. Although all 5 top findings had replication P values of less than .05, only FAM63B remained significant after applying our very conservative correction for multiple testing. FAM63B was our top MWAS finding with a P = 6.3 × 10−11 (q = 2.1 × 10−4). The replication assay contained 3 CpG sites. The highest P value (2.3 × 10−10) of these 3 CpG sites was below our multiple testing threshold of P = 3.3 × 10−3. Table 2 shows that FAM63B is part of 4 networks regulated by miRNA. Three types of these miRNA (miR-218, miR-9, and miR-504) can be linked to neuronal differentiation and dopaminergic gene expression.54-56
All genes selected from hypoxia pathways had a nominal P < .05. Whereas the most hypoxia-specific gene (ARNT) replicated after correcting for multiple testing, the most specific immune response–related gene, FCAR, was only nominally significant. To perform our second “replication” effort, we first removed the top MWAS findings that were used to detect the networks/pathways in the initial analyses and then performed 10 000 permutations to test whether the other genes in the implicated networks showed enrichment for small P values in the MWAS. For hypoxia networks created using PPIs, none of the test statistics obtained after permutation had a value more extreme than the observed test statistic (eFigure 5 in the Supplement). This implies a P < 1.0 × 10−4 (= 1/10 000), indicating that the MWAS results for the remaining group of network genes were more significant than expected under the null hypothesis. For the pathway analyses, the permutation test was also highly significant (P <.001; eFigure 5 in the Supplement). For the immune system, we combined PPI network and pathway results to avoid small sets of genes. None of the permutation test statistics had a value more extreme than the observed test statistic (P < 1.0 × 10−4; eFigure 6 in the Supplement).
Interestingly, a site in RELN had an MWAS q value of less than 0.1. RELN has previously been associated with schizophrenia via messenger RNA expression studies57,58 and, although some inconclusive results exist,25 is one of the most prominent schizophrenia candidate genes in methylation studies.19,22 Furthermore, support for a strong inverse correlation between RELN expression and promoter methylation has been observed in mice59 and humans.60Table 3 shows that the RELN site also replicated. Similar to previous findings,19,22 we observed increased levels of methylation in schizophrenia cases. Traditionally, methylation studies of RELN have focused on the promoter region. Our best finding was located in the first intron and did not directly overlap the previous findings.
Table 3 shows that our negative control did not replicate, nor did the 2 most significant sites obtained after we performed an MWAS without regressing out the principal components. This suggests that the principal components were useful to prevent false positives. Regressing out the covariates from Table 1 did not alter results (eTables 4, 5, and 6 in the Supplement). For example, cigarette smoking can result in impaired oxygen release to tissues,61 and nicotine can upregulate HIF1A.62 However, we did not observe correlations between the methylation of genes in hypoxia networks and smoking, nor did the inclusion of smoking status as a covariate change the replication results (eTable 4 in the Supplement).
Our top MWAS finding (FAM63B) replicated with a P = 2.3 × 10−10. It was part of the networks regulated by miRNA that can be linked to neuronal differentiation and dopaminergic gene expression,54-56 functions of potential relevance for schizophrenia. Many of our other top MWAS results could be linked to hypoxia and sometimes infection. Replicated findings also implicated RELN, one of the most frequently studied candidates in methylation studies of schizophrenia.22 Interestingly, RELN is regulated by HIF1/2a and can therefore also be linked to hypoxia.63,64
The hypoxia findings were very robust. Regardless of whether we used PPI networks, pathway databases, or an miRNA target gene database, hypoxia was a dominant theme. The hypoxia genes replicated in independent samples using a different technology. Furthermore, genes that were not among the top findings in the MWAS but were in the hypoxia pathways were also significantly enriched for small P values. Although the scope and quality of our phenotype data were limited, smoking or other covariates did not account for the hypoxia findings. Although we can only speculate about the cause, we note that a substantial amount of literature exists showing that hypoxia during fetal development increases the risk of schizophrenia.3 It is known that environmentally induced methylation changes can be preserved over a prolonged period of time.65,66 One intriguing hypothesis is that early hypoxia events alter methylation profiles in blood DNA, traces of which are preserved in the adult patient.
Many MWAS results reflected environmental insults. Because environmental effects cannot alter sequence variation, these phenomena cannot be detected with genome-wide association studies or exome-sequencing studies. Although there was some thematic overlap (eg, genome-wide association studies have implicated genes involved in immune response67), this likely explains why we found genes that were different from those found in studies of sequence variants. To find overlapping loci, different analytical strategies may be required. For example, the DNA sequence can regulate methylation patterns,9,68-70 and we are currently conducting analyses to find loci where these regulatory mechanisms may be disrupted in schizophrenia. Because these analyses combine sequence variants with methylation patterns, they are more likely to yield results that overlap with genome-wide association study findings. Methylation signatures of environmental insults may not impact gene expression in blood. Thus, whole transcriptome studies may not be able to capture the phenomena detected in this study; therefore, methylation studies provide unique possibilities compared with other technologies.
Our results demonstrate how methylation studies in whole blood can advance schizophrenia research. First, they suggest that a record of pathogenic events may be preserved in the methylome. Etiologically distinct disease subtypes may be distinguishable from each other with respect to prognosis, course, or response to treatment.71 The possibility of identifying these subtypes using methylation markers that tend to have large effect sizes and can be measured with cost-effective assays, using DNA from blood that is stable and easy to collect, would be of great clinical importance. Second, methylation studies can generate testable hypotheses about disease mechanisms. For example, the hypoxia findings show how methylation studies can point to disease-causing factors. As postulated by the “signature model,”5 the causal factors may affect schizophrenia through processes that have nothing to do with methylation (eg, possible causal mechanisms include disruption of the laminar organization of the cerebral cortex72). Our RELN finding possibly demonstrates the second possible model, in which the methylation status of disease-relevant sites in the brain is mirrored by the corresponding sites in the blood. Thus, previous studies19,22 have implicated methylation sites in RELN in schizophrenia using postmortem brain samples. The fact that we find this gene in whole blood provides a possible illustration of the “functional mirror-site model.”5
A variety of efforts were taken to control for potential confounders. Our results suggested genes related to hypoxia, the immune system, and brain function rather than genes that, for example, are potentially relevant to medication and life style differences. This suggests that our efforts worked satisfactorily. For biomarkers other than genetic variants, there is always the inherent risk of confounding effects. Experiments studying model systems in controlled environments (eg, cell culture) would be the next step to rule out confounders completely.
Studies have suggested that 30 to 60 million reads per sample may be sufficient to reveal valuable information for whole-genome methylation analysis.33,73 We obtained, on average, 68.0 million reads, of which 32.4 million high-quality reads (47.6%) remained after stringent quality control. The MWAS was performed on “blocks” that summed reads across correlated CpG sites to improve the reliability of the measurements. This appeared to be sufficient to detect methylation markers that replicated in independent samples. It is possible, however, that increasing the number of reads would allow the detection of sites (eg, in CpG-poor regions) that could currently not be measured reliably.
In summary, to our knowledge, the present study is one of the first MWASs of disease with a large sample size using a technology that provided good coverage of methylation sites across the genome. Our results demonstrate how methylation studies can suggest new avenues to increase our understanding of disease and yield biomarkers that can be used to potentially improve disease management.
Submitted for Publication: April 22, 2013; final revision received June 22, 2013; accepted July 29, 2013.
Corresponding Author: Edwin J. C. G. van den Oord, PhD, Center for Biomarker Research and Personalized Medicine, Virginia Commonwealth University, PO Box 980533, Richmond, VA 23298-0581 (firstname.lastname@example.org).
Published Online: January 8, 2014. doi:10.1001/jamapsychiatry.2013.3730.
Author Contributions: Dr van den Oord had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Aberg, McClay, Chen, Xie, Gao, Sullivan, van den Oord.
Acquisition of data: Aberg, Hudson, Harada, Hultman, Magnusson, van den Oord.
Analysis and interpretation of data: Aberg, McClay, Nerella, Clark, Kumar, Chen, Khachane, Xie, Hudson, Gao, Hultman, Magnusson, van den Oord.
Drafting of the manuscript: Aberg, McClay, Nerella, Clark, Kumar, Hudson, Gao, Hultman, van den Oord.
Critical revision of the manuscript for important intellectual content: Aberg, Clark, Chen, Khachane, Xie, Harada, Sullivan, Magnusson, van den Oord.
Statistical analysis: Aberg, McClay, Clark, Chen, Gao, van den Oord,
Obtained funding: Aberg, Hultman, Sullivan, Magnusson, van den Oord.
Administrative, technical, or material support: Aberg, McClay, Nerella, Khachane, Hudson, Harada, Hultman, Magnusson, van den Oord.
Study supervision: Aberg, Khachane, Hultman, van den Oord.
Conflict of Interest Disclosures: None reported.
Funding/Support: This study was supported by the National Institute of Mental Health (grant RC2MH089996) and is part of a larger project entitled “A Large-Scale Schizophrenia Association Study in Sweden” that is supported by grants from the National Institute of Mental Health (grant MH077139) and the Stanley Medical Research Institute. The institutions involved in this project are the Karolinska Institutet, the Icahn School of Medicine at Mount Sinai in New York, the University of North Carolina at Chapel Hill, Virginia Commonwealth University, the Broad Institute in Cambridge, Massachusetts, and the US National Institute of Mental Health in Bethesda, Maryland.
Role of the Sponsor: The sponsors had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: Library construction and next-generation sequencing were performed by EdgeBio. We thank the Swedish Schizophrenia Consortium and Life Technologies for their advice.
Create a personal account or sign in to: