Figure 1. Histograms of P values from the family-based association analyses. Results are given in subjects from European (A and E), Asian (B and F), and African (C and G) ancestry. The combined analyses (D and I) present results from the test statistics across the 3 groups while taking the direction of effects into account. Plots are given for single-nucleotide polymorphisms not including (A-D) and including (F-I) the major histocompatibility complex region.
Figure 2. Comparison of single-nucleotide polymorphisms selected through data integration (blue), that have a top genome-wide association study (GWAS) meta-analysis P value (red), or both (green). P values in the family-based replication study are calculated on a −log10 scale.
Aberg KA, Liu Y, Bukszár J, et al. A comprehensive family-based replication study of schizophrenia genes. JAMA Psychiatry.. Published online April 9, 2013. doi:10.1001/jamapsychiatry.2013.288.
eTable 1. The 18 genome-wide association study (GWAS) samples included in our meta-analysis
eTable 2. Number of SNPs selected for various reasons
eTable 3. Gene-based permutation P values
eFigure 1. Number of cases vs controls (A) and genotyped vs imputed SNPs (B)
eFigure 2. SNP imputation r2 by study
eFigure 3. SNP imputation r2 by lambda
eFigure 4. Scree plot PC analyses
eFigure 5. Ancestry across studies (A) and per study sample (B)
eFigure 6. QQ plot (A) and Manhattan plot (B) GWAS meta-analysis before and after correction with PCs
eFigure 7. Ancestral clustering of replication samples
eFigure 8. LD between SNPs from Table 1 located on chromosome 6 (A), chromosome 10 (B), and chromosome 22 (C) and from Table 2 located on chromosome 6 (D)
eFigure 9. Performance of P value vs MIND-based selection evaluated by simulation (A), cross-validation (B), and results from current replication study (C)
eMethods 1. Meta-analysis.
eMethods 2. Family-based replication
eMethods 3. Validation of MIND
Customize your JAMA Network experience by selecting one or more topics from the list below.
Aberg KA, Liu Y, Bukszár J, et al. A Comprehensive Family-Based Replication Study of Schizophrenia Genes. JAMA Psychiatry. 2013;70(6):573–581. doi:10.1001/jamapsychiatry.2013.288
Importance Schizophrenia (SCZ) is a devastating psychiatric condition. Identifying the specific genetic variants and pathways that increase susceptibility to SCZ is critical to improve disease understanding and address the urgent need for new drug targets.
Objective To identify SCZ susceptibility genes.
Design We integrated results from a meta-analysis of 18 genome-wide association studies (GWAS) involving 1 085 772 single-nucleotide polymorphisms (SNPs) and 6 databases that showed significant informativeness for SCZ. The 9380 most promising SNPs were then specifically genotyped in an independent family-based replication study that, after quality control, consisted of 8107 SNPs.
Setting Linkage meta-analysis, brain transcriptome meta-analysis, candidate gene database, OMIM, relevant mouse studies, and expression quantitative trait locus databases.
Patients We included 11 185 cases and 10 768 control subjects from 6 databases and, after quality control 6298 individuals (including 3286 cases) from 1811 nuclear families.
Main Outcomes and Measures Case-control status for SCZ.
Results Replication results showed a highly significant enrichment of SNPs with small P values. Of the SNPs with replication values of P <. 01, the proportion of SNPs that had the same direction of effects as in the GWAS meta-analysis was 89% in the combined ancestry group (sign test, P < 2.20 × 10−16) and 93% in subjects of European ancestry only (P < 2.20 × 10−16). Our results supported the major histocompatibility complex region showing a 3.7-fold overall enrichment of replication values of P < .01 in subjects from European ancestry. We replicated SNPs in TCF4 (P = 2.53 × 10−10) and NOTCH4 (P = 3.16 × 10−7) that are among the most robust SCZ findings. More novel findings included POM121L2 (P = 3.51 × 10−7), AS3MT (P = 9.01 × 10−7), CNNM2 (P = 6.07 × 10−7), and NT5C2 (P = 4.09 × 10−7). To explore the many small effects, we performed pathway analyses. The most significant pathways involved neuronal function (axonal guidance, neuronal systems, and L1 cell adhesion molecule interaction) and the immune system (antigen processing, cell adhesion molecules relevant to T cells, and translocation to immunological synapse).
Conclusions and Relevance We replicated novel SCZ disease genes and pathogenic pathways. Better understanding the molecular and biological mechanisms involved with schizophrenia may improve disease management and may identify new drug targets.
Schizophrenia (SCZ) is a major public health problem1 that ranks ninth in the global burden of illness.2 Of a large set of prenatal and antenatal risk factors, having a first-degree relative with SCZ is one of the most important,3 and genetic factors account for most of this familial risk.4 Identifying the specific genetic variants that increase susceptibility is crucial to improve our understanding of SCZ and has the potential to address the urgent need for new drug targets.5
Although SCZ genetics have proven difficult, recent genome-wide association studies (GWASs) mega- and meta-analyses have suggested several promising loci.6,7 Our aim was to perform a comprehensive family-based replication study of promising SCZ loci. Because the results from the recent mega-analysis of the Psychiatric GWAS Consortium were not available at the time we started the present study,6 we first conducted a meta-analysis of 18 SCZ GWASs that included 21 953 subjects of European ancestry. The samples used for this meta-analysis overlapped for almost 90% with the Psychiatric GWAS Consortium study. In addition to selecting for replication the SNPs with the best P values in the GWAS meta-analysis, we selected the best SNPs after integrating existing informative SCZ data sources with the meta-analysis results. Such a convergent functional genomic approach can improve statistical power to detect biologically more meaningful and reproducible effects.8 In total, we selected 9381 SNPs for genotyping in an independent sample of 6298 subjects, including 3286 cases, from 1811 nuclear families. The use of nuclear families is a critical feature because almost all GWAS studies to date involved case-control samples, and their findings have been criticized as being possible population stratification artifacts.9 Because our family-based replication study is robust against such false-positive findings, it provides an unprecedented opportunity to shed more light on recent GWAS findings.
Our SCZ GWAS meta-analysis involved 18 studies. After stringent quality control, 1 085 772 genotyped and imputed SNPs were available for 21 953 subjects of European descent (11 185 cases and 10 768 controls). To account for possible population stratification within each of the GWASs, we included the first 3 principal components from the EigenSoft package (Helix Systems)10 plus any additional components that predicted case-control status (P ≤ .05). Additional details about the study samples and the methods can be found in eMethods 1, eTable 1, and eFigures 1 through 6.
Single-nucleotide polymorphisms were selected for a variety of reasons (eFigure 3 and eTable 2), including having the best P values in the meta-analysis and after data integration using Mathematically based Integration of Heterogeneous Data (MIND)11 (validation studies are listed in the eMethods 2). MIND estimates the (posterior) probability that an SNP is associated with the disease after taking all data (GWAS meta-analysis and external data sets) into account. It empirically “weighs” other data according to the strength of its disease-relevant information. We used the following databases that contained significant SCZ information: SzGene12 database, summarizing the results of 1617 studies reporting on 952 SZC candidate genes (excluding findings from the GWAS used in our meta-analysis); top regions from a meta-analysis of 32 independent genome-wide linkage scans of 3255 pedigrees with 7413 SCZ cases13; a gene expression meta-analysis of 12 controlled studies across 6 different microarray platforms using postmortem brain tissue from SCZ cases and controls14; the OMIM database of disease genes; human orthologs of mouse genes associated with behavioral phenotypes relevant to neuropsychiatric outcomes15; and SNPs strongly associated with variation in transcript abundance in the cortex (http://eqtl.uchicago.edu). A total of 8107 of the 9380 selected SNPs were successfully genotyped with the Illumina iSelect assay (http://www.illumina.com/products/infinium_iselect_custom_genotyping_beadchips.ilmn), with a call rate of 99.94%.
Approximately 89% of the replication samples were families from the National Institute of Mental Health repository, and the remaining 11% were collected by one of us (L.E.D.). None of these samples was used in the prior analyses, so this replication effort was completely independent. Each family included at least 1 subject with a DSM-III-R diagnosis of SCZ. After quality control, 6298 individuals (3286 cases) from 1811 families remained.
For the analyses, we first subdivided subjects into ancestral groups based on identity-by-state sharing, as estimated using the genotyped SNPs in parents or, if they had not undergone genotyping, 1 randomly selected sibling per family. Three well-distinguished clusters, where the major self-reported ethnicity in each group was African (1262 individuals from 438 families), European (2740 individuals from 794 families), and Asian (2296 individuals from 579 families) ancestry, are shown in eFigure 7. Next, we used UNPHASED software,16 which is robust to population structure when the data are complete and has only minor loss of robustness when data are missing. To further minimize the risk of population stratification effects, we first performed the analyses within each ancestral group and then combined the 3 test statistics to obtain an overall replication P value. We limited the association testing to markers with a minor allele frequency of greater than 0.05 within each group. We preserved the direction of effects (ie, sign) so that an allele being overtransmitted in one group and undertransmitted in another would have no effect in this combined analysis (eMethods 2). Additional details are given in eTable 2 and eFigure 8.
To test whether multiple susceptibility alleles with small effects were organized into pathways, we used ConsensusPathDB, a human-centric meta-database of functional biological data compiled from 30 separate public sources of biological interactions.17-19 A list of 265 genes overlapping or flanking (±25 kilobases [kb]) the SCZ-associated SNPs with P < .05 in our analysis were included in these analysis. To account for multiple testing, we controlled the false discovery rate20 at the 0.01 level (eMethods 2 and eTable 3).
In MIND, sources of prior information used to select SNPs through data integration will only affect results to the extent that they contain disease-relevant information (ie, to the extent that genes with good P values in the sources of prior information also have good P values in the meta-analysis) (eMethods 3). Because most of the databases we used pertain to genes, bias in terms of selecting SNPs in genes as opposed to intergenic regions is expected. However, the selection was entirely based on the empirical support for SNPs in those genes. Furthermore, the pathway analyses were performed using the replication results. Therefore, our pathway findings are unlikely to reflect prior notions about SCZ-relevant pathways for which there is no empirical support (eFigure 9).
Figure 1 shows histograms of the P values from the association tests. Under the null hypothesis assuming no replication, the histograms would have had equal heights. The skewed histograms therefore indicated considerable enrichment of small P values. To quantify this enrichment, we divided the observed median test statistic value by the expected median under the null hypothesis. This index, which has an expected value of 1.00 if SNPs do not replicate, equaled 1.19, 1.12, and 1.07 for subjects from European, Asian, and African ancestry, respectively, and was 1.15 for the (signed) combined analysis that required the same direction of effects in all 3 groups. Next, we tested whether this enrichment for small P values was statistically significant (eMethods 3). Owing to the large number of SNPs, large sample size, and family-based association tests allowing for missing genotypes, it was computationally not feasible to perform a sufficiently large number of permutations to obtain empirical P values. Instead, we obtained the lower and upper bounds assuming no linkage disequilibrium (LD) and very high LD, respectively, among the SNPs. When we assumed no LD, the P value was so small that the statistical test in the R package returned a 0; when we assumed an extremely high LD between the SNPs, we obtained P = 2.0 × 10−4. Thus, even in the most conservative scenario, the test indicated significant enrichment of small P values in the replication. We also performed sign tests that examined whether the direction of effects was similar in the GWAS meta-analysis and the replication study. Of the SNPs with replication values of P < .01, the proportion of SNPs that had the same direction of effect of 89% (P = 2.20 × 10−16; 95% CI, 82%-94%) for the combined ancestry group and 93% (P = 2.20 × 10−16; 95% CI, 88%-97%) for European subjects. Again, in both cases the statistical test in the R package returned a 0, indicating that this pattern was almost impossible to occur by change. The mean odds ratios of these SNPs were 1.2 and 1.3 in the combined and European study samples, respectively. Overall, our findings confirm the polygenetic nature of SCZ and show that we replicated a substantial number of susceptibility alleles with small effects.7,21,22
We anticipated that SNPs replicate better in the same ancestral group in which they showed their initial association signals. The GWAS meta-analysis was the main source of SNP selection and involved European subjects only, which likely explains why SNPs replicated relatively better in the European group. Although non–major histocompatibility complex (MHC) SNPs replicated in all ancestral groups, the SNPs in the MHC region did not: we observed a large 3.7-fold enrichment of values of P < .01 in European samples, but findings dissipated in non-European samples. This pattern seems consistent with the exceptionally large LD differences across ancestral groups in this region resulting from its evolutionary significance (eg, it harbors many genes affecting the immune response).
For SNPs that do not have effects, we would not expect to see any difference between markers selected with or without data integration in the replication study. To compare the 2 approaches, we therefore first selected all SNPs with P values of less than .01 to enrich for markers with effects. Figure 2 shows that SNPs selected through data integration replicated as well as SNPs selected on the basis of having the top-ranked P values in the meta-analysis without data integration. Some of the SNPs selected through data integration even had P values with ranks as high as 25 000 in the meta-analysis. The fact that such SNPs would not have been selected on the basis of their P value but do replicate demonstrates the value of considering existing data sources.
We applied a stringent definition of replication requiring the same SNP to have the same direction of effect in the replication study and meta-analysis. Table 1 (combined analysis) and Table 2 (European analysis) report the SNPs that replicated at P ≤ .005. The P values for all SNPs can be downloaded from http:// www.people.vcu.edu/~ejvandenoord/. Compared with the meta-analysis or when combining P values from the 3 ancestral groups, sample size was ignored when calculating the overall P value across the replication study and meta-analysis. In the GWAS meta-analysis, the effect sizes will be overestimated and P values will be too optimistic owing to the winner's curse.23-25 Furthermore, because the sample size was much larger in the meta-analysis, these P values would have dominated the combined P value. Ignoring samples sizes avoids overly optimistic P values while still providing some quantification for the combined evidence of a specific SNP across the meta-analysis and replication study. Some SNPs in Table 1 showed substantial allele frequency differences across ancestral groups. However, in addition to using nuclear families, the UNPHASED analyses were performed within each ancestral group to provide a second layer of protection against possible stratification effects.
Table 3 shows the results from the pathway analyses. Where the same gene combination from our input list indicated multiple pathways, we show only the most significant instance to eliminate redundancy. Fourteen pathways were significant at a false discovery rate of 0.01. The 3 most significant pathways were axon guidance (P = 5.26 × 10−6), developmental biology (P = 1.29 × 10−5), and neuronal systems (P = 1.37 × 10−5). Larger pathways, such as these 3 top findings, which each include more than 250 genes, frequently incorporate several more specific themes, and this can be further observed in Table 3. For example, axon guidance includes the smaller L1 cell adhesion molecule (L1CAM) pathway interaction (P = 6.75 × 10−5), which was also significant in our analysis.
In our study we replicated SNPs in TCF4 (P = 2.53 × 10−10 in the European analysis) and NOTCH4 (P = 3.16 × 10−7 in the combined and P = 5.22 × 10−7 in the European analyses) that are among the top 10 most promising SCZ candidate genes.12 Other loci previously showing association with SCZ include GRIK326 (P = 3.48 × 10−7) and BRD127 (P = 1.53 × 10−7) in the combined analyses and FEZ128 (P = 3.21 × 10−6) in the European analysis. In the combined analyses, we replicated (P = 2.90 × 10−7) SNPs in an approximately 230-kb region on chromosome 10q24, which was recently reported to be associated in part with SCZ.6 This region encompasses an uncharacterized open reading frame (C10orf32) as well as AS3MT, CNNM2, and NT5C2. AS3MT may play a role in arsenic metabolism.29CNNM2 is abundantly expressed in brain and functions as a divalent metal ion transporter,30 whereas NT5C2 hydrolyzes purine nucleotides and is involved in maintaining cellular nucleotide balance.31 A notable novel finding in the combined analysis (P = 1.69 × 10−5) is BCL2, which has been suggested as a marker for neuronal differentiation.32 Lower levels of BCL2 have been observed in the temporal cortex of patients with SCZ compared with controls.33
After TCF4, the second most significant finding (P = 3.51 × 10−7) in the European replication involved POM121L2, a gene with unknown function but in a location that has previously been associated with SCZ.7 In addition to NOTCH4 and POM121L2, several other genes in the MHC region replicated. The LD between the MHC SNPs reported in Table 2 (eFigure 8) showed a complex pattern and ranged from 0 to high LD (eg, SNPs in NFKBIL1, PRRC2A, and BAG6). Many of these genes are involved in immune response, although recent evidence suggests a possible role in neuronal signaling and activity-dependent changes in synaptic connectivity.34 This finding was true for some SNPs outside the MHC as well. For example, several genes with significant findings on chromosome 19 (eg, CLC) code for proteins that belong to the galectin family, which regulates immune response.
In the pathway analysis, 43 SCZ-associated genes were included in the 14 most significant pathways. Many of these are related to neuronal function and the immune system, 2 functions of potential relevance for SCZ, which are discussed in this section. The most significant pathway finding, axon guidance, includes genes involved in the process by which neurons send out axons to reach the correct targets. Growing axons sense guidance cues in the environment and respond by undergoing cytoskeletal changes that determine the direction of axon growth. Several highly conserved families of axon guidance molecules and their receptors have been identified.35 Among the SCZ-associated genes in our study that are part of this pathway are ROBO2 (roundabout, axon guidance receptor, homologue 2), which is a receptor for the axon guidance molecule SLIT2. The related gene SLIT3 is also among our findings. As mentioned already, L1CAM interactions is a subpathway of “axon guidance” that was also significantly enriched for SCZ-associated genes. The L1CAM family includes 4 structurally related proteins36 of which neurofascin (NFASC) is among our SCZ-associated genes that overlap with the L1CAM interaction reference pathway. Neurofascin has been shown to participate in neurite outgrowth and stabilization of neuronal structures, the latter particularly through interaction with ankyrins.37 This interaction was also observed in our pathway analysis (interaction between L1 and ankyrins, P = 2.78 × 10−5). Ankyrin 1, 2, and 3 (ANK1, ANK2, and ANK3) are part of our SCZ-associated genes. Ankyrins are adaptor proteins that couple membrane proteins, such as voltage-gated sodium channels, to the developing cytoskeleton.38 The observed theme of cell adhesion molecules from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (P = 5.97 × 10−4) replicates a previous pathway analysis of large-scale SCZ genetic studies, even to the point that we observe many of the same specific genes, including CNTNAP2 and NRXN2.39
The generic neuronal system pathway in the Reactome database consists of several subpathways, such as transmission across chemical synapses, which was also among our highly significant findings. Thus, in addition to the pathways involved in neuronal growth and projection mentioned in the previous paragraphs, we observe associations with chemical neurotransmission. Among the genes in this pathway are several nicotinic acetylcholine receptors (CHRNA5, CHRNA2, and CHRNA3) that are also significant in their own subpathway (P = 2.21 × 10−4); voltage-gated calcium channels, such as CACNB2 and CACNB4, γ-aminobutyric acid B2 receptor (GABBR2); and a glutamate transporter (SLC1A3). Calcium signaling in particular has previously been identified as a core theme in the etiology of SCZ through large-scale genetic studies. Additional notable genes present in the transmission across chemical synapses pathway include the cyclic adenosine monophosphate response element binding protein 1 (CREB1) and the N-ethylmaleimide–sensitive factor (NSF). CREB1 is a transcription factor involved in mediating gene regulation after signaling events,40 whereas NSF is involved in vesicle trafficking and membrane fusion.41
A final, relevant theme among the significant pathways is the immune system. Antigen processing and presentation pathways from the KEGG and Biocarta databases were significantly enriched for SCZ-associated genes. Although the composition of these pathways was somewhat different between the 2 databases, the core genes of TAP1, TAP2 and HLA-DRA were common to both. The transporter associated with antigen processing (ie, TAP) is composed of a heterodimeric complex of TAP1 and TAP2. It is a key element in the immune recognition of cells compromised by virus infection or malignant transformation. The transporter is crucial in MHC class I antigen presentation by translocating proteasomal degradation products into the lumen of the endoplasmic reticulum for loading onto MHC class I molecules.42 An additional pathway related to the immune system, translocation of ZAP-70 kinase to the immunological synapse, was also significant. The zeta chain–associated protein kinase 70-kDa (ZAP-70) is an integral part of the adaptive immune system because it initiates signaling at the immunological synapse between a T cell and an antigen-presenting cell.43 Previous large-scale genetic studies have implicated the MHC region on chromosome 6,44 and most of the genes that we observe in the present study to be involved in these immune system–related pathways, such as the TAP and HLA genes, map to the MHC region. Therefore, the evidence that ties the MHC region, and by implication the immune system, to the etiology of SCZ is increasing.
We integrated results from a meta-analysis of 18 SCZ GWASs with existing informative SCZ databases to select SNPs for a family-based replication study. Results suggested a considerable enrichment of small P values in the replication study. Test results showed that this enrichment was statistically significant. Furthermore, of the SNPs with replication values of P < .01, the proportion of SNPs that had the same direction of effect as in the GWAS meta-analysis was about 90%, which is almost impossible to occur by chance. Finally, analyses suggested several significant pathways, which again suggested that SNPs replicated. Because the group of selected SNPs replicated as a whole, it follows that individual SNPs in the replication study replicate. A complication was that rather than a few SNPs having large effects, many SNPs appeared to have small effects. Although our pathway analyses of these SNPs implicated specific pathogenic processes, some caution is required before making strong statements about the replication status of individual SNPs. Other studies have reached a similar conclusion that for SCZ, many SNPs with small effects may be involved and these SNPs replicate as a group.21,45 Our article adds to these studies. The facts that we used a family-based replication (thereby minimizing the probability that population substructure accounts for the replication) and tested a much smaller set of SNPs pinpoints more precisely the genes that are likely involved.
Some of the SNPs selected through data integration had ranks up to 25 000 in the meta-analysis. Although these SNPs would never have been selected on the basis of their P value, they replicated as well as SNPs selected on the basis of having the top-ranked P values in the meta-analysis without data integration. This demonstrates the value of considering existing data sources. This conclusion also emerges from a study by Ayalew et al,46 who identified and prioritized genes involved in SCZ by gene-level integration of GWAS data with other genetic and gene expression studies. Some of the top findings in their study were the previously reported SCZ candidate gene TCF4 and the pathways involved in synaptic connectivity and glutamate signaling. Following the same concept but using a more statistical approach, we also identified SNPs in TCF4 and identified pathways related to cellular connectivity and signaling. Given that the number of (publicly) available databases and the tools to curate these data are increasing rapidly, future studies should be able to capitalize even more on data integration.
In addition to SNPs in two of the most promising SCZ candidate genes, we replicated several recently identified susceptibility genes for SCZ in an independent family-based replication sample. Our family-based replication also suggests that previous claims implicating the MHC region as a susceptibility region for SCZ cannot be discarded as population stratification artifacts. Pathway analyses of the many small effects reveal several biological themes involved in brain function, immune response, and biological functions of potential importance for the development of SCZ. The present investigation is, to our knowledge, the first family-based replication that confirms several of the recent mega- and meta-analyses top findings in a completely independent study sample using a different genotyping assay than what was used in the initial detection.
Correspondence: Edwin J. van den Oord, PhD, Center for Biomarker Research and Personalized Medicine, Virginia Commonwealth University, 1112 E Clay St, McGuire Hall, Room 209B, Richmond, VA 23298 (email@example.com).
Submitted for Publication: August 7, 2012; final revision received January 14, 2013; accepted January 14, 2013.
Published Online: April 9, 2013. doi:10.1001/jamapsychiatry.2013.288
Author Contributions: Dr van den Oord had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors contributed their critical reviews of the manuscript during its preparation and approved submission of the final manuscript. Drs Aberg and Liu contributed equally to the work.
Conflict of Interest Disclosures: None reported.
Funding/Support: This study was supported by grants R01HG004240, R01MH078069, and 1R01MH097283 (Drs Bukszár, Khachane, Aberg, and McClay) and grant R01MH080403 (Drs Sullivan and Liu) from the National Human Genome Research Institute.
Role of the Sponsors: The National Institute of Mental Health financially supported the design and conduct of the study but was not involved in manuscript review or approval.
Additional Information: Dr Ophoff acted as an author on behalf of the GROUP (Genetic Risk and Outcome in Psychosis) Consortium. Members of the consortium include René S. Kahn, MD, Don H. Linszen, MD, Jim van Os, MD, PhD, Durk Wiersma, PhD, Richard Bruggeman, MD, Wiepke Cahn, MD, Lieuwe de Haan, MD, Lydia Krabbendam, PhD, and Inez Myin-Germeys, PhD.
Create a personal account or sign in to: