Dashed lines indicate the individuals who were sequenced. A square represents a male individual; a circle, a female individual; shading, an affected individual; and a slash mark, a deceased individual.
eText. Supplemental Text
eTable 1. Annotation of All Segregating Variants in the Primary Analysis
eTable 2. Variant-Level Meta-analysis of All 84 Segregating Variants
eTable 3. Gene-Burden Level Meta-analysis of All 82 Genes Identified by the Segregation Analyses
eFigure. Power Analyses
Goes FS, Pirooznia M, Parla JS, Kramer M, Ghiban E, Mavruk S, Chen Y, Monson ET, Willour VL, Karchin R, Flickinger M, Locke AE, Levy SE, Scott LJ, Boehnke M, Stahl E, Moran JL, Hultman CM, Landén M, Purcell SM, Sklar P, Zandi PP, McCombie WR, Potash JB. Exome Sequencing of Familial Bipolar Disorder. JAMA Psychiatry. 2016;73(6):590-597. doi:10.1001/jamapsychiatry.2016.0251
Complex disorders, such as bipolar disorder (BD), likely result from the influence of both common and rare susceptibility alleles. While common variation has been widely studied, rare variant discovery has only recently become feasible with next-generation sequencing.
To utilize a combined family-based and case-control approach to exome sequencing in BD using multiplex families as an initial discovery strategy, followed by association testing in a large case-control meta-analysis.
Design, Setting, and Participants
We performed exome sequencing of 36 affected members with BD from 8 multiplex families and tested rare, segregating variants in 3 independent case-control samples consisting of 3541 BD cases and 4774 controls.
Main Outcomes and Measures
We used penalized logistic regression and 1-sided gene-burden analyses to test for association of rare, segregating damaging variants with BD. Permutation-based analyses were performed to test for overall enrichment with previously identified gene sets.
We found 84 rare (frequency <1%), segregating variants that were bioinformatically predicted to be damaging. These variants were found in 82 genes that were enriched for gene sets previously identified in de novo studies of autism (19 observed vs. 10.9 expected, P = .0066) and schizophrenia (11 observed vs. 5.1 expected, P = .0062) and for targets of the fragile X mental retardation protein (FMRP) pathway (10 observed vs. 4.4 expected, P = .0076). The case-control meta-analyses yielded 19 genes that were nominally associated with BD based either on individual variants or a gene-burden approach. Although no gene was individually significant after correction for multiple testing, this group of genes continued to show evidence for significant enrichment of de novo autism genes (6 observed vs 2.6 expected, P = .028).
Conclusions and Relevance
Our results are consistent with the presence of prominent locus and allelic heterogeneity in BD and suggest that very large samples will be required to definitively identify individual rare variants or genes conferring risk for this disorder. However, we also identify significant associations with gene sets composed of previously discovered de novo variants in autism and schizophrenia, as well as targets of the FRMP pathway, providing preliminary support for the overlap of potential autism and schizophrenia risk genes with rare, segregating variants in families with BD.
Family, twin, and adoption studies have provided strong evidence for the importance of genetic factors in the etiology of bipolar disorder (BD). Yet, despite an estimated 0.7 to 0.8 heritability,1 identifying the specific genetic causes of BD has proved challenging. Numerous genome-wide linkage scans have been performed in BD, but the limited replication across studies suggests that variants of major effect are unlikely to exist. On the other hand, genome-wide association studies of BD have recently implicated a number of variants of small effect in very large samples.2
While an important role for rare single-nucleotide variants in complex diseases has been proposed on theoretical grounds,3,4 empirical data have only recently begun to emerge. In autism, large-scale de novo studies5,6 have identified genes with recurrent, highly damaging mutations that cluster in pathways involved in transcriptional regulation, chromatin modification, and synaptic function. Initial studies in schizophrenia have been underpowered to implicate rare variants in a specific gene, although they have found convergent evidence for association of both damaging de novo and singleton variants in gene sets of the postsynaptic density (PSD) proteins, calcium channels, targets of the fragile X mental retardation protein (FMRP), and chromatin remodeling genes.7- 9
Several case-control and family-based sequencing studies of BD are ongoing, but only a limited number have been published.10- 13 Two family studies12,13 of the Amish population found evidence for partial segregation of a number of variants but little convergence on a specific gene or a linkage location. Similar results were found by Collins et al,10 who performed exome sequencing of a large pedigree without conclusively identifying variants of large effect, and by Cruceanu et al,11 who studied 25 pedigrees with lithium-responsive BD and also found limited evidence for cosegregation at the level of variants or genes across families. The largest study14 of BD to date performed whole-genome sequencing of 200 individuals from 41 families with BD, finding evidence for an excess of rare variants in pathways associated with γ-aminobutyric acid and calcium channel signaling.
The results of these early studies suggest that the pattern of complex inheritance in BD seen with common variants may also hold for rare variants, with potentially many risk alleles of modest effect distributed across large numbers of genes and noncoding regions. Given the inherent challenges of studying rare variants with currently available sample sizes,15 we used a hybrid approach to exome sequencing in BD by sequencing 8 multiplex families as an initial discovery strategy, followed by a case-control meta-analysis of 3541 BD cases and 4774 controls.
Question Can exome sequencing in families with bipolar disorder (BD) and case-control individuals reveal rare genetic variants associated with illness?
Findings This case-control study found 84 rare, damaging variants that segregated with BD. These variants were located in genes enriched for ones previously identified in studies of autism and schizophrenia, as well as the fragile X mental retardation protein (FMRP) pathway. Case-control meta-analyses yielded 19 genes nominally associated with BD and continued enrichment of autism genes.
Meaning This study provides preliminary support for the overlap of potential autism and schizophrenia risk genes, as well as targets of the FRMP pathway, with rare genetic variants found in families with BD.
Cases for the family sample were ascertained at The Johns Hopkins University as part of a genetic linkage study of BD,16 which has been approved by The Johns Hopkins University Institutional Review Board. Written informed consent was obtained from all participants. The sequenced family sample was selected to include pedigrees with more than 4 affected members carrying the diagnosis of BD type I (BD-I), BD type II (BD-II), or schizoaffective disorder, bipolar type. One additional family with multiple cases of BD-II was also selected. The 8 pedigrees are shown in the Figure. They include a mean of 6.9 affected individuals, and 36 of 55 were sequenced.
The primary case-control sample (1135 cases and 1142 controls) was generated from an exome sequencing study of BD and controls, referred to as the Rare Bipolar Loci Identification Through Synaptome Sequencing (RareBLISS) exome study (described more fully in the eText in the Supplement). DNA from both cases and controls was isolated from lymphoblastoid cell lines. All cases and controls were of self-reported European ancestry.
We performed exome capture using capture arrays (NimbleGen EZ Exome, version 1 and version 2; Roche), followed by standard alignment and variant calling with the Genome Analysis Toolkit by McKenna et al17 (a full description is given in the eText in the Supplement). Identified variants were annotated with ANNOVAR (http://www.openbioinformatics.org/annovar/)18 using reference assembly (RefSeq, release 65; http://www.ncbi.nlm.nih.gov/news/05-20-2014-RefSeq-65/). For annotation of potentially damaging variants, we followed the example of a recent schizophrenia exome sequencing study7 in defining 3 successively more inclusive annotation categories based on 5 bioinformatics algorithms (SIFT, PolyPhen-2 HVAR, PolyPhen-2 HDIV, LRT, and MutationTaster) provided in the Database for Nonsynonymous SNPs and Their Functional Predictions (http://sites.google.com/site/jpopgen/dbNSFP).19 The 3 categories were characterized as nonsynonymous broad (evidence of damaging effect by any 1 of 5 different bioinformatics algorithms), nonsynonymous strict (evidence of damaging effect by all 5 different bioinformatics algorithms), and disruptive (canonical splice site, nonsense, or frameshift mutations).
For the family-based analysis, we used a filtering approach to identify low-frequency (minor allele frequency [MAF] <1%) damaging variants (defined by the nonsynonymous broad annotation) that segregated with all affected relatives while allowing for one missing genotype per family. This strategy was motivated by a hypothesis that risk variants of moderate to high penetrance would be shared within families but not necessarily across different families given the failure of prior linkage investigations to identify replicable findings.
From the family analysis, we identified rare and damaging segregating variants that were tested in the RareBLISS case-control sample. For single-variant tests, we used logistic regression with the Firth penalized likelihood method, which can incorporate discrete and continuous covariates and provides a more robust estimate of the effect size estimates when data are sparse.20 We included as covariates 4 ancestry-based principal components (derived from common variants extracted from the exome sequencing data set using the software Eigensoft [https://github.com/DReichLab/EIG]21) and variables indexing differences in capture kits and sequencing platforms. For gene-level tests, we used PLINK/SEQ (https://atgu.mgh.harvard.edu/plinkseq/) to test the overall burden of rare, damaging variants in cases vs controls. We carried out gene-level tests under 3 frequency categories (MAF <1%, MAF <0.1%, and singletons) and 3 annotation categories (nonsynonymous broad [NSbroad], NSstrict, and disruptive), for a total of 9 tests.
We further examined the variants and genes implicated by the family analysis in 2 additional case-control samples (eText in the Supplement). The first comprised 1022 cases with BD and 2220 controls from Sweden.22 The second consisted of an interim data freeze from the Bipolar Research in Deep Genome and Epigenome Sequencing (BRIDGES) Study, comprising whole-genome sequencing results for 1388 cases with BD and 1412 controls, all of European ancestry (eText in the Supplement). Phenotypic details are provided in the eText in the Supplement. The analyses of both additional data sets were performed in a manner similar to what was done in Rare-BLISS using penalized logistic regression for single-variant analyses and PLINK/SEQ burden tests for the gene-level association tests. Because joint analysis has been shown to be more powerful than replication,23 we performed variant-level and gene-level tests of association, followed by fixed-effects meta-analysis of the results across all 3 samples using METAL (http://www.sph.umich.edu/csg/abecasis/metal/).24
Genes with segregating variants were found to include a number of genes previously observed in de novo investigations of autism and schizophrenia, and some were also localized to the PSD. To test for specific enrichment of these gene sets, we used the curated list from prior studies5,8 that summarized genes with de novo nonsense and missense variants in autism (n = 1781), schizophrenia (n = 670), and intellectual disability (n = 141) studies, as well as genes encoding proteins found in the PSD (n = 1398) and the FMRP pathway (n = 795). We subsequently tested whether genes with a segregating variant were enriched for any of these 3 categories by randomly selecting an equal number of genes captured by our exome study while matching by the following 3 potentially confounding metrics: cumulative exon length (±20%), sequence coverage (±20%), and a gene-specific corrected measure of intolerance to missense variation (missense z score). The latter represents a standardized measure of the deviation between observed and expected missense variants found in the Exome Aggregation Consortium database.25 We performed 10 000 permutations and counted the number of times that randomly selected genes were found in each of the 3 gene sets. We then compared our observed counts of overlap with the 3 gene sets with this null distribution to obtain empirical P values. As an additional step to evaluate the potential role of background variation in our results, we also obtained a curated list of genes (n = 1215) found to harbor de novo variants (missense and loss of function) in well siblings from simplex families with autism.5 We tested for enrichment of these “control” de novo genes using the same matched permutation procedure.
We first searched for variants of moderate to high penetrance in 8 multiplex families by sequencing at least 4 affected members in each family (Figure). A total of 7551 variants were found to segregate as heterozygotes in at least 1 family, ranging from 511 to 1683 variants per family, reflecting the size of the pedigrees and the decreased likelihood of sharing variants with an increased number of sequenced cases (Table 1). Filtering for rare (MAF <1%), damaging variants as defined by the NSbroad category led to the identification of 84 variants that segregated in at least 1 family, including 23 variants that further met the NSstrict criteria and 5 variants that were disruptive. The 84 segregating variants were found in 82 independent genes (shown fully in eTable 1 in the Supplement), with 2 genes (LAMA4 and OBSCN) having 2 segregating variants. The 2 segregating variants in LAMA4 were in the same family, while the 2 variants in OBSCN were found in independent families. However, both genes are large and evolutionarily unconstrained,26 increasing the likelihood that they could be false-positives.
To obtain convergent evidence for association with BD, we examined whether segregating variants showed consistent evidence for association in 3 ongoing case-control studies of BD in individuals of European ancestry. These included the RareBLISS (1135 cases and 1142 controls), Sweden (1018 cases and 2220 controls), and BRIDGES (1388 cases and 1412 controls) studies. Each study was analyzed separately, and the results were combined in a fixed-effects meta-analysis.
We performed penalized logistic regression of the 84 segregating variants in each of the 3 data sets, followed by a fixed-effects meta-analysis that included 3541 cases and 4774 controls (eTable 2 in the Supplement). Of the 84 variants, 49 were present in at least 1 data set, 23 were present in all 3, and 35 were not found in any of them. Variants with a meta-analytic association P < .10 and an odds ratio greater than 1 are listed in Table 2, which includes 3 variants with nominally significant findings (P < .05) in the MLK4, APPL2, and HSP90AA1 genes.
Given the limited power to identify an association with single variants and the high likelihood of allelic heterogeneity in causal genes,27 we sought additional evidence for association using gene-based (“burden”) association tests of the 82 genes that included at least 1 rare, segregating variant from the family analysis. One-sided burden tests were performed in PLINK/SEQ under 3 frequency classes (MAF <1%, MAF <0.1%, and singletons) and 3 annotation classes (NSbroad, NSstrict, and disruptive). Association analyses were performed independently in the 3 case-control samples, followed by a fixed-effects meta-analysis. The results of all of the tests performed for the 82 genes are summarized in eTable 3 in the Supplement, with the top findings (P < .05) summarized in Table 3. No individual P value survived full correction for multiple testing (P < 6.78 × 10−5), but 16 genes showed nominal evidence for association (P < .05). Five of these genes were previously found to have de novo damaging variants in investigations of autism (RPGRIP1L, FRAS1, AHNAK, KDM5B, and SLC12A4), while 3 of the genes (SLC4A1, APPL2, and AHNAK) encode proteins localized to the PSD, which has been recently implicated in rare variant studies of schizophrenia.5,7,8,28
The above observations led us to ask whether segregating variants were located within 3 particular gene sets (autism, schizophrenia, and PSD) at a rate that exceeded chance expectation. When all 82 genes with segregating variants under the NSbroad model were considered, there were 10 PSD genes, while there were 19 and 11 genes previously identified as harboring de novo missense or nonsense variants in investigations of autism and schizophrenia, respectively (eTable 1 in the Supplement). We tested for overrepresentation of the 82 genes identified by the initial segregation analysis in these 3 gene sets and found evidence for enrichment in the autism set (19 observed vs 10.9 expected, P = .0066) and schizophrenia set (11 observed vs 5.1 expected, P = .0062). The results for the PSD set (10 observed vs 6.9 expected, P = .15) were consistent with chance expectation.
In a further analysis, we tested whether the segregating genes with nominal evidence for association in the case-control meta-analyses, in either the variant-level or gene-level tests (n = 19) (Table 1 and Table 2), also showed evidence for enrichment in the same 3 gene sets. This permutation analysis confirmed significant enrichment of the de novo autism gene set (6 observed vs 2.6 expected, P = .028) but not the PSD set (4 observed vs 1.7 expected, P = .09) or the de novo schizophrenia set (1 observed vs 1.0 expected, P = .65).
Given the significance of the autism de novo gene set, we also tested for enrichment of the FMRP pathway and found significant evidence for enrichment in the 82 segregating genes (10 observed vs. 4.4 expected, P = .0076) but not in the subset of 19 genes with nominal evidence of association in the meta-analysis. We further examined the de novo intellectual disability gene set but found no evidence of enrichment (0 observed, P > .99). We note that the significant results for the 82 segregating variants survive correction for the 5 gene sets tested, while the smaller subset of genes with additional nominal significance in the meta-analysis does not.
To evaluate the potential confounding role of “baseline” or expected rates of variation, we tested whether genes implicated by de novo variants found in control siblings were enriched among our segregating variants. The results based on the 82 genes with segregating variants were consistent with chance expectation (10 observed vs 7.6 expected, P = .21).
Our study represents one of the first large-scale exome sequencing efforts in BD and one of the first to combine a family-based and case-control design. In each of our selected multiplex families, we found several rare, segregating variants of predicted damaging effect, leading us to seek supportive evidence for association in 3 large-scale, case-control exome sequencing studies. Variant-level and gene-burden analysis provided supportive nominal evidence (P < .05) for 19 of these genes, although neither variant-based nor gene-burden results met study-wide thresholds for statistical significance. However, we found support for enrichment of segregating variants in genes identified by de novo studies of autism and schizophrenia, with additional evidence for the autism gene set enrichment from case-control data.
The 19 genes implicated by segregating variants that showed the strongest evidence for case-control association with BD (Tables 2 and 3) included more members of the de novo autism gene set than expected. This enrichment was based on the following 6 genes: HSP90AA1, RPGRIP1L, FRAS1, AHNAK, KDM5B, and SLC12A4. The most strongly implicated of these genes is KDM5B, in which 2 nonsense and 2 missense de novo mutations have been found in sporadic cases with autism.5KDM5B (also known as JARIDB1) encodes a histone H3 lysine 4 (H3K4) demethylase that has been linked to neural differentiation in embryonic stem cells.29 Intriguingly, the recent Psychiatric GWAS Consortium pathway-based analysis of common genetic variation identified histone H3K4 methylation as the most strongly associated pathway with BD.30 Moreover, histone H3K4 methylation was also found to be the most strongly associated pathway in a cross-disorder analysis of BD, schizophrenia, and major depressive disorder, raising the possibility that it may increase susceptibility to a broad number of mental disorders.
While genetic overlap between BD and schizophrenia has been well documented by family studies31 and by genome-wide association studies,32 there are also emerging data to suggest the presence of etiological overlap between BD and autism. In particular, analyses of the Swedish national registers have yielded evidence for an increased risk of autism in individuals with BD (relative risk [RR], 13.2) and in their first-degree relatives (RR, 2.0-4.0), with a coheritability estimate of 65%.33 Similarly, Swedish registry data have also shown the inverse, with an increased risk of BD in individuals with autism (RR, 6.6) and their siblings (RR, 1.8).34 Overlap between BD and autism is also seen in investigations of rare copy number variation (CNV), with a recent meta-analysis of CNV studies showing evidence for association with BD of 3 CNVs (1q21.1 dup, 3q29 del, and 16p11.2 dup) originally implicated in both autism and schizophrenia.35,36
Our study should be seen in light of a number of important limitations. First, a major challenge of rare variant studies is the increasing recognition that very large sample sizes may be necessary to perform a fully powered case-control study.15 Although our hybrid family-based study, followed by case-control association, was designed to improve power by limiting the genomic search space to variants and genes identified by an initial segregation analysis, power analyses of the combined case-control sample continued to show that the meta-analysis was underpowered to detect the types of effect sizes and allele frequencies found in our study (eFigure in the Supplement). Second, our study exclusively focused on exome variation and thus could not detect any role of rare noncoding variants. Third, although our coverage of the exome is typical of most studies of this type, it is incomplete at approximately 80%, which is an inevitable limitation of current exome capture and sequence technology. Fourth, while we have used widely accepted bioinformatics tools to classify variants as damaging, these tools are probabilistic and imprecise and will likely miss the effect of variants that are tissue specific. Fifth, in focusing only on fully segregating variants, we have not considered less penetrant variants that may be involved in disease susceptibility because these variants would be even more difficult to detect with our available sample size. In sensitivity analyses, we performed a broad analysis of variants shared among 2 or more affected family members and did not find evidence for any association meeting correction for multiple testing or for any similar enrichment compared with the analysis presented in this study. Sixth, our study has relied solely on traditional clinical phenotypes and has not characterized individuals in ways that may align more closely with disease pathophysiology. It is likely that a new generation of “genotype-first” studies will be needed to delineate which specific phenotypes will constitute molecular subtypes of BD.37
In summary, although our study remains underpowered to implicate rare variants in individual genes, we have found preliminary evidence for the overlap of potential autism and schizophrenia risk genes with our segregating variants. These results provide further data on shared genetic susceptibility across the major psychiatric disorders.
Submitted for Publication: September 30, 2015; final revision received January 14, 2016; accepted January 16, 2016.
Corresponding Author: Fernando S. Goes, MD, Department of Psychiatry and Behavioral Sciences, The Johns Hopkins School of Medicine, Meyer 4-119A, 600 N Wolfe St, Baltimore, MD 21287 (email@example.com).
Published Online: April 27, 2016. doi:10.1001/jamapsychiatry.2016.0251.
Author Contributions: Drs Goes and Potash had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Goes, Pirooznia, and Parla are co–first authors and contributed equally to this work. Drs Zandi, McCombie, and Potash contributed equally as senior authors.
Study concept and design: Goes, Pirooznia, Hultman, Landén, Zandi, Potash.
Acquisition, analysis, or interpretation of data: Goes, Pirooznia, Parla, Kramer, Ghiban, Mavruk, Chen, Monson, Willour, Karchin, Flickinger, Locke, Levy, Scott, Boehnke, Stahl, Moran, Hultman, Purcell, Sklar, Zandi, McCombie, Potash.
Drafting of the manuscript: Goes, Pirooznia, Parla, Kramer, Ghiban, Karchin, McCombie, Potash.
Critical revision of the manuscript for important intellectual content: Goes, Pirooznia, Mavruk, Chen, Monson, Willour, Flickinger, Locke, Levy, Scott, Boehnke, Stahl, Moran, Hultman, Landén, Purcell, Sklar, Zandi, Potash.
Statistical analysis: Goes, Pirooznia, Chen, Karchin, Flickinger, Locke, Scott, Stahl, Purcell, Zandi.
Obtained funding: Goes, Boehnke, Landén, Sklar, Zandi, Potash.
Administrative, technical, or material support: Pirooznia, Parla, Ghiban, Mavruk, Monson, Levy, Stahl, Hultman, Landén, Sklar, Potash.
Study supervision: Willour, Scott, Boehnke, Landén, Sklar, Zandi, Potash.
Conflict of Interest Disclosures: Dr McCombie reported participating in meetings sponsored by Illumina and Pacific Biosciences (which had no decision-making roles related to this study) over the past 4 years, reported receiving travel reimbursement and honoraria for presentations, and reported being a founder and shareholder of Orion Genomics, which focuses on plant genomics and cancer genetics. No other disclosures were reported.
Funding/Support: This work was supported by grants R00MH86049 (Dr Goes), K01MH093809 (Dr Pirooznia), R01MH087992 (Dr McCombie), MH09414501 and MH105653 (Dr Boehnke), and R01MH087979 (Dr Potash) from the National Institute of Mental Health and by a National Alliance for Research in Schizophrenia and Affective Disorders (NARSAD) Young Investigator Award (Dr Stahl).
Role of the Funder/Sponsor: The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: Aravinda Chakravarti, PhD, provided helpful analytical contributions. We thank the Bipolar Research in Deep Genome and Epigenome Sequencing (BRIDGES) study and affiliated study members (Goncalo Abecasis, DPhil, Gerome Breen, PhD, William G. Iacono, PhD, Matt McGue, PhD, Melvin G. McInnis, MD, Richard M. Myers, PhD, Carlos N. Pato, MD, PhD, and John B. Vincent, PhD) for prepublication data based on the sequencing of samples from multiple studies, including the STEP-DB sample from the National Institute of Mental Health repository. For the Swedish bipolar disorder exome sequencing study, Steve McCarroll, PhD, helped generate the sequencing data, and we acknowledge the use of the Swedish National Quality Register for Bipolar Disorder (BipoläR) and thank the clinical collaborators, data collectors, and facilitators in the St Göran Project (Stockholm, Sweden) and in the Department of Medical Epidemiology and Biostatistics at Karolinska Institutet, Sweden, for their help with recruitment of participants. Finally, we particularly thank the individuals and families who volunteered and thus made this work possible.