Runs of homozygosity intersect the EXOC4 gene (A) and the CTNNA3 gene (B). Light blue bars represent ROHs in cases with Alzheimer disease; dark blue bars, ROHs in controls. The arrows represent the physical ROH consensus.
eFigure 1. B-allele frequency (freq) and log R ratio plots in GenomeStudio
eFigure 2. Linkage disequilibrium analysis using the Caribbean Hispanic controls of European origin (n?=?326)
eTable 1. List of identified nominally significant consensus regions (EMP1 <0.05) in the entire dataset
eTable 2. List of identified nominally significant consensus regions (EMP1 <0.05) in the European subgroup
eTable 3. List of identified nominally significant consensus regions (EMP1 <0.05) in the African subgroup
eTable 4. List of coding variants reported in dbSNP137 for the EXOC4 and CTNNA3 associated consensus regions
eTable 5. The samples with ROHs at the EXOC4 locus (28 AD cases and 8 controls)
eTable 6. Genotypes of 12 individuals (11 AD cases and 1 control) with ROHs at the EXOC4 locus
Ghani M, Sato C, Lee JH, Reitz C, Moreno D, Mayeux R, St George-Hyslop P, Rogaeva E. Evidence of Recessive Alzheimer Disease Loci in a Caribbean Hispanic Data SetGenome-wide Survey of Runs of Homozygosity. JAMA Neurol. 2013;70(10):1261-1267. doi:10.1001/jamaneurol.2013.3545
The search for novel Alzheimer disease (AD) genes or pathologic mutations within known AD loci is ongoing. The development of array technologies has helped to identify rare recessive mutations among long runs of homozygosity (ROHs), in which both parental alleles are identical. Caribbean Hispanics are known to have an elevated risk for AD and tend to have large families with evidence of inbreeding.
To test the hypothesis that the late-onset AD in a Caribbean Hispanic population might be explained in part by the homozygosity of unknown loci that could harbor recessive AD risk haplotypes or pathologic mutations.
We used genome-wide array data to identify ROHs (>1 megabase) and conducted global burden and locus-specific ROH analyses.
A whole-genome case-control ROH study.
A Caribbean Hispanic data set of 547 unrelated cases (48.8% with familial AD) and 542 controls collected from a population known to have a 3-fold higher risk of AD vs non-Hispanics in the same community. Based on a Structure program analysis, our data set consisted of African Hispanic (207 cases and 192 controls) and European Hispanic (329 cases and 326 controls) participants.
Alzheimer disease risk genes.
Main Outcomes and Measures
We calculated the total and mean lengths of the ROHs per sample. Global burden measurements among autosomal chromosomes were investigated in cases vs controls. Pools of overlapping ROH segments (consensus regions) were identified, and the case to control ratio was calculated for each consensus region. We formulated the tested hypothesis before data collection.
In total, we identified 17 137 autosomal regions with ROHs. The mean length of the ROH per person was significantly greater in cases vs controls (P = .0039), and this association was stronger with familial AD (P = .0005). Among the European Hispanics, a consensus region at the EXOC4 locus was significantly associated with AD even after correction for multiple testing (empirical P value 1 [EMP1], .0001; EMP2, .002; 21 AD cases vs 2 controls). Among the African Hispanic subset, the most significant but nominal association was observed for CTNNA3, a well-known AD gene candidate (EMP1, .002; 10 AD cases vs 0 controls).
Conclusions and Relevance
Our results show that ROHs could significantly contribute to the etiology of AD. Future studies would require the analysis of larger, relatively inbred data sets that might reveal novel recessive AD genes. The next step is to conduct sequencing of top significant loci in a subset of samples with overlapping ROHs.
Alzheimer disease (AD) is the most common form of dementia, characterized by neurotoxic aggregates of tau and β-amyloid peptides in the brain.1 Known genetic factors account for only up to 40% of the risk for AD, half of which is attributable to APOE and approximately 3% to dominant mutations in APP, PSEN1, and PSEN2.2,3 Genome-wide association studies (GWASs) have identified associations between late-onset AD (age >65 years) and single-nucleotide polymorphisms (SNPs) in 9 loci (CLU, PICALM, BIN1, MS4A4/MS4A6E, CR1, CD2AP, CD33, EPHA1, and ABCA7)4- 8 in which the pathogenic variants have not yet been revealed apart from CR1.9 In addition, a candidate gene approach discovered a reproducible association between AD and SORL1,10 and a recent exome study revealed AD risk variants in TREM2.11 Finally, the G4C2-repeat expansion in C9orf72 has been recently reported in a few patients with clinical AD12,13; however, this association remains to be validated.14
Intriguingly, a recent study on the concordance of AD among parents and offspring suggested that approximately 90% of early-onset AD cases are likely the result of autosomal recessive inheritance15; however, the p.A673V substitution in APP is the only known recessive AD mutation.16 Recessive inheritance has not been widely investigated for complex traits. The lack of inbred families in most North American or European data sets has made mapping of recessive loci challenging; however, the development of array technologies has recently helped to identify rare recessive mutations among long runs of homozygosity (ROHs), in which both parental alleles are identical.17,18 In addition, ROHs can harbor imprinted chromosomes19,20 or risk haplotypes that predispose to a disorder in a homozygous state.21,22 Runs of homozygosity could be inherited from a common ancestor many generations back,23 and longer ROHs are expected in closely related individuals (identical by descent) or inbred populations.24,25 Runs of homozygosity greater than 1 megabase (Mb) are relatively frequent in the general population and could arise without inbreeding as a result of common extended haplotypes at loci with rare recombination events.26 Small ROHs (<1 Mb) are too frequent (especially in inbred populations) to search for rare recessive loci, and therefore most ROH studies use a 1-Mb cutoff.27,28
At present, ROHs have been associated with a risk for rheumatoid arthritis,21 Parkinson disease,29 and schizophrenia.30 Recently, genome-wide measurements of ROHs (>1 Mb) were studied in 2 outbred AD data sets of North American and European origin.27,28 In both studies, the global burden analysis of ROHs did not reveal a significant association with AD. The only significant result in locus-specific analyses was obtained for a consensus ROH region on chr8p12 in the North American data set (P = .017; 40 AD cases vs 9 controls),27 but no loci survived correction for multiple testing in the European data set.28 Surprisingly, homozygosity mapping of a small data set from an isolated Arab community in Israel (Wadi Ara) detected that the controls were more inbred than the AD cases.31 In addition, a whole-genome study of 2 affected siblings from a consanguineous AD family revealed several shared ROHs; however, lack of data from unaffected family members complicated the result interpretation.32
In the present study, we investigated a late-onset AD data set of a Caribbean Hispanic population for global burden measurements of ROHs and the specific loci that could harbor recessive AD mutations or risk haplotypes. This data set was previously analyzed in an SNP-based GWAS that replicated several AD genes (APOE, CLU, PICALM, and BIN1).33 We also evaluated the data set for the incidence of rare (>100 kilobase [kb]) copy number variations, but only a nominal association with AD was detected for a duplication on chr15q11.34
The Caribbean Hispanic population in our study originates mainly from the Dominican Republic or from Puerto Rico, which share a complex heritage.35 The native populations of both islands dwindled owing to disease, warfare, and forced labor after the Spanish conquered them in the 15th century, and large numbers of slaves were later imported from West Africa.35- 37 At present, the population of the Dominican Republic is 9.8 million; that of Puerto Rico, 3.7 million.38
Caribbean Hispanics are known to have a 3-fold-higher risk for AD compared with non-Hispanics in the same community39 and tend to have large families (≥6 siblings) with evidence of inbreeding, which makes this population very appropriate for an ROH study. A previous study found a single-founder PSEN1 mutation (p.G206A) that was unique to Caribbean Hispanics and responsible for 42% of early-onset AD families in this population.35 However, the high incidence of late-onset AD in this population is not explained by PSEN1.35,39 Hence, we tested the hypothesis that the late-onset AD in the Caribbean Hispanic population might be in part explained by the homozygosity of unknown loci.
This study was approved by the institutional review boards of Columbia University and the University of Toronto. We examined a Caribbean Hispanic case-control data set.33,34 Briefly, the data set was composed of 542 unrelated control subjects (68.5% female) and 547 unrelated AD cases (70.7% female), including 267 cases who had at least 1 relative with AD, representing familial AD (FAD). Family history of AD was also recorded for 47 controls (8.7%); however, none had any AD symptoms at the time of examination (mean age, 82 years). The mean (SD) age at onset of AD was 80 (8) years, and the mean (SD) age at the last examination of the controls was 79 (6) years. Analysis with the STRUCTURE program (http://pritch.bsd.uchicago.edu/software.html) had previously determined that 207 AD cases were of African ancestry and 329 cases were of European ancestry; 192 controls were of African ancestry and 326 controls were of European ancestry (11 cases and 24 controls were of unknown ancestry).33 The proportion of both subgroups was similar in cases and controls: African Hispanics made up 37.8% of cases and 35.4% of controls, whereas European Hispanics made up 60.1% of cases and 60.1% of controls.
All DNA samples were isolated from whole blood and genotyped on the Illumina HumanHap 650Y array (Illumina, Inc). The genotyping success rate was greater than 99% for all samples, providing a reliable source to estimate the ROHs. We included in this study only samples that previously passed stringent SNP-based quality control.33 The raw genotype data have been submitted to the Gene Expression Omnibus, a public functional genomics data repository at the National Center for Biotechnology Information (GSE33528).
Runs of homozygosity were identified using a whole-genome association analysis toolset (PLINK; http://pngu.mgh.harvard.edu/~purcell/plink/), which has a reliable algorithm for ROH detection.40 As previously described, we used a sliding window of 50 SNPs across the genome with no more than 1 heterozygous SNP allowed in the window, and the minimal ROH size was defined as 1 Mb with an SNP density of at least 1 SNP per 50 kb.27,28 We calculated the total and mean lengths of the ROHs per sample and the total number of ROHs for each sample. Global burden measurements among autosomal chromosomes were investigated in cases vs controls with a 1-tailed test (10 000 permutations) for the number of ROHs and for the total and mean ROH lengths per individual (as in published ROH studies27,28). We used a 1-tailed test because the Caribbean Hispanic population has a high incidence of AD39 and is more suitable for the detection of risk and not of protective alleles.
Pools of overlapping ROH segments were identified for the entire data set, and we calculated the ratio of cases to controls for each consensus region, defined as a shared segment of greater than 100 kb (>3 SNPs) among cases and controls. Runs of homozygosity were checked globally among the entire data set and each subgroup using the PLINK regional test against the consensus regions. The PLINK grouping function was also applied to cluster the ROH segments if they were at least 99% identical by genotype.
The standardized, pairwise Lewontin disequilibrium coefficient was used to estimate the strength of linkage disequilibrium between SNPs using the Haploview software package.41 In addition, we estimated inbreeding (F) using the SNPs from the Illumina HumanHap 650Y array to determine identity by descent as estimated by the genetic relationship matrix, which was implemented in the Genome-wide Complex Trait Analysis.42,43
In total, we identified 17 137 autosomal regions with ROHs greater than 1 Mb (8912 in 542 controls and 8225 in 547 AD cases), including 4041 ROHs in 267 FAD cases (Table). The global burden measurements of rate and total size of ROHs were not associated with AD. However, AD was significantly associated with a larger mean ROH size per person: 2133 kb in cases vs 1934 kb in controls (P = .0039). This association was stronger with FAD (P = .0005). Also, the total size of ROHs per person was marginally longer in FAD cases (40 Mb) vs controls (34 Mb) (P = .05), which could be the result of consanguineous marriages in this island population. Indeed, in our cohort, we observed a noticeable inbreeding comparable to second-cousin marriages (mean F = 0.020). The mean degree of inbreeding was even higher in the 2 subgroups, which was more evident in the European Hispanics (F = 0.055) than in the African Hispanics (F = 0.046).
The global burden analysis was extended to the population subgroups, as defined by the Structure program. Alzheimer disease was significantly associated with larger mean lengths of ROH in the European Hispanic subgroup (P = .013), which also had a 2-fold greater total ROH size per individual compared with the African Hispanic subgroup (Table).
We estimated which genes were intersected by ROHs more frequently in cases vs controls. This locus-specific analysis of the entire data set did not reveal significant findings after correction for multiple testing. The most significant nominal association with AD was observed for the ROHs at chr7q33 intersecting EXOC4 (NM_021807.3) (empirical P value 1 [EMP1], .0006; 28 cases vs 8 controls). In the European Hispanic population, we observed a significant association between AD and EXOC4-intersecting ROHs even after correction for multiple tests (EMP1, .0001; EMP2, .009; 21 AD cases vs 2 controls) (Figure, A). The presence of EXOC4-intersecting ROHs was confirmed by manual inspection of the B-allele frequency and log R ratio plots in GenomeStudio Data Analysis Software (Illumina, Inc) (Supplement [eFigure 1]). Although no locus survived correction for multiple tests in the African Hispanic population, CTNNA3 (NM_013266.2) with its nested gene (LRRTM3 [NM_178011.3]) on chr10q21.3 revealed the strongest nominal association with AD (EMP1, .002). Runs of homozygosity–intersecting CTNNA3 were detected in 10 cases but not in controls (Figure, B). Based on previous copy number variation analysis, the homozygosity at the EXOC4 and CTNNA3 loci cannot be explained by the presence of large deletions.34
We did not detect any significant findings for ROHs intersecting SORL1, TREM2, APOE, or the 9 AD loci identified by recent large GWASs.4- 8 Among known AD loci, the only nominal association with AD was detected in the European Hispanic subgroup for CLU (EMP1, .038; 7 cases vs 1 control). Of note, although AD in our cohort was strongly associated with the APOE ε4/ε4 genotype,33 we found no difference in the frequencies of APOE-intersecting ROHs between cases (5 individuals, including 3 with 3/3, 1 with 2/2, and 1 with 4/4 genotypes) and controls (4 individuals, all with genotype 3/3). Hence, in a Caribbean Hispanic population, the frequency of extended APOE haplotypes is low, as in other reported populations.27,28
In the entire data set, we identified 1415 consensus regions (defined as the region shared by all individuals carrying an ROH at a given locus), but none of them was associated with AD after correction for multiple testing (Supplement [eTable 1]). However, in the European Hispanic population, a homozygous segment overlapping EXOC4 was significantly associated with AD even after correction for multiple testing (EMP1, .0001; EMP2, .002; 21 cases vs 2 controls). This consensus region was flanked by rs7793621 and rs7792010 located at chr7:132988570-133681101 (hg19) (Supplement [eFigure 2 and eTable 2]). In the African Hispanic population, the most significant nominal association was observed for the consensus region overlapping CTNNA3 (EMP1, .004), but no region survived correction for multiple testing (Supplement [eTable 3]). Consensus regions of CTNNA3 and EXOC4 have several rare, potentially functional coding variations, including missense and nonsense SNPs, according to dbSNP137 (Supplement [eTable 4]).
In addition, we evaluated the samples with ROHs at EXOC4 for genotypic identity using the PLINK grouping function. The largest subgroup with more than 99% identical genotypes consisted of 11 cases (including 9 European Hispanics) and 1 control of African Hispanic ancestry (Supplement [eTable 5]). Manual inspection of the genotypes of these 12 individuals (Supplement [eTable 6]) confirmed the shared haplotype that extends from the 5′-untranslated region to exon 14 of EXOC4, including the Sec8-exocyst domain (PF04048). The controls of European Hispanic origin (n = 326) were used to study the linkage disequilibrium structure around EXOC4. The shared haplotype observed in the 9 European Hispanics belongs to a linkage disequilibrium block that contains only EXOC4 (Supplement [eFigure 2]).
Our results suggest the existence of recessive AD risk loci in the Caribbean Hispanic population. We detected an association between AD and a larger genome-wide mean ROH size (P = .0039), which was stronger with FAD (P = .0005). The previous studies on global burden measurements of ROHs did not identify any association with AD, likely owing to the outbred nature of the investigated North American and European data sets.27,28 In contrast, our study had greater power to detect association with ROHs because we studied a more homogeneous population with noticeable inbreeding. Of note, total ROH size was 2 times longer in the European Hispanic subset than in the African Hispanic subset, indicating closer relatedness in the European Hispanic population, in which we detected the association between AD and a larger mean ROH size (P = .013).
The association of ROHs with AD could reflect the cumulative effects of multiple ROHs widely distributed throughout the genome (as in schizophrenia44) or the contribution of specific loci harboring recessive mutations and/or risk haplotypes in a subset of patients with AD. In the gene-based approach, the most significant result (EMP2, .009) was observed in the European Hispanics for EXOC4, encoding a component of the exocyst complex involved in the trafficking of the N-methyl-d-aspartate receptor.45- 47 In the African Hispanic population, the strongest but nominal association was found for CTNNA3, which is a previously reported functional and positional AD gene candidate (also known as VR22 or α-3-catenin). The CTNNA3 protein is a binding partner of β-catenin, and the complex of α/β-catenin interacts with presenilin 1, promoting cadherin processing.48- 52 The CTNNA3 gene is mapped to 10q22.2 within an AD locus (OMIM 605526).53,54 Furthermore, CTNNA3 was suggested as an APOE-dependent AD gene55- 57 and reported to influence the level of β-amyloid 42 in late-onset AD families.58 Finally, an association between AD and a block of CTNNA3 SNPs in our Caribbean Hispanic data set was recently described.59 Hence, the present study further supports the role of CTNNA3 in AD.
The EXOC4 and CTNNA3 proteins are part of a pathway responsible for the “stabilization and expansion of the E-cadherin adherens junction,” according to the Pathway Interaction Database (http://pid.nci.nih.gov/index.shtml). Also, both proteins are implicated in the “Arf6 trafficking events” pathway, involved in endocytosis and vesicular transport by acting on membrane lipid composition and actin organization.60,61
In the analysis of ROH consensus regions, only the EXOC4 locus among European Hispanics survived correction for multiple testing. In total, we observed 61 consensus regions nominally associated with AD, which could not be explained by the presence of large copy number variations.34 Some of these regions are mapped near known GWAS-significant SNPs (<1 Mb). For instance, the consensus region at chr8p21.1 (11 cases vs 1 control) is near the CLU gene.4,5,7,62 Two other consensus regions are close to the significant SNPs detected in the African American GWAS: rs10850408 (chr12q24.21) and rs2221154 (chr3p24.1).63 The regions on chr3p26.1 and chr4p15.1 were close to significant SNPs identified in a meta-analysis of AD age at onset (rs271066 and rs10517270, respectively).64 Finally, among the largest consensus regions (>1 Mb) on chr21q21.1, chr4p15.2, and chr8p23.3, the locus on chr21q21.1 (EMP1, .004; 8 AD cases vs 0 controls) contains the neural cell adhesion molecule 2 gene (NCAM2). Of note, the SNPs in NCAM2 were reported to be associated with β-amyloid levels in cerebrospinal fluid.65
In summary, we found that ROHs could significantly contribute to the etiology of AD in a population with noticeable inbreeding. Future studies would require the analysis of larger data sets, including association tests with regression analysis incorporating the principal components of a population (eg, age and sex). To characterize the molecular defects underlying AD, the next step is to conduct deep sequencing of top significant loci in a subset of samples with overlapping ROHs, followed by segregation studies in families affected by potential pathologic mutations.
Accepted for Publication: May 15, 2013.
Corresponding Authors: Ekaterina Rogaeva, PhD, and Peter St George-Hyslop, MD, Tanz Centre for Research in Neurodegenerative Diseases, University of Toronto, Tanz Neuroscience Building, 6 Queen’s Park Crescent W, Room 227, Toronto, ON M5S 3H2, Canada (firstname.lastname@example.org and email@example.com).
Published Online: August 26, 2013. doi:10.1001/jamaneurol.2013.3545.
Author Contributions:Study concept and design: St George-Hyslop, Rogaeva.
Acquisition of data: Ghani, Sato, Moreno, Mayeux, Rogaeva.
Analysis and interpretation of data: Ghani, Lee, Reitz, Rogaeva.
Drafting of the manuscript: Ghani, Lee, Reitz, St George-Hyslop, Rogaeva.
Critical revision of the manuscript for important intellectual content: Sato, Lee, Reitz, Moreno, Mayeux, St George-Hyslop.
Statistical analysis: Ghani, Lee, Reitz.
Obtained funding: Ghani, Sato, Moreno, Mayeux, St George-Hyslop, Rogaeva.
Administrative, technical, and material support: Mayeux, Rogaeva.
Study supervision: St George-Hyslop, Rogaeva.
Conflict of Interest Disclosures: None reported.
Funding/Support: This work was supported by grants R37-AG15473 (Dr Mayeux) and P01-AG07232 from the National Institute on Aging, National Institutes of Health; by the Blanchett Hooker Rockefeller Foundation; by the Charles S. Robertson Gift from the Banbury Fund (Dr Mayeux); by the W. Garfield Weston Foundation (Dr Rogaeva); by the Canadian Institutes of Health Research; by the Wellcome Trust; by the Medical Research Council; by the National Institutes of Health; by the National Institute of Health Research; by the Ontario Research Fund; and by the Alzheimer Society of Ontario (Dr St George-Hyslop).