Population structure of a Caribbean Hispanic population. The dark gray dots represent Hispanic white individuals, while the black dots represent Hispanic African individuals. The light gray dots represent individuals from other Central American countries. The Figure was generated using STRUCTURE.35
Manhattan plot of allelic association analysis in a Caribbean Hispanic population. The results of genome-wide association analysis are presented. One single-nucleotide polymorphism has a P value less than 9 × 10−6 and multiple single-nucleotide polymorphisms have P values less than 9 × 10−6.
Association between CUGBP2 and late-onset Alzheimer disease (LOAD) among homozygous APOEε4 carriers in Caribbean Hispanic subjects vs National Institute on Aging Late-Onset Alzheimer's Disease study European American subjects. Two models were used to examine the relation between CUGBP2 and LOAD, conditional on APOE ε4 status. Model 1 is homozygous APOEε4 carriers vs others; model 2 is homozygous APOEε4 carriers vs homozygous APOE ε3 carriers. The remaining subjects were excluded from the analysis. bp Indicates base pair. The base pair location on the x-axis is not in scale.
Joseph H. Lee, Rong Cheng, Sandra Barral, Christiane Reitz, Martin Medrano, Rafael Lantigua, Ivonne Z. Jiménez-Velazquez, Ekaterina Rogaeva, Peter H. St. George-Hyslop, Richard Mayeux. Identification of Novel Loci for Alzheimer Disease and Replication of CLU, PICALM, and BIN1 in Caribbean Hispanic Individuals. Arch Neurol. 2011;68(3):320–328. doi:10.1001/archneurol.2010.292
Numerous genome-wide association studies (GWAS) have been published for late-onset Alzheimer disease (LOAD).1- 13 Aside from APOE, additional candidate susceptibility genes identified using GWAS methods for LOAD have included GAB2, GALP, 14q32.13, LOC651924, PGBD1, TNK1, CR1, CLU, PICALM, and BIN1.14,15 In addition, variants in SORL1 identified by Rogaeva et al16 have been replicated in several independent cohorts and were significantly associated with LOAD in a meta-analysis.17 Difficulties inherent to the genetics of complex diseases (eg, etiologic heterogeneity, gene × environment and gene × gene interactions, and methylation) remain with these studies, and much work needs to be done. For example, the strength of association, or effect size, as measured by odds ratios (ORs) varies widely across studies and is generally small. Yet, these GWAS have identified a number of candidate genes that need to be replicated and their functional roles determined. Despite the increasing number of identified susceptibility genetic variants, a relatively large proportion of genetic variance remains unexplained.18 This has much to do with both the complexity of the genetics and inadequacy of heritability as a measure of genetic contribution. Similar phenomena have been observed in other common, complex genetic diseases and invoked a term, genetic dark matter, in GWAS.19,20
In the current study, we report the results of a GWAS in unrelated patients with LOAD and controls of Caribbean Hispanic ancestry. This population was selected because the prevalence and incidence rate of LOAD is higher than in white, non-Hispanic individuals living in the same community21 and because we had previously identified numerous large families multiply affected by LOAD. We first examined unrelated cases and controls in the Caribbean Hispanic individuals and then replicated the associations using the publicly available GWAS data from the National Institute on Aging Late-Onset Alzheimer's Disease (NIA-LOAD) Family Study (E. M. Wijsman, PhD, N. Pankratz, PhD, Y. Choi, PhD, J. H. Rothstein, MS, K. Faber, MS, R.C., J.H.L., T. D. Bird, MD, D. A. Bennett, MD, R. Diaz-Arrastia, MD, A. M. Goate, DPhil, M. Farlow, MD, B. Ghetti, MD, R. A. Sweet, MD, T. M. Foroud, PhD, and R.P.M.; for the NIA-LOAD/NCRAD Family Study Group. “Genome-wide Association of Familial Late-Onset Alzheimer's Disease Replicates BIN1 and CLU and Nominates CUGBP2 in Interaction with APOE,” unpublished data). This approach allowed us to further assess the role of genetic admixture in the Caribbean Hispanic population. To our knowledge, this is the only GWAS of Alzheimer disease that focuses exclusively on a Caribbean Hispanic population.
We studied 1093 unrelated Caribbean Hispanic individuals comprising 549 cases and 544 controls (Table 1). These participants were selected from the Washington Heights–Inwood Columbia Aging Project (WHICAP) study and the Estudio Familiar de Influencia Genetica de Alzheimer (EFIGA) study. The WHICAP study is a population-based epidemiologic study of randomly selected elderly individuals residing in northern Manhattan, New York, comprising 3 ethnic groups: non-Hispanic white, Caribbean Hispanic, and African American.21 For the current study, we restricted the study inclusion to individuals who were self-reported Hispanic of Caribbean origin and did not include non-Hispanic white or African American individuals. In addition, we selected 1 affected individual from each family participating in the EFIGA study of Caribbean Hispanic families with LOAD.22 Both studies followed the same clinical diagnostic methods.
The participants originated from the Dominican Republic and Puerto Rico. Approximately 60.3% of the affected individuals were participants in the WHICAP epidemiologic study, and the remaining 39.7% of the participants were from the EFIGA study. All unaffected individuals were participants in the WHICAP epidemiologic study. For the familial cases, we selected 1 proband from each family to create a cohort of unrelated individuals. We selected persons with definite or probable LOAD over those with possible LOAD to limit the effects of comorbidity.
Data were available from medical, neurological, and neuropsychological evaluations23 collected from 1999 through 2007. The standardized neuropsychological test battery covered multiple domains and included the Mini-Mental State Examination,24 the Boston Naming Test,25 the Controlled Word Association Test26 from the Boston Diagnostic Aphasia Evaluation,27 the Wechsler Adult Intelligence Scale–Revised similarities subtest,28 the Mattis Dementia Rating Scale,29 the Rosen Drawing Test,30 the Benton Visual Retention Test,31 the multiple-choice version of the Benton Visual Retention Test,31 and the Selective Reminding Test.32
The diagnosis of dementia was established on the basis of all available information gathered from the initial and follow-up assessments and medical records. The diagnosis of LOAD was based on the National Institute of Neurological Disorders and Stroke–Alzheimer's Disease and Related Disorders Association criteria.33
Single-nucleotide polymorphisms (SNPs) were genotyped at the Illumina Genotyping Service Center, San Diego, California, using Illumina HumanHap 650Y chips. From the 650Y chips, 658 610 SNP markers were originally genotyped. Quality control measures for SNP genotype were performed using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/). We excluded SNPs with the following characteristics: missing genotype rate more than 20%; minimum allele frequency less than 1%; Hardy-Weinberg equilibrium test34 at a P value less than .0001 in controls. Although the 650Y chip includes additional SNPs for Yoruban individuals, we initially used less stringent criteria for quality control than others because the Illumina SNP chips are optimized for white populations. Furthermore, we wanted to reduce the likelihood of false-negative results. To limit the possibility that positive signals were caused by SNPs with poor calling rate, we lowered the threshold for the missing genotype rate to 5%. This screen reduced the total number of analyzed SNPs by 0.26%. None of the SNPs of main interest (ie, P value <9 × 10−6 shown in Table 2) had low genotype rates. Following all quality control measures, we analyzed 627 380 autosomal SNPs.
We applied 2 methods to estimate ancestry proportion in each subject, and thus population stratification, in this case-control data set: STRUCTURE version 2.235 and identity-by-state–based clustering method using PLINK version 1.0536 (eAppendix). Briefly, we used 500 unlinked SNPs for the STRUCTURE analysis35 and all available SNPs (n = 627 380 autosomal SNPs) for the PLINK analysis to assess underlying population structure. To see better representation of the geographic separation from source populations, we augmented the 1093 Hispanic samples with 210 subjects from the HapMap Web site (http://www.hapmap.org), which included 60 European American, 60 Yoruban, and 90 East Asian individuals. Our analyses revealed that the assignment of cluster from the STRUCTURE program was comparable with that from the PLINK program (data not shown). For all subsequent association analyses, we used the cluster information obtained from the PLINK analysis to correct for population stratification. The λ genomic inflation factor was not inflated (1.0378 after population stratification correction, eFigure 1).
We conducted single-point allelic association analysis using the Mantel-Haenszel χ2 test statistic, which tests for SNP-disease association conditional on population subcluster estimated from the PLINK analysis described earlier (Table 2). In addition, we performed a multivariate logistic regression analysis, adjusted for age, sex, education, and population stratification, using PLINK (Table 3). For the analysis of all subjects only, we adjusted for the presence or absence of APOE along with the earlier-mentioned 4 covariates. To determine whether the associations were caused by statistical artifact, we computed the P value for 1 million replications to derive empirical P values for the top 23 SNPs that showed the strongest support for association with LOAD. For this purpose, we randomly shuffled affection status for each subject to create the null distribution and assess the likelihood of false-positive results for each SNP.
We had prioritized candidate SNPs by selecting SNPs that had a nominal P value of 9 × 10−6 or lower. While this cut point does not reach the Bonferroni-corrected genome-wide P value of .05, this cut point helped us to prioritize SNPs of importance. To determine whether the findings from the Caribbean Hispanic individuals could be replicated in an independent data set, we examined the publicly available GWAS data from the NIA-LOAD study (Wijsman et al, unpublished data) (Table 2). The details of the demographic and clinical characteristics of the NIA-LOAD participants who were included in the GWAS are provided in their report (Wijsman et al, unpublished data). Briefly, the study first examined self-reported European American individuals: 2124 individuals from the NIA-LOAD study and 325 individuals from the National Cell Repository for Alzheimer's Disease (NCRAD) study. Those were augmented with 1186 unrelated individuals from the NIA-LOAD study and 204 individuals from the NCRAD database. These self-reported European American individuals were subsequently clustered into 3 groups (northern European, Ashkenazi Jewish, and southern European) based on a principle component analysis. Subsequent analyses took ethnic background into consideration. In the present study, we specifically compared the results from this GWAS in Caribbean Hispanic individuals against the results from 3 subanalyses in the NIA-LOAD GWAS: case-control analysis of unrelated individuals; family-based analysis stratified by APOE ; and family-based analysis stratified by ethnicity. Table 2 presents the P values for each SNP. We also list SNPs located within 5 kilobases that have a nominal P value less than .05.
We subsequently identified a set of self-reported Caribbean Hispanic individuals from the NIA-LOAD data set. These include an additional 116 unrelated patients with LOAD and 70 unrelated controls who were not included in previous analyses. To check comparability between the 2 Caribbean Hispanic data sets and to check SNP calling between the Illumina 650Y and 610K SNP chips, we compared allele frequencies for common randomly selected SNPs. Allele frequencies between the 2 data sets did not differ significantly.
We performed separate analyses focusing on SNPs in the candidate genes that were identified from previous GWAS, including CR1, CLU, PICALM, and BIN1, for the significant genetic associations reported and replicated in 3 previous studies.7,9,13 For these genes, we performed 4 analyses: Mantel-Haenszel χ2 test taking into account population stratification, APOE ε4– restricted analysis (ie, restricted to individuals with at least 1 copy of ε4 compared with those without), and Mantel-Haenszel χ2 test taking into account the presence or absence of APOE ε4 (Table 4). In addition to those 4 genes, we followed up the novel genetic association identified from the NIA-LOAD GWAS (Wijsman et al, unpublished data). The NIA-LOAD GWAS identified the CUGBP2 gene to be significantly associated with LOAD among a subset of samples with homozygous APOE ε4 carriers. Herein, we evaluated the association using 2 different models to account for its association with the APOE ε4 genotype (Table 4). Under model 1, homozygous APOE ε4 carriers were considered to have the putative genotype and all others do not. Under model 2, homozygous APOE ε4 carriers were considered to have the putative genotype, while homozygous APOE ε3 carriers, the most common isoform, were considered to have a wild type. The remaining subjects were excluded in the analysis.
Seventy percent of the participants were women. The mean (SD) age at onset of LOAD was 79.98 (8.0) years, and 18.2% of the subjects were carriers of an APOEε4 allele. The mean (SD) age at last examination of the controls was 78.87 (6.4) years. The analysis testing for population stratification revealed that the 1093 Hispanic individuals comprised 658 individuals (60.2%) who were likely to be of European white ancestry, 401 (36.7%) who were likely to be of African ancestry, and 34 (3.1%) who were unrelated to the prior 2 groups and from other Latin American countries (Figure 1).
None of the SNPs reached genome-wide statistical significance at a nominal P value of 7.97 × 10−8 or lower. The results from the population stratification–adjusted single-point analysis are shown in a Manhattan plot (Figure 2). Twenty-three SNPs had P values less than 9 × 10−6 in at least 1 of the 3 analyses, including all combined subjects, carriers of the APOE ε4 allele, and noncarriers of the APOE ε4 allele (Table 2). Of those, the strongest evidence for association was observed for rs9945493 (P = 1.7 × 10−7; OR, 0.33; 95% confidence interval, 0.21-0.51) on 18q23. For each SNP, we calculated ORs and 95% confidence intervals as well as empirical P values based on 1 million replicates (Table 3). As observed in other GWAS, ORs ranged from 0.33 for rs9945493 to 1.87 for rs1117750 for all subjects.
We then examined the same 23 SNPs from Table 2 in an independent data set by comparing the results from each of our 3 analyses against data from the NIA-LOAD GWAS, which was restricted to self-reported European American individuals (Wijsman et al, unpublished data). Five SNPs (rs4669573 and rs10197851 on 2p25.1, rs11711889 on 3q25.2, rs1117750 on 7p21.1, and rs7908652 on 10q23.1) from the list of 23 had a nominal P value less than .05 in at least 1 of the 3 analyses in the NIA-LOAD GWAS (Table 2, footnote e); rs4669573 is located within the HPCAL1 (hippocalcin-like 1) gene, and the ODC1 gene is located 100 kilobases away, and rs1117750 and several flanking SNPs that supported allelic association were located within the DGKB (diacylglycerol kinase, β 90 kDa) gene. Lastly, rs7908652 is located proximal to multiple genes, including GHITM (growth hormone inducible transmembrane protein), C10orf99 (chromosome 10 open reading frame 99), PCDH21 (protocadherin 21), LRIT2 (leucine-rich repeat, immunoglobulin-like, and transmembrane domains 2), LRIT1 (leucine-rich repeat, immunoglobulin-like, and transmembrane domains 1), and RGR (retinal G protein-coupled receptor) (eFigure 2).
For CLU, we observed that rs881146 (Pnominal = .00213; Table 4, footnote c) was significantly associated with LOAD in population-stratified analysis and among APOEε4 carriers (Table 4). However, rs11136000 in CLU, reported both by Harold et al7 and Lambert et al9 to be associated with LOAD in European and American white individuals, was not associated with LOAD herein. For PICALM, rs17159904 was marginally associated with LOAD in population stratification–adjusted and APOE- adjusted analyses. For BIN1, we observed a positive association in ε4 carriers for rs7561528 (Pnominal = .00536).
We evaluated an interaction model between APOE and CUGBP2 to follow up the putative gene × gene interaction finding in the NIA-LOAD study (Wijsman et al, unpublished data)(Figure 3). In that study, rs201119 in the CUGBP2 gene was significantly associated with LOAD only among individuals with a homozygous ε4 genotype (Pnominal = 1.52 × 10−8), but this SNP was not significantly associated with LOAD when all subjects were considered (Pnominal = .726 for allelic association and P = .2607 for genotype association). Because we had a smaller sample size than the NIA-LOAD GWAS, we applied 2 somewhat different models to test whether the allelic association between CUGBP2 and LOAD was restricted to carriers of APOEε4 and absent in non– APOEε4 s carriers. For this purpose, we performed an interaction model using PLINK in both the Caribbean Hispanic and NIA-LOAD samples. As shown in Figure 3, in the Caribbean Hispanic individuals, we observed a modest interaction between the genotype at rs20119 in the CUGBP2 gene and APOEε4 genotype (Pnominal = .04898 under model 2). This is the SNP that showed the original allelic association in the NIA-LOAD GWAS samples. For the same SNP, the NIA-LOAD samples had a P value of .00012 under model 1 and .00016 under model 2, supporting the association under our models for both data sets. When we examined all SNPs in CUGBP2 in both data sets, however, we observed 2 different regions with strongest signals (Figure 2). The SNP rs2242451 showed the strongest support under model 2 (Pnominal = .00324) in the Caribbean Hispanic samples, while in the NIA-LOAD samples, the strongest signal came from rs201119 and adjacent SNPs.
We report several novel candidate loci that may harbor putative disease variants in Caribbean Hispanic individuals with LOAD and confirmed associations between LOAD and the 4 genes that have been previously reported. These 4 novel loci (5 SNPs) include multiple genes, and further examination is necessary to verify their involvement in LOAD. We replicated the allelic association between LOAD and CUGBP2 in homozygous carriers of the APOEε4 allele reported by Wijsman and colleagues (Wijsman et al, unpublished data). This gene was studied because the strongest signal was observed in homozygous ε4 carriers and this region on chromosome 10p14 contains the gene CUGBP2. CUGBP2 has 1 isoform that is expressed predominantly in neurons, with experimental evidence suggesting involvement in apoptosis in the hippocampus.37 Further, it is involved in posttranscriptional RNA binding activities as well as pre–messenger RNA alternative splicing. Based on structural similarity, it is speculated that this gene may be involved in increasing COX2 messenger RNA. Although the current study does support association with LOAD, the pattern of the associated SNPs differed between the 2 cohorts. The difference in genetic architecture between non-Hispanic and Hispanic populations is the most likely explanation for the fact that the associated SNPs differed between the 2 populations.
We found that the 4 candidate loci that were strongly associated with LOAD and were replicated in the NIA-LOAD cohort are located near genes that could be biologically relevant to LOAD. HPCAL1 on 2p25.1 is a calcium-binding protein expressed in the brain and has been associated with hypertension in Japanese individuals,38 which in turn is associated with LOAD risk. The region 10q23.1 includes 3 genes that are expressed in the brain and have been reported by Grupe et al,39 including PCDH21 (believed to be involved in the neuronal maintenance), LRIT1, and RGR.
We replicated associations between LOAD and SNPs in 3 of the 4 genes that were previously reported to be significant at the genome-wide level, namely CLU, PICALM, and BIN1. However, the associated SNPs between these candidate genes and LOAD were not necessarily identical in the Caribbean Hispanic individuals compared with a European American data set. Nonetheless, the overall support for the 3 genes is enhanced by the observation that the allelic association extends to an ethnically distinct population.
CLU, believed to be involved in modulation of inflammation and lipid metabolism, was associated with LOAD in carriers of ε4 (P = .00213). More than a decade ago, we examined CLU (also known as APOJ) as a risk factor for LOAD because it shares similar functional roles as APOE, including cholesterol binding and involvement in inflammation or injury.40 Based on a small set of coding polymorphisms in APOJ, Tycko and colleagues40 did observe a positive association in 1 homozygous polymorphism, but this association was no longer significant when all subjects with at least 1 copy of the APOEε4 allele were excluded. Further, they observed a significant difference in allele frequencies by race, and the present study also shows different linkage disequilibrium patterns between the Caribbean Hispanic individuals and the NIA-LOAD cohorts (eFigure 3). Thus, the inconsistent findings across studies could be attributed to an interaction between APOE and APOJ, small sample size, different distribution of ethnic background in the participants, or any combination of these factors. The present study observed an association between CLU and LOAD in the presence of APOEε4 (Table 4). This is consistent with the much larger study by Lambert and colleagues9 but not with the study by Harold et al.7
BIN1, a gene expressed in the central nervous system and reported to activate a caspase-independent apoptotic process, was also associated with LOAD in only carriers of ε4 (P = .00536). PICALM is reported to be involved in the neurotransmitter release processes, thereby affecting memory functions.41,42
Together these 3 genes suggest that they contribute to the overall LOAD phenotype. However, the measures of association are unlikely to be consistent across data sets, since in addition to allelic differences among race groups, significant differences in the distribution of vascular and inflammation risk factors can also alter the observed genotype-phenotype relations, even after adjusting for other known risk factors including age, sex, and education.43,43
The current study has some limitations. First, this study, based on a modest sample size of Caribbean Hispanic individuals, does not have power to detect rare variants with weak effects; thus, some risk variants may have been missed. Based on the original GWAS set, the current study has 80% power, genome-wide, to detect alleles with a frequency of 0.35 or higher when the OR is 1.5. When the OR for SNPs is 1.7, this study has 80% power to detect SNPs with an allele frequency of 0.25 or higher. When we combined both Caribbean Hispanic data sets (specifically, one from our GWAS along with the Caribbean Hispanic subset that is part of the NIA-LOAD GWAS), the current study has 80% power genome-wide to detect SNPs with somewhat lower allele frequencies. For a SNP with an OR of 1.5, 80% power can be achieved for SNPs with an allele frequency of 0.3 or higher. For a SNP with an OR of 1.7, 80% power can be achieved for SNPs with an allele frequency of 0.2 or higher. Power calculation was carried out assuming an additive model with SNP minor allele frequency being comparable with the allele frequency of the putative variant (http://pngu.mgh.harvard.edu/~purcell/gpc/cc2.html). Second, independent replication of the candidate SNPs in Caribbean Hispanic individuals who share comparable genetic architecture would have further strengthened the validity of the findings because the likelihood of replicating the same allele within the same SNP would be higher than in other ethnic groups. For this reason, we added a small set of Caribbean Hispanic individuals from the NIA-LOAD GWAS data set who were evaluated using the same diagnostic tools. However, the sample size remained relatively modest. When we evaluated the candidate SNPs in an independent sample of European American individuals with different genetic background (NIA-LOAD GWAS), often allelic associations for the same SNPs were modest, but different SNPs within the gene supported allelic association. However, genetic associations using a cohort with a different ethnic background strengthen the observed association since (1) it is not unexpected to have multiple variants within a gene associated with a disease (eg, PSEN1) and (2) the findings may be generalizeable to a wider set of populations. These findings need to be further evaluated using functional genetics approaches to evaluate the validity of observed association.
We used a dense set of SNPs to survey the genome to identify novel loci and to assess support for allelic association with BIN1, CLU, and PICALM. The current cohort extends previous GWAS of non-Hispanic white populations by exploring allelic association in an admixed cohort with a different set of genetic and environmental risk factors. The confirmation in the present study further strengthens the associations between variants in these genes and LOAD. It also supports the role of other genetic (eg, APOE) and environment factors modulating the genetic variant, especially when each variant may only have a small effect size. We also identified novel candidate genes (eg, HPCAL1, DGKB) in a Caribbean Hispanic cohort and replicated the association in an independent ethnically different data set. These genes need to be examined further in independent data sets.
Correspondence: Richard Mayeux, MD, MSc, Sergievsky Center, 630 W 168th St, New York, NY 10032 (firstname.lastname@example.org).
Accepted for Publication: September 2, 2010.
Published Online: November 8, 2010. doi:10.1001/archneurol.2010.292
Author Contributions: Drs Lee, Cheng, Barral, and Reitz contributed equally to this work. Study concept and design: Lee, Barral, St. George-Hyslop, and Mayeux. Acquisition of data: Medrano, Lantigua, Jiménez-Velazquez, Rogaeva, and Mayeux. Analysis and interpretation of data: Lee, Cheng, Barral, Reitz, and Mayeux. Drafting of the manuscript: Lee, Reitz, Jiménez-Velazquez, and Mayeux. Critical revision of the manuscript for important intellectual content: Lee, Cheng, Barral, Medrano, Lantigua, Rogaeva, St. George-Hyslop, and Mayeux. Statistical analysis: Lee, Cheng, and Barral. Obtained funding: Jiménez-Velazquez, Rogaeva, St. George-Hyslop, and Mayeux. Administrative, technical, and material support: Medrano and Lantigua. Study supervision: Lee, Medrano, Lantigua, and Mayeux.
Financial Disclosure: None reported.
Funding/Support: This work was supported by grants R37-AG15473 and P01-AG07232 from the National Institutes of Health and the National Institute on Aging, the Blanchett Hooker Rockefeller Foundation, Charles S. Robertson Gift from the Banbury Fund (Dr Mayeux), the Canadian Institutes of Health Research, the Howard Hughes Medical Institute, the Canadian Institutes of Health Research–Japan Science and Technology Trust, the Alzheimer Society of Ontario, Wellcome Trust, and Ontario Research Fund (Dr St. George-Hyslop). Dr Reitz was also supported by Paul B. Beeson Career Development Award K23AG034550.
Online-Only Material: The eAppendix and eFigures are available at http://www.archneurol.com.