Hypothetical tiered design strategy for a large-scale genome-wide association study of Alzheimer disease (AD). SNP indicates single-nucleotide polymorphism.
Waring SC, Rosenberg RN. Genome-Wide Association Studies in Alzheimer Disease. Arch Neurol. 2008;65(3):329-334. doi:10.1001/archneur.65.3.329
Copyright 2008 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2008
The genetics of Alzheimer disease (AD) to date support an age-dependent dichotomous model whereby earlier age of disease onset (< 60 years) is explained by 3 fully penetrant genes (APP[NCBI Entrez gene 351], PSEN1[NCBI Entrez gene 5663], and PSEN2[NCBI Entrez gene 5664]), whereas later age of disease onset (≥ 65 years) representing most cases of AD has yet to be explained by a purely genetic model. The APOEgene (NCBI Entrez gene 348) is the strongest genetic risk factor for later onset, although it is neither sufficient nor necessary to explain all occurrences of disease. Numerous putative genetic risk alleles and genetic variants have been reported. Although all have relevance to biological mechanisms that may be associated with AD pathogenesis, they await replication in large representative populations. Genome-wide association studies have emerged as an increasingly effective tool for identifying genetic contributions to complex diseases and represent the next frontier for furthering our understanding of the underlying etiologic, biological, and pathologic mechanisms associated with chronic complex disorders. There have already been success stories for diseases such as macular degeneration and diabetes mellitus. Whether this will hold true for a genetically complex and heterogeneous disease such as AD is not known, although early reports are encouraging. This review considers recent publications from studies that have successfully applied genome-wide association methods to investigations of AD by taking advantage of the currently available high-throughput arrays, bioinformatics, and software advances. The inherent strengths, limitations, and challenges associated with study design issues in the context of AD are presented herein.
Alzheimer disease (AD) is the most common cause of dementia and the most prevalent neurodegenerative disorder associated with aging.1Alzheimer disease is a heterogeneous disorder with a complex etiology owing to genetic and environmental influences as causal or risk modifiers. The neuropathologic hallmarks of disease are extracellular amyloid plaques and intracellular neurofibrillary tangles of hyperphosphorylated tau protein.2Only 10% of AD cases occurring before 60 years of age (early-onset AD) are due to rare, fully penetrant (autosomal dominant) mutations in 3 genes: Aβ precursor protein (APP) on chromosome 21,3presenilin 1 (PSEN1) on chromosome 14,4and presenilin 2 (PSEN2) on chromosome 1.5,6In contrast, most cases of AD are later in onset (≥ 65 years of age) (late-onset AD), are nonfamilial, and are likely the result of highly prevalent genetic variants with low penetrance.7To date, the only genetic risk factor for late-onset AD remains the apolipoprotein E gene (APOE), specifically the ε4 allele, which is moderately penetrant, accounting for up to 50% of cases.8
However, a robust literature reports numerous putative genetic risk alleles and promising genetic variants. Recent reports from individual studies reveal significant associations with the sortilin-related receptor (SORL1[NCBI Entrez gene 6653])9,10and glycine-rich protein 2–associated binding protein 2 (GAB2[NCBI Entrez gene 9846])11on chromosome 11; death-associated protein kinase 1 (DAPK1[NCBI Entrez gene 1612]),12ubiquilin 1 (UBQLN1[NCBI Entrez gene 299798]),13and adenosine triphosphate–binding cassette transporter 1, subfamily A (ABCA1[NCBI Entrez gene 19]), on chromosome 914; and low-density lipoprotein receptor–related protein 6 (LRP6[NCBI Entrez gene 4040]) on chromosome 12.15All of these putative variants still lack replication in large representative populations but have relevance to neuropathologic mechanisms and pathways that may be associated with AD pathogenesis (Table 1).
A large meta-analysis from the AlzGene database16representing 1055 polymorphisms and 355 genes reported in the literature as of August 2006 revealed the following 13 additional potential AD-susceptibility genes: angiotensin I converting enzyme (ACE[NCBI Entrez gene 1636]); cholinergic receptor, nicotinic, beta 2 (CHRNB2[NCBI Entrez gene 1141]); cystatin C (CST3[NCBI Entrez gene 1471]); estrogen receptor 1 (ESR1[NCBI Entrez gene 2099]); glyceraldehyde-3-phosphate dehydrogenase, spermatogenic (GAPDHS[NCBI Entrez gene 26330]); insulin-degrading enzyme (IDE[NCBI Entrez gene 3416]); 5,10-methylenetetrahydrofolate reductase (MTHFR[NCBI Entrez gene 4524]); nicastrin (NCSTN[NCBI Entrez gene 23385]); prion protein (PRNP[NCBI Entrez gene 5621]); presenilin 1 (PSEN1); transferrin (TF[NCBI Entrez gene 7018]); mitochondrial transcription factor A (TFAM[NCBI Entrez gene 7019]); and tumor necrosis factor (TNF[NCBI Entrez gene 7124]).17All are associated with relevant biological mechanisms and pathways but await replication to further elucidate their utility as significant markers for AD.
Understanding the role that genetic defects play in the pathogenesis of AD has been a major focus of investigation for several years. These collective efforts have provided valuable insights into the genetic and molecular mechanisms associated with AD. Until recently, most reports have been from linkage analyses and studies that have examined the association of single-nucleotide polymorphisms (SNPs), usually in a single candidate gene.16However, the completion of the Human Genome Project,18along with the development of public databases to catalog variation between individuals (SNPs) and advances in high-throughput, high-density genotyping technology have led to a sharp increase in the number of studies now examining a large number of SNPs simultaneously in hypothesis-independent designs. Indeed, genome-wide association studies (GWASs) have emerged as an increasingly effective tool for identifying genetic contributions to complex diseases and represent the next frontier for furthering our understanding of the underlying etiologic, biological, and pathologic mechanisms associated with chronic complex disorders. There have been early success stories for diseases such as macular degeneration19- 21and diabetes mellitus.22,23Whether this will hold true for a genetically complex and heterogeneous disease such as AD is not known, although indications from early reports are encouraging.
This review considers recent publications from studies that have successfully applied genome-wide association methods to investigations of AD by taking advantage of the currently available high-throughput arrays, bioinformatics, and software advances. The inherent strengths, limitations, and challenges associated with study design issues are presented herein to help guide future GWASs in AD.
The recent decline in the cost of genotyping coupled with technological advances that now allow simultaneous examination of up to 1 000 000 SNPs in a single high-throughput assay has led to an explosion of GWASs of common chronic diseases, including AD. Initial studies focusing on AD have taken advantage of already amassed robust data from case-control and longitudinal cohorts with reliable information on the phenotype of interest (eg, age at onset, diagnosis, and cognitive profile) and DNA available for genotyping 24- 27
A GWAS of neuropathologically confirmed AD cases and control subjects from the United States and the Netherlands revealed a significant SNP (rs4420638) in linkage disequilibrium to the APOEε4 variant on chromosome 19, thus providing support for the APOElocus as the major susceptibility gene for late-onset AD, with an odds ratio significantly greater than that for any other locus in the human genome.24That study also supported the feasibility of using the most recently available ultrahigh-density GWAS (502 627 SNPs) for AD and other heritable phenotypes. A subsequent study on the same population enriched with clinical samples divided into APOEε4 carriers and noncarriers for discovery and replication demonstrated a modifying effect on the risk of AD associated with the GAB2among APOEε4 carriers but not among noncarriers.25
The first GWAS to appear in the literature, and which focused on putative functional variants (SNPs) in AD, examined 17 343 markers in a robust sample of 1808 late-onset cases of AD and 2062 controls from the United States and the United Kingdom. Results revealed 19 significant markers associated with the risk of AD. Three of these were galaninelike peptide precursor (GALP[NCBI Entrez gene 85569]) on chromosome 19q13.42, nonreceptor tyrosine kinase (TNK1[NCBI Entrez gene 8711]) on chromosome 17p13.1, and phosphoenolpyruvate carboxykinase (PCK1[NCBI Entrez gene 5105]) on chromosome 20q13.31, all with potential relevance to AD.26The fact that 3 of the 19 SNPs were in linkage disequilibrium to the APOEε4 locus lends validity to the study design. However, replication in large appropriately designed studies is necessary for confirmation and further elucidation of their relevance in AD pathogenesis.
A GWAS from a genetically isolated population in the Netherlands of 103 patients with late-onset AD and 170 first-degree relatives from a complex pedigree of 4645 members focused on genetic variants associated with cognitive endophenotypes in addition to risk of AD.27This was the first study to show significant linkage to 3q23 markers associated with AD. Findings also confirmed previous linkage to 1q25, 10q22-24, and 11q25 associated with AD. Using cognitive function as the endophenotype, the authors27concluded that RGSL2(NCBI Entrez gene 84227), RALGPS2(NCBI Entrez gene 55103), and C1orf49(NCBI Entrez gene 84066) on chromosome 1; HTR7(NCBI Entrez gene 3363), MPHOSPH1(NCBI Entrez gene 9585), and CYP2C(NCBI Entrez gene 1559) on chromosome 10; and NMNAT3(NCBI Entrez gene 349565) and CLSTN2(NCBI Entrez gene 64084) on chromosome 3 are likely to be the relevant genes. They failed to confirm SORL1on chromosome 11 but reported OPCMS(NCBI Entrez gene 4978) and HNT(NCBI Entrez gene 50863) to be the relevant genes. Another GWAS of endophenotypes with relevance to AD from the Framingham group focused on changes in brain volumetric measures on magnetic resonance imaging and cognitive performance in 705 individuals free of dementia and stroke.28They found significant associations between SORL1(rs1131497) and abstract reasoning and between CDH4(rs1970546) and total cerebral brain volume. They also reported polymorphisms with 28 of 163 candidate genes for AD, stroke, and memory impairment for the endophenotypes studied. Overall, these results indicate that genetic variants may be detectable with subclinical phenotypes known to increase the risk of developing clinical neurological disease.28
Although these studies represent an excellent start, with several new associations already detected within the past year, they constitute only the beginning, as many of these putative associations await validation in sufficiently large and representative populations before their utility as genetic markers for disease is completely understood. An important consideration is that the results thus far show very modest effect sizes in the order of relative risks (or odds ratios) of less than 2.0, and most in the range of 1.1 to 1.5. As expected with most common complex diseases with onset later in life,29late-onset AD likely results from genetic variants with modest effect size that, while statistically significant, are neither sufficient nor necessary to explain all occurrences of disease.30It is therefore incumbent that the synthesis and interpretation of weak but consistent associations that are biologically plausible afford distinction from spurious associations that may result from bias and inferior study design.
One of the most important and challenging steps in designing large-scale GWASs on currently available high-density genotyping platforms is assembling sufficiently large study populations that will allow robust gene discovery and replication. Sample size and power calculations for GWASs are statistically rigorous and beyond the scope of this review. Interested readers are referred to recent publications on the subject that take into account the density and coverage of newer available genotyping platforms and software.31,32These reports and others point to the importance of issues that must be taken into account when planning a study.
Power to detect a causal variant increases with sample size and magnitude of effect (relative risk or odds ratio) and can vary by genotyping platform and the population studied. The effect of stratification on APOEand the effect of population stratification need to be taken into account. Owing to the large number of genetic variants (SNPs) examined and the lack of independence with SNPs in linkage disequilibrium to each other, multiple testing and type I error are even greater threats to validity for the modest effect sizes expected. In addition, power calculations should take into account misclassification error, not just of the genotype but of clinical misclassification error associated with the phenotype, particularly when the trait of interest is disease status in the case of AD (as discussed in the “Phenotype” subsection).
To overcome these inherent issues, large sample sizes of well-characterized phenotypes and tiered strategies for discovery and replication have proved to be the most successful strategy for studies appearing in the recent literature. Most studies, such as those funded through the Genetic Association Information Network33and the Wellcome Trust Case Control Consortium,34have included 2000 to more than 10 000 cases and controls to identify truly significant results. Considering the current cost of genotyping large-scale samples and the availability of large numbers of well-characterized research subjects, some of which have already been included in GWASs, a similar approach is both timely and feasible for AD, albeit not without challenges and limitations.
Although the initial analyses for most GWASs of AD may use disease status as the phenotypic trait of interest, testing for associations with secondary traits such as clinical and cognitive features, biomarkers including neuroimaging, and neuropathological features has considerable merit. Because of established criteria for a diagnosis of AD and the need for standardization across sites for large multicenter studies, considerable effort goes into making a clinical diagnosis of AD. However, establishing an appropriate control continues to be a challenge. The presence of AD neuropathologic changes in brains of nondemented individuals at autopsy is not uncommon, and a number of patients with a clinical diagnosis of probable AD also have cerebrovascular comorbidities that may influence cognition. This has important implications for designing GWASs, because most genetic defects will be associated with neurobiological and biochemical processes that lead directly to clinical AD through neurodegenerative changes (eg, formation of neurofibrillary tangles and amyloid deposition) or indirectly through other mechanisms (eg, cerebrovascular disease). Indeed, the potential for biological heterogeneity among individuals with clinical AD and among nondemented controls presents a challenge in the design of a GWAS and in the interpretation of the findings. Therefore, strategies to minimize the effects of misclassification error associated with the phenotype of interest need to be developed for implementation in future large-scale studies.
Ideally, a GWAS for AD would begin with a discovery sample that is sufficiently large with a wealth of quantitative and qualitative information available for phenotypic structuring. Another large sample with similar phenotypic structure would then be used to further narrow the most significant SNPs resulting from discovery genotyping to identify a much smaller number of significant SNPs. Those SNPs would then be examined in a larger sample with similar phenotypic structure for replication (Figure). Finally, the most significant SNPs identified would then need to be subjected to further analyses (gene expression, sequencing, fine mapping, and functional studies) for final confirmation of relevant putative causal variants. Guidelines for replication of genotype-phenotype associations and GWASs have been published recently.35,36
Although the numbers are arbitrary, because of the number of traits to be examined and the modest effect size expected for most traits, it is clear that multiple large samples for discovery and replication are necessary to ensure a successful GWAS.37,38This will require forming consortia that will afford pooling across studies because it is unlikely that a single study will have adequate power to unequivocally identify novel genetic risk factors for AD. To take advantage of available data on a large number of patients with AD and controls, plans to form consortia such as the National Institute on Aging (NIA) Genome-Wide Association Study Consortium are already under way. These collective efforts will attempt to design and complete GWASs in the near future to provide an excellent and timely foundation for numerous studies to follow. These initial efforts will likely include GWASs already completed or in progress such as the Translational Genomics Research Initiative study of AD, the National Institute of Mental Health AD sample, the Framingham Study, and the Texas Alzheimer's Research Consortium among other publicly funded and public-private initiatives in progress or planned. Valuable resources such as the National Cell Repository of AD, the NIA National Alzheimer Coordinating Center, and ongoing studies such as the Adult Changes in Thought, the Washington Heights–Inwood Columbia Aging Project, and the Chicago Healthy Aging Project should also be considered, along with ongoing biomarker and neuroimaging studies such as the Alzheimer Disease Neuroimaging Initiative. All of these resources would be expected to have readily available DNA from a large sample of well-characterized research participants and could provide a robust pooled sample that is immediately available to initiate a large-scale GWAS.
The potential contribution from large family studies such as the NIA Late Onset Alzheimer Disease centers, particularly for discovery, should also be evaluated. Although the “common disease–common variant” hypothesis39provided the conceptual framework for the HapMap Project on which the commercially available high-density genotyping platforms are based, multiple rare variants would not be identified by the tagging SNPs. However, their role in susceptibility to common complex diseases may be significant.40,41
This discussion focuses on replication-based multistage design analysis. However, some groups recommend joint analysis as an alternative strategy for GWASs, whereby data from both stages are analyzed together. This strategy may be even more efficient and achieve greater power when a large proportion of the study sample is genotyped in stage 1 and a relatively large proportion of markers is selected for stage 2. An excellent review comparing joint analysis with replication-based analysis is provided by Skol et al.42The choice of analysis for multistage design is an important study design issue that must be taken into account to minimize genotyping cost and maximize statistical power within the context of the available sample size.
Assembling large representative samples for discovery and replication is certainly not without challenges. The extent to which individual institutional review board–approved protocols allow sharing of banked DNA and genetic information must be verified. Logistical and technical issues associated with locating and assembling large DNA sample sets and subsequent genotyping is a large and expensive undertaking. Cataloguing all available data required to support the various endophenotypes to be tested initially and transforming these data into a standardized format for entry into a unified database require considerable planning, technical support, and funding. As evidenced by the success of studies already completed, these initial large-scale studies are expected to produce a number of candidate polymorphisms and haplotypes that will require replication and confirmation in other samples. Expanding current studies now to include comparably collected information on anticipated significant endophenotypes and collection of biospecimens and neuroimaging will require tremendous resources and planning. Research teams populated with subject matter experts representing the entire spectrum of expertise required must be in place to ensure success of such an undertaking over the next 5 years and beyond and therefore becomes a matter of funding and prioritization of research agendas among all respective stakeholders.
Although there are challenges to assembling a large representative population for discovery and replication studies, the prospect of new insights that will further our understanding of the role of genetic defects in the neuropathogenic mechanisms that lead to AD make the GWAS an attractive and powerful tool worth the effort and investment. However, there is tremendous work yet to be done, not only in designing appropriately large and representative studies that will lead to numerous new gene discoveries but also in translating these discoveries to advance clinical practice. Indeed, the time has come to leverage the wealth of advancing methods in genetics, neuroscience, bioinformatics, statistics, and genetic epidemiology that will lead to better treatment outcomes, drug discovery, and prognosis for the millions of individuals already affected by or at risk of developing AD.
Correspondence:Stephen C. Waring, DVM, PhD, Department of Epidemiology, The University of Texas School of Public Health, 1200 Herman Pressler, RAS-E629, Houston, TX 77030 (email@example.com).
Accepted for Publication:November 24, 2007.
Author Contributions:Study concept and design: Waring and Rosenberg. Drafting of the manuscript: Waring. Critical revision of the manuscript for important intellectual content: Waring and Rosenberg.
Financial Disclosure:None reported.
Funding/Support:This study was supported by P30AG12300-14 from the NIA, National Institutes of Health (Alzheimer's Disease Center, The University of Texas Southwestern Medical Center [Dr Rosenberg]), and by funds from the state of Texas through the Texas Council on Alzheimer’s Disease and Related Disorders (Drs Waring and Rosenberg).
Additional Contributions:The NIA Genetics of Alzheimer's Disease Workgroup discussants provided some of the conceptual framework presented herein.