Summary of unadjusted and apolipoprotein E ϵ4 (APOE ε4) (the only known genetic risk factor associated with late-onset Alzheimer disease) conditional 2-point linkage analysis in the dominant (A) and recessive (B) models. LOD indicates logarithm of odds.
Multipoint linkage analysis in probable and possible late-onset Alzheimer disease (LOAD) (A) and in probable LOAD only (B). In A, loci at 2p25.3, 3q28, 7p21, and 14q23 achieved logarithm of odds (LOD) scores greater than 1.0; all chromosomes are included. In B, 8 loci had LOD scores greater than 1.0, and 4 of these loci exceeded LOD scores of 2.0. The highest LOD scores were at positions 2p25.3 (LOD, 2.9), 2q34 (LOD, 2.75), 3q28 (LOD, 2.75), and 6q21 (LOD, 2.1). cM indicates centimorgan.
Lee JH, Cheng R, Santana V, Williamson J, Lantigua R, Medrano M, Arriaga A, Stern Y, Tycko B, Rogaeva E, Wakutani Y, Kawarai T, St George–Hyslop P, Mayeux R. Expanded Genomewide Scan Implicates a Novel Locus at 3q28 Among Caribbean Hispanics With Familial Alzheimer Disease. Arch Neurol. 2006;63(11):1591-1598. doi:10.1001/archneur.63.11.1591
To identify novel candidate regions for late-onset Alzheimer disease (LOAD) and to confirm linkage in previously identified chromosomal regions.
Family-based linkage analysis.
Probands with familial LOAD identified in clinics in the Dominican Republic, Puerto Rico, and the United States.
We conducted a genome scan in 1161 members primarily clinically diagnosed as having LOAD; these members were from 209 families of Caribbean Hispanic ancestry.
Main Outcome Measures
We analyzed 376 microsatellite markers with an average intermarker distance of 9.3 centimorgan. We conducted linkage analysis using possible and probable LOAD, and we performed affecteds-only 2-point linkage analyses assuming either an autosomal dominant or a recessive model. Subsequently, we conducted a multipoint affected sibling pair linkage analysis.
Two-point parametric linkage analysis identified a locus at 3q28 with a genomewide empirical P value of .03 (logarithm of odds [LOD], 3.09) in a dominant model for probable and possible LOAD. Other regions suggestive of linkage included 2p25.3 (LOD, 1.77), 7p21.1 (LOD, 1.82), and 9q32 (LOD, 1.94). Under a recessive model, we also identified loci at 5p15.33 (LOD, 1.86), 12q24.21 (LOD, 2.43), 14q22.3 (LOD, 2.53), and 14q23.1 (LOD, 2.16) as suggestive for linkage. Restricted to probable LOAD, many of these loci continued to meet criteria suggestive for linkage, as did loci at 2p25.3 (LOD, 2.72), 3q28 (LOD, 2.28), 6p21.31 (LOD, 2.19), and 7p21.1 (LOD, 2.05). APOE conditional analysis indicated that the observed linkage at 3q28 was independent of the APOE ε4 allele. Multipoint nonparametric affected sibling pair linkage analysis provided confirmation of suggestive linkage for most, but not all, loci.
Seven loci with LOD scores greater than 2.0 were identified among multiple affected Caribbean Hispanic families with LOAD. The highest LOD score was found at chromosome 3q28. At least 2 other independent studies have observed support for significant linkage at chromosome 3q28, highlighting this region as a locus for further genetic exploration.
The ε4 variant of the apolipoprotein E (APOE) gene remains the only known genetic risk factor associated with late-onset Alzheimer disease (LOAD).1 Daw et al2 predicted that there may be as many as 4 additional genetic variants that influence the age at onset of LOAD. Several genomewide genetic linkage surveys also suggest additional LOAD loci.3- 16 Despite these efforts, to our knowledge, no single gene has been found to show consistent associations in multiple data sets. Nevertheless, broadly overlapping loci conferring modest susceptibility to LOAD have been reported in families from North America or Europe on chromosomes 12p11 to 12q13,13,16,17 10q21 to 10q25,4,6,18 and 9p21 to 9p22.12,14,19,20 Within these regions, analyses have implicated several candidate genes, but most lack confirmation in independent studies or their replication has been inconsistent. The susceptibility locus for complex traits is often difficult to replicate because the number of families included in the follow-up study is too few.21 Nevertheless, replication of linkage or association remains the critical step in the validation of genetic studies.
In a previous genomewide study of Caribbean Hispanic families,10 confirmatory evidence for linkage to chromosome 12p22 and 10q,10 and evidence for a novel locus on 18q,10 were reported. In the present report, we describe the second phase of this study, with the results of a follow-up genome scan that included additional families of the same ethnic origin.
We included 1161 individuals from 209 Caribbean Hispanic families participating in a family study of AD. Of those families, 101 were included in the earlier study, whom we refer to as phase 1 families10; for the second scan, we added 108 families that included 611 individuals, whom we refer to as phase 2 families (Table 1). Three families from the first genome scan study were not included because many family members lacked adequate DNA for a genome scan at that time. There were approximately 5 members within each collected family. The sampling design and detailed characteristics of the participants have been described elsewhere.23
A physician, typically a gerontologist (M.M.), internist (R.L.), or neurologist (R.M.), examined all patients and participating family members and obtained blood at the examination. To be included in the study, the proband and a living sibling were required to meet National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Associations’24 criteria for probable or possible AD. The Clinical Dementia Rating Scale was used to rate disease severity.25 Brain imaging and other laboratory study results were reviewed when they were available, and offered when medically required for diagnosis. The battery of neuropsychological tests used was developed and evaluated extensively in Hispanics.26- 28 All diagnoses were established at a consensus conference that included physicians and the neuropsychologists. Eleven patients died during the study, and subsequent autopsies confirmed the clinical diagnoses in each person.
In the present genome scan, we conducted linkage analysis using 2 phenotypes for affection: (1) possible and probable LOAD (combined) and (2) probable LOAD only (restricted). Individuals with other forms of dementia or with mild cognitive impairment were considered unknown.
The institutional review board of Columbia-Presbyterian Medical Center and Columbia University Health Sciences and the Bioethics National Committee for Research in the Dominican Republic approved the study. We obtained informed consent from the participants directly or from a family member (surrogate) when the individual had dementia.
A total of 376 autosomal microsatellite markers with an average intermarker distance of 9.3 centimorgan were genotyped at the Center for Medical Genetics, Marshfield Medical Research Foundation, Marshfield, Wis. Marker heterozygosity ranged from 0.53 to 0.92 (average, 0.77). Maps from the Marshfield Medical Research Foundation (http://research.marshfieldclinic.org/genetics/) and the Ensembl (http://www.ensembl.org/index.html) were used for locus order and intermarker distance. Order was confirmed by comparison with the UCSC Genome Browser (http://genome.ucsc.edu/).
We first checked and corrected nonpaternity problems using computer software (RELATIVE29 and RELTEST30). We checked for mendelian inconsistencies in marker data using other computer software (PEDCHECK).31 We estimated allele frequencies from data that included all genotyped individuals using a computer program (SIB-PAIR; http://www2.qimr.edu.au/davidD/sib-pair.html). APOE genotyping was performed as described in a previous study, with slight modifications.32,33
To simplify the presentation, we report the results from the combined phase 1 and phase 2 families using the combined and restricted LOAD phenotypes.
We conducted 2-point analyses using a computer program (MLINK) from a computer package (FASTLINK),34 producing 2-point logarithm of odds (LOD) scores. Because the mode of inheritance for LOAD is unknown, we tested dominant and recessive transmissions under an affecteds-only model.35 We assumed genetic parameters used in other similar genomewide linkage analyses: a disease allele frequency of 0.001 and a penetrance of 0.001 for gene carriers and 0 for noncarriers.
We also repeated the 2-point analysis, conditional on the APOE ε4 allele. For this purpose, we considered an individual with LOAD and at least 1 APOE ε4 allele affected. An individual was considered unknown if the individual had AD but did not have at least 1 APOE ε4 allele. Unaffected individuals were coded as unaffected regardless of their APOE status (ε4-positive conditional analysis). To determine the effect of a locus, independent of APOE, we conducted an analogous conditional linkage analysis, in which an individual with LOAD was considered affected in the absence of an APOE ε4 allele (ε4-negative conditional analysis). This conditional analysis eliminated the need to stratify families into APOE ε4 positive or negative families.
We also conducted multipoint nonparametric affected sibling pair linkage analysis using computer software (GENEHUNTER, version 2.1).36,37 Specifically, we used the weighted “all pairs” option and set the increment function to scan at a distance of 1.0 centimorgan throughout the genetic map. Affected sibling pair analysis calculates the probabilities of sharing 0, 1, or 2 alleles (z0, z1, or z2, respectively) identical by descent between sibling pairs, because loci that are involved in susceptibility to a trait will have probabilities (z0, z1, and z2) that differ from the expected mendelian proportions.38,39 Under the assumption of dominance variation, we limited the sharing probabilities to the “possible triangle,” as described elsewhere,40,41 to ensure biological consistency. This is defined as follows: z0 + z1 + z2 = 1, z1 ≤ 0.5, and z1 ≥ (2 × z0).
To adjust for multiple testing, we computed empirical P values using the genotyping information and marker allele frequencies from the data to simulate genotypes under the assumption of no linkage. We simulated 100 replicates with probable and possible LOAD as the phenotype using computer software (SIMULATE).42 We then used a computer program (MLINK) to analyze the simulated data sets under the same analytical models as we applied in the original data set. Based on our empirical data, suggestive linkage was defined as a 2-point LOD score of 1.77 (dominant) or 1.79 (recessive), significant linkage was defined as an LOD score of 2.81 (dominant) or 2.67 (recessive), and highly significant linkage was defined as an LOD score of 3.73 (dominant) or 3.59 (recessive). These definitions were based on guidelines suggested by Lander and Kruglyak,43 in which suggestive linkage was defined as statistical evidence that the linkage would occur once at random in a genome scan, significant linkage was defined as statistical evidence that linkage would occur once in 20 such genome scans or P<.05 (ie, empirical P=.05), and highly significant linkage was defined as statistical evidence that linkage would occur once in 100 scans or P<.01 (ie, empirical P=.01).
Among the 1161 family members in the 209 families, 65.9% were women (Table 1). The average age at onset was 73.2 years, and 50.4% of the participants met the criteria for probable or possible LOAD. Other forms of dementia or no consensus diagnosis were determined in 7.4% of family members.
Compared with the phase 1 set of 101 families, the phase 2 set of 108 families had a similar age at onset and there were no differences in the proportion of women. There were differences in the proportion of affected family members and in those with probable or possible LOAD (Table 1), but the number of individuals per family was approximately the same. The frequency of APOE ε4 did not differ between the first and second set of families.
Four microsatellite markers achieved LOD scores that were suggestive for linkage in the combined families in the dominant model, and 3 were suggestive for linkage when the phenotype was restricted to probable LOAD (Table 2). While markers at 2p25.3 and 3q28 were suggestive in both dominant models, only the marker at 3q28 met the criteria for significant linkage (Table 2). In the recessive model, 5 markers had LOD scores suggestive for linkage in the combined phenotypes and 3 markers were suggestive using the restricted phenotype. Of these markers, only 2q25.3 and 3q28 also met the criteria for significant linkage and were either suggestive or significant for linkage regardless of the phenotype definition or the model specified. Additional data available (eTable) show the locations of 26 microsatellite markers with LOD scores greater than 1.0 in the phase 1 and phase 2 families and by the diagnoses (probable and possible LOAD, or probable LOAD only). Review of the eTable also shows the individual LOD scores for each model and each phenotype definition in the phase 1 and phase 2 families.
We repeated the analysis conditioning on the presence or absence of an APOE ε4 allele (Figure 1). The LOD scores for the APOE conditional analysis were much lower than those from the unadjusted analysis. The APOE conditional model specifically tests for a “joint effect” of the marker and the APOE ε4 allele. This results in a reduction of linkage score in the unadjusted model, which indicates the independent effects of the marker. Among markers with LOD scores greater than 1.0 in Table 2, 2 (D6S1051 and D14S750) had APOE conditional LOD scores greater than 1.0, suggesting that APOE influence on most of the markers was weak. A few markers (eg, D3S2387, D8S592 and D22S532) were significant for APOE conditional analysis only.
Four loci (2p25.3, 3q28, 7p21.1, and 14q23.1) achieved LOD scores greater than 1.0 (Figure 2A). These loci corresponded well with the results of the single-point analysis previously described. When the diagnosis was restricted to include only probable LOAD, the results differed (Figure 2B). Eight loci had LOD scores greater than 1.0, and 4 of these loci exceeded LOD scores of 2.0. The highest LOD score was on chromosome 2 at 2p25.3 (LOD, 2.9), which was significant for linkage. Additional loci suggestive for linkage included 2q34 (LOD, 2.75), 3q28 (LOD, 2.75), and 6q21 (LOD, 2.1). There were slight, but no significant, differences in the phase 1 and phase 2 families compared with the combined families (data not shown).
To model genetic heterogeneity, we conducted heterogeneity LOD (HLOD) analysis for the chromosomal loci with substantial evidence of linkage, namely, chromosomes 2, 3, 6, 7, 12, and 14. Overall, the HLOD scores for probable LOAD (range, 0.59-2.34) were higher than those for probable and possible LOAD (range, 0.09-3.64). However, the highest HLOD of 3.64 was observed at 7p (28 centimorgan) for possible and probable LOAD. Individual plots for the multipoint linkage analysis for all 22 chromosomes and the plots for HLOD analysis for chromosomes 2, 3, 6, 7, 12, and 14 are available on request.
During the past 10 years, 10 genomewide linkage studies3,5,7,9,10,12- 15,19 of LOAD have been published. These studies suggest that the most consistent evidence for linkage for LOAD occurs at 6p, 9q, 10q, 12p, and 19q.44 Linkage at 19q most likely represents APOE, but other genetic variants may reside at this locus as well.45 With the exception of APOE, investigators have encountered difficulty in refining the exact locations for each of the putative loci identified in genomewide scans. This difficulty has been attributed to clinical and genetic heterogeneity, population variability, and sample size requirements for replication of initial linkage findings. Several attractive candidate genes have been identified within these intervals, including α-2-macroglobulin due to its ability to mediate the clearance and degradation of amyloid β, the major component of β-amyloid deposits; catenin (cadherin-associated protein), α-3 (CTNNA3) binds with β catenin, which interacts with presenilin 1 (PSEN1); plasminogen activator, urokinase (PLAU), which converts plasminogen to plasmin and may be involved in amyloid precursor protein processing; insulin-degrading enzyme (IDE), a protease involved in the termination of the insulin response and cleavage of amyloid β; gluthathione S-transferase omega-1 (GSTO1) involved in apoptosis and the inflammatory response; glutathione S-transferase omega-2 (GSTO2) involved in oxidative stress and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) involved in apoptosis and binds to amyloid precursor protein and amyloid β, but independent replication remains inconsistent.46- 54
For this study, we augmented a previous genomewide scan10 by doubling the number of families included. The cohort represents a unique collection of families from the Dominican Republic and Puerto Rico. It was previously reported that LOAD is more frequent in this population than in non-Hispanic populations of European or US ancestry55,56 and that the risk of LOAD associated with APOE ε4 in Caribbean Hispanics is lower than in non-Hispanic populations in Europe or the United States.57
We have identified several loci meeting genomewide criteria for suggestive linkage and 2 loci (2p25.3 and 3q28) that would meet the criteria for significant linkage according to the criteria suggested by Lander and Kruglyak43 for LOAD among multiple affected Caribbean Hispanic families. The highest LOD score was at 3q28, meeting genomewide criteria for significant linkage in the 2-point analysis and for suggestive linkage in the multipoint analysis. The same conclusion was reached when a more restrictive diagnosis was used in the follow-up analyses. For most markers with significant or suggestive linkage, including the markers at 3q28, the influence of the ε4 allele of APOE was minimal.
Noticeably positive scores with markers from 3q28 have been previously reported. Hiltunen et al8 used linkage disequilibrium mapping to associate specific microsatellite markers with familial LOAD in a case-control study in the geographically distinct population of Finland. The association reached a genomewide significance level of P<.05 and a haplotype that included several markers near 3q28 associated with LOAD. More recently, a genetic linkage study9 of dementia in the Amish reported LOD scores of 1.89 and 2.16 at position 3q28, which were among the highest LOD scores in that investigation. Candidate genes at 3q28 include somatostatin (SST) and an autosomal homologue of the fragile X mental retardation gene (FXR1). It is remarkable that 3 independent studies of LOAD, a case-control study and 2 family-based studies, of individuals from different ethnic backgrounds have identified the same locus as possibly harboring a genetic variant that increases LOAD susceptibility. Clearly, additional genetic investigations, including fine mapping of this region, may lead to the identification of the LOAD-linked genetic variant.
Less striking was the suggestive linkage at 5p15.33 in our study. In the 2-point analysis, the LOD score was 1.5 (dominant model) and 1.9 (recessive model). Pericak-Vance et al14 reported an LOD score of 2.2 (recessive model) and Hiltunen et al8 found a strong association with marker D5S807 in their Finnish case-control study. However, the LOD score in our study decreased below 1.0 when the analysis was restricted to probable LOAD. Additional genetic mapping will be required to determine whether this locus is important.
Among the suggestive linkage observed at 2p25.1, 6p21.31, 7p21.1, 10q26.2, and 14q23.1, only the chromosome regions containing 6p21.31 and 14q23.1 have been previously reported. By using the National Institute of Mental Health families, Blacker et al5 found suggestive linkage at 6p21.1 with marker D6S1017, as did Hiltunen et al8 in the case-control association in a Finnish cohort. Supplementing the National Institute of Mental Health families with families from the Indiana Alzheimer Disease Center National Cell Repository (now referred to as the National Cell Repository for Alzheimer Disease), Myers et al12 also reported suggestive link age that was significant only among individuals with an APOE ε4 allele. At least 2 studies3,5 have reported suggestive linkage at 14q22, which is proximal to PSEN1. PSEN1 is involved in γ-secretase activity, resulting in proteolytic cleavage of amyloid precursor protein. Mutations in PSEN1 are the most frequent cause of early-onset, familial AD. However, it is unlikely that an allelic variant in the promoter region of PSEN1 represents linkage to 14q22.3, even though CC homozygosity in the promoter of PSEN1 was associated with LOAD in 1 study.58 We are not aware of prior reports suggesting linkage at 2p25. While this result may represent a false positive, it is notable that the LOD score for this locus increased when the diagnosis was restricted to probable LOAD and was not modified in the APOE conditional analysis. Zubenko et al59 found suggestive linkage at 12q24, as we observed in the 2-point recessive model. Despite several reports of linkage,13,17,22,60 to our knowledge, no genetic variant has been identified. Furthermore, there seem to be 2 independent AD loci at 12p and 12q.5,12,13,19,22
Unlike previous reports,5,12,13,19 we did not find evidence for linkage at 9p, where several other reports have provided support for linkage. Differences in the ethnic background of the population we have studied and the possibility of variability in genetic markers used may partially explain the absence of linkage. Alternatively, the markers we used may not have been close enough to the sites to detect the linkage signal reported by other groups. For example, we observed no evidence for linkage at 19q13, near the APOE locus. However, a strong association was previously detected between LOAD and the APOE ε4 allele in our familial AD data set of Caribbean Hispanics.23
We observed differences in the results for multipoint linkage analysis under 2 different definitions, namely, combined (probable and possible) LOAD vs probable LOAD only, illustrating the difficulties of gene identification in complex neuropsychiatric disorders. Closer examination of our results revealed that the restricted definition (probable LOAD only) reduced the number of affected individuals in many families. The excluded families in this restricted analysis had contributed negative scores toward the overall support for linkage in the combined (probable and possible LOAD) phenotype. One possible explanation for decreased LOD scores in analyses with the less restrictive diagnostic definition is a higher level of genetic heterogeneity. Support for this view came from our multipoint HLOD analysis, which showed results that were substantially higher for several chromosomes. However, that multipoint analysis was consistent with the 2-point analyses and is unlikely to be the main reason for the discrepancy.
This report describes the results of a genomewide screen of 209 families with multiple affected family members of Caribbean Hispanic ancestry that included 1161 individuals. Despite the fact that ethnic background differs from previously published studies, many of the linkage signals overlap with those reported from studies of whites in the United States and Western Europe. On the other hand, this report differs from other genetic linkage studies in that we observed little evidence for linkage at 9p, 10q, or 19q. Given the strength of the linkage signal in the present study and others, 3q28 is a promising chromosomal region for further genetic exploration.
Correspondence: Richard Mayeux, MSc, MD, Gertrude H. Sergievsky Center, Columbia University, 630 W 168th St, New York, NY 10032 (email@example.com).
Accepted for Publication: May 12, 2006.
Author Contributions:Study concept and design: Lee, St George–Hyslop, and Mayeux. Acquisition of data: Santana, Williamson, Lantigua, Medrano, Stern, Tycko, St George–Hyslop, and Mayeux. Analysis and interpretation of data: Lee, Cheng, Arriaga, Stern, Rogaeva, Wakutani, Kawarai, St George–Hyslop, and Mayeux. Drafting of the manuscript: Lee, Williamson, Medrano, Arriaga, Rogaeva, Wakutani, Kawarai, St George–Hyslop, and Mayeux. Critical revision of the manuscript for important intellectual content: Lee, Cheng, Santana, Lantigua, Arriaga, Stern, Tycko, Rogaeva, Wakutani, St George–Hyslop, and Mayeux. Statistical analysis: Lee, Cheng, Arriaga, and Mayeux. Obtained funding: Rogaeva, St George–Hyslop, and Mayeux. Administrative, technical, and material support: Santana, Williamson, Lantigua, Stern, Kawarai, and St George–Hyslop. Study supervision: Lee, Santana, Medrano, Stern, Tycko, and St George–Hyslop.
Financial Disclosure: None reported.
Funding/Support: This study was supported by grant R37 AG15473 from the National Institutes on Aging, National Institutes of Health; grant HV48141 from the National Heart, Lung, and Blood Institute; the Canadian Institutes of Health Research; and the Charles S. Robertson Gift for Research on Alzheimer's Disease from the Banbury fund.
Additional Information: The eTable is available.