Daoud H, Valdmanis PN, Gros-Louis F, Belzil V, Spiegelman D, Henrion E, Diallo O, Desjarlais A, Gauthier J, Camu W, Dion PA, Rouleau GA. Resequencing of 29 Candidate Genes in Patients With Familial and Sporadic Amyotrophic Lateral Sclerosis. Arch Neurol. 2011;68(5):587–593. doi:10.1001/archneurol.2010.351
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease characterized by the selective loss of motor neurons in the spinal cord and motor cortex, resulting in progressive muscle weakness and atrophy. It typically leads to death within 3 to 5 years following onset.1
Among cases of ALS, most are sporadic ALS (SALS), while approximately 5% of patients have a positive family history with clear mendelian inheritance (familial ALS [FALS]). Mutations in the superoxide dismutase 1 gene (SOD1) are the most common known cause of ALS. They account for 15% to 20% of cases of FALS, which is most frequently inherited in an autosomal dominant manner, and 1% to 2% of all cases.2 While several mechanisms have been proposed to explain the toxic effects of mutant SOD1, the mechanism by which SOD1 leads to selective motor neuron death remains unclear.3 More recently, mutations in TARDBP and FUS, encoding 2 multifunctional DNA or RNA binding proteins, were identified in patients with FALS4- 7 and were subsequently reported in patients with SALS.8,9 Several other rare causative genes linked to FALS have been identified, including ALS2, SETX, VAPB, ANG,10 and, more recently, OPTN.11
Although the discovery of these genes has led to significant new insights into the causes of ALS, the basic pathogenic mechanism and the genetic cause of most ALS cases remain unknown. This emphasizes our need to identify additional ALS-causing genes so that we can better understand the pathogenesis of this disease.
A set of genes specifically expressed at different stages of development in mouse corticospinal motor neurons has been identified, including 29 biologically relevant genes.12 These genes are of particular interest as they might be involved in different aspects of corticospinal motor neuron development and are likely to govern the ability of developing brain cells to connect properly to the spinal cord, which may have significant relevance to ALS. Four of these 29 genes are located within or near previously identified loci for ALS and 1 of these (NEFH) has previously been implicated in ALS,13,14 suggesting that these newly identified genes may be good candidates for ALS.
Therefore, in an effort to identify novel ALS disease-causing genes, we have selected human orthologs for these 29 corticospinal motor neuron mouse genes and carried out a mutation screening of the entire coding regions by direct sequencing in 190 unrelated patients with ALS (80 with FALS and 110 with SALS).
A total of 190 patients with ALS (80 with FALS and 110 with SALS) and 190 unrelated, neurologically healthy individuals matched for age and ethnicity were included in this study. All individuals were recruited through clinics in France and Quebec, Canada. All patients were diagnosed with probable or definite ALS according to El Escorial criteria.15 The clinical characteristics of this cohort are summarized in Table 1. All patients with FALS tested negative for mutations in SOD1, TARDBP, FUS, VAPB, and ANG. Informed written consent was obtained from each participant, and the study was approved by the ethics committees and institutional review boards of the relevant institutions. Blood samples were obtained from patients and control subjects, and genomic DNA was extracted from peripheral blood cells using standard methods.
We identified the full messenger RNA sequences of all potential transcriptional isoforms and determined the exon and intron structures based on alignment with the UCSC Genome Browser (Hg18 build; http://genome.ucsc.edu/) (Table 2). These genes have been classified into 5 different groups based on expression profiles suggestive of a specific role in corticospinal motor neuron development. Primers were designed using the ExonPrimer software (http://ihg2.helmholtz-muenchen.de/ihg/ExonPrimer.html) from the UCSC Genome Browser to amplify 200- to 700–base pair (bp) fragments that cover coding exons plus 50 bp from each of the flanking introns. Polymerase chain reaction (PCR) was performed in 384-well plates using 5 ng of genomic DNA and AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, California) as per the manufacturer's instructions. The PCR products were sequenced on a 3730XL DNA analyzer (Applied Biosystems). A fragment was considered successfully sequenced if the sequence quality of more than 90% of the traces was sufficient to permit analysis.
Sequence variants were called automatically using PolyPhred version 5.04 (http://droog.gs.washington.edu/polyphred) and PolySCAN version 3.0 (http://genome.wustl.edu/tools/genome_center_software/polyscan). We verified all identified mutations by manual inspection of the trace files using Mutation Surveyor software (SoftGenetics, State College, Pennsylvania). All potential truncating variants were further investigated by resequencing the appropriate sample from a newly amplified PCR product using forward and reverse primers. For variants identified in FALS cases, we tested the cosegregation by sequencing the corresponding fragment in available additional family members. For novel nonsynonymous variants, we resequenced the corresponding fragment in 190 control subjects. Finally, the potential disruptiveness of missense mutations as well as the conservation of the altered amino acids were analyzed by bioinformatics programs PANTHER (http://www.pantherdb.org/tools/csnpScoreForm.jsp), PolyPhen (http://genetics.bwh.harvard.edu/pph/), and SIFT (http://sift.jcvi.org/www/SIFT_BLink_submit.html) (eAppendix).
A total of 278 PCR primer pairs were designed to amplify the coding exons of the 29 genes included in this study. The PCR conditions could be optimized for 257 amplicons, corresponding to an average success rate of 92.45% (eTable 1). Given that the overall success rate was 92.45% and that we screened 15 168 codons (Table 2), we estimate the amount of coding sequence screened per sample to be 42 kilobases and the total amount of coding sequence screened in our ALS cohort to be approximately 8 megabases.
We focused our analysis on coding variants because coding exons are believed to harbor a substantial amount of functional variations that cause mendelian diseases.16 We identified a total of 142 different coding variants, of which 78 were synonymous changes and 64 were nonsynonymous changes (Table 3 and eTable 2). No variants that affect highly conserved consensus splice sites and no insertions and deletions were found. Synonymous and nonsynonymous variants were found in approximately equal proportion, which is consistent with the genetic variations recently found following complete sequencing of 12 human exomes.17
Among the 64 nonsynonymous variants identified, 24 were previously described in the dbSNP database (National Center for Biotechnology Information, Bethesda, Maryland) and were not considered further (eTable 2). The other 40 novel nonsynonymous variants are reported in Table 4. Of these, 10 were found in control subjects, suggesting that these variants are not associated with ALS. Among the remaining 30 variants, 21 have been found exclusively in SALS cases, 8 have been found exclusively in FALS cases, and only 1 has been found in both SALS and FALS cases. On the other hand, 10 genes exhibited no variant, 10 genes had only 1 variant each, and 9 genes had more than 1 variant.
We identified 21 different nonsynonymous variants in SALS cases, including 20 missense variants and 1 nonsense variant (Table 4). Four missense variants (Arg65Cys, Gly113Arg, Arg246Trp, and Glu367Gln) were found in the CDH13 gene, which encodes a calcium-dependent cell-cell adhesion glycoprotein. None of these affect a functional domain of the protein, and only Arg246Trp is predicted to be damaging by the bioinformatics programs.
Two missense variants (Ile234Thr and Pro578Leu) as well as 1 nonsense variant (Arg1191X) were found in the DIAPH3 gene. The Ile234Thr variant affects the guanine triphosphatase–binding domain, which is required for the protein activation but is not predicted to affect the protein function. The Pro578Leu variant affects a proline residue located in a proline stretch and is predicted to affect the protein function only by PANTHER and PolyPhen. The Arg1191X variant is the only nonsense variation identified in this study and removes the last 3 amino acids of this protein.
Two missense variants (Lys867Asn and Glu918Gly) in the NEFH gene, encoding the heavy neurofilament protein, were identified in 2 patients with SALS. These do not affect a conserved residue, and neither is predicted to affect the protein function.
Two missense variants were identified in each of the OMA1, RAMP3, and CDH22 genes. The His65Tyr and Glu272Gly variants found in OMA1 are predicted to be tolerated and do not affect highly conserved residues. Likewise, in RAMP3, the Leu111Ser and Glu117Lys missense variants do not affect a functional domain of the encoded protein and are predicted to be tolerated. By contrast, the Glu92Lys and Thr134Met missense variants found in CDH22 affect highly conserved residues located in the first cadherin domain, and the Thr134Met missense variant is predicted to be damaging by PANTHER and PolyPhen. Of the 6 remaining missense variants identified in patients with SALS, only the Arg169Cys variant found in the CRYM gene, encoding a taxon-specific crystallin protein that binds thyroid hormone, is predicted to be damaging by all 3 programs used and affects a highly conserved residue, which is conserved in 34 of 35 vertebrates from humans to fish.
Eight missense variants were identified in patients with FALS (Table 4). We were unfortunately unable to test the cosegregation of 4 of these missense variants (Ala103Val in CDH13, Glu32Val and Pro229Ser in BCL11B, and Arg346His in NEFH) as no additional family members were available. Three missense variants (Gly188Asp in FEZF2, Glu954Val in CNTN6, and His507Tyr in GRB14) were also found in unaffected siblings, suggesting that these are not associated with ALS in these families. The Arg1042His missense variant identified in DIAPH3 was also found in the only available unaffected brother, suggesting that this missense variant could be causative with incomplete penetrance or could be unrelated to the patient's ALS phenotype.
Only 1 missense variant (Leu199Pro) found in LUM was identified in 1 patient with FALS and 3 patients with SALS. Unfortunately, we were unable to test the cosegregation of this mutation in the familial case as no additional family members were available. This missense variant affects a residue located within a leucine-rich repeat domain, is highly conserved, and is predicted to be deleterious by all 3 programs.
Finally, no variation was found in C5ORF23, CD55, LBD2, STK39, TMEM163, OPN3, NTNG1, IGFBP4, ITM2A, and S100A10, suggesting that these genes may not have a genetic role in ALS.
Given that nonsynonymous variants are more likely to have an effect on protein function and that synonymous variants are likely to represent neutral benign polymorphisms, we evaluated the distribution of nonsynonymous and synonymous variants in our ALS cohort by classifying them into 2 categories. The first category includes novel as well as known variants that were previously reported in the dbSNP database. The second category includes unique variants that were observed in only 1 patient with ALS as well as frequent variants. Of the 142 coding sequence variants, 82 (57.7%) were novel. However, the distribution of novel vs known variants was not significantly different between nonsynonymous and synonymous variants (P = .19). On the other hand, the grouping of variants into unique and frequent variants revealed a significant difference (P = .01) in the distribution of nonsynonymous and synonymous variants toward the identification of more unique nonsynonymous variants than expected (Table 3).
We report here the resequencing of 29 candidate genes in a cohort of 190 patients with ALS. We identified 142 coding variants and prioritized and genotyped those most likely to have a detrimental effect. To identify mutations in genes that cause or predispose to ALS, we focused our analysis on nonsynonymous variants as these are more likely to have a substantial effect on protein function. We identified 40 novel nonsynonymous variants, including 39 missense variants and 1 nonsense variant. Most importantly, we showed a significant excess of unique nonsynonymous variants compared with unique synonymous variants in our cohort of patients with ALS. This excess suggests that these unique nonsynonymous variants may include ALS-predisposing mutations because converging evidence shows that a large number of individually rare, highly pathogenic mutations cause ALS.4,8,9 However, we believe that only a small fraction of these unique nonsynonymous variants identified here are likely to predispose to ALS. Thus, the identification of ALS genes in which unique missense variants are responsible for the disease is challenging. We therefore adopted a multifaceted approach based on the functional prediction of missense variants, the conservation of the altered amino acid, and the cosegregation of the variants identified in familial cases to highlight missense variations potentially involved in ALS. Genes with missense variants that also scored highly based on these secondary criteria were considered excellent candidates. Using these combined strategies, we were able to identify genes with possible involvement in ALS.
The LUM gene could be considered an excellent candidate for ALS as the Leu199Met missense variant was identified in 4 patients with ALS but not in 190 control subjects, affects a highly conserved residue located within a leucine-rich repeat domain, and is predicted to be deleterious based on bioinformatics programs. This gene encodes lumican, a keratan sulfate proteoglycan involved in collagen fibril organization, epithelial cell migration, and tissue repair.18 Based on our combined approach, the LUM gene constitutes a promising candidate for ALS; however, screening of this gene in additional cases will help determine its role in ALS pathogenesis.
We identified 5 different missense variants in the CDH13 gene, which is a member of the cadherin superfamily and encodes a calcium-dependent cell-cell adhesion glycoprotein.19 We were unable to assess the cosegregation of the Ala103Val variant found in a patient with FALS as no additional family members were available. The other missense variants were found in SALS cases; among these, only the Arg246Trp variant is predicted to be deleterious. Therefore, mutations in this gene may play a role in ALS, but further investigation is warranted.
Three missense variants and 1 nonsense variant were identified in the DIAPH3 gene, which encodes a protein involved in actin cytoskeleton reorganization. The Arg1042His variant was found in a patient with FALS and an unaffected brother who had passed the average age at onset for ALS, suggesting that this variant could represent a partially penetrant allele or could simply be a rare benign polymorphism. The 2 other missense variants, Ile234Thr and Pro578Leu, were found in patients with SALS and affect the guanine triphosphatase–binding domain and a proline stretch, respectively. The Ile234Thr variant is predicted to be tolerated, whereas Pro578Leu is predicted to be deleterious by PANTHER and PolyPhen. The Arg1191X nonsense variant was found in a patient with SALS and occurs at the C-terminal part of the protein, removing the last 3 amino acids. However, we do not believe this variant is pathogenic as another truncating variant (Leu1190X) was identified in 4 control subjects. Nonetheless, screening of this gene in additional patients is needed to determine its role in ALS.
Four missense variants were also found in the NEFH gene. Three were found in patients with SALS and 1 was found in a patient with FALS. No missense variants found in patients with SALS (Ala40Val, Lys867Asn, and Glu918Gly) are predicted to affect the protein function. We were unable to assess the cosegregation of the Arg346His variant found in the patient with FALS as no additional family members were available. Nevertheless, deletions of NEFH were previously identified in both patients with FALS and those with SALS,13,14 and the transgenic mice that overexpress NEFH develop motor neuron pathology.20 Altogether, this suggests that point mutations in NEFH may also play a role in the pathogenesis of ALS in some cases.
Two missense variants were identified in each of the BCL11B, OMA1, RAMP3, and CDH22 genes. In BCL11B, 2 missense variants were found in 2 patients with FALS, but we were unable to assess the cosegregation as no family members were available. In the 3 other genes, missense variants were identified in patients with SALS, and none are predicted to be damaging by the 3 bioinformatics programs. Still, sequencing of these genes in additional patients with ALS is needed to help determine their role in ALS.
Finally, among the remaining missense variants identified in SALS cases, only Arg169Cys found in the CRYM gene scored high based on our combined strategy as it affects a highly conserved residue and is predicted to be damaging by the 3 programs. CRYM encodes the nicotinamide adenine dinucleotide phosphate–regulated thyroid hormone–binding protein, a taxon-specific crystallin protein that binds thyroid hormone for possible regulatory or developmental roles. Mutations in CRYM have been identified as a cause of nonsyndromic deafness.21 However, increased expression of CRYM has been recently observed in microglial cells of a transgenic mouse carrying the Leu126delTT mutation in SOD1, which exhibited distinct ALS-like motor symptoms and pathological findings.22 This overexpression was not found in control mice, suggesting a link with the pathogenesis of FALS. Thus, the Arg169Cys missense variant identified here might be considered as contributing to ALS. However, as for the other missense variants identified in SALS cases, we are aware that it is still difficult to claim this missense variant causes disease and that its pathogenicity regarding ALS remains to be established.
In summary, we have sequenced 29 candidate genes in 190 patients with ALS with an average success rate of 92.45%. This rate constitutes a limitation of our study as we could have missed some potentially important variants. Nevertheless, the analysis of these genes led to the identification of promising novel ALS genes such as LUM and CRYM. Moreover, this study also highlights some analytical challenges of large-scale sequencing screens to detect disease-causing variants. Consequently, all promising variants identified through this study require careful evaluation and cannot be regarded as disease causing on their own. Further screening and functional studies of these variants are needed to confirm their implication in ALS. As the sequencing of the complete human genome or exome in large numbers of individuals has been shown to be feasible,23,24 the use of such approaches in the field of ALS could accelerate the identification of novel ALS genes. Overall, this could open new avenues for research into the pathogenesis of this disease and offer leads to the development of new treatment strategies.
Correspondence: Guy A. Rouleau, MD, PhD, FRCPC, Centre of Excellence in Neuromics, Centre hospitalier de l’Université de Montréal Research Center, 2099 Alexandre De-Seve St, Montreal, QC H2L 2W5, Canada (email@example.com).
Accepted for Publication: October 21, 2010.
Published Online: January 10, 2011. doi:10.1001/archneurol.2010.351
Authors Contributions:Study concept and design: Daoud, Gros-Louis, Dion, and Rouleau. Acquisition of data: Daoud, Valdmanis, Spiegelman, Henrion, Diallo, Desjarlais, and Gauthier. Analysis and interpretation of data: Daoud, Valdmanis, Belzil, Spiegelman, and Camu. Drafting of the manuscript: Daoud. Critical revision of the manuscript for important intellectual content: Valdmanis, Gros-Louis, Belzil, Spiegelman, Henrion, Diallo, Desjarlais, Gauthier, Camu, Dion, and Rouleau. Obtained funding: Gros-Louis and Rouleau. Administrative, technical, and material support: Daoud, Valdmanis, Belzil, Spiegelman, Henrion, Diallo, Desjarlais, Gauthier, and Camu. Study supervision: Dion and Rouleau.
Financial Disclosure: None reported.
Funding/Support: This study was supported by the Muscular Dystrophy Association and the ALS Association. Dr Daoud is supported by a postdoctoral fellowship and Ms Belzil is supported by a doctoral fellowship from the Canadian Institutes of Health Research. Dr Rouleau holds a Canada Research Chair in Genetics of the Nervous System and the Jeanne-et-J-Louis-Lévesque Chair in Genetics of Brain Diseases.
Role of the Sponsors: The sponsors had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Contributions: We thank the patients and their families involved in this study. Annie Levert, Annie Raymond, BSc, Pascale Thibodeau, BSc, Karine Lachapelle, Philippe Jolivet, and Sylvia Dobrzeniecka, MSc, provided technical assistance.