Procedures for microarray-based resequencing. A robotics system was introduced to manipulate numerous polymerase chain reactions (PCR). SAPE indicates streptoavidin-phycoerythrin; B, biotin.
Whole view of microarray TKYALS01. A heterozygous point mutation of SOD1, L106V, was unambiguously identified.
Validation of accuracy of sequence determination. A, Scan images of homozygous and heterozygous SOD1 L126S mutations. Each column shows a base position, and each row shows a base call (the determination of base(s) at each position). In the center position, the base call was T in the control, whereas in the patients, the base calls were homozygous C (upper panel) or heterozygous C/T (middle panel). B, Scan image of hemizygous ABCD1 del2146-2157 mutation. Signals of the deletion site and surrounding probes were virtually undetectable. C, Scan image of heterozygous SOD1 L129 (del459TT) mutation. The base calls at the mutation site of the patient were the same as those of the control. The probes at and around the mutation site, however, showed decreased signal intensities compared with those of the controls. D, Signal intensities at and around the deletion site of heterozygous SOD1 L129 (del459TT) mutation. Signal intensities at the deletion site (del TT) and the flanking bases were approximately half those of the controls. bp indicates base pairs.
Novel mutations identified in patients with familial and sporadic amyotrophic lateral sclerosis (ALS). A, Scan image of heterozygous SOD1 K3E point mutation. The A to G heterozygous point mutation was identified in a patient with familial ALS. B, Scan image of heterozygous DCTN1 R997W point mutation. The C to T heterozygous point mutation was identified in a patient with sporadic ALS. C, Conservation of DCTN1 amino acid sequences in different animal species. The arginine residue at codon 997 is highly conserved among species (shown in red), including a synonymous basic amino acid, lysine (shown in pink), in chicken and Xenopus laevis. Nonconserved amino acids are shown in green.
Summary of comprehensive resequencing of causative and disease-related genes in patients with sporadic amyotrophic lateral sclerosis (ALS). A, Classification of sequence alterations identified in this study. Thirty-three sequence alterations including 2 mutations and 31 variants were found in 1 megabase resequencing. Of the 31 variants, 9 (29%) were novel. These novel variants were listed in neither the Single Nucleotide Polymorphism database (http://www.ncbi.nlm.nih.gov/SNP/index.html) nor the Japanese Single Nucleotide Polymorphism database (http://snp.ims.u-tokyo.ac.jp/). B, Locations of novel mutations and variants. Novel mutations, nonsynonymous variants, synonymous variants, and noncoding variants are shown in arrows with respective colors at each location in the corresponding genes. The structure of each gene was obtained from Entrez Gene (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene). * Indicates variants that were only found in patients with sporadic ALS.
Takahashi Y, Seki N, Ishiura H, Mitsui J, Matsukawa T, Kishino A, Onodera O, Aoki M, Shimozawa N, Murayama S, Itoyama Y, Suzuki Y, Sobue G, Nishizawa M, Goto J, Tsuji S. Development of a High-Throughput Microarray-Based Resequencing System for Neurological Disorders and Its Application to Molecular Genetics of Amyotrophic Lateral Sclerosis. Arch Neurol. 2008;65(10):1326-1332. doi:10.1001/archneur.65.10.1326
Comprehensive resequencing of the causative and disease-related genes of neurodegenerative diseases is expected to enable (1) comprehensive mutational analysis of familial cases, (2) identification of sporadic cases with de novo or low-penetrant mutations, (3) identification of rare variants conferring disease susceptibility, and ultimately (4) better understanding of the molecular basis of these diseases.
To develop a microarray-based high-throughput resequencing system for the causative and disease-related genes of amyotrophic lateral sclerosis (ALS) and other neurodegenerative diseases.
Validation of the system was conducted in terms of the signal-to-noise ratio, accuracy, and throughput. Comprehensive gene analysis was applied for patients with ALS.
Ten patients with familial ALS, 35 patients with sporadic ALS, and 238 controls.
The system detected point mutations with 100% accuracy and completed the resequencing of 270 kilobase pairs in 3 working days with greater than 99.9% accuracy of base calls, or the determination of base(s) at each position. Analysis of patients with familial ALS revealed 2 SOD1 mutations. Analysis of the 35 patients with sporadic ALS revealed a previously known SOD1 mutation, S134N, a novel putative pathogenic DCTN1 mutation, R997W, and 9 novel variants including 4 nonsynonymous heterozygous variants consisting of 2 in ALS2, 1 in ANG, and 1 in VEGF that were not found in the controls.
The DNA microarray–based resequencing system is a powerful tool for high-throughput comprehensive analysis of causative and disease-related genes. It can be used to detect mutations in familial and sporadic cases and to identify numerous novel variants potentially associated with genetic risks.
With recent progress in human molecular genetics, many causative genes of inherited neurological diseases have been identified. In 2007, 667 neurological diseases were registered in the Online Mendelian Inheritance in Man database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM) as diseases with identified causative genes. It should be noted that there are substantial nonallelic genetic heterogeneities in hereditary neurodegenerative diseases, including amyotrophic lateral sclerosis (ALS), Parkinson disease, Alzheimer disease, and hereditary spastic paraplegia. Thus, there is a strong demand for comprehensive mutational analysis of multiple causative genes in daily clinical practice.
Most neurodegenerative diseases are sporadic and their molecular etiologies remain unknown. Although genome-wide association studies (GWAS) using common variants of single nucleotide polymorphisms have been undertaken to identify the loci of disease-susceptibility genes, genetic risks associated with rare variants may not be captured by GWAS.1 Identification of multiple rare variants, however, would need comprehensive resequencing of candidate genes. Furthermore, sporadic diseases may be caused by de novo mutations or low-penetrant mutations in the causative genes. Taken together, development of a comprehensive resequencing system of causative genes will be indispensable, not only to provide mutational analyses of multiple causative genes for familial diseases, but also to explore the molecular basis of sporadic diseases.
A DNA microarray–based resequencing method has been invented to enable rapid and accurate nucleotide sequence analysis of multiple genes spanning 30 to 300 kilobase pairs.2,3 We used this method to develop a comprehensive high-throughput resequencing system focusing on ALS as well as other neurodegenerative diseases. We herein describe the development of the microarray-based comprehensive resequencing system and its application to ALS genetics to validate the above-described concepts. We also discuss the implications of comprehensive resequencing for the molecular dissection of neurological diseases.
We have designed a microarray, TKYALS01, that primarily focuses on the causative genes of and genes related to ALS (Table 1). The sequences tiled on the microarray included the sequences of all of the exons and 12 flanking base pairs (bp) of the splice junctions. Promoter sequences were also included in the tiled sequences for genes whose expression levels were presumed to modify the disease processes.4,5 In addition, another microarray, TKYPD01, was designed to focus on genes relevant to Parkinson disease, autosomal-dominant hereditary spastic paraplegias, and adrenoleukodystrophy (data not shown).
Because the principle of the resequencing microarray is based on sequencing by hybridization (SBH), it is crucially important to avoid cross-hybridization to increase the accuracy of resequencing. For this purpose, we conducted an “in silico” screening to compare the tiled sequences with a sliding 25-nucleotide window to detect the sequences with an identity exceeding 22 bases in the tiled sequences and optimized the design of the microarrays and polymerase chain reaction (PCR) primers.
Thirty-five patients with sporadic ALS and 10 patients with familial ALS, 7 with autosomal dominant mode of inheritance and 3 with affected siblings, were enrolled in this study. The diagnosis of ALS was based on El Escorial and the revised Airlie House diagnostic criteria. A total of 238 control genomic DNA samples were also used.
Thirty-six genomic DNA samples with previously determined mutations of SOD1 (OMIM 147450), the causative gene of familial ALS,6- 9 or those of ABCD1 (OMIM 300371), the causative gene of adrenoleukodystrophy,10- 13 were anonymized and subjected to analysis without prior information on the mutations.
All of the genomic DNA samples were obtained with written informed consent, and this research was approved by the institutional review board of the University of Tokyo.
Specific PCR primers were designed using the Primer3 Web site (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi)(eTable). Touch-down PCR protocols were used to enhance the specificity of PCR amplification (eTable). Each PCR product was quantified using PicoGreen (Molecular Probes, Eugene, Oregon), pooled equimolarly into 1 tube using a robotic system, BioMek FX (Beckman Coulter, Fullerton, California), and subjected to SBH according to the manufacturer's instructions (Affymetrix, Santa Clara, California) (Figure 1). The undetermined base calls were further analyzed by manual inspection of the signals. The resequencing of ANG (OMIM 105850) and the confirmation of all of the sequence variants determined by SBH were conducted by direct nucleotide sequence analysis using an automated DNA sequencer and BigDye Terminator version 3.1 (Applied Biosystems, Foster City, California). Analyses of frequency of variants in the controls were conducted by denatured high-performance liquid chromatography (Transgenomics, Omaha, Nebraska).
To evaluate the signal-to-noise ratio, all of the PCR amplicons for TKYALS01 except those for SNCG were subjected to hybridization to TKYALS01 and scanning. Simultaneous hybridization of the mixed PCR amplicons did not interfere with the signals and SOD1 mutations were unambiguously identified (Figure 2). Furthermore, the areas where the probes for SNCG were tiled did not show any detectable signals, indicating that cross-hybridization was negligible.
As shown in Figure 3A, all of the point mutations were correctly identified, confirming the accuracy of SBH for detection of point mutations. The locations of the hemizygous ABCD1 insertion/deletion mutations were also easily identified because the signals of the insertion/deletion sites and surrounding probes were undetectable (Figure 3B). Determination of the exact base changes required direct nucleotide sequence analysis. In contrast, none of the 4 heterozygous insertion/deletion mutations of SOD1 were unambiguously detected without prior information on the mutations. Only the SOD1 heterozygous deletion mutation del429TT was detectable by carefully evaluating the signal intensities (Figure 3C) because the signal intensities were moderately decreased at the deletion sites and the 12 flanking bases (Figure 3D).
By employing robotics to manipulate numerous PCR reactions, the resequencing of as many as 271 625 bp was easily accomplished in 3 working days with a total of 271 445 bp (99.93%) correctly called, confirming the high throughput of this system.
The molecular diagnosis of 10 patients with familial ALS using this system revealed 2 SOD1 mutations, including 1 novel mutation, K3E (Figure 4A), and 1 previously identified mutation, I106V. The novel mutation was not identified in the 238 controls (476 chromosomes). The novel SOD1 mutation was found in a 70-year-old man presenting with progressive distal-dominant muscle atrophy, weakness in all extremities, and positive Babinski signs.
In the 35 patients with sporadic ALS, we identified a previously known SOD1 mutation, S134N, and a novel putative pathogenic DCTN1 (OMIM 601143) mutation, R997W (Figure 4B). These mutations were not present in the 238 controls (476 chromosomes). The amino acid residue R997 of DCTN1 was located in a region conserved among different animal species (Figure 4C). The patient with the DCTN1 mutation was a 68-year-old man presenting with progressive muscle atrophy, weakness in all of his extremities, and postural tremor in the upper extremities, with onset at the age of 67 years. Findings from neurological examination on admission at 68 years of age revealed diffuse muscle atrophy, weakness, fasciculation, and hyporeflexia in all extremities. Weakness of neck flexion was also noted. Observation of his intelligence was normal. Neither bulbar sign nor pyramidal sign was recognized. Electromyography showed diffuse active neurogenic changes compatible with progressive lower motor neuron degeneration. His parents remained healthy beyond 80 years of age.
The comprehensive analysis of the 35 patients with sporadic ALS also revealed 31 sequence alterations in addition to the 2 mutations described above (Table 2 and Table 3). Nine of the 31 variants (29%) were novel (Figure 5A), including 4 nonsynonymous heterozygous variants consisting of 2 in ALS2 (OMIM 606352), 1 in ANG, and 1 in VEGF (OMIM 192240) (Figure 5B) that were present in the ALS patients but not in 238 controls (476 chromosomes).
The effect of the microarray-based high-throughput resequencing system is 3-fold. First, it enables comprehensive mutational analyses of multiple causative genes for the diagnosis of familial cases. Because of nonallelic genetic heterogeneities and clinical variability, it is often difficult to focus on particular genes depending solely on the phenotypes. In this situation, the comprehensive analysis of causative genes is often superior to categorical approaches based on clinical information. The second effect is the identification of mutations in causative genes in sporadic cases (Figure 4). Thus, comprehensive resequencing of the causative genes may reveal mutations with reduced penetrance or de novo mutations in a portion of patients with sporadic ALS. The system has a great advantage in screening numerous genes in many patients with sporadic ALS.
The third effect is the discovery of rare variants potentially involved in disease susceptibility. The current approaches for identifying genetic risks of ALS are mainly based on GWAS employing common single-nucleotide polymorphisms, which generally provide relatively low odds ratios.14,15 The extensive resequencing of relevant genes is expected to complement GWAS by identifying rare variants that contribute to the development of diseases with substantially high odds ratios.16- 19 Large-scale resequencing projects to uncover functional and regulatory variants are currently in progress, identifying numerous novel variants.20 Indeed, nonsynonymous heterozygous variants in ALS2, ANG, and VEGF are overrepresented in patients with ALS (Figure 5B). To confirm the significance of these rare variants in disease pathogenesis, large-scale case-control studies and functional analyses of individual mutant proteins will be required.
The advantage of SBH lies in resequencing particular sets of genes. Once the microarrays are designed, the sequencing is inexpensive and the system can be efficiently used for the repetitive interrogation of the same genome region. To further enhance the throughput of the resequencing system based on SBH, improvement in the detection capability for heterozygous insertion/deletion mutations is required. It seems theoretically possible to overcome this issue by optimizing hybridization conditions and detecting changes in the signal intensity patterns.21
The DNA microarray–based high-throughput resequencing system for comprehensive analysis of causative and disease-related genes contributes to the identification of causative mutations not only in familial ALS cases but also in some sporadic cases with low-penetrant mutations or de novo mutations, and to the identification of numerous rare variations potentially associated with diseases. This system serves as a milestone for translating the technological innovation of high-throughput resequencing directly into clinical practice.
Correspondence: Shoji Tsuji, MD, PhD, Department of Neurology, Graduate School of Medicine, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8655, Japan (firstname.lastname@example.org).
Accepted for Publication: March 21, 2008.
Author Contributions: Dr Tsuji had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: Murayama, Itoyama, Goto, and Tsuji. Acquisition of data: Takahashi, Seki, Matsukawa, Kishino, Aoki, Shimozawa, Murayama, Suzuki, Sobue, Nishizawa, and Goto. Analysis and interpretation of data: Takahashi, Ishiura, Mitsui, Goto, and Tsuji. Drafting of the manuscript: Takahashi, Seki, Matsukawa, Kishino, Aoki, Itoyama, Suzuki, Sobue, Nishizawa, Goto, and Tsuji. Critical revision of the manuscript for important intellectual content: Takahashi, Ishiura, Mitsui, Onodera, Shimozawa, Murayama, Goto, and Tsuji. Obtained funding: Tsuji. Administrative, technical, and material support: Takahashi, Seki, Ishiura, Kishino, Aoki, Shimozawa, Murayama, Suzuki, Sobue, Nishizawa, and Tsuji. Study supervision: Itoyama, Goto, and Tsuji.
Financial Disclosure: None reported.
Funding/Support: This study was supported in part by KAKENHI (Grant-in-Aid for Scientific Research) on Priority Areas; Applied Genomics, the 21st Century Center of Excellence Program, Center for Integrated Brain Medical Science, and Scientific Research (A) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan; a Grant-in-Aid for the Research Committee for Ataxic Diseases of the Research on Measures for Intractable Diseases from the Ministry of Health, Labour, and Welfare, Japan; and a grant from the Takeda Foundation.