Exome Sequencing and the Identification of New Genes and Shared Mechanisms in Polymicrogyria

This genetic association study assesses germline genetic causes of polymicrogyria in a large cohort and considers novel polymicrogyria gene associations.


eMethods Exome Sequencing and SNV Variant Annotation
Libraries from DNA samples (>250 ng of DNA, at >2 ng/ul) were created with an Illumina Nextera or Twist exome capture (~38 Mb target) and sequenced (150 bp paired reads) to cover >80% of exonic targets with minimum 20x and a mean target coverage of >100x.Sample identity quality assurance checks were performed on each sample.The ES data were de-multiplexed, and each sample's sequence data were aggregated into a single Picard BAM file.ES data were processed through a customized pipeline based on Picard, using base quality score recalibration and local realignment at known indels.The BWA aligner was used for mapping reads to the human genome build 38 (hg38).SNVs and insertions/deletions (indels) were jointly called across all samples using Genome Analysis Toolkit (GATK) HaplotypeCaller package version 3.5.Default filters were applied to SNV and indel calls using the GATK Variant Quality Score Recalibration (VQSR) approach.Annotation was performed using Variant Effect Predictor (VEP).

Copy Number Variant Detection
Copy-number variants (CNVs) were discovered from ES following GATK-gCNV best practices. 1Briefly, read coverage was first calculated for each exome using GATK CollectReadCounts.Then, all samples were subdivided into batches determined using a principal components analysis (PCA) of sequencing read counts for gCNV model training and execution.After batching, one gCNV model was trained per batch using GATK GermlineCNVCaller on a subset of training samples, and the trained model was then applied to call CNVs for each sample per batch.Finally, all raw CNVs were aggregated across all batches and post-processed using quality-and frequency-based filtering to produce a final CNV callset.Lastly, the variant call set (SNVs and CNVs) was uploaded to seqr 2 for collaborative analysis.

Variant Interpretation
Guided by the principles of the American College of Medical Genetics and Genomics (ACMG) 3 , we considered variants with no greater than 0.01 frequency for a recessive model or 0.0001 frequency for a dominant or de novo model using gnomAD v2.1 4 for general population reference and evaluated segregation within families as appropriate.We evaluated the predicted consequence of each variant meeting these frequency criteria for conservation of the relevant amino acid compared to multiple orthologs, in silico functional prediction algorithms (e.g., CADD score), predicted effect on splicing (if intronic) using Alamut Visual Plus and considered any existing associations with disease, such as reported in ClinVar, HGMD, or published clinical literature.

Diagnostic Yield Comparison to Commercial Gene Panels
To assess the difference in diagnostic yield between commercially available brain malformation panels and ES for PMG specifically, we calculated the putative diagnostic yields of several commercial brain malformation panels (GeneDx -102 genescomprehensive brain malformation panel August 2021, Invitae -163 genestest code 5506, August 2021, NHS/UK -45 genesversion 2.148, and University of Chicago -130 genestest code 1130, August 2021) in our cohort in silico, not counting families in which we identified CNVs (n=4), or families in which we identified novel candidate PMG genes (n=9).The genes included in each panel are listed in Supplementary Table 4.

Network Visualization of PMG Genes
To visualize functional relationships between genes implicated in PMG, we loaded all genes associated with PMG in our patient cohort into stringApp, which uses the STRING database of known and predicted protein-protein interaction data to construct and visualize the protein-protein interaction network. 5We then manually assigned functional modules based on this network and literature review.

eResults Clinical Gene Panel Comparisons
Panel sequencing for genetic characterization of MCDs is frequently used before ES in the clinical setting due to the expense and analytical challenges of ES, and because panels may capture most of the diagnostic yield for conditions that have relatively few genetic causes (e.g., lissencephaly, or holoprosencephaly).However, gene panels have substantially lower diagnostic yield for PMG relative to other MCDs due to the heterogeneity of PMG-associated genes and the ever-expanding list of implicated genes that are not always included on panels. 6We assessed the performance of four commercially available gene panels used in evaluation of MCDs on our PMG cohort, excluding from this calculation families with identified pathogenic CNVs and families in whom we made novel PMG-gene associations, as those may not have been reported as such on a clinical ES.The diagnostic yield of ES in our cohort, excluding the noted families, would have been 76/262 (29.0%), representing 39 known PMG genes, which is substantially higher than the commercial gene panels, even before novel gene discovery or CNV analysis.All four of the commercially available panels (eTable 4) performed similarly: All analyzed panels performed in line with current guideline estimates of the yield of diagnostic panel sequencing studies in PMG (~15-20%). 6Of the diagnoses made by ES that the examined panels would miss, many belonged to the category of ion conducting proteins (e.g., ATP1A3, SCN3A), accounting for a third to a half of missed diagnoses across the panels (Invitae -13/27, University of Chicago -11/30, GeneDx -16/33, NHS/UK -16/40).

STRING Analysis of PMG Associated Genes
eFigure: STRING protein-protein interaction analysis groups PMG-associated genes into different modules.The number of lines denote the number of types of evidence for interaction, including reports in published literature, interactions reported in proteomic databases, and gene transcript co-expression.

Organizing PMG Etiologies
In this study, we searched for genetic etiologies for PMG in 275 families, identified several novel associations, and organized heterogenous genetic etiologies into a functional framework.Future genotype-phenotype studies of narrower groups, like excellent previous characterizations of specific groups of PMG patients with PIK3R2 mutations 7 or tubulinopathies 8 , will further delineate differences between the categories identified.Cohorts like this provide an invaluable resource to the clinical and research communities focused on the care of patients with MCD and those investigating the genetic underpinnings of brain development.Studies anchored in PMG genetics have not only demonstrated the necessity of genetic testing for diagnosis of MCDs but have also informed our understanding of brain development by uncovering roles for diverse cellular components, including growth signaling pathway enzymes, glycosylating enzymes, cytoskeletal components, and ion-conducting proteins.
PMG stands in contrast to other cortical malformations, in which relatively few genetic etiologies (if not mutations in few genes) cause most cases.For example, variants in just 17 genes, virtually all associated with microtubule binding proteins, account for 81% of all lissencephaly cases, with LIS1 alone accounting for 40% of cases documented 9 .Also, more than half of primary microcephaly cases are caused by centrosome gene variants 10 .In contrast, genetic causes of PMG have been harder to catalogue because of their heterogeneity, and the overall diagnostic yield in PMG versus other MCDs will certainly be lower because PMG can result from non-genetic etiologies including in utero viral infections (estimated at 20-30% of cases) 6 and in utero ischemia.

Ion-conducting Proteins are a Common Cause of PMG
A recent excellent retrospective cohort study of PMG patients by Stutterd et al. used panel sequencing of 328 established or candidate brain malformation genes to estimate the frequency of germline mutations in 123 PMG patients, excluding CNVs, and determined a diagnostic yield of 20.3%. 11The discrepancy in diagnostic yield between Stutterd et al.'s cohort (20.3%), and the rate of genetic explanations (not necessarily diagnostic yield) in our cohort (33%) is reasonably explained by the expansion of our sequencing breadth and ability for gene discovery; if we simply restricted our findings to the genes they included in their panel, our molecular explanation would have been similar to their reported yield.The genes that make up the yield difference are majority concentrated in a single categorygenes encoding ion-conducting proteins.Further, the genes that make up that yield difference between previous studies or the gap between commercially available MCD panels and the yield of ES in our cohort are heavily concentrated in a single categoryalso genes encoding ion-conducting proteins.The mechanism by which ion channels influence cortical development is an active area of research 12 , and may involve modulation of cell proliferation, migration, and differentiation. 13Table 1. Genes Included in tNGS PMG Panel

eTable 3 .
Brain Malformation Gene Panels Analyzed Compared to ES