Whole-Exome Sequencing Among Chinese Patients With Hereditary Diffuse Gastric Cancer

Key Points Questions What is the incidence rate of germline alterations in CDH1, which has been reported as a susceptibility gene present in 25% to 50% of patients with hereditary diffuse gastric cancer (HDGC), and is there a genetic basis underlying disease susceptibility in the remaining 50% to 75% of patients with HDGC? Findings In this cohort study of 284 Chinese patients with HDGC, the frequency of CDH1 germline alterations was low (2.8%), and germline alterations, insertions, and deletions were most frequently found in MUC4, ABCA13, ZNF469, FCGBP, IGFN1, RNF213, and SSPO. Double-hit events in genes such as CACNA1D were observed among patients with HDGC. Meaning This study’s findings challenge the previously reported high frequency of CDH1 germline alterations in HDGC and suggest that double-hit events may serve as important mechanisms for HDGC tumorigenesis; the study also provided a genetic landscape and identified new susceptibility genes for HDGC.

The private variants were defined through the following criteria as follows: (i) nonsense variants, splice-site variants, and frameshift INDELs; (ii) heterozygous in the germline; (iii) less than 0.5% minor allele frequency (MAF) in the 1000 Genomes Project 2 or the CMDB 3 ; (iv) present in only one patient; (v) more than 0.5 mappability score; and (vi) no more additional genomic locus through BLAT based query 4 .
Since the most known high-penetrance disease-associated variants are located in the coding regions, only genetic variants located in these regions were considered. Moreover, as the functional impact of most missense variants is obscure, we focus on truncating variants (including nonsense mutations, splice site mutations, and frameshift INDELs), which usually result in the function loss of the encoded proteins.
Somatic SNVs and INDELs were analyzed using Mutect2 with default parameters. We also extracted reads mapping on the SNV/INDELs and remap them on human reference hg38 using BLAT, and filtered the mutations whose reads were mapped in more than one position of hg38. Considering the oxidative deamination in FFPE samples, we evaluated the effects of C>T/G>A artefacts in our data. We separated SNVs into low frequency SNVs and high frequency SNVs basing on the variant allele frequency (VAF) cut-off 0.1, and compared the fractions of C>T/G>A in both SNV groups. There is no significant fraction difference between SNVs which VAF < 0.1 (29.26%) with that of VAF > 0. 1  were annotated using ANNOVAR. Then MuSigCV (v1.41) was used to define significantly mutated genes (SMGs) in the somatic HDGC cohort. A gene with a q-value less than 0.05 was considered to be significantly mutated (Table S1). Mutational signature analysis was performed using maftools v 2.2.10 5 . Three signatures were decomposed by the non-negative matrix factorization (NMF) method. Each signature was compared against known signatures derived from the COSMIC database (https://cancer.sanger.ac.uk/cosmic/signatures_v2) based on the cosine similarity. To show the frequently mutated genes in the oncoplot, we filtered the mutations frequently detected (minor allele frequency (MAF) > 0.01) in databases including 1000GP, ESP6500, ExAC and CMDB. MSI status was determined by MSIsensor (v0.5) 6 and samples with an MSI score > 10 were considered as MSI-H.
To validate the detected germline mutation as well as somatic mutations, we performed ultra-deep targeted sequencing of 100 genes including 55 most frequently germline mutated genes and 45 somatic mutated genes, which is mentioned before. Target sequencing reads were aligned to the hg19 using the BWA with the same parameters of WES, and "samtools mpileup" was performed to investigate both sequencing depth as well as mutant reads depth of each mutation site. The mutation sites with limited sequencing depth (reads count < 50) were removed, and mutation sites with mutant depth more than 3 were treated as validated sites (Table S5).
SCNAs were detected using Control-FREEC v11.1. The GISTIC2 algorithm was used to infer recurrently amplified or deleted genomic regions in the HDGC cohort. G-scores were © 2022 Liu ZX et al. JAMA Network Open. calculated for sequencing regions based on the frequency and amplitude of amplification or deletions affecting each gene. The "high-level amplification (or deletion) thresholds of segment mean" provided by GISTICS2 was used to define "amplification (AMP)" and "deletion (DEL)" at the gene level.

Double-hit event analysis of HDGC
Firstly, the purity of each tumor sample was evaluated based on the ABSOLUTE (v1.0.6) algorithm. Both SCNA segmentation files and SNV files in VCF format were set as input to ABSOLUTE. In line with the recommended best practice, all ABSOLUTE solutions were reviewed by 3 bioinformaticians, with solutions selected based on the majority vote.
The loss of heterozygosity (LOH) regions was detected using Control-FREEC v11.1, which could calculate the tumor coverage relative to germline (logR) and b-allele frequencies (BAF) of each 1000GP SNP site, and further define the LOH regions.
Double-hit event was defined as the heterozygous germline mutation which becomes homozygous because of the LOH, which may expand the impact of germline mutation and potentially relate to tumorigenesis. We selected high-quality heterozygous germline mutations located in the LOH regions and extracted the sequencing depth and mutant reads count of each selected germline mutation site in tumor samples using Samtools "mpileup" function with the parameter "-p 20 -P 20". We calculated the expected VAF in the bulk tissue sample with specific tumor purity using the following formula: We simulated sequencing in the specific read depth of one mutation site with the expected VAF for 10 5 times and calculated the 95% confidence interval of expected VAF (VAFexp) of each given germline mutation site. We identified germline mutations for which the detected VAFs in tumor sequencing data were more than 0.5 and in the 95% confidence intervals and defined them as double-hit events (Table S2).

Drug target analysis
All variants including somatic mutations and gene-level CNAs were queried for potential actionability in three drug databases including OncoKB 7 , Cancer Genome Interpreter (CGI) 8 and CIVIC 9 . The common data model proposed by the ESMO Scale for Clinical Actionability of molecular Targets (ESCAT) 10 which includes 6 tiers of different actionable levels was employed. We integrated these three databases into this criterion and added the levels of "R1" and "R2" from OncoKB into the common evidence tiers, which represented different levels of drug resistance predictions. The mapping details are shown in Table S6.

Statistical analyses
For Based on the established cohort, sequencing and data analysis were carried out according to the procedures presented in Figure S1.  (Table S5). For the 117 single nucleotide variants and 48 INDELs in the 45 somatic mutated genes, 5.98% (7/117) and 14.58% (7/48) were removed due to the limited sequencing depth (<50X), while 96.36% (106/110) and 92.68% (38/41) were validated as consistent with WES, respectively (Table S5). Taken together, these results showed that the variants calling in this study was highly reliable.