Association of Germline Variants in Natural Killer Cells With Tumor Immune Microenvironment Subtypes, Tumor-Infiltrating Lymphocytes, Immunotherapy Response, Clinical Outcomes, and Cancer Risk.

Importance
Only a small fraction of patients with cancer receiving immune checkpoint therapy (ICT) respond, which is associated with tumor immune microenvironment (TIME) subtypes and tumor-infiltrating lymphocytes (TILs).


Objective
To examine whether germline variants of natural killer (NK) cells, a key component of the immune system, are associated with TIME subtypes, the abundance of TILs, response to ICT, clinical outcomes, and cancer risk.


Design, Setting, and Participants
This genetic association study explored TIME subtypes and examined the association of the germline genomic information of patients with cancer with TIME subtypes, abundance of TILs, response to ICT, prognosis, and cancer risk. Clinical information, tumor RNA sequencing, and whole-exome sequencing (WES) data of paired normal samples of patients with 13 common cancers (n = 5883) were obtained from the Cancer Genome Atlas. The WES data of individuals with no cancer (n = 4500) were obtained from the database of Genotypes and Phenotypes. Data collection and analysis took place in March 2017.


Main Outcomes and Measures
Associations between the number of germline defective genes in NK cells and survival time and the abundance of TILs.


Results
Based on tumor RNA sequencing data, tumors were stratified into TIME-rich, TIME-intermediate, and TIME-poor subtypes. Tumors of TIME-rich subtype had more TILs (TIL-NK cells in TIME-rich head and neck squamous cell carcinoma [HNSC] tumors: t = 4.85; 95% CI of the difference, 0.01-0.03; P = 2.19 × 10-6) compared with TIME-intermediate HNSC tumors (t = 3.70; 95% CI of the difference, 0.01-0.03; P < .001), better prognosis (patients with HNSC: hazard ratio, 0.65; 95% CI, 0.41-1.02; P = .054) compared with TIME-intermediate and TIME-poor subtypes, and better ICT response (patients with melanoma: odds ratio [OR], 4.45; 95% CI, 0.99-27.08; P = .04). Patients with TIME-rich tumors had significantly fewer inherited defective genes in NK cells than patients with TIME-intermediate and TIME-poor tumors (patients with HNSC: OR, 0.49; 95% CI, 0.26-1.07; P = .005). Similarly, patients with cancer had significantly more inherited defective genes in NK cells than individuals with no cancer (patients with HNSC: OR, 19.09; 95% CI, 4.30-315.96; P = 6.21 × 10-4). Among 11 of 13 common cancers, the number of heritable defective genes in NK cells was significantly negatively associated with survival (patients with HNSC: hazard ratio, 1.77; 95% CI, 1.18-2.66; P = .005), abundance of TILs (patients with HNSC: R = -0.25; 95% CI, -0.65-2.17; P = 0.02), and response to ICT (patients with melanoma: OR, 4.45; 95% CI, 0.99-27.08; P = .04).


Conclusions and Relevance
These results suggest that individuals who have more inherited defective genes in NK cells had a higher risk of developing cancer and that these inherited defects were associated with TIME subtypes, recruitment of TILs, ICT response, and clinical outcomes. The findings have implications for identifying individuals at risk for developing cancer of many types based on germline variants of NK cells and for improving existing ICT and chimeric antigen receptor-T cell therapy by adoptive transfer of healthy NK cells to patients with TIME-intermediate and TIME-poor tumors.


eMethods 1. Collecting NK cell-specific (NK-specific) Genes
To select NK-specific genes, we used the gene signatures derived from a genomewide gene expression analysis of mouse lymphocytes between immune cells 1 and an immune gene regulatory network (ImmGen) 2 containing the gene expression profiles of more than 300 mouse immune cell types. NK cells were reported to be easily distinguished from B cells and adaptive T cells such as CD4+ T, CD8+ T and other innate immune cells but shared many up-regulated genes with NKT and γδ T cells. 1 The shared repertoire of surface receptors, signaling molecules and transcription factors expressed by NK cells and the innate-like T cells (i.e., NKT and γδ T cells) blurs the distinctions among these cell types. Thus, we collected ~200 NK-specific genes representing (1) NK-unique genes which were more highly up-reregulated in NK cells than innate-like T cells (i.e., NKT and γδ T cells) and other immune cells (i.e., B, CD4+ T, CD8+ T and others); (2) genes (termed as NK-NKT genes here) which were highly up-regulated in both NK and NKT cells, but not in other immune cells; (3) genes (i.e., termed as NK-γδ T genes here) which were highly up-regulated in both NK and γδ T cells, but not in other immune cells; and (4) genes (i.e., termed as NK-NKT-γδ T genes here) which were highly up-regulated in NK, NKT and γδ T cells, but not in other immune cells. These genes were able to distinguish innate populations (NK, NKT and γδ T cells) from adaptive T cells, B cells and other immune cells.
We manually checked these genes from literature and gene databases such as GeneCards (https://www.genecards.org) to remove the genes which were universally expressed. Furthermore, we conducted correlation analyses between the expression of each gene and the abundance of TIL-NK, TIL-NK CD56 bright or TIL-NK CD56 dim cells in each cancer type. For each cancer type, the genes were remained only if they had significantly positive correlations with the abundance of any of the three TIL-NK cell subsets (FDR-corrected p<0.05). Finally, we obtained 157 genes (i.e., 28 of them were NK-unique genes). These genes were highly expressed on either NK cell alone, NK-NKT, NK-γδ T or NK-NKT-γδ T cells. To exclude the genes that have a functional effect on tumor cells, we further screened the genes (i.e., from the 157 gene) which were within the bottom 30% of the expression value-ranked wholegenome genes in the TIME-rich tumors.   Functional variants were examined and annotated using the Combined Annotation Dependent Depletion (CADD) 4 with the default parameters. To consistent with the WES data analysis pipeline in TCGA, BWA (version-0.7.15) 5 was used to align with default parameters for the cancer-free individuals, and pipe into Samtools (version-0.1.8) 6 to sort. Additional adding read groups and duplicate removal were processed with Picard-tools (version-2.6.0). The resulted BAM files were processed with GATK (version-4.0.11.0) for realignment and base recalibration. eMethods 5. Immune Gene Set and Clustering Analysis for Determining TIME Subtypes Immune-related genes (n=1,384) including MHC system-related genes, 7 immunophenoscore-related genes, 8 ICT essential genes for immunotherapy 9 and cytotoxic T cell-resistant genes 10 were collected and identified as critical immunerelated genes (the gene pool G ). RNA-sequencing data of melanoma samples were used to conduct the following analysis: Step#1 Initialize the candidate set of key genes, that is, candidate G   Step#2 Randomly select 30% genes from the gene set random G .
Step#3 Replace the features of elements in the patient set P with random G to form the sample set random S .
Step#4 Group the samples random S by using the hierarchical clustering method. For each of the clustering, clValid 11 was used to evaluate the clustering stability and the most stable clustering number was recorded.
Step#5 Repeat Steps 2-4 100,000 times. Rank the most stable clustering numbers and select the most suitable clustering number 3.
Step#6 Extract the genes when clustering number is 3 and rank the genes.
Select the most informative genes and record them as the final set of key-gene candidates (1,294 genes).
We then used the 1,294 genes to conduct unsupervised clustering analyses of the RNA-seq data for each cancer type to define TIME subtypes. eMethods 6. Assigning Immune Checkpoint Therapy (ICT) Trial Samples into TIME Subtypes ICT-clinical trial SKCM and STAD samples were assigned to the TIME subtypes derived from TCGA-SKCM and TCGA-STAD samples, respectively. To assign an ICT-clinical trial sample into a TIME subtype, t-test statistics were first conducted between TIME-rich and TIME-intermediate tumors, and between TIME-intermediate and TIME-poor tumors based on the genes derived from the NK cell-mediated cytotoxicity pathway and the Wnt signaling pathway using RNA-seq data. Spearman's correlation was then conducted between each ICT-clinical trial sample and the TCGA samples in each TIME subtype based on the obtained significantly differential genes (p<0.05). Finally, k-nearest neighbor algorithm (KNN, k=5) was used to determine the subtype of each ICT-clinical trial sample.

eMethods 7. Randomization Tests of the NK-Defective Genes
For each cancer type, we identified a set of NK-defective genes. To test if a set of NK-defective genes could be randomly identified, we conducted randomization tests.
We first randomly selected 157 genes within the bottom 30% of the expression valueranked whole-genome genes of the TIME-rich tumors for a given cancer type (n=5,000 times) and then employed hypergeometric tests. When a negative correlation (p<0.05) between the gene number in the random genes and the abundance of TIL-NK cells appeared, the hypergeometric test was marked as one success. The number of successes was calculated within the 10,000 hypergeometric tests. To further validate the observation that more inherited defective genes in NK cells were in cancer patients than cancer-free individuals, we obtained the WES data of germlines for 12,380 cancer-free samples from dbGaP (phs000473) and determined their functional mutated genes. For each cancer type, we conducted hypergeometric tests for the NK cell-mediated cytotoxicity pathway, NK cell-associated phenotypes and the APP pathway shown in Figure 3 and eFigure 4 using the methods in our previous study. 12 Briefly, for a given cancer type, we compared cancer hallmark genes (~12,000 genes) using randomization tests to identify differentially functional germline mutated genes (p<0.05) between cancer and the cancer-free samples, and then conducted enrichment analyses of the NK cell-mediated cytotoxicity pathway, NK cell-associated phenotypes and APP pathway via conducting hypergeometric tests (differentially functional germline mutated genes vs the cancer hallmark genes).

eAppendix 1. Associations of Defective Genes in NK Cell-ITAM-Signaling Genes With Clinical Outcomes and Abudnance of TILs
To test whether ITAM signaling-genes in NK cells could have more inherited defects in TIME-poor tumors compared with TIME-rich tumors, we collected a set of known ITAM-dependent genes (~20 genes) in NK cells and found that some of the genes had significantly higher genetic defects in TIME-poor tumors than TIME-rich tumors (eTable 5). We further asked whether the combination of these significantly defective genes with the potentially NK cell-defective genes identified in each cancer type could improve the correlations (i.e., the negative correlations between the number of NK cell-defective genes and survival and TILs' abundance). We showed that the correlations were significantly improved (i.e., lower p values and correlation coefficients) in 11 out of the 12 cancer types except COAD. These results suggested that inheritably genetic defects in ITAM-signalling genes of the NK cells could highly associated with TILs' abundance and survival in all the cancers except COAD in a synergy manner. NK cells surrounding COAD tumors enabled to directly interact with environmental factors such as microbiome, drinks, and others so that COAD associated NK cells were much more complex than other cancers. eAppendix 2. Inherited Defective Genes in NK cells, Type I Diabetes, Long-term Depression Phenotypes, and Cancer Germline genomic analysis here suggested that Type I diabetes and long-term depression phenotypes were linked to some cancers. This is supported by non-genetic studies showing that diabetes is a risk factor for all-site cancer through a metaanalysis of 121 cohorts including 20 million individuals and one million events. 13 A 24-year follow-up study showed that depression increases the risk of cancer, 14 moreover, a meta-analysis of 16 studies (n=163,000) showed that cancer patients with anxiety and depression had a greater risk of dying from all types of cancer. 15 The research implied that the impairment of NK cell function is probably one of the common factors behind these links. For example, obesity has been known to impair NK cell function and then lead to an increased risk for severe infections and several cancer types, 16,17 while chronic family stress is consistently associated with decreases in NK cell cytotoxicity. 18 These results indicated that NK function impaired by either genetic defects or regulatory factors could increase cancer incidence.

eAppendix 3. Associations of the Inherited Defective Genes in APP and Wnt Pathways With Tumorigenesis and Metastasis
It has been shown that activated Wnt signaling pathway in tumors excluded the recruitment of CD8+ T cells into TIMEs. 19 Also, somatic mutations of the Wnt pathway in tumors could activate it to prevent T cells from being recruited into TIMEs. 20 Here we showed that in most of the cancer types the number of inherited those derived from defective NK cells. These results suggested the association between inherited dysregulations in Wnt pathway and TILs' recruitment in some cancer types, and its influences were much weaker than defective NK cells.
APP is a biological process which presents antigens to T cells, functional defects in APP could lead tumor cells to escape T cell surveillance. However, careful analyses in this study showed that the number of genetic defective genes in the APP pathway (i.e., the KEGG APP pathway includes HLA family genes, TAP1/2 and other genes) was not significantly associated with clinical outcomes and TILs' abundance.
However, significantly more inherited defective genes in the APP pathway were observed in cancer patients than cancer-free individuals in most cancers ( Figure 5 in the main text). These results suggested that individuals who bear more inherited defective genes in the APP pathway probably were at-risk of developing cancers.
These insights provided a potential opportunity to identify the subpopulation at-risk of developing cancers based on the inherited defective genes in NK cells and the APP pathway.

Cancer
In this study, we showed that inherited defective genes in NK cells shaped TIME subtypes, TILs' abundance, survival and cancer risk. Along this line, many open questions still remained, for example, defective genes in NK cells were largely shared by different cancer types, however, each cancer type has a few unique NK-defective genes. Given the fact that tissue/organ-resident NK cells are different and diverse, 21 additional studies will be needed to elucidate if defective genes are dominantly expressed in tissue/organ-resident NK cells. If so, adoptive transfer of tissue-resident NK cells could be considered to be more efficient in cancer prevention and TIL's recruitment. NK cells share gene expression programs with NKT and γδ T cells, each of which is a subset of innate-like T cells. NK cells, γδ T cells and NKT cells are cytotoxic cells, which trigger innate immune responses, provide the first level of eFigure 1. Heatmaps Showing the 3 Universal TIME Subtypes Heatmaps showing the three universal TIME subtypes derived from the unsupervised clustering by using the expression of the immune-checkpoint therapy essential genes. Columns and rows represent samples and genes, respectively. Cancer types include bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), kidney renal clear cell carcinoma (KIRC), lower grade glioma (LGG), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinoma (PRAD), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA) and uterine corpus endometrial carcinoma (UCEC). Red, green and blue bars represent TIMErich, TIME-intermediate and TIME-poor subtype, respectively. eFigure 2. Abundance of the Tumor-Infiltrating Lymphocytes in TIME Subtypes in Cancers Rows represent tumor-infiltrating lymphocytes. eFigure 3. Kaplan-Meier Curves Between Patients With Cancer in TIME-Rich Subtype and TIME-Intermediate and TIME-Poor Subtypes. It revealed that survival for was significantly longer for TIME-rich patients than TIMEintermediate/-poor patients. A substantial fraction of samples in COAD was virus-infected tumors After removing them, only 87 COAD samples were remained for analysis. eFigure 4. Significantly Enriched Pathways by Comparing RNA-seq Data in TIME-Intermediate and TIME-Poor Subtypes Columns and rows represent KEGG pathways and cancer types, respectively. The digital numbers represent FDR-corrected p values. eFigure 5. Heatmaps of the Significantly Differential Functional Germline Variants Between the TIME-Rich and TIME-Intermediate/TIME-Poor Subtypes (Fisher's exact tests with FDR-corrected p<0.25). Only the functional germline variants (rows) which were less enriched in the TIME-rich subtype are shown. Red, green and blue bars represent TIME-rich, TIME-intermediate and TIME-poor subtype, respectively. Columns represent samples.

eFigure 6. Significantly Enriched Pathways by Comparing Functional Germline Variant
Between TIME-Rich and TIME-Intermediate/TIME-Poor Subtypes Columns and rows represent KEGG pathways and cancer types, respectively. The digital numbers represent FDR-corrected p values. Significantly functional differential germline variants (p<0.2) between TIME-rich and TIME-intermediate/-poor tumors were intersected with the immune-checkpoint therapy (ICT) essential genes, and then pathway enrichment analysis was conducted. eFigure 7. Significantly Enriched Pathways of the Significantly Differential Germline Variants Between the TIME-Intermediate and TIME-Poor Subtypes A heatmap showing the significantly enriched pathways derived from the significantly differential germline variants between TIME-intermediate and TIME-poor subtypes. Columns and rows represent KEGG pathways and cancer types, respectively. The digital numbers represent FDR-corrected p values. Significantly functional differential germline variants (p<0.2) between TIME-intermediate and TIME-poor tumors were intersected with the immune-checkpoint therapy (ICT) essential genes, and then pathway enrichment analysis was conducted.

eFigure 8. A Heatmap Showing the Significantly More Inherited NKD Genes in TIME-Intermediate/TIME-Poor Subtypes Than TIME-Rich Subtype in Cancers
The digital numbers represent FDR-corrected p values. Columns and rows represent cancer types and known inherited NKD (NK cell deficiency) genes, respectively. eFigure 11. Negative Correlations Between the Number of the Inheritable Defective Genes  and Abundance of TILs  NK, NKT, A CD8+ T, A CD4+ T, cDC, immature B, Activated B, MDSC, GD T, RT, CM  CD8+ T, I dendritic, Type 1 TH, Type 2 TH, Type 17 TH, TF helper, A dendritic, EM CD4+  T, EM CD8+ T, P dendritic and CM CD4+ T represent natural killer cell, natural killer T cell,  activated CD8+ T cell, activated CD4+ T cell, conventional dendritic cell, immature B cell, activated B cell, myeloid-derived suppressor cell, gamma delta T cell, regulatory T cell, central memory CD8+ T cell, immature dendritic cell, Type 1 T helper cell, Type 2 T helper cell, Type 17 T helper cell, T follicular helper cell, activated dendritic cell, effector memory CD4+ T cell, effector memory CD8+ T cell, plasmacytoid dendritic cell and central memory CD4+ T cell, respectively. *0.05<p-value <0.1; **0.01<p-value <0.05; and ***p-value<0.01.

eFigure 15. Heatmaps of the Significantly Differentially Functional Germline Variants Between Cancer and Cancer-Free Cohorts
Only the functional germline variants (rows) which are more enriched in cancer patients are shown. For each cancer types, the same number of the cancer-free individuals were randomly selected from the cancer-free cohort (n=4,500). Red and blue bars represent cancer and cancer-free individuals, respectively. Columns represent samples. eFigure 16. Significantly Enriched Pathways Derived from the Significantly Differential Germline Variants Between Individuals With No Cancer Patients With Cance rA heatmap showing the significantly enriched pathways derived from the significantly differential germline variants between cancer-free individuals and cancer patients in 13 cancer types. Columns and rows represent KEGG pathways and cancer types, respectively. The digital numbers represent FDR-corrected p values. Significantly functional differential germline variants (p<0.2) between cancer-free individuals and cancer patients were intersected with the immune-checkpoint therapy (ICT) essential genes, and then pathway enrichment analysis was conducted.