Genome-Wide Interaction Study of Dietary Intake and Colorectal Cancer Risk in the UK Biobank

Key Points Question Which variants and genes modify the association of dietary intake with colorectal cancer (CRC) risk, and what are the underlying pathways for diet-CRC associations? Findings In this nested case-control study including 4686 patients with incident CRC and 14 058 matched controls, 324 variants suggestively interacted with 11 dietary factors, and multiple variants of EPDR1 were found to interact with fish intake on CRC risk. Several pathways were detected for the association between milk, cheese, tea, and alcohol consumption and CRC risk. Meaning The findings of this study support evidence for possible pathways involved in the association between diet and CRC.

where   represents the probability of being a CRC case for any individual ,   is the binary indicator variable for stratum  among s matching risk sets,   is the SNP alternative allele count,   is the dietary intake frequency,   is the covariates including age, sex, and six principal component scores, and ϵ  is the residual.α  is the regression coefficient associated with stratum indicator variables, and β  , β  , and β  are regression coefficients for SNP, dietary intake, and SNP*diet interaction term, respectively.The p value of the β  coefficient was used to assess the existence of an interaction.We determined SNPs that both suggestively (two-sided p<1x10 -5 ) and significantly (two-sided p<5x10 -8 ) interact with dietary factors on CRC risk.Genotype data were handled in plink2 6 and imported into the R program (version 4.2.2) 7 for association and interaction analyses.
Apart from genome-wide interaction analysis at the SNP level, gene-based and gene-set enrichment analyses are suggested to exert more powerful results 8 .In gene-based analysis, genetic variants are aggregated to the whole gene level, testing the joint effect of all markers in the gene.
In gene-set analysis, individual genes are aggregated into groups of genes sharing certain biological pathways and functions 8 .Given the list of predefined genes, gene-set enrichment analysis searches for sets of genes that are significantly overrepresented 8 .These sets of genes normally consist of genes that function together in a known biological pathway 8 .
To investigate whether SNP-diet interaction patterns tended to converge within genetic regions, we carried out a gene-based analysis using MAGMA in web-based FUMA 9 .Overall, input SNPs were mapped to 18,041 protein-coding genes; two-sided p<2.77×10 -6 (0.05/18,000) was deemed to be significant in gene-based analysis.Summary statistics from GWI analysis were also used as input files for which each SNP was assigned to a gene using the NCBI 37.3 gene definition.MAGMA further implemented gene-set enrichment analysis to identify Gene Ontology terms and biological pathways from multiple sources 9 .We considered a Benjamini-Hochberg adjusted two-sided p value <0.05 as significant in gene-set enrichment analysis.
To further interpret significant interactions between dietary and genetic factors in CRC risk, we analyzed the association between dietary intake and CRC risk stratified by genetic markers in overall 370,004 cancer-free individuals at baseline.Adapting the MAGMA gene-based approach 8 , we classified individuals by specific genes (cumulative SNP genotypes) by using the principal component analysis to project SNP matrix into important principal components and performed stratification analyses of diet-CRC associations in subgroups of individuals.

eTable 1 .
Dietary Intake and Colorectal Cancer Risk in a Nested Case-Control From the UK

10 (1.02-1.20) 1.10 (1.01-1.20)
Summary of Genetic Variants Mapped to EPDR1 Gene Factor Loading Matrix for Major EPDR1 Gene Principal Components Identified From SNP Matrix Using Principal Component Analysis Associations Between Fish Intake and Colorectal Cancer Risk According to EPDR1 HR, hazard ratio; CI, confidence interval.Fully adjusted model included sex, smoking, drinking, body mass index, and physical activity as covariates.Bold font indicates significant difference.eTable 6. Dietary Intake Among Study Among Participants Who Completed Touchscreen Questionnaire at Least Two Visit Assessments Data are presented as counts (percentages).Differences of dietary intake between baseline and latest visit were analyzed using McNemar test.Scree Plot for Explained Variance of Principal Components Derived From Multiple Variants Located in EPDR1 Gene Using Principal Component Analysis eFigure 7.