Rare and Common Genetic Variation Underlying Atrial Fibrillation Risk

Key Points Question What is the combined contribution of rare and common genetic variation to atrial fibrillation (AF) risk? Findings In this genetic association study, rare genetic variants, predicted to cause loss of function, in 6 genes were associated with AF. Together, these rare variants and a polygenic risk score for AF were associated with a considerable risk of incident atrial fibrillation; rare variants were also associated with heart failure and cardiomyopathy, and a higher risk of cardiomyopathy following AF diagnosis. Meaning The findings suggest that assessing both rare and common genetic variation may aid in atrial fibrillation prevention and risk stratification.

eFigure 10.Forest plot of hazard ratios for AF, cardiomyopathy, and HF (30-day grace period) eReferences This supplemental material has been provided by the authors to give readers additional information about their work.

Quality control, variant annotation, and phenotype definitions
We conducted further filtering of samples based on QC criteria listed in UK Biobank resource 531 (heterozygosity, missing rates, excess relatedness, and missing kinship inference).We excluded samples with disagreements between reported sex and genetically determined sex, and filtered for European ancestry based on the first six principal components of individuals self-reporting as "White", "Irish", or "Any other white background" (UK Biobank data field 21000, coding 1001, 1002, and 1003).We filtered variants by missingness (>10%) and Hardy-Weinberg equilibrium test (P<1x10 -15 ), and retained calls with a genotype quality >20, read depth >10 and call rate >90%.
Variant annotation was performed using dbSNP (version 4.1a) 1 and SnpEff (version 5.0) 2 .pLoF variants were defined as variants leading to a premature stop codon or to the loss of a start or stop codon, frameshift variants leading to a premature stop codon, and variants disrupting canonical splice acceptor or donor sites.Only pLoF variants annotated as "high" impact were included as pLoF variants.We assesed splice-sit variants using SpliceAI Diabetes was defined by ICD-10 codes E10, E11 and E14 (UK Biobank data fields f.130706, f.130708, and f.130714)

Gene-based tests for rare missense variants
Unlike pLOF variants, the effects of missense variants on disease risk are more difficult to predict.
Traditional burden tests lose power when the effects of variants are bidirectional.Alternative methods that account for bi-directionality, like the Sequence Kernel Association Test (SKAT) 4 , may lose power when only a small proportion of variants in a gene are associated with the investigated outcome (sparsity of causal variants).
We, therefore, performed both a traditional burden test and followed up with the Omnibus Aggregated Cauchy Association Test (ACAT-O) 5 as a sensitivity analysis.The ACAT-O test is robust to both bi-directional variant effects and sparsity of causal variants and is therefore wellsuited to examine missense variants.

Sensitivity analyses for gene-based tests
As a sensitivity analysis, we conducted a leave-one-variant-out (LOVO) analysis using the function integrated in REGENIE, for all significant and suggestive associations.This approach constructs a series of masks for each gene, leaving one variant out per mask.A subsequent gene-based burden test was performed for each mask to detect if any individual variants were the sole drivers of the association (P>0.05 for mask without individual variant).
To assess whether any associations were primarily driven by ventricular cardiomyopathies, we also conducted another gene-based burden test, excluding all individuals with diagnosed cardiomyopathies before inclusion or during follow-up.Cardiomyopathies were defined by ICD10 code I42 (UK Biobank data field 131338).
The association between pLOF variants in RPL3L and AF was primarily driven by a variant in position chr16:1945498:C>T (P=0.050 for mask without variant) and the association between missense variants in the UBE4B gene and AF was primarily driven by a single missense variant in position chr1:10107367:G>A (P=0.74 for mask without variant).Results of the LOVO analysis are summarized in Supplementary Data 4-5.Excluding individuals with cardiomyopathies did not substantially alter the results (Supplementary Data 6-7).

Replication of genetic findings
We included 138,131 participants in Geisinger Health System MyCode cohort and 29,127 participants in the Mount Sinai BioMe Biobank.Atrial fibrillation cases were defined based on International Classification of Diseases version 10 (ICD-10) I48 obtained from electronic health records.Participants without any records of cardiac arrythmia were used as controls.

DNA sequencing and genotyping data
The Regeneron Genetics Center performed high coverage whole-exome sequencing using NimbleGen VCRome probes (Roche CA, USA) or a modified version of the xGen design from Integrated DNA Technologies (IDT).Sequencing was done using Illumina v4 HiSeq 2500 or NovaSeq instruments, achieving over 20x coverage for 96% of VCRome samples and 99% of IDT samples.Variants were annotated using snpEff and Ensembl v85 gene definitions, prioritizing protein-coding transcripts based on functional impact.The following variants were defined as protein truncating: insertions or deletions resulting in frameshift, any variant causing a stop gained, start lost or stop lost and any variants affecting a splice acceptor or splice donor site.Common variant genotyping was performed on single nucleotide polymorphism (SNP) arrays as previously described 6 .We retained genotyped variants with a minor allele frequency >1%, <10% missingness, Hardy-Weinberg equilibrium test P-value >10-15.We imputed the genotyped variants based on the TOPMed reference panel 7 , using the TOPMed imputation server 8,9 .Further details are provided elsewhere 6,10,11 .

Association analyses
We estimated associations between the burden of predicted loss-of-function variants in TTN, RPL3L, PKP2, CTNNA3, C10orf71, and KDM5B with atrial fibrillation by fitting additive genetic Firth bias-corrected logistic regression models using the software REGENIE, version 2+ 12 .
Analyses were adjusted for age, age squared, sex, age-by-sex, and age squared-by-sex interaction terms; experimental batch-related covariates; the first 10 common variant-derived genetic principal components; the first 20 rare variant-derived principal components; and a polygenic score generated by REGENIE, which robustly adjusts for relatedness and population 12 .Association results from Geisinger Health System MyCode and the Mount Sinai BioMe Biobank were metaanalyzed using fixed-effects inverse variance weighting.

Protein abundance and RNA expression across cardiac cell types in human hearts
To evaluate protein abundance levels of TTN, RPL3L, PKP2, CTNNA3, C10orf71, and KDM5B, we used utilized mass spectrometry (MS)-based protein abundance measurements from human left and right atrial tissue of seven individuals from one of our previous studies 13 .Raw data were searched against the SwissProt human protein database containing canonical and isoform sequences using Similarly, to evaluate in which cell types the proteins of interest are expressed in the human heart, we queried a publicly available single-nucleus RNA sequencing (snRNAseq) data set of 287,269 cells of the human heart published by Tucker et al. 14 .Cytoplasmic cardiomyocyte clusters were removed, the remaining clusters were combined and average RNA expression values per cell types were calculated as described by Tucker et al. 14 .The average expression values of TTN, RPL3L, PKP2, CTNNA3, C10orf71, KDM5B, and MYBPC3 were extracted and scaled per gene by dividing by the max expression value over all cell types.Data were processed and visualized using Python 3.7.1 and Seaborn 0.9.0.Results are illustrated in eFigure 1.
As the C10orf71 had not previously been associated with cardiovascular phenotypes, we obtained tissue specific expression based on normalized consensus RNA-sequencing data from the Human Protein Atlas (Human Protein Atlas: www.proteinatlas.org) 15 and GTEx (www.gtexportal.org).The tissue specific RNA-expression of C10orf71 was visualized using R. Results are illustrated in eFigure 2.

Risk of heart failure and cardiomyopathy
We assessed hazard ratios for incident AF, HF, and cardiomyopathy as separate outcomes.To ascertain temporal trends in incident disease, we considered each individual outcome and all-cause mortality as competing events.The models were adjusted for sex, age, BMI at inclusion, and hypertension, and IHD at inclusion.We considered P<0.0056 as statistically significant (3 genetic exposures x 3 independent outcomes).
Among individuals diagnosed with AF during follow-up, we assessed the hazard ratios for incident HF and cardiomyopathy based on carrier status of a rare pLOF variant.Individuals who developed HF or cardiomyopathy before AF were excluded.Hazard ratios were estimated for HF and cardiomyopathy as separate, competing events.

eAppendix eTable 1 . 2 . 3 . 4 . 5 .eTable 6 .eTable 7 .eTable 8 . 9 . 1 . 2 . 3 . 4 . 5 .eFigure 6 . 9 .
Odds ratio for AF according to PRS and pLOF variants eTable Odds ratio for AF according to PRS and pLOF variants (unrelated individuals) eTable Variant carriers in study cohort for incident AF, HF and cardiomyopathy eTable Hazard ratios ratio for incident AF according to genetic risk and clinical risk factors eTable Cumulative incidence of AF by age 80 Cumulative incidence of AF by age 70 Cumulative incidence of AF by age 60 Cumulative incidence of AF by age 80 (unrelated individuals) eTable Cumulative incidence of AF by age 80 (excluding TTN pLOF variants) eFigure Flowchart of study design eFigure Manhattan plot of gene-based test for rare pLOF variants eFigure Quantile-Quantile plot of gene-based test for rare pLOF variants eFigure Cardiac expression of AF associated genes eFigure Tissue-specific RNA expression of C10orf71 Results from gene-based association test with AF in independent replication cohort eFigure 7. Ten-year risk of AF (unrelated individuals) eFigure 8. Ten-year risk of AF (excluding TTN pLOF variants) eFigure Forest plot of hazard ratios for AF, cardiomyopathy, and HF (unrelaed individuals) 3 , and classified splice-site variants with SpliceAI score >0.8 as pLoF.AF was defined by the International Classification of Diseases, 10 th revision (ICD-10) code I48, corresponding to UK Biobank data field 131351.The AF diagnosis in the UK Biobank was based on hospital records, death records and primary care records.Individuals without an AF diagnosis were used as controls.Individuals with uncertain AF diagnosis (i.e.individuals with AF diagnosis based only on self-reports or individuals diagnosed with atrial flutter [ICD-10 code I48.3 and I48.4]) were assigned to the control group.Heart failure was defined by ICD-10 code I50 (UK Biobank data field 131354) and cardiomyopathy by ICD10-code I42 (UK Biobank data field 131338).Ischemic heart disease was defined by ICD-10 codes I20, I21, I22, I24, and I25 (UK Biobank data fields 131296, 131298, 131300, 131304, and 131306).Hypertension was defined by ICD-10 code I10 (UK Biobank data field 131286) diagnosed at time of inclusion.

MaxQuant v1. 5 . 3 .
19. ProteinsGroups.txtdata were further processed and visualized using Python 3.7.1 and Seaborn 0.9.0.Reverse identifications, potential contaminants as well as proteins only identified by site were removed and LFQ protein intensities were extracted.One sample from the left atrium (H117-LA) showed a low number of protein identifications and a significantly lower overall protein intensity distribution and was thus removed from further analyses.Median protein intensity-based absolute quantification (iBAQ) values over all samples per atrium were calculated for each protein and visualized by means of a rank plot.KDM5B was not identified in the data set.Moreover, protein iBAQ values of TTN, RPL3L, PKP2, CTNNA3, and C10orf71 of each biological replicate were extracted and visualized using a box plot.

eFigure 2 .eFigure 3 .
Manhattan plot of gene-based test for rare pLOF variants X-axis denotes chromosomal position of the gene.Y-axis denotes -log10 of the P-value for the genetic associations with AF.Significant genes are labeled and colored in red.Quantile-Quantile plot of gene-based test for rare pLOF variants X-axis denotes expected -log10 P-value, while Y-axis denotes the observed -log10 P-values.The lambda value (λ) indicates a measure of genomic inflation in the dataset.

eFigure 4 .
Cardiac expression of AF associated genes Suppl.Figure1A) relative abundance of protein products of the AF-associated genes identified in the study.Suppl.Figure1B) relative abundance of protein products in left atria (LA) and right atria (RA) respectively.The product of KDM5B was not identified in the proteomics dataset.Suppl.

Figure
Figure1C) relative RNA expression across cell types, based on single-cell RNA expression data.
Odds ratio for AF according to PRS and pLOF variants Odds ratio for AF according to PRS and pLOF variants (unrelated individuals) Variant carriers in study cohort for incident AF, HF and cardiomyopathy Carriers of rare pLOF variants in main study cohort after exclusion of individuals with prevalent AF, HF or cardiomyopathy.Fifteen individuals carried rare variants in two different genes.Variant carriers total denotes number of individuals with at least one pLOF variant.Hazard ratios ratio for incident AF according to genetic risk and clinical risk factors Cumulative incidence of AF by age 80 Cumulative incidence of AF by age 80 (unrelated individuals) Cumulative incidence of AF by age 80 (excluding TTN pLOF variants) The models were adjusted for sex, age at AF diagnosis, BMI, and hypertension or IHD at time of AF diagnosis.periodandstartedfollow-up30daysafterAFdiagnosis."©2024VadOBetal.JAMA Cardiology.eTable 1. CI, Confidence interval, pLOF variant, predicted loss-of-function variant in atrial fibrillation associated gene; PRS, Polygenic risk score for atrial fibrillation.©2024VadOBetal.JAMA Cardiology.eTable 2. CI, Confidence interval, pLOF variant, predicted loss-of-function variant in atrial fibrillation associated gene; PRS, Polygenic risk score for atrial fibrillation.eTable3.©2024VadOBet al.JAMA Cardiology.eTable 4. BMI, Body-mass index, CI, Confidence interval, pLOF variant, predicted loss-of-function variant in atrial fibrillation associated gene; PRS, Polygenic risk score for atrial fibrillation.©2024VadOBet al.JAMA Cardiology.eTable 5. © 2024 Vad OB et al.JAMA Cardiology.eTable 8. CI, Confidence interval, pLOF variant, predicted loss-of-function variant in atrial fibrillation associated gene; PRS, Polygenic risk score for atrial fibrillation.©2024Vad OB et al.JAMA Cardiology.eTable 9. CI, Confidence interval, pLOF variant, predicted loss-of-function variant in atrial fibrillation associated gene; PRS, Polygenic risk score for atrial fibrillation.eFigure1. Flowchart of study design AF, atrial fibrillation, CM, cardiomyopathy, HF, heart failure, QC, quality control, WES, wholeexome sequencing.