The Potential of Genetics in Identifying Women at Lower Risk of Breast Cancer

Key Points Question Can genetic information identify women for whom it is safe to delay mammogram screening? Findings In this case-control study of 25 591 women, 2338 (9.1%) were classified as having low genetic risk for breast cancer; these women exhibited significantly later onset of breast cancer compared with average-risk or high-risk counterparts, indicating a potential to defer mammogram screening by 5 to 10 years. Meaning Delaying the age to start mammogram screenings for women at low genetic risk could optimize health care resource allocation.


Participants
This study was based on the Healthy Nevada Project.The Healthy Nevada Project study was reviewed and approved by the University of Nevada, Reno Institutional Review Board (IRB, project 956068-12), and all participants provided informed consent.The initial dataset comprised 39,546 individuals.For this study, we only included participants who were inferred to be of female sex based on the genetic data and had longitudinal data length >0 in the electronic health records at Renown Health.

Clinical phenotypes from EHR
All participants included in this study had Electronic Health Records available from Renown Health.Phenotypes were processed from Epic/Clarity EHR data.This study is centered on breast cancer, and we did not look at ovarian cancer despite the known impact of pathogenic variants in BRCA1 and BRCA2 on both breast and ovarian cancers.We also did not look at breast cancer subtypes (ER+ / ER-, PR+ / PR-or triple negative), or any other types of cancer.
Exome+® sequencing DNA was extracted from saliva.All samples were sequenced at Helix using the Exome+® assay.The Exome+® assay targets the exome as well as about 300,000 common SNPs outside of the exome that allow for imputation of all common SNPs genome-wide.In turn, this allows GWAS analysis and polygenic risk score calculation.Data was processed using a custom version of Sentieon and aligned to GRCh38, with variant calling and phasing algorithms following GATK best practices.Imputation of common variants in the HNP data was performed by pre-phasing samples and then imputing.

Variant annotation and classification
Variant annotation was performed with Ensembl Variant Effect Predictor-99 16 and LOFTEE 17 .
We first selected a set of five genes: BRCA1, BRCA2, PALB2, ATM and CHEK2 for our study.
These are the genes with the strongest association with breast cancer risk 3,8,19,20 .Genes such as BARD1, RAD51C or RAD51D have also been associated with breast cancer and have also been used in studies developing models to predict breast cancer risk in the population 5 .Other genes such as TP53 or CDH1 are involved in cancer syndromes, and protein-truncating variants in these genes predispose to cancer, including breast cancer, with a high penetrance 21,22 .
Lastly, protein truncating variants in MAP3K1 7 have recently been associated with breast cancer in the largest exome-wide study of breast cancer to date.This is why we included these six genes (BARD1, CDH1, MAP3K1, RAD51C, RAD51D and TP53) in a supplementary analysis.The list of genes could be expanded further and more studies will be necessary to identify the optimal set of genes to analyze when the goal is to identify women at lower risk of breast cancer.
We classified variants based on the decision tree shown in Figure S1.All of the variants classified as 'Pathogenic' are listed in Table S2.This includes information on the number of carriers for each variant, as well as the mean sequencing depth for each variant.All variants classified as a variant of uncertain significance (VUS) are reported in Table S4.
This method led to 2,362 women (9.2%) with a VUS variant in one of these genes.We took the conservative approach to exclude these individuals from the low-risk category.Future studies will be necessary to define the best way to classify rare variants that lead to exclusion of a carrier from the low-risk group.The name 'VUS' may not be appropriate either.

CNV annotation and classification
CNVs that passed quality control thresholds were annotated with overlapping MANE transcripts.
For this study, Copy number of 1 (instead of 2) or 0 were considered pathogenic.We did not consider CNVs for VUS.Specifically, we did not consider a copy number of 3 or more as a VUS.

Polygenic risk score calculation
We used the 313-SNPs PRS model published in 2019 10 .Full details on how we implemented this score and its performance in the Healthy Nevada Project cohort can be found in our paper regarding higher risk in women with a pathogenic variant in ATM and CHEK2 and a high polygenic risk 8 .We reported that this PRS model achieved an AUC of 0.63 in participants in the Europe genetic similarity group and an AUC of 0.66 in the Americas genetic similarity group 8 , which was consistent with the original publication of this PRS 10 .
In a nutshell, we first converted the model from GRCh37 to GRCh38.We then calculated the score in each of the 25,591 women included in this study.Our cohort is diverse and the distribution across genetic similarity (inferred from genetic data) was the following: N Africa = 499; N Americas = 3,728; N East Asia = 832; N Europe = 19,484; N Other = 929 and N South Asia = 119.There are known challenges in implementing one model across a diverse cohort, which is why we took the following steps.First, a genotype dosage was calculated for each variant in the score for each individual.The dosage was based on the genotype probability field resulting from the imputation pipeline.When an individual had no GP or no GT (genotype) for a specific variant, the dosage was based on the Allele Frequency of this variant in gnomAD v3 for the population closest to the genetic similarity of the participant.We then split the cohort into 6 cohorts based on genetic similarity and ranked individuals based on their PRS value.The distribution of the PRS values was slightly shifted for different groups.The median PRS was lowest in the Europe population, and highest in the East Asia population.To reduce biases, we © 2023 American Medical Association.All rights reserved.therefore assigned a percentile based on the ranking within the participant's genetic similarity distribution.Lastly, we regrouped all six genetic similarity groups into one cohort for later analyses that were based on percentiles.

Survival analysis, hazard ratios and statistical tests
Kaplan Meier survival curves were done using the KaplanMeierFitter function from the Lifelines python library.Statistical differences between survival curves were assessed using a logrank_test function from the lifelines.statisticspython library.
Values at a given age (e.g. 45 years old, 5 years after the recommended age to start screening) were calculated using the 'predict' function.
Hazard Ratios were calculated using the CoxPHFitter function from the lifelines python library.Plots were made using pyplot from the matplotlib python library.Green shades indicate lower quantiles (lower polygenic risk).Gray curve represents the average risk for women with a PRS between 41% and 60% of the PRS distribution.(B) On the Y-axis: Cox proportional hazard ratios and their 95% Confidence intervals calculated for each PRS group compared to the reference group which is the 41%-60% group.(C) Kaplan Meier curves showing the % of women with a breast cancer diagnosis by age based on their polygenic risk.Green curve represents women with a low polygenic risk (the combined groups: <1%, 1%-© 2023 American Medical Association.All rights reserved.5% and 6%-10%).Gray curve represents the average risk for women with a PRS between 41% and 60% of the PRS distribution.95% Confidence intervals are represented in lighter shades.

Supplement 1 . 1 . 2 . 3 . 4 .Supplement 2 . eTable 1 . 2 . 3 . 4 . 5 . 6 .
eMethods.Supplementary Methods eFigure Classification of Pathogenic Variants and Variants of Uncertain Significance eFigure Impact of Pathogenic Variants and VUSs on Breast Cancer Diagnosis eFigure Impact of Polygenic Risk on Breast Cancer Diagnosis eFigure Impact of Low Genetic Risk Based on 11 Genes and a 313-SNPs PRS on Breast Cancer Diagnosis Description of the Healthy Nevada Project eTable List of All Pathogenic Variants in 11 Breast Cancer Genes eTable List of All Pathogenic CNVs in 11 Breast Cancer Genes eTable List of all VUSs in 11 Breast Cancer Genes eTable Association of the 313-SNPs Breast Cancer PRS and Breast Cancer in the Healthy Nevada Project Cohort eTable Impact of Delay of Screening for Those in the Low-Risk Group Defined Using 11 Genes and a PRS eReferences.Supplemental ReferencesThis supplemental material has been provided by the authors to give readers additional information about their work.

eFigure 2 :
Impact of pathogenic variants and VUSs on breast cancer diagnosis(A) Analysis for the 5 genes: BRCA1, BRCA2, PALB2, ATM and CHEK2.Left panel: Kaplan Meier curves showing the % of women with a breast cancer diagnosis by age based on whether they have: a pathogenic variant in one of the 5 genes (dark blue curve), a VUS in one of the 5 genes (light blue curve), or no P or VUS variant in these 5 genes (gray curve).95% Confidence intervals are represented in lighter shades.Right panel: Cox proportional hazard ratios and their © 2023 American Medical Association.All rights reserved.95% Confidence intervals; Y-axis is on a log scale.Reference group is the group without a P or VUS variant in the 5 genes.(B) Analysis for the 11 genes: BRCA1, BRCA2, PALB2, ATM, CHEK2, BARD1, CDH1, MAP3K1, RAD51C, RAD51D, and TP53.Left panel: Kaplan Meier curves showing the % of women with a breast cancer diagnosis by age based on whether they have: a pathogenic variant in one of the 11 genes (dark blue curve), a VUS in one of the 11 genes (light blue curve), or no P or VUS variant in these 11 genes (gray curve).95% Confidence intervals are represented in lighter shades.Right panel: Cox proportional hazard ratios and their 95% Confidence intervals; Y-axis is on a log scale.Reference group is the group without a P or VUS variant in the 11 genes.© 2023 American Medical Association.All rights reserved.

eFigure 3 :
Impact of polygenic risk on breast cancer diagnosis(A) Kaplan Meier curves showing the % of women with a breast cancer diagnosis by age based on their polygenic risk quantile.Blue shades indicate higher quantiles (higher polygenic risk).

eFigure 4 :
Impact of low genetic risk based on 11 genes and a 313-SNPs PRS on breast cancer diagnosis (A) Schema detailing how genetic risk stratification was done.(B) Kaplan Meier curves showing the % of women with a breast cancer diagnosis by age based on their genetic risk group: average risk (gray curve), low risk (green curve).95% confidence intervals are represented in light gray and light green.The KaplanMeierFitter function (to draw the curves) and the CoxPHFitter function (to calculate the Hazard Ratio) were used and were from the Lifelines python library.Supplementary Tables eTable 1: Description of the Healthy Nevada Project eTable 2: List of all Pathogenic variants in 11 breast cancer genes eTable 3: List of all Pathogenic CNVs in 11 breast cancer genes eTable 4: List of all VUSs in 11 breast cancer genes eTable 5: Association of the 313-SNPs breast cancer PRS and breast cancer in the Healthy Nevada Project cohort eTable 6: Impact of delay of screening for those in the low-risk group defined using 11 genes and a PRS