Pathology of Tumors Associated With Pathogenic Germline Variants in 9 Breast Cancer Susceptibility Genes

Key Points Question What breast tumor characteristics are associated with rare pathogenic protein truncating or missense variants in breast cancer susceptibility genes? Findings In this case-control study involving 46 387 control participants and 42 680 women with a diagnosis of breast cancer, pathology features (eg, tumor subtype, morphology, size, TNM stage, and lymph node involvement) associated with rare germline (likely) pathogenic variants in 9 different breast cancer susceptibility genes were studied. Substantial differences in tumor subtype distribution by gene were found. Meaning The results of this study suggest that tumor subtypes differ by gene; these findings can potentially inform guidelines for gene panel testing, risk prediction in unaffected individuals, variant classification, and understanding of breast cancer etiology.

eMethods. eTable 1. Description of studies included in this analysis eTable 2. Immunohistochemistry and tumor grade-based surrogates for five intrinsic breast cancer subtypes eTable 3. Numbers of cases and controls, and age distributions, by country of origin eTable 4. Numbers of variant carriers by breast cancer susceptibility gene eTable 5. Cross tabulation of ER, PR, HER2 and Grade data eTable 6. Distribution of intrinsic tumor subtypes in women of all ages and in different age groups, by breast cancer susceptibility gene. eTable 7. Prevalence of PTV and MSV in breast cancer susceptibility genes by intrinsic subtypes of breast cancer among women of different age groups at diagnosis eTable 8. Odds ratios for association between PTV and MSV carrier status and intrinsic subtypes refined by PR expression eTable 9. Odds ratios for association between PTV and MSV carrier status and intrinsic subtypes of breast cancer following imputation using an EM algorithm. eFigure 1. Case-only analysis of phenotypic markers and prognostic features by gene (complete case analysis) eFigure 2. Frequency histogram of intrinsic subtypes among noncarriers and carriers of PTVs and MSVs in 9 genes. eFigure 3. Frequency distribution of intrinsic subtypes among noncarriers and carriers of PTVs and MSVs in the 9 genes, in women aged ≤40 years. eFigure 4. Frequency distribution of intrinsic subtypes among noncarriers and carriers of PTVs and MSVs in the 9 genes, in women aged 41-60 years. eFigure 5. Frequency distribution of intrinsic subtypes among noncarriers and carriers of PTVs and MSVs in the 9 genes, in women aged >60 years. eFigure 6. Association odds ratios for MSV carrier status in BRCA1, BRCA2, TP53 and intrinsic subtypes of breast cancer.

Studies and inclusion criteria
The BRIDGES study included samples from female breast cancer cases and unaffected controls, as described in Dorling et al. 1 and eTable 1. The analyses presented here are based on data from the subset of cases from population or hospital-based studies and controls that were sampled independently of family history (38 contributing studies). Only women aged between 18 and 79 years with no missing information on age were included.
Studies sampled controls from among women in the same population such that the age distribution was similar to that of the cases, without individual matching. Analyses were presented in terms of odds ratios (ORs). In the computation of cumulative risks were assumed to approximate incidence rate ratios: this is an approximation because density-based sampling was not used; however, the difference is slight because study recruitment was over a short period of time and the probability of a potential control becoming a case was small (the rare disease assumption).
Ethnicity was defined genetically using principal components analysis from the array genotype data where this was available, otherwise by self-report. For Malaysia and Singapore, we excluded admixed individuals, defined as not reaching a 50% threshold for a single ancestry (Chinese, Malay or Indian) based on genotyping. We also excluded individuals who were from a minority ancestry for that study (that is, non-east Asian individuals from the 4 Asian studies and non-European individuals from the European studies). Five countries were removed from imputation and subsequent regressions: France (missing Grade), Thailand, Belarus, and Canada (missing HER2 status), Cyprus (missing tumor size).

Tumor Pathology Data
Pathology information was based on histology and immunohistochemistry results from medical records, rescored whole slides or tumor microarrays, curated in BCAC database v12. Data obtained from individual study centers were centrally harmonized and checked according to a standard data dictionary. ER, PR and HER2 status was obtained mostly from medical records followed by immunohistochemistry performed on tumor tissue microarrays or whole-section tumor slides 2,3 . The cut-off was 10% for ER and PR for most studies; some USA based studies used a 1% cut-off. For HER2 scored by immunohistochemistry, in the majority of studies 0-2+ were categorized as negative and 3+ as positive in most studies. Some studies used FISH/CISH or SISH to confirm HER2 status. Most studies used the Bloom and Richardson (SBR) system for grading tumors. The variable 'Stage' was collated by studies individually but largely reflects TNM Staging. The European TNM staging (https://www.uicc.org/resources/tnm), which is very similar to the AJCC TNM staging, was used,. Some studies from the USA that used SEER staging, these were recoded as far as possible to TNM staging.
Patterns of missing in the pathology data are shown in eTables 5 and 6; pathology was more likely to be missing among younger women, but there was no correlation between missingness and genotype.

Laboratory Methods, Variant calling and classification
Details of library preparation and sequencing procedures are described in Dorling et al. 1 Library preparation was conducted using the Fluidigm Juno 192.24 system. Amplified products were combined into barcoded libraries of 768 samples, which were run on a single lane of an Illumina Hiseq4000. Samples were demultiplexed and then aligned to the reference genome (hg19) using BWA-MEM 4 . Each sample was sequenced to an average depth of 349 reads, in the target region. Depth, along with base quality, was used as part of the secondary quality control filtering. Variant calling was performed using VarDict 5 ; further details of variant calling, filtering and quality control are given in Dorling et al. 1 Variants were defined as PTVs if they were frameshifting insertions/deletions, stop-gain single nucleotide variants or canonical splice variants, with the exception of variants in the last exon of each gene and some canonical splice variants that may not be protein truncating. We also analyzed rare missense variants in BRCA1, BRCA2 and TP53 classified as pathogenic according to clinical guidelines. For BRCA1 and BRCA2 we considered variants classified as (likely) pathogenic using the ENIGMA BRCA1/2 expert panel guidelines (https://enigmaconsortium.org/), or by clinical testing laboratory submitters to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) which largely employ adaptations of the American College of Medical Genetics (ACMG) guidelines 5 . For TP53, we considered a definition of (likely) pathogenic, based on ACMG guidelines 6 , augmented by variants classified as (likely) pathogenic based on a published quantitative model for TP53 missense variant classification that utilizes a combination of bioinformatic prediction and the reported somatic:germline ratio for a given variant 1,7 .
Five women carrying PTVs in both BRCA1 and BRCA2 and 11 women carrying PTVs in more than one 'nonBRCA1/2' gene were excluded from these analyses. Women harbouring mutations in BRCA1 or BRCA2 plus a non-BRCA gene were included in the BRCA1 and BRCA2 analysis respectively, consistent with Dorling et al. 1 As numbers of such double mutations are very small compared with total numbers of BRCA1 and BRCA2 carriers, there was a trivial difference in the results when in sensitivity analyses these women excluded (data not shown). Specifically, 8 women carrying a BRCA1 PTV and a PTV in a nonBRCA1/2 gene and one woman carrying a BRCA1 PTV and TP53 MSV were included only in the BRCA1 PTV analysis; 18 women carrying a BRCA2 PTV and a PTV in a nonBRCA1/2 gene and one woman carrying a BRCA2 PTV and a BRCA2 MSV were included only in the BRCA2 PTV analysis. There was little difference in the results in sensitivity analyses of the association between intrinsic subtypes and BRCA1 or BRCA2 mutation status that excluded these women.

Imputation using MICE and an EM-algorithm
To evaluate heterogeneity of risk by intrinsic tumor subtypes, we used Multiple Imputation by Chained Equations (MICE) to impute missing pathology variables. ER, PR, HER2, grade, tumor size, lymph node involvement, country, age and the presence or absence of PTV or MSV in the BC genes were used to inform imputations. Missing data patterns and diagnostics for multiple imputation were inspected (data not shown). Intrinsic subtypes were constructed for each of 100 imputed datasets and results of multinomial regression for each imputed dataset pooled.
For some analyses, we also used a polytomous regression approach (TOP) which iteratively imputes pathology characteristics using an EM algorithm and has improved power for identifying heterogenous associations between risk loci and tumor subtypes. 8 When implementing TOP, we imputed only ER, PR, HER2 and Grade. Countries with missing information for >10 individuals for two or more tumor markers were excluded from the analyses.
MICE is the most widely used imputation method and provided the flexibility required to conduct all the analyses. The EM approach should converge to the maximum likelihood estimate, whereas the MICE approach relies on random resampling. As MICE is a well-established method with robust properties, and the results were very consistent where both methods were used, we used MICE as the standard approach.
Estimating Odds ratios for association between PTV/MSV carrier status and intrinsic subtypes Multinomial logistic regression was used to estimate the odds ratios (ORs) associated with carrying any PTV (or pathogenic MSV) in each gene. Age interactions were evaluated by fitting an age x gene interaction term in the model. Subtype-specific age-interaction terms were meta-analyzed and Wald test p-values for the combined interaction ORs calculated.

Calculation of cumulative risk of developing BC subtypes
Cumulative risks for each subtype were calculated by combining age-specific ORs estimates with UK population incidence rates (2016) as baseline (https://www.cancerresearchuk.org/health-professional/cancerstatistics/statistics-by-cancer-type/breast-cancer/incidence-invasive), accounting for competing risk of not developing BC of a different subtype 9 . For these computations, the ORs were assumed to approximate the incidence rate ratios (i.e. the rare disease assumption). PTVs in ATM, BARD1, BRCA1, BRCA2, CHEK2, PALB2, RAD51C and RAD51D were included in the absolute risk model. Age-specific ORs were derived by assuming a linear trend in the log(OR) with age for all subtypes apart from ATM, BARD1, RAD51C, and RAD51D. For BRCA1, BRCA2, and PALB2 a model assuming the same age-trend in each subtype was assumed. For CHEK2 triple-negative disease no age interaction was assumed, while for all other subtypes the model assumed the same age-trend in each subtype. Where the same age-trend was assumed, the effect size based on a (fixed-effect) meta-analysis of these subtype-specific age-interaction estimates was used. The interaction effect size was included in multinomial logistic regression as an offset term to obtain the corresponding main effects coefficients.

Age-and gene-specific subtype proportions for BOADICEA
For analyses carried out for inclusion of tumor subtypes in the BOADICEA risk prediction algorithm 10 , the three subtypes currently considered: i) ER-positive ii) triple-negative, and iii) ER-negative but not triplenegative, were used. Age-and gene-specific subtype proportions for each tumor subtype in BOADICEA (eTable 14) were calculated by first estimating ORs for PTV carriers and the respective age-interactions for each subtype as described above. These estimates are relative to non-carriers of deleterious variants of any of the genes. Therefore, the corresponding relevant baseline subtype proportions were the proportions in non-carriers. For this, we used the non-carrier proportions in European cases in the BRIDGES analysis, to allow for possible differences in subtype proportions by ethnicity (the OR estimates were, however, from the whole dataset as there is no evidence for differences in effect size by population).
Subtype proportions were first computed in 5-year intervals, and then smoothed using Lowess, with a bandwidth of 0.2, for ER-positive, triple-negative and ER-negative non-triple-negative separately. These estimates were then further smoothed to annual proportions by assuming a linear change in proportion between the midpoint of each interval.
The proportions in each subtype were finally derived using the formula: Where ( ) is the proportion of cases at time t in subtype s, ( ) is the incidence of subtype s for gene g at time t and ( ) is the relative risk (OR) at time t, relative to gene category 0 (i.e. non-carriers). PTV, protein truncating variants; MSV, missense variants; HR, hormone receptor; HER2, human epidermal growth factor receptor 2; TN, triple-negative; OR, Odds Ratio; CI, Confidence MICE Imputation was carried out as described in the Methods and intrinsic subtypes constructed for each imputed data-set. The histogram represents the average proportion (over all 100 imputations). For some gene, subtype and age combinations data are limited, and therefore frequency is imprecise. These results are also shown in eFigures 7-9; numbers underlying the proportions are shown in eTable10