Generalizability of Polygenic Risk Scores for Breast Cancer Among Women With European, African, and Latinx Ancestry

Key Points Question How do previously developed breast cancer polygenic risk scores (PRSs) perform in a clinical setting for women of different ancestries? Findings In this multicenter cohort study linking electronic medical records to genotyping data that including 39 591 women, PRSs were significantly associated with breast cancer risk in women of all ancestries, although the effect sizes were smaller in women with African ancestry. Meaning Previously developed PRS models for breast cancer risk performed well for women with European and Latinx ancestries in different clinical settings; these results suggest that larger studies are needed to develop and validate PRSs for women with African ancestry.

We restrict our analysis within women with age > 18. We defined a woman as a breast cancer case if there was at least one occurrence of the female breast cancer diagnostic codes (eTable 1) or at least two occurrences (from distinct calendar days) of breast cancer history codes (eTable 2) if she had no breast cancer diagnostic codes.
Controls were defined having none of the female breast cancer diagnostic or history codes (eTables 1 and eTable 2). We further excluded participants without EMR data available. The breast cancer phenotyping algorithm has been validated by a chart review and a 95% positive predictive value for cases and negative predictive value for controls were achieved (https://phekb.org/phenotype/breast-cancer). Breast cancer cases included women with stage 0-IV breast cancer, which includes women with ductal carcinoma in-situ (stage 0). Women with stage 0 breast cancer often require definitive treatment with complete surgical resection, radiation therapy, and adjuvant hormonal therapy, and are therefore included as cases. Women with a diagnosis of benign breast disease, including those with lobular carcinoma in situ were excluded from the case definition.
For all sites e xcept for Mount Sini, we further classified breast cancer cases into subtypes by tumor ER status, based on tumor registry data or information extracted from a breast pathology report following a biopsy or surgical resection. If the breast pathology/tumor registry information was unavailable or if the ER or progesterone receptor (PR) status was missing or unknown, we queried for hormonal therapies on the medication list (eTable 3) after breast cancer diagnosis. We defined a breast cancer case as ER-positive if at there was at least one medication listed.
If the breast pathology/tumor registry data was unavailable (or ER/PR status was missing), and no hormonal therapies were identified, then we classified the breast cancer case as subtype unknown. We also extracted breast cancer family history information from tumor registry or based on ICD9/10 codes 'V16.3' or 'Z80.3'.

PRS Models
The first two PRS models evaluated in this study (BCAC-S and BCAC-L) were developed by the Breast Cancer Association Consortium (BCAC) based on approximately 90,000 breast cancer cases and 75,000 control women of EA 1 . BCAC-S, which includes 313 variants, was developed using a hard-thresholding approach, and BCAC-L, which includes 3820 variants, was developed using a lasso regression-based approach 1   We also included two PRS models developed in AA women, WHI-AA 4 and ROOT 5 . WHI-AA includes 75 variants and is based on a validation set of ~ 4,000 AA women enrolled in the WHI (the OR column in the Supplementary S-H-ERP and BCAC-S-H-ERN were hybrid models consisting of the same 313 variants where the optimum effect size was obtained when a subset of variants in the base model were given subtype-specific weights if the breast cancer subtype association is significant, while the remaining variants were given the effect size for overall BC. Subtype specified PRS models (BCAC-L-ERP and BCAC-L-ERN) were also constructed for the BCAC-L model, which included 5,218 variants.

PRS Calculation
To estimate each PRS, we excluded ambiguous variants (i.e. C/G and A/T), variants with allele mismatches even after strand flipping, and variants with more than 3+allele from each PRS model. We used PLINK 1.9 6 to calculate each PRS as a weighted sum using the --score function.

Estimation of absolute risk of BC
To explore the potential clinical utility of PRS models that were significantly associated with breast cancer, individulas were group into tertiles of PRS distribution. We estimated cumulative risk of breast cancer for high PRS risk (top tertile), moderate PRS risk (middle tertile) and low risk (bottom tertile) individuals in each ancestry using iCARE 7 . We chose the PRS that was the most strongly associated with breast cancer within each ancestry (UKBB in EA and AA women, and BCAC-L model in LA) to estimate absolute risks. Ethnicity, race, and age-specific breast cancer rates in the US were from the Surveillance, Epidemiology, and End Results Program (SEER) 8 .
Ethnicity, race, and age-specific mortality rates in the US were extracted from Centers for Disease Control and Prevention's WONDER online database (http://wonder.cdc.gov/ucd-icd10.html) as competing risks.