There were 4619 women with estrogen receptor (ER)–positive and 924 with ER-negative cancer; 5769 with ductal and 956 with lobular cancer; 2419 with grade 1 or 2 cancer and 791 with grade 3 cancer; 1217 with in situ cancer and 8735 with invasive disease; 284 with ERBB2-positive cancer and 10 022 with negative or unknown ERBB2 status. There were 440 women with bilateral breast cancer and 9866 with unilateral breast cancer. For other tumor characteristics, wherever possible the first cancer diagnosed in women with bilateral disease was used. CI indicates confidence interval; SNP, single-nucleotide polymorphism. The area of each square is inversely proportional to the variance of the logarithm of the OR and hence proportional to the amount of statistical information available for that particular estimate. The dashed line indicates the overall OR for that SNP and is shown if the OR is significant at P<.05. All P values are for interaction.
Information on the characteristics of the studies are in eTable 2. CI indicates confidence interval; ER, estrogen receptor; SNP, single-nucleotide polymorphism. The area of the square is inversely proportional to the variance of the logarithm of the OR and hence proportional to the amount of statistical information available for that particular estimate. The dashed line indicates the overall OR for that SNP.
Odds ratios are plotted against the mean polygenic risk score in each fifth of the 9113 controls (≈1823 in each). The polygenic risk score is described in the “Methods” section. Among controls, the mean (SE) risk score is 0 (0.24). The mean (range) of the polygenic risk score within each successive quintile is −0.33 (−0.69 to −0.20), −0.13 (−0.20 to −0.07), 0 (−0.07 to 0.06), 0.13 (0.06 to 0.21), and 0.35 (0.21 to 0.98). For stability the 3 central ER− groups were combined. FCI indicates floated confidence interval (based on the standard error of the log risk); ER, estrogen receptor; SNP, single-nucleotide polymorphism.
ER indicates estrogen receptor; SNP, single-nucleotide polymorphism. Error bars show 95% confidence intervals.
Reeves GK, Travis RC, Green J, Bull D, Tipper S, Baker K, Beral V, Peto R, Bell J, Zelenika D, Lathrop M, Million Women Study Collaborators FT. Incidence of Breast Cancer and Its Subtypes in Relation to Individual and Multiple Low-Penetrance Genetic Susceptibility Loci. JAMA. 2010;304(4):426-434. doi:10.1001/jama.2010.1042
Author Affiliations: Cancer Epidemiology Unit, University of Oxford, Oxford, United Kingdom (Drs Reeves, Travis, Green, and Beral and Mss Bull, Tipper, and Baker); Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), University of Oxford (Mr Peto); Department of the Regius Professor of Medicine, University of Oxford (Dr Bell); Centre National de Génotypage 1G, Commissariat á l’énergie Atomique; and Fondation Jean Dausset-Centre d’Etude de Polymorphisme Humain, Paris, France (Drs Zelenika and Lathrop).
Context There is limited evidence on how the risk of breast cancer and its subtypes depend on low-penetrance susceptibility loci, individually or in combination.
Objective To analyze breast cancer risk, overall and by tumor subtype, in relation to 14 individual single-nucleotide polymorphisms (SNPs) previously linked to the disease, and in relation to a polygenic risk score.
Design, Setting, and Participants Study of 10 306 women with breast cancer (mean age at diagnosis, 58 years) and 10 393 women without breast cancer who in 2005-2008 provided blood samples for genotyping in a large prospective study of UK women; and meta-analysis of these results and of other published results.
Main Outcome Measures Estimated per-allele odds ratio (OR) for individual SNPs, and cumulative incidence of breast cancer to age 70 years in relation to a polygenic risk score based on the 4, 7, or 10 SNPs most strongly associated with risk.
Results Odds ratios for breast cancer were greatest for FGFR2-rs2981582 and TNRC9-rs3803662 and, for these 2 SNPs, were significantly greater for estrogen receptor (ER)–positive than for ER-negative disease, both in our data and in meta-analyses of all published data (pooled per-allele ORs [95% confidence intervals] for ER-positive vs ER-negative disease: 1.30 [1.26-1.33] vs 1.05 [1.01-1.10] for FGFR2; interaction P < .001; and 1.24 [1.21-1.28] vs 1.12 [1.07-1.17] for TNRC9; interaction P < .001). The next strongest association was for 2q-rs13387042, for which the per-allele OR was significantly greater for bilateral than unilateral disease (1.39 [1.21-1.60] vs 1.15 [1.11-1.20]; interaction P = .008) and for lobular than ductal tumors (1.35 [1.23-1.49] vs 1.10 [1.05-1.15]; interaction P < .001). The estimated cumulative incidence (95% confidence interval) of breast cancer to age 70 years among women in the top and bottom fifths of a polygenic risk score based on 7 SNPs was 8.8% (8.3%-9.4%) and 4.4% (4.2%-4.8%), respectively. For ER-positive disease the corresponding risks were 7.4% (6.9%-8.0%) and 3.4% (3.1%-3.8%), respectively; while for ER-negative disease they were 1.4% (1.2%-1.6%) and 1.0% (0.8%-1.2%). The findings did not differ materially according to the number of SNPs included in the polygenic risk model.
Conclusions The polygenic risk score was substantially more predictive of ER-positive than of ER-negative breast cancer, particularly for absolute risk.
Findings from genome-wide association studies (GWAS),1- 7 together with analyses of specific candidate polymorphisms,8 have identified a number of variants that are definitely or probably associated with breast cancer risk. There is also increasing evidence that some genetic factors have different effects on different subtypes of breast cancer.9 For 14 single-nucleotide polymorphisms (SNPs) previously associated with the disease1- 4,8 we report results from a systematic examination of the relationship with breast cancer incidence, by 6 tumor subtypes, among 10 306 breast cancer cases and 10 393 controls in a large UK prospective study. We also present results of meta-analyses that combine our results with those of published studies. Finally, we fit a log-additive model for polygenic risk.
In 1996-2001, 1.3 million women aged 50 through 64 years, who had been invited for routine breast cancer screening at centers throughout England and Scotland, were recruited into the Million Women Study. Participants provided information about reproductive factors, sociodemographic characteristics, self-assessed race/ethnicity, and other personal characteristics (full details are given elsewhere10 and at http://www.millionwomenstudy.org). All study participants are flagged on the National Health Service (NHS) Central Register for incident cancer yielding data on cancer site (coded according to International Statistical Classification of Diseases and Related Health Problems, 10th Revision11) and morphology (coded according to International Classification of Diseases for Oncology12). In 2005-2008, a genetic susceptibility study was conducted among women with breast cancer and randomly selected controls; blood specimens for genotyping were obtained from about 50% of the cases.13 Information on histological subtype and invasiveness of the cancers comes from the NHS Central Registers and is virtually complete, but information on other tumor characteristics is less complete because it was derived from medical records and other sources, including questionnaire data. In particular, ERBB2 positivity was known definitely for a relatively small number of the tumors, since it has only recently been measured routinely in the United Kingdom. For the purposes of these analyses, therefore, breast cancers were classified as ERBB2 positive or other; although some breast cancers in the “other” group may be ERBB2-positive, the majority are likely to be ERBB2-negative. The term bilateral breast cancer refers to a cancer diagnosed in both breasts either simultaneously or at different times. Approval for the work was granted from the Oxford and East Anglia Multi-center Research and Ethics Committee and the Eastern Multi-center Research and Ethics Committee. Participants gave signed consent for the original study and separately for the genetic susceptibility study.
Genotyping for 14 SNPs (FGFR2-rs2981582 [Entrez Gene 2263], TNRC9-rs3803662 [Entrez Gene 27324], 2q35- rs13387042, MAP3K1-rs889312 [Entrez Gene 4214], 8q24- rs13281615, 2p- rs4666451, 5p12- rs981782, CASP8-rs104548 [Entrez Gene 841], LSP1-3817198 [Entrez Gene 4046], 5q- rs30099, TGFB1-rs1982073 [Entrez Gene 7040], ATM-rs1800054 [Entrez Gene 472], TNRC9-rs8051542, TNRC9-rs12443621) was conducted in 2008-2009 at the Centre National de Génotypage (CNG) in Paris, France, using the Taqman assay (Applied Biosystems, Carlsbad, California). Primers and probes are available upon request from the authors. The 14 SNPs selected for genotyping were chosen on the basis of reviews, meta-analyses, and results from GWAS published before March 2008, when genotyping commenced. Laboratory personnel who conducted the assays had no knowledge of the case/control status of specimens. The overall genotyping success rate was 97% and was at least 96% for each individual variant. The concordance of internal duplicate samples was greater than 99.9% and no significant deviation from Hardy-Weinberg equilibrium was observed among the controls (14 comparisons: 13 with P > .10 and one with P = .04).
For each SNP, logistic regression was used to estimate adjusted per-allele odds ratios (ORs) for breast cancer in relation to genotype. All analyses were routinely adjusted for age at recruitment and for 10 regions, representing regions covered by UK cancer registries. The relationship between each SNP and breast cancer risk was examined in the form of both genotype-specific ORs (heterozygote or homozygote) and of per-allele ORs. For 3 SNPs, the minor (ie, less frequent) allele was associated with a decrease in risk, so to compare variant-associated increases in risk, the results for all SNPs are presented in the form of per-allele ORs for the high-risk, rather than the low-frequency, allele. Differences in the per-allele ORs of specific types of breast cancer were examined using case vs case logistic regression (because the controls contribute no relevant information); where appropriate, the effect of adjustment for other tumor characteristics was also examined. Since there was already substantial prior evidence of an association between many of the SNPs examined and overall breast cancer risk, 95% confidence intervals (CIs) were used and P values were not corrected for multiple comparisons.
We also conducted a meta-analysis of published data according to tumor characteristics for the 3 SNPs most strongly associated with breast cancer risk, based on studies identified through searches of PubMed by January 31, 2010. Articles were eligible for inclusion if they presented data on the association between any of these 3 SNPs and the risk of breast cancer according to any of the tumor characteristics listed in Figure 1. Details of the search strategy used are given in the eMethods. Pooled ORs were estimated by calculating the inverse-variance weighted average of the study-specific log ORs. Tests of heterogeneity between pooled OR estimates for specific breast cancer subtypes were obtained by treating these pooled estimates as if they were statistically independent (ie, ignoring the fact that they had common controls), thereby yielding slightly conservative P values.
To assess the breast cancer risk associated with multiple high-risk loci we fitted a log-additive model for polygenic risk. Under this model, the information from either 4, 7, or 10 SNPs was summarized for each woman by a single numerical polygenic risk score.
where m is the total number of SNPs included in the risk score, ni is the number of high-risk alleles for the ith SNP, ei is the mean number of high-risk alleles for the ith SNP among controls, and bi is the log of the per-allele OR for the ith SNP.
Such a score can be thought of as the sum of the number of high-risk alleles a woman carries, with the counts for each SNP weighted by the magnitude of the associated log per-allele OR and with a common constant subtracted from each woman's score that is chosen to ensure that the mean score among the controls is zero. A positive score is, therefore, by definition above average and a negative score is below average. The range of possible polygenic risk scores depends on the subset of SNPs included in the score and on the per-allele OR for each SNP. The ORs for breast cancer risk were calculated for each of the fifths of the polygenic risk score, based on the distribution of scores among controls. Detailed results are given here for a polygenic model based on the 7 SNPs most strongly associated with breast cancer risk, and results from sensitivity analyses are also presented to assess the effect of including only the 4 most important SNPs and of including as many as 10 SNPs in the polygenic model. Other sensitivity analyses relating to the fit of polygenic risk models used instead per-allele ORs for each SNP estimated from other published data. All analyses were conducted with Stata version 11.0 (Stata Corp, College Station, Texas).
In Figures where ORs are represented by squares and CIs by lines, the area of the square is inversely proportional to the variance of the log OR. Hence the area is directly proportional to the amount of statistical information that underlies that particular OR. When analyses involve risk comparisons across more than 2 categories, variances are estimated by treating the ORs as floating absolute risks.14
According to the International Agency for Research on Cancer, the cumulative incidence of breast cancer to age 70 years in women from developed countries was about 6.3% in 1998-2002.15 Using this figure to represent incidence rates typical of western populations, we estimated cumulative breast cancer risk to age 70 years by genotype, assuming the allele frequencies and the genotypic ORs found here. Estimates of absolute risk were also calculated separately for ER-positive and ER-negative disease, assuming that 80% of breast cancers in women before age 70 years are ER-positive.16
A total of 10 306 women with breast cancer and 10 393 women without the disease were included in these analyses. The mean age at recruitment of cases and controls was 56.6 (SD, 4.7) and 55.9 (SD, 4.6) years, respectively (mean age at diagnosis, 58 years). Information on the characteristics of cases and controls at recruitment into the study, including reproductive history, lifestyle factors, and socioeconomic status (based on deprivation index17) have been published elsewhere for most of the women included in these analyses13 and are shown in eTable 1. As expected, when cases and controls were compared, cases were less likely to be parous, were older when their first child was born, had a higher body mass index, consumed more alcohol, and were more likely to have a first-degree relative with breast cancer.
We confirmed significant associations with overall breast cancer risk for 7 of the 14 SNPs examined (in descending order of per-allele odds ratio: FGFR2-rs2981582, TNRC9-rs3803662, 2q35-rs13387042, MAP3K1-rs889312, 8q24-rs13281615, 2p-rs4666451, and 5p-rs981782; eFigure 1). Each of these 7 SNPs showed highly significant, independent associations with breast cancer risk (each P < .003, most P < .001). In addition, there were no clear correlations between the allele frequencies of these SNPs, nor was there any good evidence of interdependence of their effects on risk (21 pairwise interaction tests: 20 with P > .10; and one with P = .04). The 3 SNPs on TNRC9 are all in the same linkage disequilibrium block and, in line with a previous study,1 we found that when the per-allele ORs for each were adjusted for the other 2, only 1 (rs3803662) remained significant; these other 2 SNPs were, therefore, ignored in subsequent analyses with respect to tumor characteristics.
Odds ratios in relation to the remaining 12 SNPs were estimated according to tumor histology, invasiveness, grade, ER status, ERBB2 status, and whether or not the cancer was bilateral. Results for the 8 SNPs with the largest ORs are shown in Figure 1, and for the remaining 4 SNPs in eFigure 2. The risks in relation to the 3 SNPs most strongly related to overall risk (FGFR2, TNRC9, and 2q35) showed clear differences in association according to at least 1 tumor characteristic (Figure 1).
FGFR2 showed a significantly greater per-allele OR (95% CI) for ER-positive than for ER-negative breast cancer (1.27 [1.21-1.34] vs 1.01 [0.92-1.12]; interaction P < .001), and, perhaps because of this, for grade 1/2 than for grade 3 breast cancer (1.23 [1.15-1.31] vs 1.09 [0.97-1.21]; interaction P = .03). Among 2217 cases with information on both characteristics, grade was strongly related to ER status (97%, 93%, and 63% ER-positive in grade 1, 2, and 3, respectively) and the apparent interaction between grade and FGFR2-associated risk was attenuated and nonsignificant after adjustment for ER status (interaction P = .23).
TNRC9 also showed a greater per-allele OR for ER-positive than for ER-negative cancer (1.24 [1.17-1.31] vs 1.07 [0.96-1.19]; interaction P = .01) and (nonsignificantly) for grade 1/2 than for grade 3 cancer. The effects of FGFR2 and TNRC9 did not vary materially with any of the other tumor characteristics examined, including bilaterality.
In contrast, rs13387042 on chromosome 2q35 showed a significantly greater per-allele OR for bilateral than unilateral disease (1.39 [1.21-1.60] vs 1.15 [1.11-1.20]; interaction P = .008), and for lobular than ductal cancer (1.35 [1.23-1.49] vs 1.10 [1.05-1.15]; interaction P < .001). In case-case comparisons, these differences by laterality and histology both remained significant after adjustment for the other (P = .01 and P < .001, respectively).
For 3 further SNPs, there was at least some evidence of an interaction with at least 1 tumor characteristic. First, for 2p-rs4666451, there appeared to be a greater per-allele OR for in situ than for invasive cancer (1.18 [1.08-1.30] vs 1.06 [1.02-1.11]; interaction P = .02). Second, for CASP8-rs1045485, which did not show a significant association with overall risk, there appeared to be a greater per-allele OR for cancers known to be ERBB2-positive than for all other cancers (1.44 [ 1.08-1.92] vs 1.04 [0.98-1.11]; interaction P = .02). Third, although we did not observe a significant association between ATM-rs1800054 and overall breast cancer risk, there did appear to be a somewhat greater per-allele OR for ER-negative than for ER-positive disease (1.33 [0.93-1.89] and 0.92 [0.74-1.14], respectively; interaction P = .05) (eFigure 2).
Results from analyses of the associations of the SNPs listed in Figure 1 and eFigure 2 according to tumor characteristics were similar when the few women reporting a nonwhite race/ethnicity were excluded (39 cases and 51 controls), and when prevalent breast cancers were excluded (2696 cases).
Three previous studies1,3,4,9,18,19 have examined the associations between FGFR2, TNRC9, or 2q35 and at least 1 of these tumor characteristics. Information on study design and basic characteristics of the study populations for these 3 studies are summarized in eTable 2. Figure 2 presents a meta-analysis of their findings combined with ours.
Three previous studies,3,9,19 in addition to our own, have reported on FGFR2 and breast cancer by ER status, and the combined FGFR2 OR estimates for ER-positive and for ER-negative cancer in Figure 2 are 1.30 (1.26-1.33) and 1.05 (1.01-1.10), respectively (interaction P < .001). Two previous studies3,9 have examined the association between FGFR2 and tumor grade and, taking our results with theirs, the pooled per-allele OR was greater for grade 1 than for grade 3 cancer (1.30 [1.25-1.36] and 1.13 [1.09-1.18], respectively; interaction P < .001).
Two published studies3,9 have examined the association between TNRC9 and breast cancer risk by ER status. Taking our results with theirs, the pooled per-allele OR was again greater for ER-positive than for ER-negative cancer (1.24 [1.21-1.28] and 1.12 [1.07-1.17], respectively; interaction P < .001), and for grade 1 than for grade 3 (1.24 [1.18-1.30] and 1.16 [1.11-1.21], respectively; interaction P < .001).
For 2q35-rs13387042, the combined per-allele OR estimates from our and other available studies3,18 were also somewhat greater for ER-positive than for ER-negative cancer (1.16 [1.13-1.18] and 1.10 [1.06-1.14], respectively; interaction P < .01). The one other study,3 apart from ours, that has reported on the association between 2q35-rs13387042 and risk by histological type found no significant interaction, but when their results were combined with ours, the per-allele OR associated with 2q35-rs13387042 remained significantly greater for lobular than for ductal cancer (1.34 [1.23-1.45] vs 1.13 [1.09-1.17]; interaction P < .001).
For the 7 SNPs that were strongly and significantly associated with overall breast cancer risk (Figure 1) there was no evidence in our data of any substantial correlations between their allele frequencies or of any interactions between their effects on risk, so their joint effects on risk should be adequately described by a simple multiplicative model (see “Methods” section). Figure 3 presents the estimated ORs of breast cancer in groups defined by fifths of this risk score among the 9113 controls with complete information on all 7 SNPs. Compared with women in the bottom fifth for polygenic risk from these 7 SNPs, the OR in the top fifth is about 2.
When the same risk score is applied to ER-positive and ER-negative breast cancer separately (Figure 3), the trend in log odds is significantly steeper for ER-positive than for ER-negative cancer (interaction P < .001) largely because the 2 most important SNPs mainly affect ER-positive disease in our data (Figure 1). No other tumor characteristic materially affected the relevance of this polygenic risk score.
Results from polygenic models based on the 4, 7, and 10 SNPs most strongly associated with risk in our data are compared in the Table. A model based on only the top 4 SNPs (all with per-allele ORs greater than 1.1), yielded essentially similar results to the model based on the top 7 SNPs, because the additional 3 SNPs had relatively small effects. Likewise, inclusion of a further 3 SNPs in the model made little difference to the OR comparing the bottom, middle, and top fifths.
Results from all of these polygenic risk models showed a significantly greater trend in OR by quintile of polygenic risk score for ER-positive than ER-negative breast cancer regardless of the number of SNPs that were included (data not shown).
When these analyses were repeated using a polygenic risk score based on per-allele ORs derived from other studies,1,8,18 the findings for the 4-, 7-, or 10-SNP model (shown in eTable 3) were virtually identical to those described in Figure 3.
The cumulative incidence of breast cancer to age 70 years among women from developed countries, based on age-specific incidence rates in 1998-2002, is typically 6.3% (63 per 1000),15 with corresponding estimates of 5.0% for ER-positive disease and 1.3% for ER-negative disease.16
In our data, and those published by others, the greatest per-allele OR for any SNP associated with breast cancer was of the order of 1.2 (FGFR2 and TNRC9). Based on an OR of this magnitude, and a high-risk allele frequency similar to that for the FGFR2 and TNRC9 SNPs in western populations, the cumulative incidence of breast cancer to age 70 years would be 5.4%, 6.5%, and 7.8%, respectively, in women with 0, 1, or 2 copies of the high-risk allele.
Estimates of cumulative breast cancer incidence to age 70 years by polygenic risk score are shown in Figure 4. The estimated cumulative risk (95% CI) of breast cancer to age 70 years among women from developed countries in the top and bottom fifth of polygenic risk score was 8.8% (8.3%-9.4%) and 4.4% (4.2%-4.8%), respectively. This difference was much greater for ER-positive disease (7.4% [6.9%-8.0%] vs 3.4% [3.1%-3.8%]) than for ER-negative disease (1.4% [1.2%-1.6%] vs 1.0% [0.8%-1.2%]).
In this large study including 10 306 women with breast cancer and 10 393 without the disease, we confirm that some of the more important common genetic variants for breast cancer have different effects on different tumor types. Our findings strengthen the existing evidence that FGFR2-rs2981582 and TNRC9-rs3803662 are predominantly associated with ER-positive breast cancer and provide strong evidence to suggest a greater association of 2q35-rs13387042 with bilateral compared with unilateral breast cancer and with lobular compared with ductal breast cancer.
The findings that FGFR2-rs2981582 and TNRC9-rs3803662 largely affect ER-positive disease, are confirmed by our meta-analysis of all available data. In the case of FGFR2, only 1 of the 4 studies presented in the meta-analysis9 showed any positive association with ER-negative disease, and given the known potential for misclassification of ER status, the pooled OR estimate is consistent with little or no effect of this SNP on ER-negative disease. Although our meta-analysis also showed a significantly greater effect of both FGFR2 and TNRC9 on relatively low-grade tumors, these differences may merely reflect the known correlation between tumor grade and ER status since in our data the association between FGFR2 and grade became attenuated and nonsignificant after adjustment for ER status. The only study to report a significant association between TNRC9 and grade9 also found that this association was no longer significant after adjustment for other characteristics. We found no suggestion in our data of a greater OR for either of these SNPs with bilateral compared with unilateral disease.
It has been suggested, on the basis of the results of one study,3 that the association between 2q35-rs13387042 and breast cancer risk may also be modified by ER status20 but there was no significant interaction between this SNP and ER status in our data and the pooled ORs from 3 studies in our meta-analysis did not differ materially by ER status. There are few published data with which to compare our findings on the association between 2q35-rs13387042 and breast cancer by histology and laterality, although when the results of the only other study to examine this relationship by histology3 were combined with ours, there remained a significantly greater association for lobular than for ductal cancer. The relatively strong evidence in our data for a difference in the effect of this SNP by both histology (P < .001) and laterality (P = .008) suggests that these differences are unlikely to be due to chance.
We found some evidence of an interaction with at least 1 tumor characteristic for a further 3 SNPs (2p-rs4666451, CASP8-rs1045485, and ATM-rs1800054). These findings are more tentative than those described above, as the associations have not yet been examined in other studies. As far as we are aware, no other large study has examined these SNPs in relation to ERBB2 status and hence the observed association between CASP8 and ERBB2 status (if real) is novel. Although our data showed weak evidence of a greater association between ATM-rs1800054 and ER-negative compared with ER-positive disease, the only previous study to report on this8 found no such interaction and when its results are combined with ours, no significant evidence of interaction remains (P < .20).
When the effects of the 7 SNPs most strongly associated with overall breast cancer risk in these data were combined using a polygenic risk score, the cumulative incidence of breast cancer to age 70 years among women in the top fifth was twice that in the bottom fifth (8.8% vs 4.4%). Both the relative and, particularly, the absolute difference was much greater for ER-positive disease (7.4% vs 3.4%) than for ER-negative disease (1.4% vs 1.0%).
Despite the large number of women involved, there is still limited power to detect differences in breast cancer risk associated with those SNPs with relatively rare high-risk alleles according to some of the tumor characteristics. Also, the large number of tests carried out in analyses by tumor characteristics means that associations that are not highly statistically significant must be interpreted with caution. Although data on invasiveness and histology were virtually complete in these data, information on other tumor characteristics was not. In particular, it was feasible to categorize breast cancers only into those that were known to be ERBB2 positive and all others. The inclusion of some true ERBB2-positive cancers into the “other” category would, however, be expected to dilute only slightly any difference in association by ERBB2 status. No information was available on 5 recently identified breast cancer susceptibility loci5- 7 (1p11-rs11249433, 14q24-rs999737, 3p-rs4973768, 17q-rs6504950, and 6q25-rs2046210), with most estimated per-allele ORs in the order of 1.1. However, our findings from polygenic risk models based on various subsets of SNPs suggest that their omission is unlikely to have materially affected our conclusions, especially since most of these loci again appear to be more strongly associated with ER-positive than ER-negative disease.5- 7 The estimates of polygenic risk in the Table and Figure 3 are based on a polygenic risk score derived from our own estimates of association for the individual SNPs. However, these findings were not materially altered when analyses were repeated using a polygenic risk score based on estimates taken from other published data for the relevant per-allele ORs.
Certain established risk factors for breast cancer have similar, or even greater, effects on breast cancer incidence than the differences seen here between women in the highest vs the lowest fifth of polygenic risk score. Indeed, our estimate of the cumulative incidence of breast cancer to age 70 years in women in the top fifth for polygenic risk score (8.8%) is similar to that for women in developed countries with one first-degree relative with breast cancer (9.1%),21 and considerably less than that for women with 2 affected first-degree relatives (15.4%).21 Furthermore, no interactions have been found between the effects of the genes investigated here and the other risk factors for breast cancer.13 Hence, as others have suggested,22,23 subdividing women on the basis of such polygenic risk scores is not at this stage a useful tool for advising women about risk or for population-based breast cancer screening programs but may ultimately be useful for understanding disease mechanisms.
Corresponding Author: Gillian K. Reeves, PhD, Cancer Epidemiology Unit, Richard Doll Bldg, Old Road Campus, Oxford OX3 7LF, UK (firstname.lastname@example.org).
Author Contributions: Dr Reeves had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Reeves, Green, Beral, Bell, Lathrop, Peto.
Acquisition of data: Green, Tipper, Baker, Beral, Zelenika, Lathrop.
Analysis and interpretation of data: Reeves, Travis, Bull, Beral, Peto.
Drafting of the manuscript: Reeves, Travis, Bull, Tipper, Beral, Peto.
Critical revision of the manuscript for important intellectual content: Reeves, Travis, Green, Baker, Beral, Peto, Bell, Zelenika, Lathrop.
Statistical analysis: Reeves, Travis, Bull, Peto.
Obtained funding: Reeves, Green, Beral, Bell, Lathrop.
Administrative, technical, or material support: Tipper, Baker, Zelenika, Lathrop.
Study supervision: Beral.
Financial Disclosures: None reported.
Funding/Support: The Million Women Study is supported by the UK Medical Research Council and Cancer Research UK. Genotyping of blood samples was supported by the Institut National du Cancer, France.
Role of the Sponsors: The sponsors did not have any input into the study design or conduct; data collection, management, analysis, or interpretation; nor did they influence the preparation, review, or approval of the manuscript.
Million Women Study Steering Committee: Emily Banks, Valerie Beral, Ruth English, Jane Green, Julietta Patnick, Richard Peto, Gillian Reeves, Martin Vessey, Matthew Wallis.
NHS Breast Screening Centres collaborating in the Million Women Study (in alphabetical order): Avon, Aylesbury, Barnsley, Basingstoke, Bedfordshire & Hertfordshire, Cambridge & Huntingdon, Chelmsford & Colchester, Chester, Cornwall, Crewe, Cumbria, Doncaster, Dorset, East Berkshire, East Cheshire, East Devon, East of Scotland, East Suffolk, East Sussex, Gateshead, Gloucestershire, Great Yarmouth, Hereford & Worcester, Kent (Canterbury, Rochester, Maidstone), Kings Lynn, Leicestershire, Liverpool, Manchester, Milton Keynes, Newcastle, North Birmingham, North East Scotland, North Lancashire, North Middlesex, North Nottingham, North of Scotland, North Tees, North Yorkshire, Nottingham, Oxford, Portsmouth, Rotherham, Sheffield, Shropshire, Somerset, South Birmingham, South East Scotland, South East Staffordshire, South Derbyshire, South Essex, South Lancashire, South West Scotland, Surrey, Warrington Halton St Helens & Knowsley, Warwickshire Solihull & Coventry, West Berkshire, West Devon, West London, West Suffolk, West Sussex, Wiltshire, Winchester, Wirral, and Wycombe.
Million Women Study coordinating center staff: Simon Abbott, Naomi Allen, Miranda Armstrong, Krys Baker, Angela Balkwill, Vicky Benson, Valerie Beral, Judith Black, Anna Brown, Diana Bull, Benjamin Cairns, Andrew Chadwick, James Chivenga, Barbara Crossley, Francesca Crowe, Gabriela Czanner, Dave Ewart, Sarah Ewart, Lee Fletcher, Toral Gathani, Laura Gerrard, Adrian Goodill, Winifred Gray, Jane Green, Joy Hooley, Bryony Horner, Sau Wan Kan, Carol Keene, Nicky Langston, Isobel Lingard, Maria Jose Luque, Kath Moser, Lynn Pank, Kirstin Pirie, Gillian Reeves, Emma Sherman, Evie Shrerry-Starmer, Moya Simmonds, Helena Strange, Sian Sweetland, Owen Tang, Alison Timadjer, Sarah Tipper, Ruth Travis, Lyndsey Trickett, Joanna Watson, Steve Williams, Lucy Wright.
Additional Contributions: We thank the women who participated in the study, general practitioners who took blood samples, and staff from the NHS Breast Screening Centres. We also thank Adrian Goodill (Cancer Epidemiology Unit, Oxford University) for preparation of the figures. No compensation was received.