[Skip to Content]
[Skip to Content Landing]
October 2016

Building and Validating Complex Models of Breast Cancer Risk

Author Affiliations
  • 1Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee
  • 2Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee
JAMA Oncol. 2016;2(10):1271-1272. doi:10.1001/jamaoncol.2016.0878

In this issue of JAMA Oncology, Maas et al1 present interesting findings about the degree to which breast cancer risk may depend on modifiable and nonmodifiable risk factors in white women. They estimate that 27% of all breast cancers in the United States could be prevented if women maintained a lean body mass, did not use menopausal hormone therapy, and avoided smoking and alcohol. They report that the reduction in risk is greatest for women who have the highest prevalent risk from nonmodifiable factors. These nonmodifiable risk factors were indices of endogenous hormonal exposure (age at menarche, menopause, and first birth, along with parity), height, family history of breast cancer, and 92 single nucleotide polymorphisms (SNPs) that have been associated with breast cancer in other investigations. The study has several strengths, most notably that it is based on data from 8 large prospective cohort studies with high levels of follow-up and well-documented breast cancer outcomes. A potentially controversial aspect of this study is that data on 68 of the 92 SNPs were never collected. Instead the authors simulated a risk score for the missing 68 SNPs using previously published estimates of associations (frequencies and effect sizes) between these SNPs and case-control status and family history. They first modeled the data from their cohorts and developed a polygenic risk score (PRS) based on the 24 SNPs that were observed in study subjects. In building this model they considered an in-depth analysis of possible interactions (synergism) between these SNPs, but found no evidence of such interactions. These SNPs were then combined in a multiplicative model together with epidemiologic risk factors (hereafter called the PRS-24 model).