For the 101 single-nucleotide polymorphisms (SNPs) GRS tertile 1, the mean was 95 (range, 73-99); tertile 2, the mean was 102 (range, 100-105); tertile 3, the mean was 110 (range, 106-125). For the 12 SNP GRS tertile 1, the mean was 9 (range, 4-10); tertile 2, the mean was 11 (range, 11-12); tertile 3, the mean was 14 (range, 13-19).
The y-axis is the proportion of the group (either with or without a CVD event at 10 years) with a given GRS. The curves were generated with a Gaussian kernel density smoother.
Nina P. Paynter, Daniel I. Chasman, Guillaume Paré, Julie E. Buring, Nancy R. Cook, Joseph P. Miletich, Paul M Ridker. Association Between a Literature-Based Genetic Risk Score and Cardiovascular Events in Women. JAMA. 2010;303(7):631–637. doi:10.1001/jama.2010.119
Author Affiliations: Center for Cardiovascular Disease Prevention and the Divisions of Preventive Medicine and Cardiovascular Diseases, Brigham and Women's Hospital, Boston, Massachusetts (Drs Paynter, Chasman, Paré, Buring, Cook, and Ridker); Departments of Pathology and Molecular Medicine and Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada (Dr Paré); and Amgen Inc, Thousand Oaks, California (Dr Miletich).
Context While multiple genetic markers associated with cardiovascular disease have been identified by genome-wide association studies, their aggregate effect on risk beyond traditional factors is uncertain, particularly among women.
Objective To test the predictive ability of a literature-based genetic risk score for cardiovascular disease.
Design, Setting, and Participants Prospective cohort of 19 313 initially healthy white women in the Women's Genome Health Study followed up over a median of 12.3 years (interquartile range, 11.6-12.8 years). Genetic risk scores were constructed from the National Human Genome Research Institute's catalog of genome-wide association study results published between 2005 and June 2009.
Main Outcome Measure Incident myocardial infarction, stroke, arterial revascularization, and cardiovascular death.
Results A total of 101 single nucleotide polymorphisms reported to be associated with cardiovascular disease or at least 1 intermediate cardiovascular disease phenotype at a published P value of less than 10−7 were identified and risk alleles were added to create a genetic risk score. During follow-up, 777 cardiovascular disease events occurred (199 myocardial infarctions, 203 strokes, 63 cardiovascular deaths, 312 revascularizations). After adjustment for age, the genetic risk score had a hazard ratio (HR) for cardiovascular disease of 1.02 per risk allele (95% confidence interval [CI], 1.00-1.03/risk allele; P = .006). This corresponds to an absolute cardiovascular disease risk of 3% over 10 years in the lowest tertile of genetic risk (73-99 risk alleles) and 3.7% in the highest tertile (106-125 risk alleles). However, after adjustment for traditional factors, the genetic risk score did not improve discrimination or reclassification (change in c index from Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults [ATP III] risk score, 0; net reclassification improvement, 0.5%; [P = .24]). The genetic risk score was not associated with cardiovascular disease risk (ATP III–adjusted HR/allele, 1.00; 95% CI, 0.99-1.01). In contrast, self-reported family history remained significantly associated with cardiovascular disease in multivariable models.
Conclusion After adjustment for traditional cardiovascular risk factors, a genetic risk score comprising 101 single nucleotide polymorphisms was not significantly associated with the incidence of total cardiovascular disease.
Risk prediction is a central part of cardiovascular disease prevention and refining prediction strategies remains important for targeting treatment recommendations. One area of potential improvement has been the discovery of genetic markers for cardiovascular disease as well as intermediate phenotypes such as cholesterol and blood pressure. Recent efforts using genome-wide association studies have greatly expanded the discovery of genetic markers associated with cardiovascular disease.
To date, however, the utility of single genetic markers to improve cardiovascular risk prediction has shown mixed results, even for the most promising marker, located in the 9p21 region.1- 3 To combine the relatively small effects of individual genes and to better capture the complex relationship between genetics and cardiovascular disease, the use of a multilocus genetic risk score has been proposed.4 One such score developed by Kathiresan et al5 included 9 genetic markers associated with increased lipid levels but showed no improvement in discrimination and only a slight improvement in reclassification. In large part, however, the predictive abilities of recently discovered genetic markers have not been tested.6 In particular, there has been no evaluation of a literature-based genetic risk score for cardiovascular disease, a possibility that is facilitated by the online catalog maintained by the National Human Genome Research Institute (NHGRI) of all genetic markers identified through genome-wide association studies.7
We constructed 2 genetic risk scores based on a comprehensive literature-based selection of genetic markers known to be associated with either cardiovascular disease or an intermediate phenotype selected from the NHGRI catalog. The scores were then tested to assess their predictive ability in the Women's Genome Health Study. We additionally assessed the predictive ability of genetic information alone, as well as in combination with known cardiovascular risk factors, and compared the genetic information to self-reported family history.
The single-nucleotide polymorphisms (SNPs) that make up the genetic risk scores tested were selected using the online catalog from the NHGRI of genome-wide association studies published between 2005 and June 5, 2009.7 In brief, the catalog is a curated and regularly updated list of all published associations between SNPs and human disease phenotypes with a P value of less than 10−5 from studies that examined at least 100 000 SNPs. From this list, all SNPs were selected with published associations with either cardiovascular disease (myocardial infarction [MI], stroke, coronary disease, and/or cardiovascular death) or an intermediate phenotype (total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, blood pressure, diabetes, hemoglobin A1c or fasting blood glucose, and high-sensitivity C-reactive protein), in which the P value was less than 10−7.
The original reports for all identified SNPs were used to confirm the published risk allele (the allele associated with an increased level or probability) for the phenotype. The published risk allele was designated the cardiovascular risk allele for all phenotypes except high-density lipoprotein cholesterol, for which the allele associated with lower levels was designated. To limit our results to independent effects, SNPs in each chromosome were pruned to ensure linkage disequilibrium (r2<0.5) using the pairwise pruning function in Plink (http://pngu.mgh.harvard.edu/purcell/plink/).8
Two genetic risk scores were constructed on an a priori basis. The first genetic risk score was the sum of all cardiovascular risk alleles from all SNPs, both those associated with cardiovascular disease and those associated with risk factors. The SNPs affecting more than 1 phenotype were only included once. The second genetic risk score was created by limiting the list to only SNPs with a published association with cardiovascular disease before pruning and then adding the number of risk alleles. Additive and independent effects for each risk allele were assumed. Simple counts of the total number of risk alleles for both risk scores were used rather than weighting by the effect of each SNP. An unweighted approach was chosen because the current literature was insufficient to provide stable estimates for each effect, all anticipated effects based on the published data were of small magnitude, and using weights from the Women's Genome Health Study data itself would have introduced bias into the results.
The Women's Genome Health Study9 is an ongoing prospective cohort, which was derived from the Women's Health Study.10 It includes more than 25 000 initially healthy female health professionals who provided a baseline blood sample as well as extensive survey data. For this study, the analyses were limited to participants for whom complete data were available for both the traditional risk factors and for the genetic risk scores. The analyses were further restricted to self-reported white participants to avoid population stratification and because many of the published genetic associations have been explored in white populations only. These restrictions resulted in 19 313 women for the testing of the genetic scores. All participants provided consent for blood-based analyses and long-term follow-up. The study was approved by the institutional review board of the Brigham and Women's Hospital (Boston, Massachusetts).
Information on age, race, smoking status, blood pressure, hypertension treatment, diabetes, and parental history of MI before the age of 60 years was collected by questionnaire at the beginning of the study. Plasma biomarkers for total cholesterol, high-density and low-density lipoprotein cholesterol, triglycerides, hemoglobin A1c, and high-sensitivity C-reactive protein were analyzed in a core laboratory facility, certified by the National Heart, Lung, and Blood Institute and the Centers for Disease Control and Prevention's Lipid Standardization Program.
Genetic information was collected using the HumanHap300 Duo + platform (Illumina Inc, San Diego, California), which contains both a standard panel of approximately 317 000 SNPs for capturing variation among individuals with European ancestry as well as approximately 45 000 SNPs selected specifically for their potential relationship with cardiovascular disease and other diseases. The SNPs defining the APOE alleles were available using an oligonucleotide ligation procedure.11,12 To use published SNPs that were not directly genotyped, the MACH 1.0.16 program (http://www.sph.umich.edu/csg/abecasis/mach/index.html) and data from HapMap13 were used to impute additional genotypes. The MACH program has been shown to have high accuracy14 and only SNPs with an estimated squared correlation between the imputed and true genotype of greater than 0.3 were included, which provides high sensitivity and specificity.15 Of the 101 SNPs selected, 46 were measured directly and 55 were imputed (minimum R2 of 0.6). The estimated maximum likelihood number of alleles was used in the risk score.
Participants were followed up for a median of 12.3 years (interquartile range, 11.6-12.8 years) for incident MI, ischemic stroke, coronary revascularization, and cardiovascular deaths, which were combined to calculate total cardiovascular disease. All end points were adjudicated using additional medical records.
Cox proportional hazards models were used to generate estimates of predicted risk using a base model with and without each genetic risk score. The base models examined were age alone, covariates from the Third Report of the National Cholesterol Education Program Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (ATP III) risk score based on the Framingham cohort with the addition of a history of diabetes (noted as a high-risk equivalent),16 and covariates from the Reynolds risk score, which is a previously published model that includes hemoglobin A1c and C-reactive protein and data on family history.17 The estimated predicted risks were then compared using the Harrell c index18 to examine discrimination, as defined by whether a prediction method ranks cases higher than noncases. The Hosmer-Lemeshow goodness-of-fit test19 was used to examine calibration, as defined by how well the predicted number of events match up with the observed number of events.
Reclassification was assessed by comparing the predicted 10-year risk for each pair of models (base model alone vs base model plus genetic score) across the categories of less than 5% risk, 5% to less than 10% risk, 10% to less than 20% risk, and 20% or higher risk. From the resulting reclassification table, the reclassification calibration statistic20 was used to assess the match between predicted and observed event rates for each model in each division of the table, with lower values and higher P values suggesting better fit. Reclassification calibration statistics cannot be directly compared across different models, but large differences between models can suggest differences in fit. The net reclassification improvement21 also was computed for the women with complete 10-year follow-up. This statistic examines whether the addition of the genetic risk score moves cases to higher risk categories more often than lower risk categories and controls to lower risk categories more often than higher risk categories. The null value is 0%, corresponding to equal movement in both directions.
Statistical significance was considered to be met with a P value of less than .05 and all testing was 2-sided. All statistical analyses were performed using R version 2.6 (R Foundation for Statistical Computing, Vienna, Austria). Using the distribution of the 101 SNP genetic risk score in the data analyses, there was 90% power to detect a 10-year odds ratio per allele as low as 1.0124.
Using the NHGRI catalog, 157 SNPs were identified with a published risk allele and a P value of less than or equal to 10−7 for the association with cardiovascular disease or an intermediate phenotype; these were matched with the geneotyped or imputed data. Five SNPs were not matched (rs17465637 in MIA3 gene region, rs28927680 in the APOA1/C3/A4/A5 region, rs3812316 and rs326 in the MLXIPL gene region, and rs4712524 in the KCNQ1 gene region).7 After pruning to eliminate correlated SNPs in high linkage disequilibrium, 101 SNPs were used in the construction of the primary genetic risk score. The second score, limited to SNPs with a published association with incident cardiovascular disease, included 12 SNPs after pruning.
The resulting genetic scores were evaluated in the 19 313 white participants from the Women's Genome Health Study. At baseline, the participants had a median age of 52.8 years (25th-75th percentile, 48.9-58.9 years), a median systolic blood pressure of 125 mm Hg (25th-75th percentile, 115-135 mm Hg), a median total cholesterol level of 208 mg/dL (25th-75th percentile, 184-235 mg/dL [to convert to mmol/L, multiply by 0.0259]), a median high-density lipoprotein cholesterol level of 52 mg/dL (25th-75th percentile, 43.3-62.5 mg/dL [to convert to mmol/L, multiply by 0.0259]), and a median high-sensitivity C-reactive protein level of 2 mg/dL (25th-75th percentile, 0.8-4.3 mg/dL [to convert to nmol/L, multiply by 9.524]). Also at baseline, 2248 women were current smokers (12%) and 479 had been diagnosed with diabetes (2%). In the individuals with diabetes, the median hemoglobin A1c level was 6.9% (25th-75th percentile, 5.9%-8.3%). Thirteen percent of the women (n = 2499) reported a parental history of MI before the age of 60 years. Over the follow-up period (median, 12.3 years; interquartile range, 11.6-12.8 years), 777 incident cardiovascular events (199 MIs, 203 strokes, 63 cardiovascular deaths, 312 revascularizations) were reported by the study participants and confirmed by the end points committee (634 in the first 10 years).
The 101 SNPs used in the genetic risk score are shown in eTable 1 arranged by the category of the phenotype for the published association. The 12 SNPs used for the score based only on the SNPs known to be associated with cardiovascular disease are listed in the phenotype category of cardiovascular disease. Each SNP was tested for associations with the previously published phenotype and with incident cardiovascular disease in the Women's Genome Health Study. These results, along the candidate gene, the published cardiovascular risk allele, and the frequency of the risk allele in the Women's Genome Health Study are included in eTable 1. Of the101 SNPs, 72 replicated the published phenotype association in the Women's Genome Health Study with a P value of less than .05 and 5 were significantly associated with incident cardiovascular disease (rs17249754 in the ATP2B1 gene region, rs1333049 in the chromosome 9p21.3 region, rs10830963 in the MTNR1B gene region, rs4607103 in the ADAMTS9 gene region, and rs1883025 in the ABCA1 gene region). Only rs1333049 in the chromosome 9p21.3 region has a previously published genome-wide association with cardiovascular disease.
Among the 19 313 participants in the Women's Genome Health Study, the mean (SD) score (or number of risk alleles) using the 101 SNPs was 102.1 (6.4) with a range from 73 to 125. The mean (SD) score using the 12 SNPs was 10.7 (1.9) with a range from 4 to 19. As anticipated, the 101 SNP genetic risk score was positively correlated with total cholesterol, systolic blood pressure, and C-reactive protein, and negatively associated with high-density lipoprotein cholesterol (eTable 2). The 12 SNP genetic risk score also was positively correlated with total cholesterol, but the relationship was sharply attenuated when the 1 SNP with a published association with cholesterol levels (rs599839 in the CELSR2/PSRC1/SORT1 region) was removed. The odds of a family history of premature MI also increased with increasing scores, with an odds ratio of 1.01 per allele for the 101 SNP score and 1.04 per allele for the 12 SNP score (both with P<.001).
Figure 1 shows the unadjusted survival curves by tertile for the 101 SNP and 12 SNP genetic risk scores and for family history of MI. Figure 2 shows the distribution of risk alleles by event status at 10 years of follow-up for the 101 SNP and 12 SNP genetic risk scores. While there is a trend toward increasing risk with greater number of risk alleles for both scores, only the highest tertile of the 101 SNP score had a significant hazard ratio (HR) of 1.22 (95% confidence interval [CI], 1.02-1.45; P = .03) for comparison with the lowest risk group. This corresponds to an absolute cardiovascular disease risk of 3% over 10 years in the lowest tertile of genetic risk (73-99 risk alleles) and 3.7% in the highest tertile (106-125 risk alleles). As suggested by the overlap in the distributions by event status, neither genetic risk score alone had discriminatory capabilities for cardiovascular disease risk (c index, 0.523 for the 101 SNP genetic risk score and 0.517 for the 12 SNP genetic risk score).
Both the 101 SNP and 12 SNP genetic risk scores were associated with increased risk of cardiovascular disease after adjusting for age (Table 1). Specifically, the age-adjusted HR for cardiovascular disease per allele for the 101 SNP genetic risk score was 1.02 per risk allele (95% CI, 1.00-1.03/risk allele; P = .006) and 1.05 per risk allele (95% CI, 1.01-1.09/risk allele; P = .01) for the 12 SNP genetic risk score. Neither genetic risk score remained independently associated once the ATP III or Reynolds covariates were adjusted for in the analyses. The ATP III–adjusted HR per allele was 1.00 (95% CI, 0.99-1.01) for the 101 SNP genetic risk score and 1.04 (95% CI, 1.00-1.08) for the 12 SNP genetic risk score. In contrast, family history of premature MI remained an independent risk factor for incident cardiovascular disease even after adjustment (HR, 1.57; 95% CI, 1.31-1.89). The effects of the standard risk factors were not affected by the addition of the genetic markers (models shown in eTable 3 and eTable 4).
All of the models were calibrated with and without with the addition of the genetic risk scores or family history. Neither genetic risk score improved prediction when added to the ATP III or Reynolds covariates (Table 2). Adding the 101 SNP genetic risk score to the ATP III covariates resulted in a change of 0 in the c index and a net reclassification improvement of 0.5% (P = .24), whereas adding the 12 SNP genetic risk score resulted in a change of 0.001 (P = .12) in the c index and a net reclassification improvement of 0.5% (P = .59). The 12 SNP genetic risk score and family history of premature MI did show some improvement in prediction beyond age alone. When the reclassification calibration was examined (Table 3), only family history of premature MI showed an improvement in fit when added to the base models.
Neither repeating the analyses with only the directly genotyped SNPs, nor excluding the SNPs associated only with C-reactive protein, hemoglobin A1c, or triglycerides had an appreciable effect on the results.
In this analysis, we constructed 2 literature-based genetic risk scores for cardiovascular disease and tested their relationship to incident cardiovascular events and their potential to improve prediction in a prospective cohort of 19 313 initially healthy white women from the Women's Genome Health Study. The risk score based on genetic markers for both cardiovascular disease and intermediate phenotypes (101 SNP score) and the risk score based only on genetic markers for cardiovascular disease (12 SNP score) were associated with increased risk after adjustment for age, but the ability of either score alone to discriminate between women at risk for cardiovascular events and those not at risk was minimal with a c index of 0.52 for both scores. Furthermore, neither genetic risk score remained associated with incident cardiovascular disease after adjustment for traditional risk factors, nor had any significant impact on discrimination or reclassification. In contrast, self-reported family history remained associated with incident cardiovascular disease after adjustment for other risk factors and had a substantive effect on reclassification fit.
Previous studies using genetic risk scores for cardiovascular disease have found some evidence of increased prediction.5,6 However, these studies have used only genetic markers that replicated in the same population used to test the score rather than a strictly literature-based approach, a method that runs the risk of overfitting and consequently yielding overly optimistic results. To avoid this potential bias, we chose to use all genes reported in the literature to be associated with cardiovascular disease or an intermediate phenotype with genome-wide significance. To the extent that the published associations identify useful genetic risk factors, our approach may more accurately reflect the potential of current genetic markers to improve risk prediction on a population basis.
We believe these data have clinical relevance for several reasons. First, genome-wide testing is increasingly available and marketed to the general public. Our study finds no clinical utility in a multilocus panel of SNPs for cardiovascular risk based on the best available literature. Second, our data confirm the utility of intermediate phenotypes such as total cholesterol, high-density lipoprotein cholesterol, and blood pressure in as much as genetic risk scores were no longer significant after adjustment for these phenotypes. This utility most likely reflects the integration of both genetic and environmental factors into measured biomarker levels and to cardiovascular outcomes. Third, our findings confirm the importance of family history of cardiovascular disease, which integrates shared genetics, shared behaviors, and environmental factors. At the same time, we believe that our data suggest areas for further biomarker research, which may improve prediction. Given the continued utility of intermediate phenotypes, the ongoing explorations in metabolomics and proteomics could add significantly to the ability to predict risk.
Limitations of our study merit consideration. As suggested by the strong effect of family history on cardiovascular disease risk, there is a substantial risk component due to genes and shared environment, which may be elucidated by future genetic research. While the NHGRI catalog is based on all available published genome-wide studies, these have focused to date only on common SNPs and, thus, we also were unable to assess the potential contributions of rare alleles. However, if only discovered through a major increase in sample size, it is possible that unidentified variants will have increasingly small effects.22 It also may be possible in the future to obtain stable estimates of the exact effect or HR for use in a weighted score and to find interactions between genes or within genes and other markers, both of which may improve predictive ability.
In conclusion, in this large-scale, prospective cohort of white women, a comprehensive literature-based genetic risk score (although associated with cardiovascular events after adjustment for age) did not improve cardiovascular risk prediction. This was true whether the component genetic effects were extended to include polymorphisms acting on intermediate phenotypes or restricted only to those directly associated with cardiovascular disease outcomes. While the importance of genetic data in understanding biology and etiology is unchallenged, we did not find evidence in this study of more than 19 000 women to incorporate the current body of known genetic markers into formal clinical tools for cardiovascular risk assessment.
Corresponding Author: Nina P. Paynter, PhD, Division of Preventive Medicine, Brigham and Women's Hospital, 900 Commonwealth Ave E, Boston, MA 02215 (firstname.lastname@example.org).
Author Contributions: Dr Paynter had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Paynter, Chasman, Paré, Buring, Cook, Ridker.
Acquisition of data: Paynter, Chasman, Paré, Buring, Cook, Miletich, Ridker.
Analysis and interpretation of data: Paynter, Chasman, Paré, Buring, Cook, Ridker.
Drafting of the manuscript: Paynter, Chasman, Cook, Ridker.
Critical revision of the manuscript for important intellectual content: Paynter, Chasman, Paré, Buring, Cook, Miletich, Ridker.
Statistical analysis: Paynter, Cook.
Obtained funding: Buring, Ridker.
Administrative, technical, or material support: Paré, Buring, Cook, Miletich, Ridker.
Study supervision: Chasman, Buring, Cook, Ridker.
Financial Disclosures: Drs Buring and Ridker reported receiving investigator-initiated funding from the National Heart, Lung, and Blood Institute, National Cancer Institute, the Donald W. Reynolds Foundation, the Leducq Foundation, Roche Diagnostics, and Amgen Inc. Dr Miletich reported both employment by and stock ownership in Amgen Inc. Dr Ridker reported receiving grant support from AstraZeneca, Novartis, Merck, Abbott, and Sanofi-Aventis; consulting fees from AstraZeneca, Novartis, Merck–Schering-Plough, Sanofi-Aventis, Isis, Siemens, and Vascular Biogenics; and being listed as a coinventor on patents held by Brigham and Women's Hospital that relate to the use of inflammatory biomarkers in cardiovascular disease, including the use of high-sensitivity C-reactive protein in the evaluation of patients' risk of cardiovascular disease. These patents have been licensed to Siemens and AstraZeneca. None of the other authors reported financial disclosures.
Funding/Support: The Women's Genome Health Study is supported by funds from the National Heart, Lung, and Blood Institute (grants HL 043851 and HL 080467), the National Cancer Institute (grant CA 047988), the Donald W. Reynolds Foundation, and the Leducq Foundation. Genotyping was performed by Amgen Inc. Additional support for DNA extraction, reagents, and data analysis was provided by Roche Diagnostics and Amgen Inc.
Role of the Sponsors: Amgen collaboratively performed genotyping. The other funding agencies had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.