Percentages shown in ovals indicate the proportion of women distributed to risk categories based on Adult Treatment Panel III (top) and the Reynolds Risk Score (bottom). Reclassification using the Reynolds Risk Score is based on data shown in Table 5, Model B. CVD indicates cardiovascular disease.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Ridker PM, Buring JE, Rifai N, Cook NR. Development and Validation of Improved Algorithms for the Assessment of Global Cardiovascular Risk in WomenThe Reynolds Risk Score. JAMA. 2007;297(6):611–619. doi:10.1001/jama.297.6.611
Context Despite improved understanding of atherothrombosis, cardiovascular prediction algorithms for women have largely relied on traditional risk factors.
Objective To develop and validate cardiovascular risk algorithms for women based on a large panel of traditional and novel risk factors.
Design, Setting, and Participants Thirty-five factors were assessed among 24 558 initially healthy US women 45 years or older who were followed up for a median of 10.2 years (through March 2004) for incident cardiovascular events (an adjudicated composite of myocardial infarction, ischemic stroke, coronary revascularization, and cardiovascular death). We used data among a random two thirds (derivation cohort, n = 16 400) to develop new risk algorithms that were then tested to compare observed and predicted outcomes in the remaining one third of women (validation cohort, n = 8158).
Main Outcome Measure Minimization of the Bayes Information Criterion was used in the derivation cohort to develop the best-fitting parsimonious prediction models. In the validation cohort, we compared predicted vs actual 10-year cardiovascular event rates when the new algorithms were compared with models based on covariates included in the Adult Treatment Panel III risk score.
Results In the derivation cohort, a best-fitting model (model A) and a clinically simplified model (model B, the Reynolds Risk Score) had lower Bayes Information Criterion scores than models based on covariates used in Adult Treatment Panel III. In the validation cohort, all measures of fit, discrimination, and calibration were improved when either model A or B was used. For example, among participants without diabetes with estimated 10-year risks according to the Adult Treatment Panel III of 5% to less than 10% (n = 603) or 10% to less than 20% (n = 156), model A reclassified 379 (50%) into higher- or lower-risk categories that in each instance more accurately matched actual event rates. Similar effects were achieved for clinically simplified model B limited to age, systolic blood pressure, hemoglobin A1c if diabetic, smoking, total and high-density lipoprotein cholesterol, high-sensitivity C-reactive protein, and parental history of myocardial infarction before age 60 years. Neither new algorithm provided substantive information about women at very low risk based on the published Adult Treatment Panel III score.
Conclusion We developed, validated, and demonstrated highly improved accuracy of 2 clinical algorithms for global cardiovascular risk prediction that reclassified 40% to 50% of women at intermediate risk into higher- or lower-risk categories.
In the decade between 1956 and 1966, investigators in Framingham, Mass, defined age, hypertension, smoking, diabetes, and hyperlipidemia as major determinants of coronary heart disease and coined the term coronary risk factors.1-5 Over time, these markers were codified into global risk scores for assessment of cardiovascular risk.6-8 However, for women, up to 20% of all coronary events occur in the absence of these major risk factors,9 whereas many women with traditional risk factors do not experience coronary events.10 Furthermore, over the past half-century, understanding of the biological processes underlying atherothrombosis has markedly shifted to encompass the complex biology of hemostasis, thrombosis, inflammation, endothelial dysfunction, and plaque instability.11,12
Despite this changing view of pathophysiology, variables included in current risk algorithms for women are largely unchanged from those recommended 40 years ago. Additional risk markers that have been proposed include alternative lipid measures, such as apolipoproteins A-I and B-100, non–high-density lipoprotein cholesterol (HDL-C), and lipoprotein(a); inflammatory biomarkers such as high-sensitivity C-reactive protein (hsCRP), soluble intercellular adhesion molecule 1 (sICAM-1), and fibrinogen; markers of glycemic control such as glycated hemoglobin A1c; and plasma creatinine and homocysteine levels.13 However, data are scant evaluating whether improved risk prediction algorithms can be developed that use these markers.14-16
We assayed all of these novel biomarkers as well as a large number of traditional risk determinants at baseline in a cohort of 24 558 initially healthy US women who were prospectively followed up for a median 10.2 years for incident myocardial infarction, stroke, coronary revascularization, or cardiovascular death. In a random subset comprising two thirds of these women (model derivation cohort, n = 16 400), we developed 2 novel algorithms for global risk prediction. We then tested the effectiveness of these new prediction models in the remaining one third of the women (test validation cohort, n = 8158).
Study participants were derived from the Women's Health Study (WHS), a nationwide cohort of US women 45 years and older free of cardiovascular disease and cancer at study entry initiated in September 1992.17 Women eligible for the current analysis were those who provided an adequate baseline plasma sample (n=27 939) and had complete ascertainment of all blood covariates of interest (n=24 558). Exposure data were collected for age, race/ethnicity, diabetes, blood pressure, blood pressure treatment, smoking status, cholesterol treatment, menopausal status, postmenopausal hormone therapy use, height, weight, alcohol use, exercise frequency, parental history of myocardial infarction before age 60 years, and current multivitamin use. All participants self-reported race/ethnicity as white, black, Hispanic American, Asian American, or other. All women were followed up through March 2004 for a median period of 10.2 years (interquartile range, 9.7-10.6 years) for incident myocardial infarction, ischemic stroke, coronary revascularization, and cardiovascular deaths; these were adjudicated by an end-points committee after medical record review. All study participants provided written informed consent. The study protocol was approved by the institutional review board of Brigham and Women's Hospital (Boston, Mass).
All women had baseline plasma samples, 76% of whom had fasting blood samples. The plasma samples were measured in a core laboratory facility for total cholesterol, HDL-C, low-density lipoprotein cholesterol (LDL-C), lipoprotein(a), apolipoproteins A-I and B-100, hsCRP, sICAM-1, fibrinogen, creatinine, hemoglobin A1c, and plasma homocysteine concentration. The core laboratory is certified by the National Heart, Lung, and Blood Institute/Centers for Disease Control and Prevention Lipid Standardization Program. Assay characteristics and coefficients of variation are available upon request.
Two thirds of the study participants (n = 16 400) were randomly assigned to a model derivation data set and one third (n = 8158) were reserved as an independent validation data set.
Among women allocated to the model derivation set, the best overall prediction algorithm (model A) was fit using Cox proportional hazards models. All available exposure variables and all blood biomarkers were considered for this initial model, as were all potential transformations and interactions between them. Both stepwise selection procedures and multiple additive regression trees18 were used for variable selection, assessment for interactions, and model development. Partial dependence plots were examined for evidence of interaction, even in the absence of main effects. These interaction terms were then further tested in the Cox models.
The final criterion for inclusion in model A was minimization of the Bayes Information Criterion (BIC).19 The BIC is a likelihood-based measure in which lower values indicate better fit and in which a penalty is paid for increasing the number of variables. Thus, the variables selected for inclusion should provide not only the best fit but also a parsimonious prediction model. The BIC is not influenced by the number of covariates, so models can be directly compared.
Once variables were selected for model A, we created a second model (model B) that was simplified for the purpose of clinical application and efficiency. For example, in these data non–HDL-C [total cholesterol −HDL-C] is highly correlated with apolipoprotein B-100 (r = 0.87), and HDL-C is highly correlated with apolipoprotein A-I (r = 0.80).20 Thus, model B substituted total cholesterol and HDL-C. Simplified model B also eliminated lipoprotein(a) because prior work in this cohort has found the predictive utility of lipoprotein(a) to be limited to those with extremely high values (>90th percentile) and concomitant hyperlipidemia.21
To allow for direct comparison, the BIC was calculated using data from the derivation cohort for models A and B, as well as for models based exclusively on covariates used in the current Adult Treatment Panel III (ATP-III) risk prediction algorithm7 or in the Framingham Risk Score,6 but with coefficients reestimated in the WHS data.
Once determined in final form, models A and B were prospectively tested in the validation data set of 8158 women. In this validation stage, 3 global measures were used to evaluate each prediction model: Entropy (a likelihood-based function for dichotomous outcomes for which smaller values indicate better fit); the Yates slope (the difference in predicted risk between cases and noncases for which larger values indicate better fit); and the Brier score (which computes the sum of squared differences between the observed outcome and fitted probabilities and for which smaller values indicate better concordance between predicted and observed outcomes).22,23 Because all women were followed up for at least 8 years, observed status and predicted risk were evaluated and compared as of 8 years of follow-up for all measures.
In addition to these global measures, we assessed the predictive accuracy of each derived model by looking at 2 components of accuracy: discrimination and calibration. Discrimination was evaluated using the C statistic that represents the area under the receiver operating characteristic curve (for which larger values indicate better discrimination). To assess model calibration (or how closely the predicted probabilities reflect actual risk), the Hosmer-Lemeshow calibration statistic comparing observed and predicted risk was computed based on categories defined by 2% increments in predicted risk.
To compare the performance of models A and B to current risk prediction algorithms, we also computed each of these summary statistics in the test cohort using models limited exclusively to covariates defined in the current ATP-III or Framingham Risk Scores, but with coefficients reestimated in the WHS cohort. We additionally computed each of these summary statistics for predicted outcomes based on formal application of the published ATP-III and Framingham Risk scoring systems as estimated from Framingham data.6,7
For ease of interpretation and to address the critical clinical issues of reclassification and risk stratification, we divided all participants in the test cohort into the 10-year risk groups of less than 5%, 5% to less than 10%, 10% to less than 20%, and 20% or higher using covariates currently included in the ATP-III risk prediction model. We then calculated the proportion of participants in the test cohort who were reclassified into either higher- or lower-risk categories using models A or B rather than the covariates in the ATP-III model and then compared observed to predicted events during the follow-up period.
Finally, to mimic clinical practice, we repeated these latter analyses using the published ATP-III risk prediction score to determine 10-year risk groups rather than the refitted model using the ATP-III covariates; because diabetes is considered a coronary risk equivalent in current ATP-III guidelines, this final analysis was restricted to nondiabetic study participants.
Analyses were conducted using SAS version 9.1 (SAS Institute Inc, Cary, NC), SPlus version 7.0 (Insightful Corp, Seattle, Wash), and Treenet version 2.0 (Salford Systems, San Diego, Calif).
Table 1 shows baseline characteristics and biomarker levels for women in the derivation and validation cohorts. During follow-up, 504 cardiovascular events occurred in the derivation cohort and 262 in the validation cohort.
In the model derivation cohort, 35 potential variables (and all possible interactions between them) were evaluated for model inclusion. Of these, only 9 were included in model A, the best-fitting predictive model with the smallest BIC value; age, systolic blood pressure, current smoking, apolipoprotein B-100, hsCRP, apolipoprotein A-I, parental history of myocardial infarction before age 60 years, and 2 interaction terms, hemoglobin A1c if diabetes was present and lipoprotein(a) level if apolipoprotein B-100 was 100 mg/dL or higher. The β coefficients, standard errors, and P values for each of these covariates in best-fitting model A are shown in Table 2.
Given selection of these 9 variables, some markers, such as homocysteine and sICAM-1, appeared to predict risk, but did not satisfy the BIC criterion for model inclusion. Other notable variables that did not further minimize the BIC once the above variables were taken into account included body mass index, alcohol use, exercise frequency, menopausal status, hormone therapy, fibrinogen, and creatinine.
Table 2 also presents β coefficients, standard errors, and P values for simplified model B, which was otherwise identical to model A, but substituted total and HDL-C for apolipoproteins B100 and A-I, and eliminated the interaction term requiring measurement of lipoprotein(a) if apolipoprotein B-100 was 100 mg/dL or higher.
In the derivation data set, the BIC value for model B (BIC = 9067.5) was not as small as that of the best-fitting model A (BIC = 9039.4), suggesting some loss of predictive ability with clinical simplification. However, model B nevertheless was associated with smaller BIC values than were models based on covariates used in the ATP-III prediction model (BIC = 9098.5) or those based on covariates used in the Framingham Risk Score (BIC = 9161.2). Thus, in the model derivation set, both model A and model B appeared to improve risk prediction over that achieved with currently measured covariates Box.
10-year cardiovascular disease risk (%) = [1 − 0.98756(exp [A − 19.848])] × 100% where
A = 0.0785 × age + 3.271 × natural logarithm (systolic blood pressure) + 0.202 × natural logarithm (high-sensitivity C-reactive protein) + 0.00820 × apolipoprotein B-100 − 0.00769 × apolipoprotein A-1 + 0.134 × hemoglobin A1c (%) (if diabetic) + 0.825 (if current smoker) + 0.427 (if family history of premature myocardial infarction) + 0.00742 × (lipoprotein(a)-10) (if lipoprotein(a)>10 and apolipoprotein B-100 ≥ 100)
Model B, the Reynolds Risk Score
10-year cardiovascular disease risk (%) = [1 − 0.98634(exp[B − 22.325])] × 100% where
B = 0.0799 × age + 3.137 × natural logarithm (systolic blood pressure) + 0.180 × natural logarithm (high-sensitivity C-reactive protein) + 1.382 × natural logarithm (total cholesterol) −1.172 × natural logarithm (high-density lipoprotein cholesterol) + 0.134 × hemoglobin A1c (%) (if diabetic) + 0.818 (if current smoker) + 0.438 (if family history of premature myocardial infarction)
Table 3 presents summary statistics regarding the performance of models A and B in terms of predicting risk among the 8158 women reserved in the prospective validation data set. For each prespecified global summary statistic (Entropy, Yates Slope, Brier Score, and C statistic), models A and B provided improvement over prediction models based on covariates used in the ATP-III or Framingham models or when the published ATP-III or Framingham Scores were directly applied. With regard to comparisons of predicted and observed risk, P values for the Hosmer-Lemeshow statistics for model A and B indicated good calibration. Although calibration was suboptimal for the 3 published score models, part of this effect was due to a difference in end-point definition.
Although formal statistical testing provides a method of evaluating model superiority, we believe the critical issue for clinical application is the proportion of patients reclassified using a new risk algorithm and whether the magnitude of this reclassification is large enough to alter physician behavior with regard to prevention.24
To address this issue, Table 4 presents the proportion of women in the validation cohort initially classified as having a 10-year risk of less than 5%, 5% to less than 10%, 10% to less than 20%, and 20% or higher based on ATP-III covariates (with coefficients reestimated in the WHS data) who would be reclassified to higher- or lower-risk categories by model A and model B. As shown for model A, the proportion of women reclassified was small for those with a 10-year risk of less than 5% (2.5%). However, 43% of all women estimated to be at 5% to less 10% risk or at 10% to less than 20% risk using ATP-III covariates were reclassified to higher or lower clinical risk categories when model A was used instead. Table 4 also shows that actual event rates for model A matched well with predicted rates in nearly all groups; of the 681 participants reclassified by model A, all but 93 were placed into more accurate risk categories.
Table 5 presents similar analyses for women who did not have diabetes with direct application of the published ATP-III risk score. As shown, about 50% of all women with an estimated 10-year risk for coronary heart disease of 5% to less than 10% or 10% to less than 20% according to ATP-III were reclassified to higher or lower risk categories when model A was used instead. Again, there was excellent matching of actual and predicted rates for model A; of the 722 participants without diabetes who were reclassified by model A, all but 2 were placed into more accurate risk categories.
As also presented in Table 4 and Table 5, similar effects were achieved for clinically simplified model B limited to age, systolic blood pressure, hemoglobin A1c if diabetic, current smoking, total and HDL-C, hsCRP, and parental history of myocardial infarction before age 60 years. Although the proportion of individuals at intermediate-risk reclassified by model B (30%-45%) was smaller than that of model A (43%-50%), there was still excellent matching of actual to predicted event rates in nearly all groups. For example, of the 647 participants without diabetes in Table 5 who were reclassified by model B, all but 6 were placed into more accurate risk categories. Neither new algorithm added substantive information for women at very low initial risk (<5% 10-year risk based on published ATP-III risk scores).
As a practical example, Table 6 provides estimated 10-year risks based on variables in our most parsimonious model (model B, the Reynolds Risk Score) for a 50-year-old women smoker without diabetes with an ATP-III estimated risk of 11.5%. As shown, 10-year risk estimates based on model B range from a low of 4.9% to a high of 18.4% for this hypothetical patient.
With regard to reclassification, as shown in the Figure for a representative population of 100 000 US women without diabetes at intermediate risk (80 000 at 5% to less than 10% and 20 000 at 10% to less than 20% 10-year risk by ATP-III), use of the clinically simplified Reynolds Risk Score would place 13 500 of these women at low risk, 48 500 at low to moderate risk, 32 500 at moderate to high risk, and 5400 at high risk.
In this study of 24 558 initially healthy US women followed up for a median of 10.2 years, we developed and validated risk prediction algorithms that reclassified 40% to 50% of women currently predicted to be at intermediate risk into higher- or lower-risk categories and did so with greatly improved accuracy when compared with models based on current ATP-III prediction scores. This effect was present not only for our best-fitting model (model A) but also for a simplified clinical model limited to age, systolic blood pressure, hemoglobin A1c if diabetic, current smoking, total and HDL-C, hsCRP, and parental history of myocardial infarction before age 60 years (model B, the Reynolds Risk Score).
In addition to providing opportunity for improved risk stratification, we believe these data have clinical implications for the targeting of preventive therapies. In these analyses, large proportions of women with 10-year risk estimates of 5% to less than 10% or of 10% to less than 20% based on current ATP-III risk scores were reclassified at either higher or lower risk of total cardiovascular disease when either of the new algorithms was used. In current US treatment guidelines that take into account the benefits, risk, and cost of lipid-lowering therapy, statins are considered an option for those with 10-year risk estimates of 10% or greater25; a more conservative approach taken in Europe typically limits statin therapy to those with 10-year risks of 20% or more.8 In both settings, application of the models described herein should allow more accurate targeting of statin prescriptions to those patients with the most appropriate level of risk so as to minimize toxicity and maximize benefit and cost efficacy.
We also believe these data provide optimism regarding novel cardiovascular risk factors. In our best-fitting model, hemoglobin A1c, hsCRP, lipoprotein(a), apolipoproteins A-I and B-100, and parental history were included because each contributed to minimization of the BIC. However, homocysteine, fibrinogen, sICAM-1, and creatinine were not included in our parsimonious models despite univariate risk associations. Similarly, neither body mass index nor exercise frequency added further prognostic information on overall global risk.26,27 By contrast, we observed that glucose control as evaluated by hemoglobin A1c was an effective biomarker in these women that modified the risk associated with diabetes.
Our findings might appear to conflict with a recent report from the Framingham Heart Study in which only marginal utility for novel risk factors was described.16 However, instead of seeking evidence of reclassification, that analysis relied solely on the C statistic, a technique known to have limited utility for evaluating prediction models for which the task is to assess future risk in a currently healthy population.28 Equally important, that analysis relied on data from 1712 women who experienced only 68 vascular events, many of which were coded as heart failure or coronary insufficiency. By contrast, the risk algorithms described herein rely on data from 24 558 women who experienced 766 hard cardiovascular end points. We also note that in a separate Framingham Heart Study analysis addressing the additive value of hsCRP, use of this biomarker alone reclassified 25% of those with ATP-III risks between 5% and 20%, data fully consistent with those presented herein.29
Despite advantages of sample size and power, limitations of our analysis merit discussion. First, because our data are limited to women and our cohort is largely white with a relatively narrow socioeconomic range, care should be taken before generalizing to other populations. We note, however, that all components of models A and B have previously been found to predict cardiovascular risk in men30-34 and that both hsCRP and parental history of vascular disease have previously been shown to predict risk within the Framingham cohort itself.29,35,36
Second, our data on blood pressure, obesity, and family history were based on self-report. However, the WHS is composed of female health professionals who are known to provide accurate reports of lifestyle factors and health status, including blood pressure and weight.37,38 In addition, self-reported blood pressure, body mass index, and family history have previously been shown in the WHS to be strong predictors of cardiovascular risk, with odds ratios consistent in magnitude with those observed in other major studies.39-41 Regarding parental history, we used a conservative cut point of age younger than 60 years to be consistent with prior findings in this cohort and in recent analyses from Framingham.41,42 The inclusion of family history in these algorithms underscores the importance of genetic influences on risk among women; in a recent study of women with low Framingham risk who had premature coronary disease in a first-degree relative, nearly a third had significant subclinical atherosclerosis and 17% had atherosclerotic burden exceeding the 90th percentile.43
Third, following recent recommendations,44 we elected in our analysis to use a combined end point of myocardial infarction, ischemic stroke, coronary revascularization, and cardiovascular mortality. We believe this is an appropriate choice because this end point has typically been used in major cardiovascular clinical trials evaluating interventions for primary prevention, including recent trials of aspirin and statin therapy.
Finally, we limited our analysis to blood-based biomarkers and traditional epidemiological risk factors, in part to ensure a cost-effective approach for primary prevention that could be directly compared with the ATP-III algorithm. These data thus do not examine the potential for atherosclerotic imaging tests to serve as an alternative method for evaluating risk. However, we believe the methods developed herein—variable selection in a derivation data set to minimize the BIC followed by prospective testing in a second validation cohort—should provide a structure for the formal evaluation of emerging risk predictors, including potential imaging tests.
As 8 to 10 million US women have an ATP-III estimated 10-year risk between 5% and 20%, application of these data could have an immediate effect on cardiovascular prevention.45 A user-friendly calculator for the Reynolds Risk Score can be freely accessed at http://www.reynoldsriskscore.org.
Corresponding Author: Paul M Ridker, MD, MPH, Center for Cardiovascular Disease Prevention, Brigham and Women's Hospital, 900 Commonwealth Ave E, Boston, MA 02215 (firstname.lastname@example.org).
Author Contributions: Drs Ridker and Cook had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Ridker, Cook.
Acquisition of data: Ridker, Buring, Rifai, Cook.
Analysis and interpretation of data: Ridker, Buring, Rifai, Cook.
Drafting of the manuscript: Ridker, Cook.
Critical revision of the manuscript for important intellectual content: Ridker, Buring, Rifai, Cook.
Statistical analysis: Cook.
Obtained funding: Ridker.
Administrative, technical, or material support: Rifai.
Financial Disclosures: Dr Ridker reports that he currently or in the past 5 years has received research funding support from multiple not-for-profit entities including the National Heart, Lung, and Blood Institute, the National Cancer Institute, the American Heart Association, the Doris Duke Charitable Foundation, the Leducq Foundation, the Donald W. Reynolds Foundation, and the James and Polly Annenberg La Vea Charitable Trusts. Dr Ridker also reports that currently or in the past 5 years he has received investigator-initiated research support from multiple for-profit entities including AstraZeneca, Bayer, Bristol-Myers Squibb, Dade-Behring, Novartis, Pharmacia, Roche, Sanofi-Aventis, and Variagenics. Dr Ridker reports being listed as a coinventor on patents held by the Brigham and Women's Hospital that relate to the use of inflammatory biomarkers in cardiovascular disease and has served as a consultant to Schering-Plough, Sanofi/Aventis, AstraZeneca, Isis Pharmaceutical, Dade-Behring, and Vascular-Biogenics. Dr Buring reports that she currently or in the past 5 years has received investigator-initiated research funding and support from the National Heart, Lung, and Blood Institute, the National Cancer Institute, the National Institute on Aging, and Dow Corning Corp; research support for pills and/or packaging from Bayer Heath Care and the Natural Source Vitamin E Association; and honoraria from Bayer for speaking engagements. Dr Rifai reports receiving research grant support from Merck Research Laboratories, serving as a consultant to Merck Research Laboratories and Sanofi/Aventis, and receiving honoraria for speaking from Merck Research Laboratories, Dade Behring, Abbott Laboratories, Ortho Diagnostics, Denka Seiken, and Roche Diagnostics. Dr Cook reports having received funding from the National Heart, Lung, and Blood Institute, the National Cancer Institute, and Roche Diagnostics, and has served as a consultant to Bayer Health Care.
Funding/Support: The Reynolds Risk Score Project was supported by investigator-initiated research grants from the Donald W. Reynolds Foundation (Las Vegas, Nev) with additional support from the Doris Duke Charitable Foundation (New York, NY), and the Leducq Foundation (Paris, France). The Women's Health Study cohort is supported by grants from the National Heart, Lung, and Blood Institute and the National Cancer Institute (Bethesda, Md).
Role of the Sponsor: The funding agencies had no involvement in the design and conduct of the study, the collection, management, analysis, and interpretation of the data, or in the drafting of the manuscript.
This article was corrected for error in data on 2/16/07, prior to publication of the correction in print.
Create a personal account or sign in to: