X-axes refer to decile of predicted risk based on the Framingham Heart Study function. ARIC indicates Atherosclerosis Risk in Communities Study. Hard CHD events were coronary death or myocardial infarction.
X-axes refer to decile of predicted risk based on the Framingham Heart Study function. HHP indicates Honolulu Heart Program; PR, Puerto Rico Heart Health Program; and SHS, Strong Heart Study. Hard CHD events were coronary death or myocardial infarction.
Ralph B. D'Agostino, Scott Grundy, Lisa M. Sullivan, Peter Wilson, . Validation of the Framingham Coronary Heart Disease Prediction ScoresResults of a Multiple Ethnic Groups Investigation. JAMA. 2001;286(2):180–187. doi:10.1001/jama.286.2.180
The Framingham Heart Study produced sex-specific coronary heart disease (CHD) prediction functions for assessing risk of developing incident CHD in a white middle-class population. Concern exists regarding whether these functions can be generalized to other populations.
To test the validity and transportability of the Framingham CHD prediction functions per a National Heart, Lung, and Blood Institute workshop organized for this purpose.
Design, Setting, and Subjects
Sex-specific CHD functions were derived from Framingham data for prediction of coronary death and myocardial infarction. These functions were applied to 6 prospectively studied, ethnically diverse cohorts (n = 23 424), including whites, blacks, Native Americans, Japanese American men, and Hispanic men: the Atherosclerosis Risk in Communities Study (1987-1988), Physicians' Health Study (1982), Honolulu Heart Program (1980-1982), Puerto Rico Heart Health Program (1965-1968), Strong Heart Study (1989-1991), and Cardiovascular Health Study (1989-1990).
Main Outcome Measures
The performance, or ability to accurately predict CHD risk, of the Framingham functions compared with the performance of risk functions developed specifically from the individual cohorts' data. Comparisons included evaluation of the equality of relative risks for standard CHD risk factors, discrimination, and calibration.
For white men and women and for black men and women the Framingham functions performed reasonably well for prediction of CHD events within 5 years of follow-up. Among Japanese American and Hispanic men and Native American women, the Framingham functions systematically overestimated the risk of 5-year CHD events. After recalibration, taking into account different prevalences of risk factors and underlying rates of developing CHD, the Framingham functions worked well in these populations.
The sex-specific Framingham CHD prediction functions perform well among whites and blacks in different settings and can be applied to other ethnic groups after recalibration for differing prevalences of risk factors and underlying rates of CHD events.
The Framingham Heart Study has developed mathematical functions for predicting risk of clinical coronary heart disease (CHD) events.1- 5 These are derived multivariable mathematical functions that assign weights to major CHD risk factors such as sex, age, blood pressure, total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), smoking behavior, and diabetes status. For a person free of cardiovascular disease, his/her CHD risk factors are entered into the function to produce a probability estimate of developing CHD within a certain time period (eg, the next 5 years). Recently, Framingham investigators developed a simplified model that incorporates blood pressure and cholesterol categories proposed by the Fifth Joint National Committee on Hypertension (JNC-V) and the National Cholesterol Education Program, Adult Treatment Panel II (NCEP-ATP II).5- 8
The Framingham functions were developed to assess the relative importance of CHD risk factors and to quantify the absolute level of CHD risk for individual patients. The report of the third adult treatment panel (NCEP-ATP III) endorses knowledge of absolute CHD risk as a means of identifying those patients who are likely to benefit from aggressive primary prevention strategies and as a tool motivating patients to comply with them.9
The Framingham Heart Study consists of white middle-class individuals. Concern exists as to the generalizability of its CHD risk function to populations such as other whites, blacks, Asian Americans, Hispanics, and Native Americans. In January 1999 the National Heart, Lung, and Blood Institute convened a CHD Prediction Workshop to evaluate the performance of Framingham functions in non-Framingham populations.10
Sex-specific Framingham CHD risk functions were derived from 2439 men and 2812 women, 30 to 74 years of age, who were free of cardiovascular disease (CVD) at the time of their Framingham Heart Study examinations in 1971 to 1974. Participants attended either the 11th examination of the original Framingham Cohort4,5 or the initial examination of the Framingham Offspring Study.11 Coronary heart disease risk factors were routinely and systematically evaluated during these examinations as described in detail elsewhere.5 Twelve-year follow-up was obtained for the development of "hard" CHD events, defined as coronary death or myocardial infarction. Sex-specific Cox proportional hazards regression functions were computed that relate JNC-V blood pressure and NCEP-ATP II cholesterol categories, along with age, current smoking, and presence of diabetes to the occurrence of hard CHD events.
Six non-Framingham cohorts were identified for evaluation.12- 17 Criteria for selection were similar age range, systematic measurement of CHD risk factors, and adequate length of follow-up for hard CHD events. The selected cohorts were participants in the Atherosclerosis Risk in Communities Study (ARIC, 1987-1988), the Physicians' Health Study (PHS, 1982), the Honolulu Heart Program (HHP, 1980-1982), the Puerto Rico Heart Health Program (PR, 1965-1968), the Strong Heart Study (SHS, 1989-1991), and the Cardiovascular Health Study (CHS, 1989-1990). The PHS is a prospective, nested case-control study with 1-to-4 matching of cases to controls for age and smoking.
For each cohort, sex-specific Cox regression functions were derived using the same variables as in the Framingham functions but using data from the individual cohorts. We call these the cohorts' "own" functions. They represent the best possible Cox prediction functions for each cohort based on specific prevalences of risk factors and CHD event rates. For the ARIC study, which includes white and black subjects, the Cox regression functions were sex- and race-specific. The CHS cohort included subjects 65 to 88 years old. We used only CHS subjects aged 65 to 74 years for the CHS' own functions.
All analyses were sex- and race-specific. The performance of the Framingham prediction functions among the non-Framingham cohorts was assessed according to 3 evaluations: equality of regression coefficients (relative risk [RR] comparison), discrimination, and calibration.
For each risk factor, Cox proportional hazards modeling yielded regression coefficients for the Framingham and non-Framingham cohorts. To compare these coefficients a test statistic z was calculated, where z = (b[F] − b[O])/SE, and where b(F) and b(O) are, respectively, the regression coefficients of the Framingham and the other cohort's model, while SE is the standard error of the difference in the coefficients. This was computed as the square root of the sum of the squares of the SEs for the 2 coefficients. Because the RR of a variable is computed by exponentiating its regression coefficient, the z statistic tests the equality of RRs between Framingham and non-Framingham cohorts.
Discrimination is the ability of a prediction model to separate those who experience hard CHD events from those who do not. We quantified this by calculating the c statistic, analogous to the area under a receiver operating characteristic (ROC) curve18- 20; this value represents an estimate of the probability that a model assigns a higher risk to those who develop CHD within a 5-year follow-up than to those who do not.18,19 For each non-Framingham cohort 2 c statistics were computed, one applying the Framingham function to the cohort and the other from the cohort's own prediction function. These were compared using a test developed by Nam.20
Calibration measures how closely predicted outcomes agree with actual outcomes. For this we used a version of the Hosmer-Lemeshow χ2 statistic.19,20 For each non-Framingham cohort, the Framingham function's predicted risks were used to divide subjects into deciles of predicted risk for experiencing a hard CHD event within 5 years. Plots were constructed showing predicted and actual event rates for each decile. A χ2 statistic was calculated to compare the differences between predicted and actual event rates; small values indicated good calibration. Values exceeding 20 indicate significant lack of calibration (P<.01). For further evaluation of calibration, we compared this χ2 statistic with one derived from each cohort's own prediction function. All statistical analyses were performed in SAS version 6.12 (SAS Institute, Cary, NC). Because the PHS is a case-control study it is not suitable for calibration comparisons.
When there is a systematic overestimation or underestimation of risk, transporting a prediction function from one setting to another requires a process known as recalibration. The Framingham Cox regression models have the form S0(t)exp(f[x,M]) where f(x,M) = b1(x1− M1) + . . . + bp(xp− Mp). Here b1, . . . ,bp are the regression coefficients (logs of the RRs), x1, . . . ,xp represent an individual's risk factors, and M1, . . . ,Mp are the means of the risk factors of the Framingham cohort. S0(t) is the Framingham average incidence rate at time t or, more precisely, the survival rate at the mean values of the risk factors. With recalibration, the Framingham mean values of the risk factors (M1, . . . ,Mp) are replaced by the mean values of the risk factors from a non-Framingham cohort, while the Framingham average incidence rate S0(t) is replaced by the cohort's own average incidence rate. We used Kaplan-Meier estimates to determine average incidence rates.20,21 It is important to note that recalibration does not affect RR comparisons or discrimination evaluations.
The Framingham Study cohort consisted of 2439 men and 2812 women free of cardiovascular disease. The 5- and 10-year hard CHD event rates were 3.7% and 8.0% for men and 1.4% and 2.8% for women.
Table 1 shows the Cox regression coefficients for the sex-specific Framingham regression models. Table 2 contains the racial compositions, sample sizes, age ranges, mean ages, risk factor distributions, and 5-year incidence rates for the Framingham and non-Framingham cohorts.
Table 3 and Table 4 contain the RRs of the CHD risk factors for each cohort's sex- and race-specific regression model. First, we considered each function separately. Among men most risk factors had statistically significant coefficients, whereas among women a number of coefficients were not statistically significant, presumably because of low event rates (eg, for ARIC black women there were only 38 events among 2333 subjects). Nonetheless, within risk factor categories, trends were significant. For example, except for the SHS Native Americans, for all cohorts the risk for hard CHD events increased as blood pressure went from optimal to stage II-IV hypertension, as TC increased, and as HDL-C decreased (P<.01 for all). In the SHS there were some unexpected but not statistically significant elevated risks in high HDL-C groups.
Among men, there were no significant differences between Framingham RRs and those of ARIC white and black men. There were differences in RRs for smoking and age in the PHS cohort, which may reflect the matching scheme that was used in that study. Among HHP Japanese American men and PR Hispanic men, RRs were lower for optimal blood pressure. Smoking was associated with a lower RR among HHP Japanese American men, while diabetes and TC 280 mg/dL (7.25 mmol/L) or higher were associated with much higher RRs among SHS Native American men. Also, in this cohort HDL-C in the range of 50 to 59 mg/dL (1.30-1.53 mmol/L) had an unexpected elevated risk and stage I hypertension had an unexpected low risk, both resulting in significant differences from the Framingham function. Among the more elderly CHS white men, cholesterol abnormalities were associated with a lower RR.
Among women, there were no differences between Framingham RRs and those of white ARIC or CHS women. Black women in the ARIC cohort had higher RRs for high normal blood pressure and stage II-IV hypertension. In the SHS there were significant differences for diabetes and smoking. Also, HDL-C greater than or equal to 60 mg/dL (1.55 mmol/L) carried an elevated risk in the SHS, resulting in a significantly different RR than that of the Framingham function.
Table 5 contains the c statistics for both men and women. The "FHS" row refers to the discrimination achieved by applying the Framingham prediction functions to the non-Framingham cohorts, while the "Best Cox" row contains the c statistics resulting from the cohort's own Cox regression function. Since they are based on the same cohorts for which the scores were developed, the latter c statistics are overestimates. For non-Framingham white men (ARIC, PHS, and CHS) and women (ARIC and CHS), the Framingham functions discriminated well, almost always achieving the same discrimination as best Cox functions. Overall, within sampling fluctuations the Framingham functions discriminated nearly as well as the best Cox functions of the non-Framingham cohorts, with the exception of the SHS Native Americans.
Table 5 also contains the χ2 statistics for evaluation of the calibration of the Framingham prediction functions applied to non-Framingham cohorts. For white men and women, including the more elderly subjects in the CHS cohort, both the Framingham functions ("Unadjusted" row) and the individual cohort's own functions ("Best Cox" row) showed a statistically acceptable calibration. Figure 1 contains calibration plots for white and black men and women from the ARIC study. In general, actual CHD event rates were similar to event rates predicted by Framingham functions among white and black men and women.
For the HHP Japanese American men and the PR Hispanic men, the calibration χ2 statistics of 66.0 and 142.0, respectively, indicate poor calibration (Table 5, "Unadjusted" row and Figure 2A and B, "Unadjusted" panels). The Framingham prediction function systematically overestimated risk in both cohorts, in which the overall CHD event rates were substantially lower. Model recalibration using the non-Framingham cohorts' mean values for risk factors and CHD incidence rates substantially improved the performance of the Framingham prediction functions (Figure 2, "Adjusted" panels, and Table 5). In the SHS, calibration was good for men (χ2 = 10.6), but less good for women (χ2 = 22.7). Recalibration resulted in improved performance of the Framingham functions (Figure 2 and Table 5).
The Framingham CHD prediction functions were developed to help clinicians estimate the absolute risk of any individual patient developing clinically manifest disease. We sought to demonstrate the external validity of the Framingham functions by examining their performance in 6 different well-described population-based cohorts that reflect a wide range of ethnic diversity.
As shown in Table 3 and Table 4, RRs for major CHD risk factors were remarkably similar to those derived from the Framingham Heart Study cohort among white men and women and black men in the ARIC cohort. Among black women in the ARIC cohort, RRs for elevated blood pressure were somewhat higher. In the cohorts that were made up of other ethnic groups, however, we did note some differences in RRs. For example, smoking was associated with a much lower RR in HHP Japanese American men for reasons that are not clear. In the CHS cohort, cholesterol abnormalities and smoking had lower RRs, possibly due to age interactions. In the SHS Native American cohort, there were RR differences for cholesterol abnormalities and diabetes. Some cholesterol differences are unexplained, with high HDL-C levels carrying an increased risk in the SHS cohort. Because the prevalence of diabetes among Native Americans is quite high, it is possible that the different RRs we observed may be due to interactions with other risk factors and with other factors unique to diabetes, such as albuminuria, that were not considered in our analyses.
The ability of the Framingham prediction functions to discriminate between subjects who developed clinical CHD and those who did not was reasonably good for most of the non-Framingham cohorts (Table 5). Among ARIC black women, the Framingham c statistic was numerically, but not significantly, lower than that derived from the model based on that same cohort's data. Since this ARIC c statistic is based on the same data with which its function was developed, it is an overestimate and the difference may relate to this. It may also be due to the small number of CHD events. The Framingham c statistic for CHS men was also numerically, but not significantly, lower than the CHS' own function c statistic. The difference may relate to the overestimate of the CHS cohort's own function c statistic. The reasons why both the Framingham and CHS cohort's own function c statistics are low may be a consequence of the relatively small sample size and the narrow age distribution. The c statistics were appreciably decreased for SHS Native Americans. Why discrimination was worse for Native Americans compared with that for white and black men and women of the ARIC cohort, PR Hispanic men, and HHP Japanese American men is not clear. It is possible that the markedly different RR estimates for cholesterol and diabetes among the SHS Native Americans may have adversely affected the ability of the Framingham prediction function to discriminate CHD risk.
In our model calibration analyses, we found reasonably good agreement between predicted and actual CHD event rates for all of the non-Framingham cohorts studied (Figure 1 and Figure 2, and Table 5) except for HHP Japanese American men, PR Hispanic men, and SHS Native American women. In these groups, the Framingham prediction functions systematically overestimated CHD risk (Figure 2A, B, and D). This overestimation was corrected by using a process of recalibration. In order to apply this to other such populations, it would be necessary to obtain cross-sectional data on risk factor prevalences as well as population data on CHD event rates over time.
Authors of treatment guidelines have recognized the need to have an accurate and reliable multivariable-based estimate of absolute CHD risk in order to best identify those most in need of aggressive preventive treatment.7- 9 Thus, the recently released NCEP-ATP III guidelines specifically recommend that the level of treatment should relate to the level of CHD risk.9 They specifically incorporate Framingham prediction functions to aid clinicians and patients in determining optimal strategies. Simple charts can be used to aid in this activity.5,9
In order for multivariable risk assessment and treatment guidelines to have optimal use and acceptability, clinicians need to be confident that absolute risk prediction functions can be transported to other settings beyond where they were originally developed. We have demonstrated that the FHS prediction functions work reasonably well among white and black men and women. When applied to Japanese American and Hispanic men and Native American women, recalibration was needed. Future work is needed to devise practical schemes by which clinicians can confidently apply the FHS prediction functions in these groups.
Corresponding Author and Reprints: Ralph B. D'Agostino, Sr, PhD, Department of Mathematics and Statistics, Boston University, 111 Cummington St, Boston, MA 02215 (e-mail: firstname.lastname@example.org).
Author Contributions:Study concept and design: D'Agostino, Grundy, Wilson.
Acquisition of data: D'Agostino, Grundy, Wilson.
Analysis and interpretation of data: D'Agostino, Grundy, Sullivan, Wilson.
Drafting of the manuscript: D'Agostino, Grundy.
Critical revision of the manuscript for important intellectual content: D'Agostino, Grundy, Sullivan, Wilson.
Statistical expertise: D'Agostino, Sullivan, Wilson.
Obtained funding: D'Agostino, Grundy.
Administrative, technical, or material support: D'Agostino, Grundy, Sullivan.
Study supervision: D'Agostino, Wilson.
Members of the CHD Risk Prediction Group:Boston University and the Framingham Study: Philip A. Wolf, MD, Daniel Levy, MD, Joseph Massaro, PhD, Byung-Ho Nam, PhD; CHD Risk Prediction Planning Committee (NIH): National Heart, Lung, and Blood Institute: James L. Cleeman, MD, Jeffrey A. Cutler, MD, Lawrence Friedman, MD, Edward Rocella, MD; CHD Risk Prediction Planning Committee (non-NIH): Scott M. Grundy, MD (chair), Ralph B. D'Agostino, Sr, PhD, Gregory Burke, MD, Lori Mosca, MD, Daniel Rader, MD, Peter W. F. Wilson, MD; Atherosclerosis Risk in Communities Study: Lloyd E. Chambless, PhD, David J. Couper, PhD; Physicians' Health Study: Meir J. Stampfer, MD, Jing Ma, PhD; Honolulu Heart Program: David Curb, MD, B. Rodrigues, MD, Robert Abbott, PhD; Puerto Rico Heart Health Program: Mario R. Gacia-Palmieri, MD, Paul Sorlie, PhD, Sean Coady, PhD; Strong Heart Study: Elisa Lee, PhD, Barbara Howard, MD; Cardiovascular Health Study: Richard Kronmal, PhD, Thomas Lumney, PhD.
Funding/Support: The Framingham Study work and analyses for the workshop carried out at Boston University were supported by a contract (N01-HC-380380) and funds from the National Heart, Lung, and Blood Institute of the National Institutes of Health.
Acknowledgment: We especially thank Claude Lenfant, MD, Director of the National Heart, Lung, and Blood Institute, for his support and organization of the Workshop on CHD Risk Assessment, January 1999, and to the cardiovascular studies (Atherosclerosis Risk in Community Study, Physicians' Health Study, Honolulu Health Project, Puerto Rico Heart Health Program, Strong Heart Study, and the Cardiovascular Health Study) who supported the workshop and this investigation by contributing their data and performing numerous analyses.