D'Agostino, Sr RB, Grundy S, Sullivan LM, Wilson P, for the CHD Risk Prediction Group . Validation of the Framingham Coronary Heart Disease Prediction ScoresResults of a Multiple Ethnic Groups Investigation. JAMA. 2001;286(2):180-187. doi:10.1001/jama.286.2.180
Author Affiliations: Departments of Mathematics and Statistics (Drs D'Agostino and Sullivan) and Medicine (Dr Wilson), Boston University, Boston, Mass; the Framingham Study, Framingham, Mass (Drs D'Agostino, Sullivan, and Wilson); and Center for Human Nutrition, University of Texas Southwestern Medical Center, Dallas (Dr Grundy).
Context The Framingham Heart Study produced sex-specific coronary heart disease
(CHD) prediction functions for assessing risk of developing incident CHD in
a white middle-class population. Concern exists regarding whether these functions
can be generalized to other populations.
Objective To test the validity and transportability of the Framingham CHD prediction
functions per a National Heart, Lung, and Blood Institute workshop organized
for this purpose.
Design, Setting, and Subjects Sex-specific CHD functions were derived from Framingham data for prediction
of coronary death and myocardial infarction. These functions were applied
to 6 prospectively studied, ethnically diverse cohorts (n = 23 424),
including whites, blacks, Native Americans, Japanese American men, and Hispanic
men: the Atherosclerosis Risk in Communities Study (1987-1988), Physicians'
Health Study (1982), Honolulu Heart Program (1980-1982), Puerto Rico Heart
Health Program (1965-1968), Strong Heart Study (1989-1991), and Cardiovascular
Health Study (1989-1990).
Main Outcome Measures The performance, or ability to accurately predict CHD risk, of the Framingham
functions compared with the performance of risk functions developed specifically
from the individual cohorts' data. Comparisons included evaluation of the
equality of relative risks for standard CHD risk factors, discrimination,
Results For white men and women and for black men and women the Framingham functions
performed reasonably well for prediction of CHD events within 5 years of follow-up.
Among Japanese American and Hispanic men and Native American women, the Framingham
functions systematically overestimated the risk of 5-year CHD events. After
recalibration, taking into account different prevalences of risk factors and
underlying rates of developing CHD, the Framingham functions worked well in
Conclusions The sex-specific Framingham CHD prediction functions perform well among
whites and blacks in different settings and can be applied to other ethnic
groups after recalibration for differing prevalences of risk factors and underlying
rates of CHD events.
The Framingham Heart Study has developed mathematical functions for
predicting risk of clinical coronary heart disease (CHD) events.1- 5
These are derived multivariable mathematical functions that assign weights
to major CHD risk factors such as sex, age, blood pressure, total cholesterol
(TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein
cholesterol (HDL-C), smoking behavior, and diabetes status. For a person free
of cardiovascular disease, his/her CHD risk factors are entered into the function
to produce a probability estimate of developing CHD within a certain time
period (eg, the next 5 years). Recently, Framingham investigators developed
a simplified model that incorporates blood pressure and cholesterol categories
proposed by the Fifth Joint National Committee on Hypertension (JNC-V) and
the National Cholesterol Education Program, Adult Treatment Panel II (NCEP-ATP
The Framingham functions were developed to assess the relative importance
of CHD risk factors and to quantify the absolute level of CHD risk for individual
patients. The report of the third adult treatment panel (NCEP-ATP III) endorses
knowledge of absolute CHD risk as a means of identifying those patients who
are likely to benefit from aggressive primary prevention strategies and as
a tool motivating patients to comply with them.9
The Framingham Heart Study consists of white middle-class individuals.
Concern exists as to the generalizability of its CHD risk function to populations
such as other whites, blacks, Asian Americans, Hispanics, and Native Americans.
In January 1999 the National Heart, Lung, and Blood Institute convened a CHD
Prediction Workshop to evaluate the performance of Framingham functions in
Sex-specific Framingham CHD risk functions were derived from 2439 men
and 2812 women, 30 to 74 years of age, who were free of cardiovascular disease
(CVD) at the time of their Framingham Heart Study examinations in 1971 to
1974. Participants attended either the 11th examination of the original Framingham
Cohort4,5 or the initial examination
of the Framingham Offspring Study.11 Coronary
heart disease risk factors were routinely and systematically evaluated during
these examinations as described in detail elsewhere.5
Twelve-year follow-up was obtained for the development of "hard" CHD events,
defined as coronary death or myocardial infarction. Sex-specific Cox proportional
hazards regression functions were computed that relate JNC-V blood pressure
and NCEP-ATP II cholesterol categories, along with age, current smoking, and
presence of diabetes to the occurrence of hard CHD events.
Six non-Framingham cohorts were identified for evaluation.12- 17
Criteria for selection were similar age range, systematic measurement of CHD
risk factors, and adequate length of follow-up for hard CHD events. The selected
cohorts were participants in the Atherosclerosis Risk in Communities Study
(ARIC, 1987-1988), the Physicians' Health Study (PHS, 1982), the Honolulu
Heart Program (HHP, 1980-1982), the Puerto Rico Heart Health Program (PR,
1965-1968), the Strong Heart Study (SHS, 1989-1991), and the Cardiovascular
Health Study (CHS, 1989-1990). The PHS is a prospective, nested case-control
study with 1-to-4 matching of cases to controls for age and smoking.
For each cohort, sex-specific Cox regression functions were derived
using the same variables as in the Framingham functions but using data from
the individual cohorts. We call these the cohorts' "own" functions. They represent
the best possible Cox prediction functions for each cohort based on specific
prevalences of risk factors and CHD event rates. For the ARIC study, which
includes white and black subjects, the Cox regression functions were sex-
and race-specific. The CHS cohort included subjects 65 to 88 years old. We
used only CHS subjects aged 65 to 74 years for the CHS' own functions.
All analyses were sex- and race-specific. The performance of the Framingham
prediction functions among the non-Framingham cohorts was assessed according
to 3 evaluations: equality of regression coefficients (relative risk [RR]
comparison), discrimination, and calibration.
For each risk factor, Cox proportional hazards modeling yielded regression
coefficients for the Framingham and non-Framingham cohorts. To compare these
coefficients a test statistic z was calculated, where z = (b[F] − b[O])/SE, and where b(F) and b(O) are,
respectively, the regression coefficients of the Framingham and the other
cohort's model, while SE is the standard error of the difference in the coefficients.
This was computed as the square root of the sum of the squares of the SEs
for the 2 coefficients. Because the RR of a variable is computed by exponentiating
its regression coefficient, the z statistic tests
the equality of RRs between Framingham and non-Framingham cohorts.
Discrimination is the ability of a prediction model to separate those
who experience hard CHD events from those who do not. We quantified this by
calculating the c statistic, analogous to the area
under a receiver operating characteristic (ROC) curve18- 20;
this value represents an estimate of the probability that a model assigns
a higher risk to those who develop CHD within a 5-year follow-up than to those
who do not.18,19 For each non-Framingham
cohort 2 c statistics were computed, one applying
the Framingham function to the cohort and the other from the cohort's own
prediction function. These were compared using a test developed by Nam.20
Calibration measures how closely predicted outcomes agree with actual
outcomes. For this we used a version of the Hosmer-Lemeshow χ2
statistic.19,20 For each non-Framingham
cohort, the Framingham function's predicted risks were used to divide subjects
into deciles of predicted risk for experiencing a hard CHD event within 5
years. Plots were constructed showing predicted and actual event rates for
each decile. A χ2 statistic was calculated to compare the differences
between predicted and actual event rates; small values indicated good calibration.
Values exceeding 20 indicate significant lack of calibration (P<.01). For further evaluation of calibration, we compared this χ2 statistic with one derived from each cohort's own prediction function.
All statistical analyses were performed in SAS version 6.12 (SAS Institute,
Cary, NC). Because the PHS is a case-control study it is not suitable for
When there is a systematic overestimation or underestimation of risk,
transporting a prediction function from one setting to another requires a
process known as recalibration. The Framingham Cox regression models have
the form S0(t)exp(f[x,M]) where f(x,M) = b1(x1− M1) + . . . + bp(xp−
Mp). Here b1, . . . ,bp are the regression
coefficients (logs of the RRs), x1, . . . ,xp represent
an individual's risk factors, and M1, . . . ,Mp are
the means of the risk factors of the Framingham cohort. S0(t) is
the Framingham average incidence rate at time t or,
more precisely, the survival rate at the mean values of the risk factors.
With recalibration, the Framingham mean values of the risk factors (M1, . . . ,Mp) are replaced by the mean values of the risk
factors from a non-Framingham cohort, while the Framingham average incidence
rate S0(t) is replaced by the cohort's own average incidence rate.
We used Kaplan-Meier estimates to determine average incidence rates.20,21 It is important to note that recalibration
does not affect RR comparisons or discrimination evaluations.
The Framingham Study cohort consisted of 2439 men and 2812 women free
of cardiovascular disease. The 5- and 10-year hard CHD event rates were 3.7%
and 8.0% for men and 1.4% and 2.8% for women.
Table 1 shows the Cox regression
coefficients for the sex-specific Framingham regression models. Table 2 contains the racial compositions, sample sizes, age ranges,
mean ages, risk factor distributions, and 5-year incidence rates for the Framingham
and non-Framingham cohorts.
Table 3 and Table 4 contain the RRs of the CHD risk factors for each cohort's
sex- and race-specific regression model. First, we considered each function
separately. Among men most risk factors had statistically significant coefficients,
whereas among women a number of coefficients were not statistically significant,
presumably because of low event rates (eg, for ARIC black women there were
only 38 events among 2333 subjects). Nonetheless, within risk factor categories,
trends were significant. For example, except for the SHS Native Americans,
for all cohorts the risk for hard CHD events increased as blood pressure went
from optimal to stage II-IV hypertension, as TC increased, and as HDL-C decreased
(P<.01 for all). In the SHS there were some unexpected
but not statistically significant elevated risks in high HDL-C groups.
Among men, there were no significant differences between Framingham
RRs and those of ARIC white and black men. There were differences in RRs for
smoking and age in the PHS cohort, which may reflect the matching scheme that
was used in that study. Among HHP Japanese American men and PR Hispanic men,
RRs were lower for optimal blood pressure. Smoking was associated with a lower
RR among HHP Japanese American men, while diabetes and TC 280 mg/dL (7.25
mmol/L) or higher were associated with much higher RRs among SHS Native American
men. Also, in this cohort HDL-C in the range of 50 to 59 mg/dL (1.30-1.53
mmol/L) had an unexpected elevated risk and stage I hypertension had an unexpected
low risk, both resulting in significant differences from the Framingham function.
Among the more elderly CHS white men, cholesterol abnormalities were associated
with a lower RR.
Among women, there were no differences between Framingham RRs and those
of white ARIC or CHS women. Black women in the ARIC cohort had higher RRs
for high normal blood pressure and stage II-IV hypertension. In the SHS there
were significant differences for diabetes and smoking. Also, HDL-C greater
than or equal to 60 mg/dL (1.55 mmol/L) carried an elevated risk in the SHS,
resulting in a significantly different RR than that of the Framingham function.
Table 5 contains the c statistics for both men and women. The "FHS" row refers
to the discrimination achieved by applying the Framingham prediction functions
to the non-Framingham cohorts, while the "Best Cox" row contains the c statistics resulting from the cohort's own Cox regression
function. Since they are based on the same cohorts for which the scores were
developed, the latter c statistics are overestimates.
For non-Framingham white men (ARIC, PHS, and CHS) and women (ARIC and CHS),
the Framingham functions discriminated well, almost always achieving the same
discrimination as best Cox functions. Overall, within sampling fluctuations
the Framingham functions discriminated nearly as well as the best Cox functions
of the non-Framingham cohorts, with the exception of the SHS Native Americans.
Table 5 also contains the χ2 statistics for evaluation of the calibration of the Framingham prediction
functions applied to non-Framingham cohorts. For white men and women, including
the more elderly subjects in the CHS cohort, both the Framingham functions
("Unadjusted" row) and the individual cohort's own functions ("Best Cox" row)
showed a statistically acceptable calibration. Figure 1 contains calibration plots for white and black men and
women from the ARIC study. In general, actual CHD event rates were similar
to event rates predicted by Framingham functions among white and black men
For the HHP Japanese American men and the PR Hispanic men, the calibration χ2 statistics of 66.0 and 142.0, respectively, indicate poor calibration
(Table 5, "Unadjusted" row and Figure 2A and B, "Unadjusted" panels). The
Framingham prediction function systematically overestimated risk in both cohorts,
in which the overall CHD event rates were substantially lower. Model recalibration
using the non-Framingham cohorts' mean values for risk factors and CHD incidence
rates substantially improved the performance of the Framingham prediction
functions (Figure 2, "Adjusted"
panels, and Table 5). In the SHS,
calibration was good for men (χ2 = 10.6), but less good for
women (χ2 = 22.7). Recalibration resulted in improved performance
of the Framingham functions (Figure 2
and Table 5).
The Framingham CHD prediction functions were developed to help clinicians
estimate the absolute risk of any individual patient developing clinically
manifest disease. We sought to demonstrate the external validity of the Framingham
functions by examining their performance in 6 different well-described population-based
cohorts that reflect a wide range of ethnic diversity.
As shown in Table 3 and Table 4, RRs for major CHD risk factors
were remarkably similar to those derived from the Framingham Heart Study cohort
among white men and women and black men in the ARIC cohort. Among black women
in the ARIC cohort, RRs for elevated blood pressure were somewhat higher.
In the cohorts that were made up of other ethnic groups, however, we did note
some differences in RRs. For example, smoking was associated with a much lower
RR in HHP Japanese American men for reasons that are not clear. In the CHS
cohort, cholesterol abnormalities and smoking had lower RRs, possibly due
to age interactions. In the SHS Native American cohort, there were RR differences
for cholesterol abnormalities and diabetes. Some cholesterol differences are
unexplained, with high HDL-C levels carrying an increased risk in the SHS
cohort. Because the prevalence of diabetes among Native Americans is quite
high, it is possible that the different RRs we observed may be due to interactions
with other risk factors and with other factors unique to diabetes, such as
albuminuria, that were not considered in our analyses.
The ability of the Framingham prediction functions to discriminate between
subjects who developed clinical CHD and those who did not was reasonably good
for most of the non-Framingham cohorts (Table 5). Among ARIC black women, the Framingham c statistic was numerically, but not significantly, lower than that
derived from the model based on that same cohort's data. Since this ARIC c statistic is based on the same data with which its function
was developed, it is an overestimate and the difference may relate to this.
It may also be due to the small number of CHD events. The Framingham c statistic for CHS men was also numerically, but not significantly,
lower than the CHS' own function c statistic. The
difference may relate to the overestimate of the CHS cohort's own function c statistic. The reasons why both the Framingham and CHS
cohort's own function c statistics are low may be
a consequence of the relatively small sample size and the narrow age distribution.
The c statistics were appreciably decreased for SHS
Native Americans. Why discrimination was worse for Native Americans compared
with that for white and black men and women of the ARIC cohort, PR Hispanic
men, and HHP Japanese American men is not clear. It is possible that the markedly
different RR estimates for cholesterol and diabetes among the SHS Native Americans
may have adversely affected the ability of the Framingham prediction function
to discriminate CHD risk.
In our model calibration analyses, we found reasonably good agreement
between predicted and actual CHD event rates for all of the non-Framingham
cohorts studied (Figure 1 and Figure 2, and Table 5) except for HHP Japanese American men, PR Hispanic men,
and SHS Native American women. In these groups, the Framingham prediction
functions systematically overestimated CHD risk (Figure 2A, B, and D). This overestimation was corrected by using
a process of recalibration. In order to apply this to other such populations,
it would be necessary to obtain cross-sectional data on risk factor prevalences
as well as population data on CHD event rates over time.
Authors of treatment guidelines have recognized the need to have an
accurate and reliable multivariable-based estimate of absolute CHD risk in
order to best identify those most in need of aggressive preventive treatment.7- 9 Thus, the recently released
NCEP-ATP III guidelines specifically recommend that the level of treatment
should relate to the level of CHD risk.9 They
specifically incorporate Framingham prediction functions to aid clinicians
and patients in determining optimal strategies. Simple charts can be used
to aid in this activity.5,9
In order for multivariable risk assessment and treatment guidelines
to have optimal use and acceptability, clinicians need to be confident that
absolute risk prediction functions can be transported to other settings beyond
where they were originally developed. We have demonstrated that the FHS prediction
functions work reasonably well among white and black men and women. When applied
to Japanese American and Hispanic men and Native American women, recalibration
was needed. Future work is needed to devise practical schemes by which clinicians
can confidently apply the FHS prediction functions in these groups.