The CCF cohort included 24 292 women and 35 585 men. Data points indicate median; error bars, 95% CI. Duke Treadmill Scores are stratified as low, intermediate, and high risk. The sex-specific risk scores are stratified into tertiles.
In the FIT cohort (23 386 women, 25 892 men), the sex-specific risk score was slightly altered to exclude abnormal heart rate recovery and to define end-stage renal disease as a glomerular filtration rate of less than 15 mL/min/1.73 m2.
eMethods. Model Calibration and Creation of Risk Scores
eTable 1. Baseline Clinical and Exercise Data for the FIT Cohort
eTable 2. Univariable Associations With All-Cause Mortality
eTable 3. Multivariable Cox Proportional Hazards Models for Prediction of Mortality in Females and Males
eTable 4. Estimated 10-Year Mortality for Women in the CCF Cohort (n = 24 292) According to Sex-Specific Risk Scores
eTable 5. Estimated 10-Year Mortality for Men in the CCF Cohort (n = 35 585) According to Sex-Specific Risk Scores
eFigure 1. Predicted Survival at 10 Years in Men and Women in the Derivation Cohort
eFigure 2. ROC Curves in Men and Women in the CCF Validation Cohort
Customize your JAMA Network experience by selecting one or more topics from the list below.
Cremer PC, Wu Y, Ahmed HM, et al. Use of Sex-Specific Clinical and Exercise Risk Scores to Identify Patients at Increased Risk for All-Cause Mortality. JAMA Cardiol. 2017;2(1):15–22. doi:10.1001/jamacardio.2016.3720
Copyright 2017 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.
Do sex-specific risk scores better estimate all-cause mortality for patients undergoing exercise treadmill testing?
In a retrospective cohort study of nearly 110 000 patients, sex-specific risk scores better estimated mortality. Exercise capacity had the greatest effect on prognosis in both sexes, and all risk factors had a differential effect on prognosis in women compared with men.
Risk stratification is improved with sex-specific risk scores, and in particular, patients at the highest risk are more readily identified.
Risk assessment tools for exercise treadmill testing may have limited external validity. Cardiovascular mortality has decreased in recent decades, and women have been underrepresented in prior cohorts.
To determine whether exercise and clinical variables are associated with differential mortality outcomes in men and women and to assess whether sex-specific risk scores better estimate all-cause mortality.
Design, Setting, and Participants
This retrospective cohort study included 59 877 patients seen at the Cleveland Clinic Foundation (CCF cohort) from January 1, 2000, through December 31, 2010, and 49 278 patients seen at the Henry Ford Hospital (FIT cohort) from January 1, 1991, through December 31, 2009. All patients were 18 years or older and underwent exercise treadmill testing. Data were analyzed from January 1, 2000, to October 27, 2011, in the CCF cohort and from January 1, 1991, to April 1, 2013, in the FIT cohort.
Main Outcomes and Measurements
The CCF cohort was divided randomly into derivation and validation samples, and separate risk scores were developed for men and women. Net reclassification, C statistics, and integrated discrimination improvement were used to compare the sex-specific risk scores with other tools that have all-cause mortality as the outcome. Discrimination and calibration were also evaluated with these sex-specific risk scores in the FIT cohort.
The CCF cohort included 59 877 patients (59.4% men; 40.5% women) with a median (interquartile range [IQR]) age of 54 (45-63) years and 2521 deaths (4.2%) during a median follow-up of 7 (IQR, 4.1-9.6) years. The FIT cohort included 49 278 patients (52.5% men; 47.4% women) with a median (IQR) age of 54 (46-64) years and 6643 deaths (13.5%) during a median (IQR) follow-up of 10.2 (7-13.4) years. C statistics for the sex-specific risk scores in the CCF validation sample were higher (0.79 in women and 0.81 in men) than C statistics using other tools in women (0.70 for Duke Treadmill Score; 0.74 for Lauer nomogram) and men (0.72 for Duke Treadmill Score; 0.75 for Lauer nomogram). Net reclassification and integrated discrimination improvement were superior with the sex-specific risk scores, mostly owing to correct reclassification of events. The sex-specific risk scores in the FIT cohort demonstrated similar discrimination (C statistic, 0.78 for women and 0.79 for men), and calibration was reasonable.
Conclusions and Relevance
Sex-specific risk scores better estimate mortality in patients undergoing exercise treadmill testing. In particular, these sex-specific risk scores help to identify patients at the highest residual risk in the present era.
Exercise testing is recommended to assess prognosis in patients with known or suspected coronary artery disease (CAD) who present with symptoms suggestive of worsening ischemic heart disease.1 For nearly 3 decades, the Duke Treadmill Score (DTS) has been the standard to assess prognosis in these patients.2 However, during the past few decades, advances in therapy have dramatically reduced cardiovascular mortality,3 and the validity of this score in a contemporary population is unclear. The DTS was also developed in a predominantly male population, and a paucity of data regarding risk stratification with this score exists in women.4 Finally, the DTS incorporates only exercise duration, ST-segment depression, and exercise-induced chest pain.2 Other exercise variables are associated with prognosis,5-7 and many of these patients have cardiovascular comorbidities that also affect their overall risk.
To address this latter concern, a nomogram was created by Lauer and colleagues8 that is superior to the DTS at predicting all-cause mortality. The broader clinical use of this nomogram is unclear and in part has been limited by exclusion of certain patient populations, including those with known CAD, valvular heart disease, heart failure, and end-stage renal disease (ESRD). Furthermore, sex-related differences in the prognostic impact of exercise test variables and cardiovascular comorbidities should be accounted for.9,10 We therefore aimed to develop comprehensive sex-specific risk scores to estimate all-cause mortality in a more inclusive and contemporary population. We then determined whether these sex-specific risk scores better estimated mortality when compared with the DTS and Lauer nomogram. Finally, we validated these sex-specific risk scores by assessing discrimination and calibration in an external cohort.
From a cohort of 60 895 consecutive patients undergoing symptom-limited treadmill testing at the Cleveland Clinic Foundation (CCF) from January 1, 2000, to December 31, 2010, 1018 were excluded owing to atrial fibrillation, a resting electrocardiogram that precluded interpretation of the ST segment, digoxin use, being younger than 18 years, or having no Social Security number available. The final CCF cohort included 59 877 patients.
At the time of stress testing, patient demographic characteristics, comorbidities, and medications were prospectively entered into a stress database. Known CAD was defined as a previous myocardial infarction, previous percutaneous coronary intervention, or a history of coronary artery bypass grafting. Heart failure was defined by self-reported history and review of the medical record. Hypertension was defined as self-reported history or use of antihypertensives. Hyperlipidemia was defined as an abnormal fasting lipid panel according to Adult Treatment Panel III guidelines, self-reported history, or use of medications to lower lipid levels. Diabetes was defined as a fasting blood glucose level of at least 126 mg/dL (to convert to millimoles per liter, multiply by 0.0555), self-reported history, or use of medication to lower glucose levels. Patients who were actively smoking cigarettes or who had smoked within the past year were considered current smokers, and patients who had smoked less recently were considered to have a history of smoking. We defined ESRD as receiving dialysis.
Patients underwent symptom-limited treadmill testing using a protocol based on a pretest estimation of exercise capacity and designed to have the patient reach maximal exertion within 8 to 12 minutes, as suggested by exercise testing guidelines.1 Standard exercise protocols were used, and most patients performed a Bruce protocol (61%). Other protocols included Cornell, Naughton, modified Naughton, and modified Bruce.1 Heart rate targets were not used as an end point or to judge the adequacy of the test. The ST segment was measured 80 milliseconds after the J point, and the magnitude of ST depression was recorded as the greatest horizontal or down-sloping ST-segment depression in any lead except aVR during the test or in recovery.
Blood pressure was measured during every stage of the test. Heart rate was recorded from an electrocardiogram printed every minute during the test. Peak estimated metabolic equivalents of task (METs) were calculated from treadmill speed and grade at peak exercise. Chest discomfort during the test was recorded as none, nonlimiting chest pain, or test-limiting chest pain. Rate-pressure product (RPP) was calculated as the product of heart rate and systolic blood pressure. A ΔRPP was calculated as RPP at peak exercise minus RPP at rest. Heart rate recovery (HRR) was calculated as peak exercise heart rate minus heart rate at 1 minute after exercise.
Patients were given a standard walking recovery for tests involving electrocardiography only, technetium imaging, or metabolic stress testing. For patients undergoing stress echocardiography, a supine recovery immediately after exercise was used. Therefore, HRR at 1 minute was classified as abnormal if 12 or fewer beats/min for patients undergoing upright recovery and abnormal if 18 or fewer beats/min in patients undergoing stress echocardiography.5,11,12 Chronotropic reserve index was calculated as (Peak heart rate – resting heart rate)/[(220 – age) – resting heart rate] and was considered abnormal if no greater than 0.8 for patients not taking a β-blocker and abnormal if at least 0.62 for patients taking a β-blocker. In patients who did not undergo a Bruce protocol, the estimated METs achieved by each patient were converted to minutes per the Bruce protocol before calculation of the DTS. The DTS was calculated as Exercise time – (5 × maximum ST-segment depression) – (4 × treadmill chest pain index). Treadmill chest pain was scored from 0 to 2, with 0 representing no chest pain; 1, nonlimiting chest pain; and 2, chest pain for which the exercise test was terminated.2
The Henry Ford Exercise Testing (FIT) cohort is from a registry of 69 885 consecutive patients who had physician-referred exercise treadmill tests at the Henry Ford Health System from January 1, 1991, through December 31, 2009. Methodologic details have been reported previously.13 In brief, patients older than 18 years who underwent exercise treadmill tests were included. All testing used the standard Bruce protocol. Exercise test, medical history, and medication data were collected at the time of testing, and supporting clinical data were derived from the electronic medical record and administrative databases. For external validation purposes, the 3880 patients without recorded weight and the 16 727 without glomerular filtration rate data were excluded. A final sample size of 49 278 patients was included for external validation.
All data in both cohorts were deidentified. The institutional review boards at CCF, Henry Ford Health System, and Johns Hopkins Hospital approved this study with an exemption for individual patient consent.
The primary outcome was all-cause mortality and was determined from the Social Security Death Index Master File. Previous work14 demonstrated that more than 95% of the time the Social Security Death Index correctly identifies patients who have died. The final censoring date was October 27, 2011, in the CCF cohort, and April 1, 2013, in the FIT cohort.
Data were analyzed from January 1, 2000, to October 27, 2011, in the CCF cohort and from January 1, 1991, to April 1, 2013, in the FIT cohort. For men and women, the CCF sample was divided randomly, with 50% of patients in the derivation cohort and 50% in the validation cohort. All data analysis to develop risk scores was performed in the derivation cohorts. Data are summarized as median and interquartile range (IQR) for continuous data and number (percentage) of nonmissing data for categorical variables. Comparisons across age categories and survival status used 2-tailed unpaired t tests for continuous variables and χ2 tests for categorical variables. Cox proportional hazards regression models were used to create separate multivariable models for men and women to determine independent risk factors for all-cause mortality.
All variables that were significantly associated with all-cause mortality on univariable analysis (P < .05) were considered for multivariable adjustment. Bootstrapping methods were used to identify variables for inclusion in the final models. Two hundred bootstrapped models were generated for men and women; variables that were entered into the models at least 50% of the time were then entered into a backward stepwise selection modeling process to create separate Cox models for men and women. These Cox models were validated for calibration accuracy to estimate overall survival (eMethods in the Supplement).
The Cox models from the derivation cohort were then used to develop sex-specific risk scores for estimating mortality. To assign value to each variable in creating a risk score, categories were created for the continuous variables in the model. Creation of these categories was based on the distribution of each variable. Linearity was tested with restricted cubic splines. Continuous variables were divided into quartiles with the exception of age for women and weight for men. The β coefficients across quartiles were similar for these variables; thus, age for women was divided into older than 65 years or 65 years or younger, and weight for men was divided into more than 80 kg or 80 kg or less. The β coefficients of each covariate in these categories were then used to assign points for every risk factor (eMethods in the Supplement). The points were then added together to obtain a total score. Overall, 7% of data were missing. To reduce bias in estimates and uncertainty related to the imputation model, multiple imputation of missing variables was performed with a regression-based method. In the CCF validation cohorts, discrimination was assessed with receiver operating characteristic curves and Harrell C statistics, category-free net reclassification improvement, and integrated discrimination improvement.15
The FIT cohort served as an external validation cohort. The risk scores were modified because certain data were not available in both cohorts. Heart rate recovery was not available in the FIT database and was excluded from the models using this cohort. A history of smoking was not available in the FIT cohort and was replaced with current smoking. Finally, ESRD was not categorized in the FIT cohort and was replaced by glomerular filtration rate of less than 15 mL/min/1.73 m2. Discrimination with these modifications to the risk scores was assessed with receiver operating characteristic curves and Harrell C statistics. Calibration was assessed by dividing the risk scores into deciles for men and women and plotting observed vs predicted mortality.
All analyses were performed using SAS (version 9.2; SAS Institute Inc), R (R CoreTeam 2015 [http://www.R-project.org/]), and STATA (version 14.0; StataCorp) statistical software. Two-tailed P < .05 was considered statistically significant.
Owing to known differences in the prevalence and impact of comorbidities and exercise variables between men and women,10 the 59 877 patients in the CCF cohort were divided by sex (59.4% men; 40.5% women) for all analyses. Overall, the median age was 54 (IQR, 45-63) years, and 66.4% of the population was white. Cardiovascular comorbidities were more common in men, especially a history of CAD (24.2% vs 9.2%). Exercise capacity was generally preserved, and men had higher exercise capacity (10 [IQR, 8.3-11.5] vs 8 [IQR, 6.6-10] METs) and DTSs (8.5 [IQR, 5.5-10.2] vs 6.5 [IQR, 4-8.2]) compared with women (Table 1). In the FIT cohort (52.5% men; 47.4% women), the median (IQR) age was also 54 (46-64) years, 63.9% were white, and cardiovascular comorbidities were common (eTable 1 in the Supplement).
In the CCF cohort, during a median follow-up of 7 (IQR, 4.1-9.6) years, 2521 deaths occurred (4.2% mortality), with 742 deaths in women (3.1% mortality) and 1779 deaths in men (5% mortality). In both sexes, death was associated with increased age, lower body weight, diabetes, hypertension, hyperlipidemia, current or former smoking, CAD, myocardial infarction, percutaneous coronary intervention, coronary artery bypass grafting, chronic obstructive pulmonary disease, stroke or transient ischemic attack, heart failure, ESRD, and peripheral arterial disease. In men, death was also associated with a lower body mass index and a family history of coronary disease. With regard to exercise variables, death was associated with ST-segment depression, lower maximal heart rate, lower maximal RPP and ΔRPP, lower peak METs, an abnormal HRR, and an abnormal chronotropic reserve index. A lower DTS was also associated with increased mortality, but no association was found between nonlimiting or limiting chest pain and mortality (eTable 2 in the Supplement).
In multivariable Cox models, lower peak estimated METs, abnormal HRR, increasing age, lower body weight, current or former smoking, and ESRD were all associated with mortality in men and women (eTable 3 in the Supplement). In addition, a history of diabetes was associated with mortality in women, whereas a history of heart failure and hypertension were associated with mortality in men. These Cox models showed good calibration at predicting mortality at 10 years (eFigure 1 in the Supplement). The β coefficients from these Cox proportional hazards regression models were then used to assign points for each covariate (Table 2). The final risk scores for women and men showed high discrimination in estimating mortality in the derivation cohorts (C statistic for women, 0.82; C statistic for men, 0.81).
Although the DTS was associated with mortality when assessed as a continuous variable, differentiation of risk was limited when assessed according to the typical DTS categories. In particular, few patients had high-risk DTSs that resulted in wide and overlapping 95% CIs. Similar results were obtained in men and women (Figure 1A and C). Of note, 78 women (0.3%) and 221 men (0.6%) had high-risk DTSs. Conversely, survival curves using the sex-specific risk scores effectively identified patients at highest risk for all-cause mortality (Figure 1B and D), as is also evident in the estimate of 10-year mortality risk according to the sex-specific risk scores (eTables 4 and 5 in the Supplement).
For the sex-specific risk scores, C statistics were similar in the CCF validation cohorts (0.79 for women and 0.81 for men). These sex-specific risk scores also performed better at estimating mortality when compared with the other models in women (C statistic for DTS, 0.70; C statistic for Lauer nomogram, 0.74) (eFigure 2A in the Supplement) and in men (C statistic for DTS, 0.72; C statistic for Lauer nomogram, 0.75) (eFigure 2B in the Supplement). Category-free net reclassification improvement and integrated discrimination improvement were also significantly improved with the sex-specific risk scores when compared with the DTS and the Lauer nomogram (Table 3). This improved discrimination primarily corresponded to the correct reclassification of patients who died.
Finally, sex-specific risk scores were calculated for patients in the FIT cohort to assess external validation. At a median follow-up of 10.2 years, there were 6643 deaths in a population of 49 278 (13.5%). The C statistics for the sex-specific risk scores were similar in women (0.78) and men (0.79). Regarding calibration, good tracking of observed vs predicted mortality was found (Figure 2).
The present study is, to our knowledge, the largest to date to develop sex-specific prognostic risk scores using treadmill testing data. We have demonstrated excellent discrimination and calibration for estimating mortality in the CCF derivation and validation cohorts. Moreover, in the CCF validation cohort, C statistics, net reclassification improvement, and integrated discrimination improvement with these new sex-specific risk scores were improved compared with the DTS and Lauer nomogram. Finally, discrimination and calibration were also reasonable when the sex-specific risk scores were tested externally in the FIT cohort.
A few important observations from our study should be highlighted. First, in patients undergoing treadmill testing, our data support separate risk scores according to sex. Certain variables are present or absent in the models for men vs women, and the hazard ratios for risk factors common to both models differ according to sex. Therefore, rather than simply adjusting for sex, we argue that a sex-specific approach should be considered when assessing the prognosis for patients who undergo exercise testing. In addition, although many variables are associated with mortality and refine risk stratification, decreased exercise capacity is the most important risk factor for men and women. Finally, as shown in our category-free net reclassification improvement, the major advantage of our risk scores is the identification of patients who are likely to have a fatal event. As cardiovascular mortality continues to decline,3 identification of patients with the highest residual risk is increasingly important.
The DTS remains the most common method to assess prognosis in patients with exercise testing, although this score was developed in higher-risk patients—predominantly middle-aged men—who all had chest pain and invasive coronary angiography.2 In a lower-risk and more diverse patient population, our study demonstrates that the prognostic value of the DTS is related solely to the importance of exercise capacity. Chest pain and ST depression with exertion were not associated with mortality in multivariable models. The Lauer nomogram improved on the DTS by incorporating other exercise variables and comorbidities.8 However, notable patient populations were excluded. Our objective was to create more comprehensive risk scores with a more inclusive patient population. This approach facilitates a broader clinical use for our sex-specific risk scores.
Our study has several notable limitations. First, we assessed all-cause instead of cardiac death, although all-cause mortality may be preferred because it is an unbiased end point. Second, imaging data were not included in our analysis and have been shown to have prognostic importance.16,17 However, the focus of this study was to develop risk scores based on clinical and exercise variables alone. Third, all patients in the study underwent evaluation at large referral centers, and generalizability to smaller hospitals may be limited. Finally, because of differences in data collection, the risk scores tested in the FIT cohort were similar, but not identical, to the scores used in the CCF cohort. Discrimination was good in both cohorts, but this observation should not lead to the exclusion of certain variables in risk assessment, especially because HRR has emerged as a risk factor in several studies.7,11,12,18 In fact, in a well-developed model, little change may occur in the C statistic when an additional variable is added, even if that variable improves risk stratification.19
In a large cohort of patients who underwent treadmill testing, we have demonstrated a differential effect of exercise variables and clinical risk factors on overall mortality according to sex. The sex-specific risk scores outperform previous risk stratification tools and help to identify patients at the highest risk for death. To facilitate clinical use of these sex-specific risk scores, we have developed an online calculator to estimate 10-year mortality (http://www.clevelandclinic.org/lp/hvi-tools/10YearMortality.html). Even when accounting for multiple comorbidities, exercise capacity was still the predominant risk factor in men and women. This online calculator can be used by physicians and patients to not only assess prognosis but also emphasize the importance of exercise, even in the presence of other cardiovascular risk factors.
Corresponding Author: Leslie Cho, MD, Heart and Vascular Institute, Cleveland Clinic, 9500 Euclid Ave, Ste JB-1, Cleveland, OH 44124 (firstname.lastname@example.org).
Accepted for Publication: August 11, 2016.
Published Online: October 26, 2016. doi:10.1001/jamacardio.2016.3720
Author Contributions: Drs Cremer and Cho had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Cremer, Ahmed, Pierson, Al-Mallah, Cho.
Acquisition, analysis, or interpretation of data: Cremer, Wu, Ahmed, Pierson, Brennan, Al-Mallah, Brawner, Ehrman, Keteyian, Blumenthal, Blaha.
Drafting of the manuscript: Cremer, Ahmed, Pierson, Keteyian, Cho.
Critical revision of the manuscript for important intellectual content: Cremer, Wu, Ahmed, Pierson, Brennan, Al-Mallah, Brawner, Ehrman, Blumenthal, Blaha.
Statistical analysis: Cremer, Wu, Ahmed, Brennan, Cho.
Administrative, technical, or material support: Al-Mallah, Keteyian, Blaha.
Study supervision: Al-Mallah, Blumenthal, Blaha.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Funding/Support: This study was supported by the Women’s Cardiovascular Center, Cleveland Clinic, and Karo’s Chair for Women’s Cardiovascular Research.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Create a personal account or sign in to: