Tree-based prediction rule for osteoporotic fracture for women with T scores of −2.5 to −1.0.
Miller PD, Barlas S, Brenneman SK, Abbott TA, Chen Y, Barrett-Connor E, Siris ES. An Approach to Identifying Osteopenic Women at Increased Short-term Risk of Fracture. Arch Intern Med. 2004;164(10):1113-1120. doi:10.1001/archinte.164.10.1113
Copyright 2004 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2004
Identification and management of women to reduce fractures is often limited to T scores less than −2.5, although many fractures occur with higher T scores. We developed a classification algorithm that identifies women with osteopenia (T scores of −2.5 to −1.0) who are at increased risk of fracture within 12 months of peripheral bone density testing.
A total of 57 421 postmenopausal white women with baseline peripheral T scores of −2.5 to −1.0 and 1-year information on new fractures were included. Thirty-two risk factors for fracture were entered into a classification and regression tree analysis to build an algorithm that best predicted future fracture events.
A total of 1130 women had new fractures in 1 year. Previous fracture, T score at a peripheral site of −1.8 or less, self-rated poor health status, and poor mobility were identified as the most important determinants of short-term fracture. Fifty-five percent of the women were identified as being at increased fracture risk. Women with previous fracture, regardless of T score, had a risk of 4.1%, followed by 2.2% in women with T scores of −1.8 or less or with poor health status, and 1.9% for women with poor mobility. The algorithm correctly classified 74% of the women who experienced a fracture.
This classification tool accurately identified postmenopausal women with peripheral T scores of −2.5 to −1.0 who are at increased risk of fracture within 12 months. It can be used in clinical practice to guide assessment and treatment decisions.
The association between bone mineral density (BMD) and fracture risk is continuous, with an approximate doubling of fracture risk for each standard deviation decline in BMD T score.1 Despite this continuous relationship, efforts at fracture risk reduction are often limited to women whose central BMD measurement has been classified as "osteoporotic," that is, T scores of −2.5 or less based on the 1994 World Health Organization (WHO)2 diagnostic classification. Postmenopausal women with a diagnosis of "osteopenia," that is, T scores between −2.5 and −1.0 based on the WHO classification, may also be at risk of fracture.3,4
In practice, the physician is often faced with making treatment decisions for women with T scores of −2.5 to −1.0 but is provided with little evidence-based guidance regarding who is at highest risk of fracture. Using a T score threshold of −2.5 for diagnostic and treatment decisions has limitations in effectively managing postmenopausal women with low BMD because more than 50% of fractures occur in women whose BMD levels are in the osteopenic range.1,5- 8 Therefore, effective strategies to reduce patients' risk of fracture must include identification and management of individuals who are osteopenic and at high risk of near-term and lifetime fractures.2 The ability to identify osteopenic women at greatest risk of fracture could maximize effective use of therapies and minimize treatment of women who are at lower fracture risk.
A few published guidelines9- 11 attempt to provide strategies incorporating BMD measures and additional risk factors for identification of women at increased risk of fracture with T scores greater than −2.5. However, the selection of risk factors for these guidelines does not indicate their relative importance, and the T score cutoff value for intervention is somewhat arbitrary. Other studies12- 21 have identified demographic and clinical factors, in addition to BMD, that play important roles as predictors of fracture in postmenopausal women primarily older than 65 years or have attempted to provide guidance for clarification of salient risk factors. The ability to use the results of the studies is hampered because more risk factors have been identified than would be practical to assess, and they do not specifically address fracture risk within the T score range of −2.5 to −1.0 or in women younger than 65 years, where guidance for clinical decision making is often needed.
To address these issues, we attempted to identify osteopenic women at high risk of fracture within 12 months of peripheral BMD testing. Although the WHO diagnostic criteria for osteoporosis and osteopenia are primarily based on central dual-energy x-ray absorptiometry (DXA) measurements, we use the terminology based on peripheral measurements in this article for purposes of convenience.22 The large cohort of postmenopausal women in the National Osteoporosis Risk Assessment (NORA) provides a unique opportunity to assess short-term fracture risk in women with a diagnosis of osteopenia; 39% of the NORA women had peripheral T scores of −2.5 to −1.0 and experienced approximately 50% of the osteoporotic fractures.23 Using a tree-based approach with noninvasive and easily ascertained data, we aimed to develop a simple algorithm that allows clinicians to classify women with a peripheral BMD-based diagnosis of osteopenia into varying levels of risk for appropriate management.
We used data from a longitudinal observational study of osteoporosis among postmenopausal women in the United States that began in 1997.3,24 In brief, postmenopausal women who were 50 years and older, without an osteoporosis diagnosis, and who had not had a BMD measurement within the preceding 12 months were eligible for participation. Women currently being treated with a bisphosphonate, calcitonin, or raloxifene hydrochloride were ineligible for participation, as were women who were participants in any other clinical trial related to osteoporosis. At baseline, each participant completed a common core questionnaire and 4 of 8 supplemental questionnaires. The supplemental questionnaires were randomly assigned to each participant. Measurement of BMD at 1 of 3 skeletal sites (heel, hand, or forearm) was performed at the physician's office. Approximately 12 months after enrollment, follow-up questionnaires inquired about fractures that had occurred since enrollment in NORA. All study protocols and consent documents were approved by a national institutional review board (Essex Institutional Review Board Inc, Lebanon, NJ).
This analysis is restricted to 57 421 white women with baseline peripheral BMD T scores of −2.5 to −1.0 who participated in the 1-year follow-up and for whom fracture incidence data were available. Women with T scores less than −2.5 were excluded from the analysis because treatments are considered essential in these women. Women with T scores greater than −1.0 were not included because these values are not considered clinically relevant for treatment decisions.
The risk factors used as potential predictors were compiled from the literature and self-reported in the baseline questionnaires, including the main survey and 5 supplements, as indicated in Table 1.
Each participant received BMD testing at one of the following peripheral sites: forearm using peripheral DXA (pDEXA; Norland Medical Systems Inc, White Plains, NY), finger using peripheral DXA(AccuDEXA; Schick Technologies Inc, Long Island City, NY), or heel using either single-energy x-ray absorptiometry (OsteoAnalyzer; Norland Medical Systems Inc) or ultrasound (Sahara; Hologic Inc, Bedford, Mass). Bone mineral density measures from the 4 devices were pooled in this analysis because they have been shown to predict fracture equally well.4
The outcome variable was incident osteoporotic fractures self-reported at 1-year follow-up. Osteoporotic fractures were defined as clinical fractures of the hip, wrist or forearm, rib, and vertebrae. These fractures have been shown to be associated with low bone mass.25- 28
Multivariate analysis was performed using classification trees introduced by Breiman et al29; CART (Classification and Regression Trees) 4.0 software (Salford Systems, San Diego, Calif) was used for the analysis. Briefly, trees provide a hierarchical classification process that is represented by a series of yes-no decision points similar to the way clinicians make prognostic and diagnostic decisions. The goal is to place each patient into a class in which the incidence of the outcome is either very high or very low.
To ensure high sensitivity of the prediction rule, the loss due to incorrectly classifying a fracture event was set higher than the loss due to misclassification of a patient without fracture. Assumption of equal previous probabilities of fracture was not appropriate owing to the low fracture rate (2%); therefore, this asymmetry was incorporated into the tree-building process. Terminal subgroups resulting from any given split were required to have at least 10 patients.
To obtain a set of reliable estimates of the independent predictive accuracy of the tree, we used 10-fold cross validation that split the data into approximately 10 parts. After the maximal tree was built on the entire sample, the sample was divided into 10 equal parts, each containing a similar distribution of the outcome variable. The first 9 parts of the data were used to construct the largest possible tree, and the remaining 1 part was used to obtain initial estimates. The process was repeated on another 9 of 10 data parts while using a different part as the test sample until each part of the data had been held in reserve 1 time as a test sample. The results of the 10 mini–test samples were then combined and applied to the tree based on the entire sample.
To assess the importance of variables that were not incorporated into the final tree, we examined the surrogate and competitor splits at each node of the tree. A surrogate split uses another predictor but results in similar classification of cases. Competitor splitters are variables that can be used instead of primary splitters, resulting in a tree with performance similar to the optimal tree in terms of error rates but possibly with less predictive accuracy.
We confirmed the accuracy of the classification tree by performing a logistic regression model and a hybrid of CART and a logistic regression model to eliminate the possibility of overrepresentation of simple structures in the data. In the logistic regression analysis, all risk factors were included as main effects in a forward stepwise fashion without higher-order terms or interactions. In the hybrid of CART and logistic regression analysis, the CART terminal nodes were entered as a single, categorical predictor (cartnode) into logistic regression along with all 32 predictors. All logistic regression analyses were performed using SAS version 6.12 (SAS Institute Inc, Cary, NC).
Of the 57 421 NORA women included in the present analysis, 1130 (2.0%) reported an incident clinical osteoporotic fracture within 12 months of BMD testing, including 196 hip fractures, 126 vertebral fractures, 319 rib fractures, and 535 wrist or forearm fractures. The participant age range was 50 to 99 years (mean [SD] age, 66.8 [8.8] years), and the mean (SD) T score was −1.62 (0.40). Almost 15% of participants reported a history of fracture before entering NORA, and almost 86% reported good to excellent health status. Table 1 gives the baseline characteristics of the study population used as predictors in the model, overall and according to 1-year fracture status.
As shown in order of importance in Figure 1, 4 of the 32 predictors were identified as the most important determinants for short-term fracture prediction. Previous fracture was the strongest predictor of fractures within 1 year.The BMD T score cutoff point that best differentiated women at increased risk of fracture among those without previous fracture was found to be −1.8. Self-reported fair or poor general health status and poor mobility were also associated with increased risk of short-term fractures. Previous fracture information was obtained from responses to the question, "Since the age of 45, have you broken any of the following bones: hip, rib, wrist, or spine (backbone)?" Self-reported health status was obtained from responses to the question, "In general, would you say your health is excellent, very good, good, fair, or poor?" Mobility was determined by the average response to 4 questions related to physical functioning in the 12-Item Short-Form Health Survey: (1) Does your health limit you in moderate activities, such as pushing a vacuum cleaner, bowling, or playing golf? (2)Does your health limit you in climbing several flights of stairs? (3) As a result of your physical health, have you accomplished less than you would like? (4) As a result of your physical health, are you limited in the kind of work you do or other activities?30 Poor mobility was defined as 2 or more positive responses to these 4 questions.
The algorithm identified 55% of the women as being at increased risk, with an overall 1-year risk of fracture of 2.4% (range, 1.9%-4.1%). Women with previous fracture, regardless of T score, had a 1-year fracture risk of 4.1%, followed by 2.2% in women without previous fracture with T scores of −1.8 or less or with poor health. Women with none of these conditions but with poor mobility had a fracture risk of 1.9%. Among 57 421 women in this analysis, 26 037 (45.3%) were identified as not being at increased risk, with 1.1% experiencing a fracture. The algorithm correctly classified 74.1% of the women in this cohort who experienced an incident fracture within 1 year.
As noted in Figure 1, knowledge of previous fracture alone identified 339 (30.0%) of 1130 fractures, suggesting that the simplest tree, containing only previous fracture, can be used to identify one third of the women who subsequently had a fracture. Combining previous fracture and BMD T score, 58.8% of women with new fractures (n = 665) occurring within 1 year can be identified. Table 2 summarizes the classification rules indicated by the decision tree in Figure 1 into 4 simple steps.
In general, the algorithm retained the ability to identify women at various ages who were at increased fracture risk (Table 3). In particular, women identified in age groups 50 to 59, 60 to 69, and 70 to 79 years had overall fracture risks of 2.6%, 2.2%, and 2.8%, respectively. Women 80 years and older identified by the algorithm had a greater risk of 3.9%. Likewise, the absolute risk for fracture within each age group according to individual predictors was similar to that of the overall cohort that was identified as being at increased risk.
No surrogate splits were found by CART at any of the 4 steps of the algorithm, indicating that none of the remaining variables would result in similar classification at the node. Evaluation of competitor splits, however, showed several risk factors that were important in their relation to fractures. Age (split at 71.5 years) and years since menopause (split at 3.5 years) were competitors at every split. Age, although not as strong as the primary splitters, was the next important predictor. In cases in which the primary predictor is not available or is difficult to obtain, the competitor variable, age, can be used for identification. The use of sedatives was a competitor for the T score and health status primary splits. Low body weight (<127 lb) was a competitor split for the health status and mobility primary splits. Inclusion of additional variables did not add substantial improvement to the performance of the tree beyond the 4 selected variables shown in Figure 1.
A logistic regression model entering all 32 risk factors in a forward stepwise selection resulted in selection of the 4 variables used in the CART classification tree as the first variables in the final parsimonious model. In the hybrid of CART and logistic regression analysis, CART terminal nodes (cartnode) was the first variable selected in a forward stepwise procedure. Additional variables selected were years since menopause and health status. Logistic regression and the hybrid of CART and logistic regression results confirm the accuracy of the simple classification tree derived from CART in identifying individuals at increased risk of fractures.
Using prospective data from a large cohort of postmenopausal osteopenic women, we showed that a simple algorithm derived from easily assessed risk factors predicted the risk of osteoporotic fracture within 1 year of BMD testing. Among 57 421 osteopenic women, the proposed algorithm of previous fracture, T score of −1.8 or less, poor self-reported health status, and poor self-reported mobility identified 55% as being at increased risk and correctly identified 74.1% of the women who had a fracture within 1 year. Osteopenic women with a previous fracture were found to have a risk similar to that of women in the NORA cohort with T scores of −2.5 or less (4.1% and 4.3%, respectively). On the other hand, osteopenic women not identified as being at increased risk had a fracture risk of 1.1%, similar to that of women in the NORA cohort with T scores greater than −1.0. If the goal of osteoporosis-specific therapy is to reduce a woman's risk of fracture, an algorithm should be used that correctly identifies the most women who have the highest risk of fractures by the most practical means. The NORA-based algorithm provides the clinician with a valuable and practical tool to accomplish this risk assessment and to design appropriate management strategies.
To our knowledge, this is the only study to date to develop a risk prediction tool for 1-year osteoporotic fracture specifically for postmenopausal women of all ages (50-99 years) with osteopenia (T score of −2.5 to −1.0). The FRACTURE Index developed by Black and colleagues18 included 7782 women 65 years and older, with no restriction on BMD T scores, to predict 5-year fracture risk. McGrother et al21 developed a risk score to predict 3-year hip fracture risk in elderly women based on a sample of 1864 women older than 70 years with no limitation on BMD T scores. The Rotterdam study developed a risk score for prediction of 4-year hip fracture risk in 5208 men and women 55 years and older at all BMD levels.17
Focusing the analysis on women with osteopenia provides relevant clinical information for women with what is considered to be low bone mass and for whom guidance regarding additional risk factors for further classification of fracture risk is lacking. The National Osteoporosis Foundation recommendations9 provide widely accepted guidance for the management of women with T scores higher than −2.5. These guidelines recommend that pharmacologic treatment should be initiated in women who have T scores less than −2.0 or less than −1.5 with at least 1 of 15 risk factors. Clinicians' ability to use the National Osteoporosis Foundation guidelines is somewhat limited since the guidelines do not indicate the relative importance of the risk factors, leaving the clinician to rely on judgment or to revert to using only T score values. The NORA-based risk algorithm is consistent with the National Osteoporosis Foundation treatment recommendations but is further enhanced with an empirically derived T score threshold and a well-defined small subset of risk factors that can be easily implemented in clinical practice.
A concern may be that peripheral devices were used to determine BMD T scores and to define this cohort. However, the WHO diagnostic criteria were established based on central (hip and spine) and peripheral (wrist) BMD measurement devices.2 T scores obtained using peripheral devices may not always be as low as T scores determined using central DXA devices, resulting in a prevalence of WHO osteoporosis using peripheral devices–specific databases of 3% to 14% compared with a prevalence based on hip measurements for white women of 16% to 20%.31,32 The discrepancies among T score calculations across various BMD devices are well recognized and exist among different central DXA skeletal sites and devices as well.33- 37 Thus, even among the 3 available central DXA devices, the same patient could be classified as being either osteopenic or osteoporotic by WHO criteria at the spine depending on the DXA manufacturer.38 It is well established that wrist and heel measurements are powerful predictors of hip and nonhip fractures.39,40 Although Miller et al4 found prevalence differences among peripheral devices, if the T score was less than −1.0, women were at increased risk of fracture, and the magnitude of risks was comparable to that measured by a central device. Despite the discrepancies, low BMD T scores less than −1.0 in postmenopausal women assessed by any BMD device, whether central or peripheral, at any site is associated with an increased fracture risk.39 Even so, clinicians should be aware that direct extrapolation of these results to women who have had central BMD measurements is not possible.
The risk factors used in the NORA-based algorithm are well established for their relationship with osteoporosis or fractures, especially history of previous fracture and low BMD.12,15- 17,19,20 Poor health status12,15 and poor mobility12,15,17,19,41 have also been shown to affect fracture risk, and they may function as surrogates for propensity to fall. Although age, height, weight, and maternal fracture were identified to be important predictors in risk models based on women 65 years and older or with T scores less than −2.5,17,18 our analysis did not find that these factors added incremental value to the predictive ability of the final model in this cohort that included women younger than 65 years and was restricted to T scores between –2.5 and –1.0. Although, as a competitor variable, a cutoff age of 71.5 years can be used as a predictor if any of the primary predictors are unavailable or are difficult to obtain.
The NORA-based algorithm can assist clinicians in identifying a group of women with osteopenia who are at increased short-term risk of new osteoporotic fractures, women for whom interventions to reduce risk should be considered. The strongest risk factor in our algorithm was previous fracture, a known powerful predictor of future fracture.42 In fact, before BMD measurements became widely available, low-traumatic fractures were diagnostic of osteoporosis.2 In NORA women with T scores between –2.5 and –1.0, a previous fracture risk level was similar to a BMD diagnosis of osteoporosis. Pharmacologic intervention has been shown to reduce subsequent fracture in women with previous vertebral fracture.43- 48
In women without previous fracture, a peripheral T score of −1.8 or lower indicated increased risk of short-term fracture. Further testing or treatment needs to be considered in these women. In women who do not have either previous fracture or low T scores, poor reported health status and poor mobility provide additional guidance as predictors of future fracture. Each of the latter factors is potentially modifiable through nonpharmacologic interventions, such as diet and exercise.
The predictive relationships found in this study are further supported by the ability to apply the algorithm to specific age groups and obtain similar risk profiles for fracture prediction. Of particular interest are younger women aged 50 to 59 years who had an overall 1-year risk of fracture similar to that of women aged 60 to 69 years (1.6% and 1.7%, respectively). Women aged 50 to 59 years who were identified by the algorithm to be at increased risk had an absolute fracture risk of 2.6%, similar to that of the total identified cohort (2.2%). In addition, women of this age with a history of fracture after age 45 years also had a 4.5% fracture risk, similar to that of women in NORA with T scores less than −2.5 (4.3%).
This study has several limitations. First, women who responded to the follow-up survey may differ from the 18% of nonresponders in that women who had a fracture may have been more (or less) likely to respond to follow-up than women who did not. Second, fracture information was collected by self-report. However, other researchers49- 51 have found self-report of fractures to be generally reliable. Because most of the spine fractures are asymptomatic or at least unrecognized, NORA cannot address the value of risk factors or peripheral BMD to predict nonclinical spine fractures. In the long term, clinical and subclinical vertebral fractures are associated with increased morbidity and mortality.52,53 Third, information on some risk factors for fracture, such as muscle strength and propensity to fall, were not available; these factors may exert an effect on fracture risk through health status or mobility status. Finally, the algorithm was derived based on 1-year fracture data. Its utility for long-term fracture prediction is unknown. However, unless the level of a given short-term predictor is unstable over time relative to other predictors, it should be a risk indicator of longer-term fracture risk.
In conclusion, we developed a simple classification tool to identify postmenopausal women with osteopenia (T scores of −2.5 to −1.0) at the highest risk of fracture within 12 months. This classification tool identified more than 70% of those who experience a fracture based on information that can easily be obtained during a routine clinic visit. These results are based on the NORA study population. Although validation in a separate cohort is necessary to further substantiate the algorithm, our intent is to present on option for managing women with T scores greater than −2.5 who are at increased risk for fracture. This NORA-based algorithm can be useful in clinical practice to guide further assessment and management decisions in a large group of women.
Corresponding author and reprints: Paul D. Miller, MD, Colorado Center for Bone Research, 3190 S Wadsworth Blvd, Suite 250, Lakewood, CO 80227 (e-mail: email@example.com).
Accepted for publication June 30, 2003.
The NORA project was funded and managed by Merck & Co Inc in collaboration with the International Society for Clinical Densitometry, West Hartford, Conn.
Presented in part as posters at the annual meeting of the American Society for Bone and Mineral Research; September 21, 2002; San Antonio, Tex.
We thank Kenneth Faulkner, PhD, for his direction in study design, data collection, and data analysis and our colleagues at Merck & Co Inc, Parexel International (Waltham, Mass), and Abt Associates Inc (Cambridge, Mass), who were involved in the implementation and data collection efforts undertaken on behalf of NORA.