Crude fracture rates per 1000 person-years according to spine and total hip bone density measurements. Bone density T scores for the lumbar spine and total hip are divided into quartiles (1 indicates lowest; 4, highest).
Leslie WD, Lix LM, Tsang JF, Caetano PA, Manitoba Bone Density Program. Single-Site vs Multisite Bone Density Measurement for Fracture Prediction. Arch Intern Med. 2007;167(15):1641-1647. doi:10.1001/archinte.167.15.1641
Bone density measurement with dual-energy x-ray absorptiometry is widely used for fracture risk assessment. Discordance between measurement sites is common, but it is unclear how this affects fracture prediction.
We performed a historical cohort study among 16 505 women 50 years or older at the time of baseline dual-energy x-ray absorptiometry of the spine and hip (mean ± SD observation period, 3.2 ± 1.5 years). The study population was drawn from a database that contains all clinical dual-energy x-ray absorptiometry test results for the province of Manitoba, Canada. Each subject's longitudinal health service record was assessed for the presence of fracture codes after bone density testing. The likelihood ratio test was used to assess the improvement in fracture prediction from Cox proportional hazards models using bone density covariates from a single site or from combined sites.
Age-adjusted hazard ratios (HRs) per standard deviation for osteoporotic fracture ranged from 1.61 (95% confidence interval [CI], 1.39-1.87) for the lumbar spine to 1.85 (95% CI, 1.70-2.01) for the total hip, with intermediate values for the femur neck (HR, 1.76 [95% CI, 1.62-1.92]) and trochanter (HR, 1.77 [95% CI, 1.63-1.92]). For fracture prediction, use of the minimum bone density measurement was no better than use of a hip measurement alone. When the total hip measurement was included in a fracture prediction model for the overall population, none of the other measurements added substantial information. The spine was the most useful site for the prediction of spine fractures alone.
Proximal femur bone density measurements consistently outperformed lumbar spine measurements for global fracture prediction. In this cohort, the total hip was the best site for overall fracture assessment.
Osteoporosis and its clinical expression, fragility fractures, have large public health implications. Worldwide, the number of persons with fracture in 2000 was estimated at 56 million, with approximately 9 million new osteoporotic fractures each year.1 This is projected to result in a loss of 5.8 million disability-adjusted life-years. Osteoporosis costs $13.8 billion annually in the United States alone.2 The global burden of osteoporosis is projected to increase markedly during the next few decades as the number of older individuals increases.3 Therefore, the ability to accurately gauge fracture risk is critical in identifying cost-effective thresholds for intervention.4,5
The measurement of bone mineral density (BMD) using dual-energy x-ray absorptiometry (DXA) is widely regarded as an important tool for osteoporosis prevention and treatment because it permits a diagnosis to be made before fracture occurrence.6 The US Preventive Services Task Force recommends that all women 65 years and older be screened routinely for osteoporosis (or beginning at age 60 years for women with additional risk factors).7
Typically, DXA measurements of the lumbar spine and proximal femur are performed, but discordance between measurement sites is common.8 It is unclear how to interpret discordant BMD measurements in terms of fracture prediction. Some expert groups recommend using the minimum measurement, while others favor using a standardized site.9 This has large health service implications because a minimum site approach will classify more women as needing intervention than a single-site approach. This study was undertaken to assess the incremental benefit in the use of multisite BMD measurement vs the use of a single site for purposes of osteoporotic fracture prediction.
In the province of Manitoba, Canada, health services are provided to virtually all residents through a single public health care system. Bone density testing with DXA has been managed as an integrated program since 1997 and uses targeted case finding rather than population screening.10 Criteria for testing emphasize female sex, age of 65 years or older, premature ovarian failure, prior fragility fracture, x-ray evidence of osteopenia, prolonged corticosteroid use, and other clinical risk factors. Dual-energy x-ray absorptiometry testing rates for this program have been published.11 The program maintains a database of all DXA results, which can be linked with other population-based computerized health databases through an anonymous personal identifier.12 The DXA database has been previously described as having completeness and accuracy in excess of 99%.12 Fracture outcomes can be assessed through a combination of hospital discharge abstracts (diagnoses and procedures coded using the International Classification of Diseases, Ninth Revision, Clinical Modification system) and physician billing claims (inpatient, outpatient, and office based).13
The study population consisted of all women 50 years or older at the time of baseline DXA. Women were required to have results for the lumbar spine (L1-L4) and proximal femur (total hip, femur neck, and trochanter sites) before October 31, 2002, and medical coverage from Manitoba Health during the observation period ending March 31, 2004. Because earlier software versions before May 1998 did not provide total hip measurements, these records were not used in the analysis. For women with more than 1 eligible set of measurements, only the first record was included. The final study population consisted of 16 505 women. The study was approved by the Research Ethics Board for the University of Manitoba, Winnipeg, and the Health Information Privacy Committee of Manitoba Health.
Dual-energy x-ray absorptiometry scans were performed and analyzed in accord with recommendations of the manufacturer. Lumbar spine T-scores (number of standard deviations above or below the young adult mean BMD) and Z-scores (number of standard deviations above or below the age-matched mean BMD) were calculated using the manufacturer's reference values for US women of white race/ethnicity based on the revised National Health and Nutrition Examination Survey III14 reference data (Prodigy software version 8.8; GE Lunar, Madison, Wisconsin). Vertebral levels affected by artifact were excluded by experienced physicians using conventional criteria.15 Before 2000, DXA measurements were performed using a pencil-beam instrument (Lunar DPX; GE Lunar), and after this date a fan-beam instrument was used (Lunar Prodigy, GE Lunar). Instruments were cross-calibrated using 59 volunteers and anthropomorphic phantoms. No clinically significant differences were identified (T-score differences, < 0.2). Therefore, all analyses are based on the unadjusted numerical results provided by the instrument. Densitometers showed stable long-term performance (coefficient of variation, < 0.5%) and satisfactory in vivo precision (coefficient of variation, 1.7% for L1-L4 and 1.1% for total hip).16
Each subject's longitudinal health service record was assessed from the date of bone density measurement to March 31, 2004, for the presence of International Classification of Diseases, Ninth Revision, Clinical Modification fracture codes that were unassociated with trauma codes (codes E800-E879 and E890-E999).17 Specific fracture sites of interest were the hip (codes 820-821), clinical spine (code 805), forearm (code 813), and proximal humerus (code 812) because they are the basis for the 10-year absolute fracture risk estimates published by Kanis et al.18 In addition, we required that hip fractures and forearm fractures be accompanied by a site-specific fracture reduction, fixation, or casting code. Hip, spine, forearm, and proximal humerus fractures defined in this way were collectively designated as osteoporotic fractures. Using these fracture definitions, we previously showed that BMD measurements predict fractures in our clinical cohort and in those reported in large meta-analyses.17 Subgroup analyses were conducted separately for hip and spine fractures because these have a greater effect on health-related quality of life than fractures of the forearm or humerus.19 Separate analyses were also performed for subgroups that showed discordant BMD measurements because these are the individuals who present the greatest clinical conundrum. A T score of −2.5 or lower was considered to be osteoporotic based on the World Health Organization (WHO) classification. The first subgroup, referred to as WHO discordant, consisted of women in whom one measurement site was osteoporotic (T score, ≤ −2.5) and another site was nonosteoporotic (T score, > −2.5). Another subgroup, referred to as range discordant, was identified in which the range of T scores (maximum minus minimum) exceeded 2.
Cox proportional hazards models were used to model the time to first incident fracture. The base models each included age (in years) and a bone density measurement (T score) as covariates. A second set of models was then constructed that included age and pairs of BMD measurements from different sites (eg, L1-L4 and total hip). Initial models compared combining the lumbar spine with any of the 3 proximal femur measurements in the overall population and in the 2 discordant subgroups. Subsequent models compared combinations of 2 proximal femur measurements.
A likelihood ratio test for the single-site model vs the multisite model was used to assess the incremental value of combined site bone density measurement.20 The likelihood ratio χ2 from the Cox proportional hazards model provides a global measure of model fit, and the difference between χ2 values provides a test of the model improvement. P < .05 indicates a statistically significant improvement in fracture prediction using the combined assessment, whereas P ≥ .05 indicates that there was no incremental benefit in the combined assessment. The degree of fracture stratification was also compared using receiver operating characteristic (ROC) curves in which data were dichotomized as any osteoporotic fracture vs no fracture. Receiver operating characteristic curves were compared using the nonparametric method by DeLong et al,21 which allows for efficient comparison of the correlated curves originating from a common population (AccuROC 2.5; Accumetric Corp, Montreal, Quebec, Canada). All other analyses were performed using commercially available software (Statistica version 6.1; StatSoft Inc, Tulsa, Oklahoma).
The characteristics of the overall study population are given in Table 1. The mean ± SD age was 65 ± 9 years, and 98.2% (n = 16 210) were of white race/ethnicity. The mean T scores for each of the measurement sites fell within the WHO osteopenic (low bone mass) category. The mean z scores were close to 0, implying that our clinical population had bone density measurements comparable to those from the manufacturer's reference population. Measurement sites differed in the proportion of women categorized as having osteoporosis, with the highest prevalence at the lumbar spine (26.0%) and the lowest prevalence at the total hip (11.2%).
The subgroup showing discordance in the WHO categorization consisted of 4354 women. By definition, every woman had at least 1 osteoporotic measurement. When discordance was based on a wide range of T scores, 2102 women were identified. Except for measurements taken at the femur neck, the mean T scores were within the WHO normal category.
Fracture events after bone density testing were identified during a mean ± SD observation period of 3.2 ± 1.5 years. At least 1 incident osteoporotic fracture was identified in 765 members (4.6%) of the cohort, including 189 hip fractures (1.1%), 209 spine fractures (1.3%), 230 forearm fractures (1.4%), and 191 proximal humerus fractures (1.2%). For women in the WHO discordant group, 297 sustained an incident fracture, while 89 women in the range discordant group sustained an incident fracture.
All bone density measurement sites were statistically significantly associated with osteoporotic fractures (P < .001). In age-adjusted models, the hazard ratio (HR) per standard deviation for fracture ranged from 1.61 (95% confidence interval [CI], 1.39-1.87) for the lumbar spine to 1.85 (95% CI, 1.70-2.01) for the total hip, with intermediate values for the femur neck (HR, 1.76 [95% CI, 1.62-1.92]) and trochanter (HR, 1.77 [95% CI, 1.63-1.92]). The minimum T score was also associated with incident fractures (P < .001), with an HR of 1.64 (95% CI, 1.52-1.77) but was no better than the individual hip measurements for fracture prediction. The area under the ROC curve was greatest for the total hip site, lowest for the lumbar spine, and intermediate for the femur neck and trochanter (Table 2). For fracture prediction, use of the minimum T score resulted in ROC areas that were no better than any hip site alone and inferior to the total hip site for the overall population (P < .005).
Table 3 summarizes the 9 analyses that were derived from the 3 study subpopulations (overall, WHO discordant, and range discordant), combining the lumbar spine measurement with any of 3 hip measurements (total hip, femur neck, and trochanter sites). The addition of any hip measurement (total hip, femur neck, or trochanter site) to a model that contained a spine measurement resulted in a statistically significant improvement in model prediction as measured by the likelihood ratio test (P < .001). However, there was no statistically significant increase when the spine measurement was added to a model that contained a total hip or trochanter measurement. This indicates that the lumbar spine did not provide any improvement in prediction once the total hip or trochanter bone density was already considered. Adding the lumbar spine measurement to a model that contained the femur neck measurement resulted in a small increase in fracture prediction for the overall population (P =.02) but not for either of the discordant subgroups. Crude fracture rates according to the spine and total hip bone density quartile are shown in the Figure. Fracture risk increases as bone density moves from the highest quartile to the lowest quartile, with a stronger and more consistent risk gradient evident for the total hip measurement than for the spine.
Because of the possibility that lumbar spine measurements may have greater value in women younger than 65 years, an age-stratified subgroup analysis was performed on the overall population. There were 8768 women aged 50 to 64 years, of whom 245 sustained an incident osteoporotic fracture. Once again, the results confirmed a statistically significant increase when a hip measurement was added to a model with a spine measurement (P < .001). Adding the lumbar spine measurement to a model that contained the femur neck measurement resulted in a small increase in fracture prediction in women aged 50 to 64 years (P =.01) but not in older women. For all other analyses, there was no benefit to adding a spine measurement to a model that contained a hip measurement (Table 4).
Site-specific analyses were conducted in the overall population for spine and hip fractures. There was a statistically significant improvement in spine fracture prediction with the combined total hip and lumbar spine measurements (χ2 = 149.5) vs the total hip alone (χ2 = 128.9) (P < .001). There was a much smaller increase in spine fracture prediction when the total hip was added to a model based on the lumbar spine alone (χ2 = 145.9), and this did not reach the threshold for statistical significance (P = .06). Conversely, the total hip BMD optimized the prediction of hip fractures alone, without any incremental gain in combining this with the lumbar spine measurement (data not shown). Similarly, the total hip BMD optimized the prediction of spine or hip fractures together, without any incremental gain in combining this with the lumbar spine measurement (data not shown).
The effect of combining hip measurements was much smaller than the effect of combining spine and hip measurements, which is consistent with the higher degree of correlation (Table 5). The femur neck did not provide any incremental fracture information once the total hip bone density was considered. In the overall population, there was an improvement in model fit by combining the femur neck and trochanter measurements, but in the discordant subgroup analyses only the trochanter measurement contributed to fracture prediction. There were inconsistent results for combining the total hip and trochanter measurements. In the overall population, there was no improvement in fracture prediction with the combination vs use of the total hip alone, whereas use of the trochanter alone maximized fracture prediction in the WHO discordant subgroup.
For prediction of osteoporotic fractures, we found that BMD measured at the proximal femur sites outperformed lumbar spine measurements. There was little incremental benefit in combining lumbar spine measurements with a hip measurement except for the prediction of spine fractures. The lumbar spine measurement was useful in the prediction of spine fractures alone but not when nonvertebral fractures were also considered. Differences between hip measurement sites were much less striking, but there was consistent evidence that the total hip and trochanter sites outperformed the femur neck site and that the total hip site was preferred for the overall population. There was no evidence that use of the minimum T-score measurement enhanced fracture prediction over use of a single site alone. Our findings were confirmed in subgroups that are likely to present the greatest challenge in clinical practice, namely, patients with discordance in the WHO osteoporotic category and individuals with a wide range in T scores.
Our findings may seem counterintuitive because the total hip site demonstrated the lowest prevalence of measurements in the osteoporotic range, suggesting that it may be less sensitive in detecting individuals at increased risk of fracture. However, the lumbar spine is prone to artifacts that can degrade its value as a measurement site.22,23 The prevalence of degenerative change on spine radiographs has been noted in up to 61% of women.24 In theory, use of the minimum measurement might be preferable. However, simulation findings by Blake et al25 show that there would be little benefit from using a combined site approach, and this was confirmed in a subsequent meta-analysis.26 Our study extends these findings by considering 4 measurement sites, including the total hip, which only became available as a measurement site in the late 1990s. Previously, the femur neck was considered by some to be the preferred site for DXA evaluation.5,27 Theoretical advantages of the total hip over the femur neck are that it evaluates a much larger amount of bone, enhancing measurement precision, and includes a larger amount of trabecular bone, which may be more responsive to change than cortical bone that predominates in the femur neck.
Our study has major implications for individuals involved in the care of osteoporosis. If fracture risk assessment is optimized by use of the total hip site, then what is the value of measuring other sites? Current diagnostic criteria28 do not distinguish between the various central skeletal sites; therefore, a menopausal woman with a lumbar spine T score of less than −2.5 would still be considered osteoporotic even if the total hip BMD were not in the osteoporotic range. Clinical trials demonstrating benefit from antiresorptive medications have usually relied on lumbar spine or femur neck BMD measurements as entry criteria, and it is unclear how these would apply to the total hip, with its lower diagnostic sensitivity. Finally, monitoring of therapy relates to site responsiveness and measurement precision. There is some evidence that the hip has been undervalued as a monitoring site, although this remains an area of controversy.22,29
The strengths of DXA for fracture prediction are well known and relate to its ease of use, low radiation dose, and confirmed ability to predict fractures.30,31 At the same time, there are substantial limitations because most women who sustain an osteoporotic fracture do not have osteoporotic BMD measurements.32,33 This has led to recognition of the importance of other risk factors for fracture. An international initiative is developing a fracture risk assessment tool that integrates densitometric and clinical risk factors to derive absolute 10-year fracture risk.5 The ability to standardize fracture risk assessment on a single BMD site would simplify the construction of this model.
The size of the data set and its population-based coverage are the greatest strengths of our study. Compared with traditional prospective cohort studies, loss to follow-up is infrequent. On the other hand, fracture ascertainment from administrative health data has known limitations. Fractures that produce few symptoms may not lead to physician interaction and to documentation in administrative health databases. This is particularly likely to occur with vertebral compression fractures because most are not clinically diagnosed.34 Our study population included few women of nonwhite race/ethnicity, and results may not be applicable to other racial/ethnic populations. Vitamin D insufficiency is prevalent in northern Canadian latitudes and could contribute to the prevalence of osteoporosis.35
In summary, for purposes of overall osteoporotic fracture risk prediction, hip BMD sites outperform the lumbar spine. In this cohort, the total hip BMD measurement alone maximized overall osteoporotic fracture prediction. The lumbar spine was the most useful site for the assessment of spine fractures alone but was no longer contributory when nonvertebral fractures were also considered.
Correspondence: William D. Leslie, MD, MSc, Faculty of Medicine, University of Manitoba, Room C5121, 409 Tache Ave, Winnipeg, MB R2H 2A6, Canada (firstname.lastname@example.org).
Accepted for Publication: April 9, 2007.
Author Contributions:Study concept and design: Leslie. Acquisition of data: Leslie, Lix, Tsang, and Caetano. Analysis and interpretation of data: Leslie, Lix, and Caetano. Drafting of the manuscript: Leslie and Tsang. Critical revision of the manuscript for important intellectual content: Leslie, Lix, and Caetano. Statistical analysis: Leslie, Lix, and Caetano. Obtained funding: Leslie. Administrative, technical, and material support: Leslie and Tsang. Study supervision: Leslie.
Group Information: This article was reviewed and approved by the Manitoba Bone Density Program Committee: William D. Leslie, MD, MSc (chair); Mark Atthey, PhD; Corrie Baillie, MD; Heather Frame, MD; Julieta Hernandez, RM, MHS; Jeanette Jackson, RN; Brent Kvern, MD; Isabelle Lafontaine, MD; Blake McClarty, MD; Colleen J. Metge, PhD; and Elizabeth A. Salamon, MD.
Financial Disclosure: Dr Leslie has received research support and honoraria for lectures from Merck Frosst Canada and an unrestricted educational grant from Proctor & Gamble Pharmaceuticals.
Funding/Support: This study was funded in part by an unrestricted educational grant from the CHAR/GE Healthcare Development Awards Programme.
Disclaimer: The results and conclusions are those of the authors, and no official endorsement by Manitoba Health is intended or should be inferred.
Additional Contributions: Manitoba Health provided additional study data.