Calcif Tissue Int (2000) 66:317–319
© 2000 Springer-Verlag New York Inc.
Editorial Controversies in Bone Mineral Density Diagnostic Classifications P. D. Miller University of Colorado Health Sciences Center, Colorado Center for Bone Research, Lakewood, Colorado 80227, USA
Bone mass measurement would be simple if only one device could be used to measure one bone and one result was obtained. Many types of devices measuring many different bones result in many numbers that may be used for the interpretation of bone mass. As Segal’s Law so aptly says: a man with one watch knows exactly what time it is, however, a man with two watches is never quite sure. New devices for measuring bone mineral density (BMD) have evolved because of the need to screen large numbers of people, which is not feasible with central dual energy X-ray (DXA) densitometers or quantitative computerized tomography (QCT) devices. In the United States alone, there will be 80,000,000 postmenopausal women by the year 2040, all of whom will be at risk for osteoporotic-related fractures. Unless each of these women receives estrogen replacement therapy for the remainder of her life, most will lose bone mass and be at increased fracture risk as their skeleton ages. Bone mass testing at a single skeletal site has value as well as limitations. It is valuable because many of the peripheral devices are portable and therefore more accessible to the people effected. Peripheral technology is also valuable because it is very accurate, as demonstrated by accuracy errors on the order of 4%, when compared with dryashed weight of bone from cadavers. In contrast, the accuracy errors of the central skeletal measuring devices are 6–8%. The more accurate the device, the more confident a physician can be that the machine is actually measuring what it was designed to measure. Devices with low accuracy errors have little impact on the diagnostic classifications of low bone mass established by the World Health Organization (WHO) which uses the number of standard deviations (SD) a bone mass measurement is above or below the mean of a reference population. However, devices with higher accuracy errors (e.g., 10%) have a profound impact on diagnostic classifications of low bone mass. Poor accuracy may cause up to a 2.0 SD difference in repeat measurements obtained on the same patient; these different measurements may lead to different diagnostic classifications. Since the accuracy of peripheral devices is excellent, a low bone mass measurement using one of these techniques will lead to a diagnostic classification of low bone mass. Peripheral technology is limited by discordance of bone mass at various skeletal sites. If bone mass was homogeneous at all skeletal sites, a low bone mass at the finger would reflect low bone mass at the hip or the spine. However, this is not the case. Peak adult BMD may be very different in different people and may be discordant at different skeletal sites in the same person. Also, the rate of
bone loss from the various bone compartments is different when age-dependent bone loss occurs. Thus, depending upon the age of the patient and the skeletal site measured, different bone mass measurements may lead to different diagnostic classifications in the same patient. Diagnostic classification is particularly important at the menopause to assist women who are undecided about initiating hormone replacement therapy (HRT) or other strategies for the preservation of skeletal mass. Classification is difficult in this early menopausal group when one skeletal site is normal and another site is low (e.g., normal measurement at the wrist and low measurement at the hip). There is great concern over this group. They might be diagnosed as having normal bone mass and therefore have no need for HRT, a bisphosphonate, or a select estrogen receptor modulator (SERM) based on a normal peripheral measurement, when a low central skeletal bone mass was simply not detected. They might be at increased risk of fracture as they age without knowing it. In specific cases they should have additional bone mass testing to confirm their diagnostic classification. Guidelines have recently been published that provide clinicians with recommendations regarding which patients with normal single-site bone mass measurements should have additional testing [1]. These guidelines decrease the number of women classified as normal at a single skeletal site, who may need preventive therapy for postmenopausal bone loss due to low bone mass at a second skeletal site. From a strictly scientific standpoint, postmenopausal woman should have both a spine and hip bone mass measurement. However, limited access and higher costs of central DXA testing will prevent millions of early postmenopausal women from receiving any bone mass testing. Peripheral skeletal measurements assist with this problem. Fracture risk for early menopausal patients is unknown. Prospective fracture trials demonstrate that measuring low bone mass at any skeletal site is predictive of increased fracture risk at any other site, with a relative risk of 1.5 per SD reduction in bone mass. It must be emphasized, however, that all of these trials were performed using elderly women with a mean age of >65 years. In this older age group, there is a global reduction in bone mass throughout the skeleton. Because of this reduction in the elderly, fracture can be predicted with nearly equal power regardless of the skeletal site measured. In contrast, early menopausal women display greater discordance in bone mass, as discussed earlier, and fracture prediction is more difficult. Studies have examined the effects of this discordance by
318
P. D. Miller: BMD Diagnostic Classifications
Fig. 1. Age-related decline in mean Caucasian female T-scores for different BMD technologies based on manufacturer reference ranges. The hip DXA reference data are from the NHANES study (6,7) as implemented on all DXA devices from all manufacturers. The DXA normative data for the PA spine (L1–L4), lateral spine (L2–L4), and forearm (one-third region) were obtained from the Hologic QDR-4500 densitometer. Heel normative data were taken from the estimated BMD for the Hologic Sahara ultrasound unit. Spinal QCTs are those used by the Image Analysis reference system. (—䉱—), heel; (—〫—), total hip; (—䡺—), PA spine; (—䊉—), forearm; (—⽧—), lateral spine; (—䊊—), QCT spine. Reprinted with permission [4].
comparing the number of women classified as having low bone mass or osteoporosis using the WHO criteria versus different T-score cutoff points across the postmenopausal age ranges. In the early menopausal age group (50–59 years old), if one skeletal site has a T-score <−2.5 SD, there is a 10% chance that another skeletal site will be normal [2, 3]. The studies do not determine how many skeletal sites might be normal if one site is low (e.g., T-score −1.2 SD). The cutoff levels for lesser degrees of low bone mass are important to distinguish for the initiation of preventive interventions. At present, the T-score intervention threshold ranges from −1.0 SD, enclosed as a package insert for two of the pharmacological therapies, to −1.5 SD, as recommended by the National Osteoporosis Foundation. In addition, the studies mentioned begin their data analysis based on the discovery of a low bone mass at some skeletal site. In fact, in early menopausal women, a single skeletal site measurement could be normal and the patient would have been excluded. Single-site bone density screening allows a patient to be classified as low bone mass or osteoporotic; however, if normal, it does not rule out low bone density at another skeletal site. Therefore, a finding of low bone mass at a peripheral site could be used by a clinician to convince an undecided patient to initiate HRT. The initiation of other preventive pharmacological agents in this scenario is not as clear. If other significant clinical risk factors are present, other preventive therapy may be reasonable. Otherwise, a central skeletal bone mass measurement, to confirm the patient’s classification, would be the ideal way to determine if alternative pharmacological interventions are required in early postmenopausal women who refuse HRT. In summary, bone mass is more concordant in the elderly population, and a low peripheral measurement can be used for diagnosis and the initiation of therapy. In the elderly population, the AP spine measured by DXA may be misleading due to arthritic changes, and may be artifactually high, leading to underdiagnosis. Bone mass is more discordant in the early menopausal group and a peripheral measurement may be normal, leading to questions regarding further testing and appropriate treatment interventions.
This issue of skeletal discordance is very important because of its implications regarding patient care, the credibility of bone mass measurements, and insurance payment for drug therapy. Figure 1 shows how data obtained from different devices may lead to different diagnostic classifications in the same patient [4]. A patient may be classified as osteoporotic using QCT at age 50 and the same patient would not be classified as osteoporotic using heel ultrasound until age 107. These discrepancies are related to (1) different rates of age-dependent bone loss at different skeletal sites, (2) different accuracy errors for different techniques, and (3) different young normal reference populations from which the T-scores are derived. There is evidence that the different young normal databases used by the different manufacturers are major factors contributing to the discrepancies observed in T-scores. The T-score is calculated as follows: BMD (patient) − mean BMD (young reference population)/SD (young reference population). It is clear that if either the mean BMD or the SD of the BMD of the young normal reference populations differ in different manufacturers’ databases, it can influence the calculated T-score. Prior to the incorporation of the NHANES III database as the common hip database for the three table DXA machines, a patient would have different hip T-scores on the different machines. This common NHANES III database eliminated these classification discrepancies [5]. Presently, there is no universal standardized database for AP-spine DXA or any of the peripheral devices. Therefore, the Tscore may differ at the same skeletal site, in the same patient, due to dissimilar young reference populations [6]. Greenspan et al [7] have validated that using a common young normal reference database nearly eliminates the differences in T-scores at the same or different skeletal sites observed with different manufacturers’ devices. T-scores were very similar even when measuring heel by ultrasound, heel by DXA, wrist or hip by DXA when the value was derived from the same (n ⳱ 55) young population database. A universal database for all devices has been proposed and endorsed by the International Bone Densitometry Standards Committee. During this interim period, prior to the
P. D. Miller: BMD Diagnostic Classifications
completion of this common database, short-term solutions (listed below) to assist clinicians with patient misclassification are being considered.
319
tive or treatment interventions based on their bone mass and other risk factors. The challenges lie in the discordance of bone at various skeletal sites and the current diagnostic decisions that are based on T-scores derived from dissimilar reference populations. These challenges have been compounded by the emergence of peripheral devices which are more accessible and less expensive. Until a universal, young, normal database is completed or T-scores are eliminated and BMC/ BMD absolute values are used for fracture prediction, Tscore equivalence for the various devices should be used and additional bone mass testing should be provided for patients whose diagnostic classification is inconclusive.
1. Establish a T-score equivalence for each device based on equal prevalence or equal risk defined by the device that corresponds to the 50% prevalence described by the NHANES III femoral neck database for 60–69-year-old Caucasian women. 2. Each manufacturer calculate a distinct T-score for their device based on prevalent fracture data established using their device. 3. Eliminate the T-score for patient diagnostic and intervention classification and move to a pure fracture risk (absolute or lifetime risk) relationship as it relates to the absolute BMD of that manufacturer’s device.
References
The first solution is easiest because all the devices have prevalence data for 60–69-year-old Caucasian women. The femoral neck −1.5 equivalence is appealing since it will capture approximately 70% of the women who will suffer hip fractures, it corresponds to the known global fracture risk predictive value of all devices (RR −1.5/SD), and it corresponds to the National Osteoporosis Foundation’s treatment threshold for postmenopausal women with 1 or more risk factors. The second solution retains the T-score, thus requiring a universal database for all ethnic groups. All future devices would be required to duplicate the universal database. The third solution has intellectual appeal since low bone mass would be used for its original intent of fracture prediction. Much of the data required is already available from the large fracture trials though it might be limited by the requirement that future devices also establish their own prospective fracture data. In conclusion, the clinician can be confident that bone mass measuring devices provide objective accurate values. The relationship between low bone mass and fracture risk is the outcome predictor in current medicine. Low bone mass allows clinicians to advise their patients regarding preven-
1. Miller PD, Bonnick SL, Johnston CC Jr, et al. (1998) The challenges of peripheral bone density testing. Which patients need additional central density skeletal measurements? J Clin Densitometry 1:211–217 2. Arlot ME, Sornay-Rendu E, Garnero P, et al. (1997) Apparent pre- and postmenopausal bone loss evaluated by DXA at different skeletal sites in women: the OFELY cohort. J Bone Miner Res 2:683–690 3. Nelson DA, Molley R, Kleerekoper M (1997) Prevalence of osteoporosis in women referred for bone density testing. J Clin Densitometry 1:5–11 4. Faulker KG, VonStetten E, Miller P (1999) Discordance in patient classification using T scores. J Clin Densitom 2:343– 350 5. Faulkner KG, Roberts LA, McClung MR (1996) Discrepancies in normative data between Lunar and Hologic DXA system. Osteoporosis Int 6:432–436 6. Ahmed AIH, Blake GM, Rymer JM, Fogelman I (1997) Screening for osteopenia and osteoporosis: Do the accepted normal ranges lead to overdiagnosis? Osteoporosis Int 7:432– 438 7. Greenspan SL, Maitland-Ramsey L, Myers E (1996) Classification of osteoporosis in the elderly is dependent on sitespecific analysis. Calcif Tissue Int 58:409–414