Receiver operating characteristic curves for detecting undiagnosed diabetes in the Cooperative Health Research in the Region of Augsburg (KORA) Survey 2000 population, in Augsburg, Germany (age group, 55-74 years), using different diabetes risk questionnaires (Griffin et al,5 Lindström and Tuomilehto,6 and Baan et al13), a prediction model (Stern et al14), and fasting glucose level alone.
Rathmann W, Martin S, Haastert B, Icks A, Holle R, Löwel H, Giani G, . Performance of Screening Questionnaires and Risk Scores for Undiagnosed DiabetesThe KORA Survey 2000. Arch Intern Med. 2005;165(4):436-441. doi:10.1001/archinte.165.4.436
Copyright 2005 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2005
Validation of published screening questionnaires and risk scores for undiagnosed diabetes has typically not been performed in independent population samples.
Oral glucose tolerance tests were performed in 1353 participants (aged 55-74 years) without known diabetes in the Cooperative Health Research in the Region of Augsburg (KORA) Survey 2000, Augsburg, Germany. Sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC) for undiagnosed diabetes were calculated for various screening questionnaires.
Four screening tests (Rotterdam Diabetes Study, Cambridge Risk Score, San Antonio Heart Study, and Finnish Diabetes Risk Score) were applied to the KORA data. The AUCs were 61% (95% confidence interval [CI], 56%-66%) for the Rotterdam Diabetes Study, 65% (95% CI, 60%-69%) for the Finnish Diabetes Risk Score (P=.10 vs Rotterdam), and 67% (95% CI, 62%-72%) for the Cambridge Risk Score (P<.001 vs Rotterdam). A predictive model including fasting glucose level (San Antonio Heart Study) yielded an AUC of 90% (P<.01 vs all 3 questionnaires); however, this was not significantly different from fasting glucose level alone (AUC, 89%; P=.46). The sensitivities, specificities, and predictive values of questionnaires were substantially lower than originally described, which was mainly due to population variation of risk factors compared with the KORA sample (age, body mass index, antihypertensive medication, and smoking).
Currently proposed questionnaires yielded low validity when applied to a new population, most likely due to differences in population characteristics. Performance of diabetes risk questionnaires or scores must be assessed in the target population where they will be applied.
Type 2 diabetes mellitus is often asymptomatic at its onset and can remain undiagnosed for several years.1 The effects of type 2 diabetes screening on morbidity and mortality and its psychosocial, ethical, and economical consequences are largely unknown.1,2 Therefore, the US Preventive Services Task Force suggested type 2 diabetes screening among adults with hypertension and hyperlipidemia only, whereas there was insufficient evidence to recommend screening programs in the general population.2 Thus, evaluation for type 2 diabetes within the health care setting is currently recommended.1
Screening for type 2 diabetes in general practice by measuring fasting glucose levels is feasible; however, it may have a low yield on the basis of age older than 45 years as sole risk factor.3 Screening could be more efficient if targeted at patients with multiple risk factors for type 2 diabetes.3 Questionnaires based on diabetes-related symptoms and risk factors have been used as a first step, followed by measurements of glucose levels in diabetes detection programs in general practice.4
During the past decade, a number of questionnaires for diabetes screening have been developed based on major risk factors. Diabetes risk scores derived from cross-sectional data routinely collected in general practice have also been proposed.5 Furthermore, multivariate risk factor models for predicting incident diabetes have been fitted, which have been used for detecting undiagnosed diabetes.6 The aim of all of these strategies was to limit the proportion of the population that needs to undergo diagnostic glucose measurements as a second step. However, before widespread use of these instruments, there is a need to validate the questionnaires and risk scores in other populations, which has rarely been done.7
Thus, the aim of the present study was to evaluate and compare the performance of several published diabetes screening questionnaires and multivariate risk scores in a population-based sample (Cooperative Health Research in the Region of Augsburg [KORA] Survey 2000) in Augsburg, Germany.
The KORA Survey 2000 is a population-based study in southern Germany using the same region and methods as the former World Health Organization MONICA (Multinational Monitoring of Trends and Determinants in Cardiovascular Disease) project.8 The study was approved by the local ethical committee, and all subjects gave written informed consent. Using 2-stage cluster sampling, 6640 subjects were invited to participate from the city of Augsburg and the surrounding districts, with 2656 aged 55 to 74 years. From October 25, 1999, through April 28, 2001, fasting oral glucose tolerance tests (OGTTs) were performed under standardized conditions in all participants 55 years or older without known diabetes. Newly diagnosed diabetes (fasting glucose level, ≥126 mg/dL [≥7.0 mmol/L]; or 2-hour postload glucose level, ≥200 mg/dL [≥11.1 mmol/L]) was defined according to the 1999 World Health Organization criteria.9
Body weight was measured in light clothing to the nearest 0.1 kg and height to the nearest 0.5 cm. Waist circumference was measured at the minimum abdominal girth to the nearest 0.1 cm. Blood pressure was measured in a sitting position 3 times in the right arm after 15 minutes of rest with the use of an automatic device (HEM 705CP; Omron Healthcare, Bannockburn, Ill). Means of the second and third measurements were used in the analysis. In a structured interview, medical history was obtained. Subjects were asked to report the frequency and average duration of regular moderate and vigorous physical activity during leisure time in winter and summer. Alcohol intake and smoking were also assessed.8 A short, qualitative food frequency list was used that included the same question used in the Lindström Diabetes Risk Score6 on daily consumption of vegetables, fruits, or berries. Medication use was assessed in detail (ie, drug groups, dosage, and duration of therapy), and subjects receiving antihypertensives or steroids were identified.
Blood glucose level was measured using a hexokinase method (Gluco-quant; Roche Diagnostics, Mannheim, Germany). High-density lipoprotein cholesterol level was assessed using the phosphotungstic acid method (Boehringer-Mannheim, Mannheim, Germany). Triglyceride levels were measured with the GPO-PAP assay (Boehringer-Mannheim).8
We performed a MEDLINE search using key words (diabetes and risk, questionnaire, or screening) in October 2003. Two authors (W.R. and S.M.) independently reviewed abstracts to find publications on diabetes screening questionnaires and risk scores. Furthermore, references of detected articles were searched for additional publications. Then, we listed all items used in the various questionnaires and scores and judged the list for its suitability within the framework of the KORA data. Only screening instruments that could be applied to the KORA Survey were included in the analyses.
The literature review yielded 8 published questionnaires or scores.5,6,10- 15 After careful review of the publications, 4 appeared suitable for application using the KORA data set5,6,13,14 (Table 1). The excluded screening tests included items or questions that have not been explicitly covered in the KORA Survey. An incidence-based prediction model proposed for identifying persons with high risk for type 2 diabetes from the United States was included in the present analysis.14 Another similar risk model from Finland yielded good performance when applied in detecting subjects with diabetes in a cross-sectional setting.6 With respect to a study from the Netherlands, only the basic predictive model was used, which included information routinely collected by general practitioners.13 The Diabetes Risk Score from Cambridge, England, consisted of 2 studies of newly diagnosed diabetes in general practices.5 All variables of these scores were available for the KORA sample except for family history of diabetes in siblings.5,14
We assessed differences between subjects with newly diagnosed diabetes and nondiabetic participants using unpaired, 2-tailed t test and Fisher exact test. We assessed the accuracy of the screening tests in discriminating subjects with and without undiagnosed diabetes using receiver operating characteristic curves, which plot the sensitivity (true-positive rate) to the false-positive rate (1 − specificity). The OGTT was considered the gold standard. Differences between the receiver operating characteristic curves were assessed by comparing the area under the curves.16 Sensitivity, specificity, positive and negative predictive values, and the likelihood ratio (LR) for a positive or negative test result were calculated for the cutoffs given in the original publications. For 1 incidence-based model, no cutoff was provided.14 Because fasting glucose level was included as 1 item in the model, a threshold was chosen that had the same specificity (and higher sensitivity) as fasting glucose level (≥110 mg/dL [≥6.1 mmol/L]).
The LR positive (LR+) is the ratio of sensitivity to probability of false-positive error and determines how likely it is that a positive result is true rather than false at a given cutoff point.17 The LR negative (LR−) is the ratio of probability of false-negative error to the specificity. A cutoff point is performing well when the LR+ is high (ie, it is much more likely that a test with a positive finding is true than false) and the LR− is low (ie, it is much less likely that a test with a negative finding is false than true).17 Exact (Pearson-Clopper) confidence intervals (CIs) for sensitivities and specificities were calculated. Confidence intervals for likelihood ratios were determined using logit estimators. The logistic regression models on which the different risk scores were based were reanalyzed using the KORA data. We performed analyses using Stata Statistical Software (Release 7.0; StataCorp, College Station, Tex) and SAS (version 8.2 TS2M0; SAS Institute, Cary, NC).
Overall, 1653 (62%) of 2656 subjects aged 55 to 74 years participated. After excluding 131 participants with known diabetes and further dropouts, 1353 subjects had an OGTT. The characteristics of KORA Survey participants with and without newly diagnosed diabetes regarding the risk factors used in the questionnaires and models are shown in Table 2. The crude OGTT-based prevalence of undiagnosed diabetes among subjects without previously known diabetes (aged 55-74 years) was 9.0%. As expected, participants with newly diagnosed diabetes were older and more obese and had a more severe cardiovascular risk factor profile.
The performance of the 3 questionnaires and the prediction model in detecting undiagnosed diabetes is shown as receiver operating characteristic curves in the Figure. For illustrative purposes, we also included the receiver operating characteristic curve for fasting glucose level as the screening test recommended by the American Diabetes Association. The prediction model of Stern et al14 was significantly better in identifying undiagnosed diabetes than the 3 risk questionnaires (P<.01). The corresponding areas under the curves were 0.90 (95% CI, 0.87-0.93) (Stern et al14), 0.67 (95% CI, 0.62-0.72) (Griffin et al5), 0.65 (95% CI, 0.60-0.69) (Lindström and Tuomilehto6), and 0.61 (95% CI, 0.56-0.66) (Baan et al13). The prediction model (Stern et al14), which included a number of continuous risk factors, was almost identical to fasting glucose level alone for prediction of newly diagnosed diabetes (area under the curve, 0.89; 95% CI, 0.85-0.93) (P=.46). Between the 3 risk questionnaires, significant differences were found only for the Cambridge Diabetes Risk Score (Griffin et al5), which showed a better discrimination than the Rotterdam Diabetes Study questionnaire (Baan et al13) (P<.001), but not better than the Finnish Diabetes Risk Score (Lindström and Tuomilehto6) (P=.10).
Sensitivity, specificity, positive and negative predictive values, and likelihood ratios for the various screening tests are shown in Table 3. Using the risk thresholds given in the original publications, sensitivity and specificity of the 3 questionnaires were generally lower than originally described (Table 1 and Table 3). None of the 3 questionnaires showed sufficient sensitivity and specificity when applied to the KORA data. Only the Finnish Risk Score showed a reasonably high sensitivity; however, this was at the expense of a low specificity (43%). Because no cutoff was given for the prediction model of Stern et al,14 a threshold was chosen with similar specificity to fasting glucose level (≥110 mg/dL [≥6.1 mmol/L]) in the KORA sample. However, although we included additional variables, the sensitivity of the prediction model was no different from that of fasting glucose level alone (Table 3).
The predictive value of a positive test result, which is the probability that someone really has undiagnosed diabetes given a positive screening test result, was 2- to 3-fold higher for the prediction model of Stern et al14 and fasting glucose level compared with the risk questionnaires (Table 3). The predictive values of a negative test result were high for all screening tests, ranging from 94% to 98%, but these were not substantially higher than the prior probability of not having diabetes.
Overall, the LR+ values were low and the LR− were relatively high for the 3 risk questionnaires, which indicated that they were not useful to increase the posttest probability of having undiagnosed diabetes in the KORA population (Table 3). Again, the performances of the prediction model of Stern et al14 and the fasting glucose level were comparable.
In general, the population characteristics of the original study samples were different from those of the KORA sample; ie, the subjects were younger and less obese and had lower prevalence of antihypertensive drug treatment (Table 1). There were also substantial differences between the logistic regression coefficients of the models given in the 4 original data sets and the KORA sample (Table 4). Whereas, in the original publications, all of these risk factors were significantly associated with undiagnosed diabetes, only a few factors were related to diabetes in the KORA participants, including (abdominal) obesity, diabetes family history, systolic blood pressure, smoking, and physical activity (Table 4). In particular, large differences in coefficients were found for age, body mass index, antihypertensive drugs, and smoking for the 3 risk questionnaires compared with the KORA data set. The published coefficients of the prediction model (Stern et al14) were mostly similar to those of the KORA sample; however, a large difference was found for fasting glucose level (coefficient for the San Antonio Heart Study,14 +0.07; coefficient for the KORA sample, +0.15).
A number of questionnaires and scores for diabetes screening have been developed that are based on major risk factors. The proposed screening questionnaires and the prediction model evaluated in the present investigation yielded a lower validity than originally described when applied to another population-based sample. Lack of external validation is a likely reason for this result.
The application of a questionnaire in a different setting may show a much lower usefulness, because different population characteristics may influence the accuracy of the screening instrument. Therefore, the Expert Committee on the Diagnosis and Classification of Diabetes Mellitus recommended that diabetes prediction scores or models should be tested across other populations to demonstrate sufficient utility before their widespread use.18 Obviously, some scores are not applicable to all populations (eg, reluctance to use a bicycle presented in a questionnaire from the Netherlands may not be a risk factor in a society that does not use bicycles as a common mode of transportation19). The inclusion of specific medications (eg, antihypertensives, steroids, drugs to lower lipid levels) and smoking may also be problematic, because prescription drug use and smoking show large variations in different regions and over time.5 Finally, the diabetes prevalence in the screening population has an impact on the predictive values, which needs to be taken into account.17
The use of a scoring system derived from logistic regression to produce a risk estimate for a different population may be more misleading than advantageous. This has previously been shown for risk factor models designed to predict coronary heart disease.20 Although there was constant evidence of the specific risk factor–outcome associations, there was a great variation in the magnitude of the risk estimates (odds ratios) derived from predictive models.20 This observation is in line with the present results on identifying undiagnosed diabetes, where the relative size of some risk factor coefficients showed large variations compared with the original models.
The rationale for using risk factor–based screening questionnaires, eg, in primary medical care, is that they are less labor intensive and more acceptable to patients than biochemical screening tests such as measurement of fasting glucose level or glycosylated hemoglobin level.5 Pencil-and-paper risk tests may also be useful for health education and raising public awareness. Furthermore, it has been proposed that automatic calculation of risk scores by the practice computer system be performed before a consultation for efficient case finding of undiagnosed diabetes.5 However, the present assessment of various questionnaires showed that their performance is obviously based on the characteristics of the original population from which they are derived. The diabetes screening questionnaires showed poor performances as stand-alone tests. However, they may be helpful as noninvasive tests to rule out undiagnosed diabetes with high probability (negative predictive values, >90%).19
An efficient screening strategy could be first to rule out the disease with high certainty to minimize unnecessary tests. Therefore, risk questionnaires may be useful as a first stage to determine whether a sequence of testing should be performed.
Biochemical testing of fasting glucose level showed lower population variation in screening of undiagnosed diabetes.21 The application of the prediction model from Stern et al,14 which included a number of diabetes risk factors and biochemical measures such as fasting glucose level, did not perform significantly better than measurement of glucose level alone in detecting undiagnosed diabetes in the KORA sample. On the other hand, the evaluation of the model of Stern et al compared with the other questionnaires suffered from incorporation bias, because fasting glucose level is a substantial component of the gold-standard OGTT.22
Our findings have several limitations. First, we were not able to evaluate all questionnaires identified in the literature search. However, most of these 4 questionnaires had some apparent limitations for widespread use, at least in European countries. The Dutch risk score by Ruige et al12 included “reluctance to use a bicycle for transportation” as an item, which is very specific to the Netherlands. Other risk equations were derived from samples of middle-aged Egyptian subjects,15 or included more than 50% Hispanic subjects among the diabetic cases.11 Finally, the screening test by Herman et al10 mostly used risk factors that were also included in the questionnaires investigated in the present study.
Furthermore, there was only 1 fasting glucose test as part of the OGTT. This meant that we could not evaluate a 2-step screening (fasting glucose test followed by OGTT). However, previous studies indicated that measurements of fasting plasma glucose level are relatively stable for the short term (intraindividual coefficient of variations of 6%-11%).21 Furthermore, we have analyzed risk questionnaires only as stand-alone tests.1 Three-step screening programs based on questionnaires followed by random measurement of plasma glucose level and final OGTT may be useful, in particular, in the setting of a general practice.4 However, there is a need for further research, in particular, to estimate the proportion of false-negative cases using such complex screening strategies with risk questionnaires as the first step.4
These data suggest that performance of diabetes risk questionnaires or scores needs to be assessed in the target population, where they will be ultimately be used. Currently proposed questionnaires yielded low validity when applied to a new population, which was mainly due to differences in population characteristics. Thus, the use of scoring systems for diabetes screening in a different population can be misleading. Nonetheless, all of the screening instruments had a high negative predictive value, and thus these instruments may be most useful when the findings are negative rather than positive.
Correspondence: Wolfgang Rathmann, MD, MSPH, Institute of Biometrics and Epidemiology, German Diabetes Center, Auf’m Hennekamp 65, D-40225 Düsseldorf, Germany (firstname.lastname@example.org).
Accepted for Publication: July 31, 2004.
Financial Disclosure: None.
KORA Study Group: In addition to the authors, the KORA Study Group consists of H.-Erich Wichmann, MD, PhD (speaker), Christa Meisinger, MD, Thomas Illig, PhD, Jürgen John, PhD, and their coworkers, of the GSF–National Research Center for Environment and Health, Neuherberg, Germany, who are responsible for the design and conduct of the KORA studies.
Funding/Support: The OGTT study was supported in part by the German Federal Ministry of Health, Berlin; the Ministry of School, Science and Research of the State of North-Rhine-Westfalia, Düsseldorf; and the Anna Wunderlich-Ernst Jühling Foundation, Düsseldorf (Drs Rathmann and Giani). The KORA Survey 2000 was financed by the GSF–National Research Center for Environment and Health, which is funded by the German Federal Ministry of Education, Science, Research and Technology, Berlin, and the State of Bavaria.
Acknowledgment: We thank K. Papke (head of the KORA Study Center) and B. Schwertner (survey organization) and their coworkers for organizing and conducting the data collection; H.-E. Wichmann, MD, PhD, and W. van Eimeren, MD (GSF, Neuherberg), for initiating the KORA Survey; and all participants of the OGTT study.