Kaplan-Meier curves for mortality by tertile of index in both training and validation samples. A, Training sample, specific index (log rank P < .001); B, validation sample, specific index (log rank P = .52); C, training sample, composite index (log rank P < .001); D, validation sample, composite index (log rank P = .63).
Kaplan-Meier curves for hospitalization by tertile of index in both training and validation samples. A, Training sample, specific index; B, validation sample, specific index; C, training sample, composite index; D, validation sample, composite index. For all comparisons, log rank P < .001.
Calculating the risk score for the chronic obstructive pulmonary disease prognostic index (CPI) based on the full data set. BMI indicates body mass index (calculated as weight in kilograms divided by height in meters squared); CRQ, Chronic Respiratory (Disease) Questionnaire; CVD, cardiovascular disease; ED, emergency department; FEV1%pred, forced expiratory volume in 1 second as percentage predicted; SGRQ, St George's Respiratory Questionnaire.
Briggs A, Spencer M, Wang H, Mannino D, Sin DD. Development and Validation of a Prognostic Index for Health Outcomes in Chronic Obstructive Pulmonary Disease. Arch Intern Med. 2008;168(1):71-79. doi:10.1001/archinternmed.2007.37
Chronic obstructive pulmonary disease (COPD) is a debilitating and progressive disease. The severity of the condition has typically been characterized by a single physiological measurement: forced expiratory volume in 1 second, which has been shown to be prognostic for mortality.
To develop a prognostic tool for COPD that is sensitive not only to mortality but also to other important drivers of health status and cost, data were obtained from a pooled analysis of 12 randomized controlled trials and 3 main outcomes were chosen: mortality, hospitalization, and number of exacerbations. Cox models were employed for the time-to-event data (death or hospitalization), and a negative binomial model was used for calculating the count data (exacerbations). From these models, 3 specific indexes were developed on a 100-point scale, and 1 composite index was obtained as a mean of the specific indexes. One-third of the data was reserved for validation purposes.
All indexes provided good discrimination among tertiles in the training and validation samples. The composite index had a performance very similar to that of the specific index in both the training and validation samples: the overall C statistic was estimated as 0.71 for both mortality and hospitalization. Each 10-point change in the composite index corresponds to an increase of 54% in the hazard ratio of death, 57% in the hazard ratio of hospitalization, and 21% in the incidence rate of exacerbations.
A composite index for COPD prognosis (the COPD Prognostic Index) has been validated in data not used in its development and is capable of predicting not only mortality, but also hospitalizations and exacerbations. All factors included in the index are straightforward to obtain, which should make the index suitable for use in primary as well as secondary care settings.
Chronic obstructive pulmonary disease (COPD) is a debilitating and progressive disease that can be fatal. Patients typically experience a prolonged functional decline punctuated by acute exacerbations that often require hospital treatment.1 The incidence of COPD is increasing worldwide, making the illness one of the greatest disease burdens in many countries.2 A recent qualitative study3 highlighted uncertainty in prognosis as one of the reasons why general practitioners fail to have a full discussion concerning the implications of COPD with their patients.
Reliable prognostic markers, once diagnosed, provide useful information to patients, their caregivers, and health care professionals.4 Until recently, a single physiological measurement—forced expiratory volume in 1 second (FEV1)—was considered to be the most accurate predictor of mortality.5 More recently, a new prognostic tool has been proposed that includes body mass index (calculated as weight in kilograms divided by height in meters squared) (B), degree of airflow obstruction (O), functional dyspnea (D), and exercise capacity (E). This multivariable “BODE” index was found to be an improvement over FEV1 alone because it was better able to predict the risk of death (both respiratory and all-cause mortality) among patients with COPD.6
Despite those improvements, the BODE index was developed using mortality as the sole outcome in a relatively small sample of patients. Exacerbations and hospital admissions are prominent outcomes that have important implications both for patients' quality of life (QOL) and for health care expenditures. Furthermore, the exercise capacity aspect of the instrument, which requires patients to undergo a 6-minute walk test, makes the BODE index less practical for use in a primary care setting.
We developed a new prognostic index for COPD (the CPI) that is capable of predicting not only mortality but also COPD exacerbations and hospital episodes, but is simple enough to facilitate its use in primary care.
Data were provided by GlaxoSmithKline (Greenford, England) and comprised 12 randomized controlled trials of various treatments for COPD undertaken over a period of approximately 10 years, involving a total of 8802 patients and amounting to over 6000 patient-years of information. Eight of these studies (7 published studies7- 13 and unpublished data from GlaxoSmithKline [“Randomised, 24-week, double-blind, placebo-controlled, parallel-group study followed by a 2-week, randomised, double-blind, run-out phase to evaluate the efficacy, safety, tolerability, and discontinuation of SB 207499 (15 mg twice daily) in patients with chronic obstructive pulmonary disease (COPD),” J. Christal et al, July 24, 2002]) followed patients for 6 months, 3 studies (2 published studies14,15 and unpublished data from GlaxoSmithKline [“A double-blind, placebo-controlled parallel group study to evaluate the efficacy, safety, and tolerability of oral cilomilast (15 mg bd when given as maintenance treatment for 12 months to subjects with chronic obstructive pulmonary disease,” 2004]) followed patients for 1 year, and 1 followed patients for 3 years16; summary information for the individual trials is reproduced in Table 1. All trials recorded deaths, hospitalizations, and exacerbations of COPD (similarly defined as an acute respiratory episode requiring treatment with antibiotics and/or oral corticosteroids) but also included a wide variety of other measures at baseline offering potential prognostic factors. The data were pooled to predict exacerbations, hospitalizations, and death, employing prognostic factors that were available in each of the data sets. All of the studies included either the St George's Respiratory Questionnaire (SGRQ) or the Chronic Respiratory (Disease) Questionnaire (CRQ), but never both. These were assumed to be complements, and a single QOL variable was constructed from the standardized scores of the appropriate index (reversing the SGRQ scores so that a reduced standardized score represented a reduced QOL for both instruments). Where single observations were missing some of the prognostic factors, these were imputed using best subsets regression—the default imputation technique in Stata statistical software (version 8; StataCorp, College Station, Texas), the package used to undertake all analyses—to give a full rectangular data set.
Table 2 summarizes these prognostic factors and in addition shows the outcome variables (deaths, hospitalizations, and exacerbations) with the data split into 2 groups: a subset of approximately two-thirds of the full sample, which was to be used for training or fitting of the prognostic models, and the remaining one-third, which was not used to fit the models but was instead retained for validating the predictions of the resulting prognostic indexes. Rather than choosing the one-third validation sample at random, a temporal split based on date of randomization was performed with the last one-third of each of the 12 studies reserved for the validation sample. Introducing a systematic rather than a random split of the data made the validation harder because imbalances between the training and validations samples were more likely.17 The temporal nature of the split led to some small, but statistically significant, differences between the fitting and validation samples (see Table 2 for P values).
Statistical models were employed for each of the 3 outcome variables of mortality, time to hospitalization, and number of exacerbations. The standard Cox semiparametric proportional hazards model was employed to model the time-to-event data.18 For the number of exacerbations, Poisson and negative binomial count data models were employed,19 and the choice between these functional forms was informed through the use of the Akaike Information Criterion (AIC).20 Overall C, a measure of discrimination for survival analysis,21,22 analogous to the C statistic measure of area under the receiver operating characteristic curve in diagnostic studies,23 was employed to assess the performance of the time-to-event models.
Initially, all candidate prognostic factors (Table 2) were considered. The final choice of prognostic factors to include was based on a combination of statistical significance and clinical relevance. Statistically significant predictors of outcome were detected by the use of backward stepwise procedures with a threshold of P < .05 as the limit for inclusion, and these variables were then reviewed for relevance by 2 of us (D.M. and D.D.S.) (without regard to statistical significance). Important factors for any of the outcomes were retained in all models.
To control for treatment effects, indicator variables were used to represent different treatment arms in the clinical trials. All results were reported with the indicator variables set to zero, which corresponds to using all of the data to estimate the risks of the corresponding outcome in the absence of treatment (placebo). Potential study effects were handled by running the models with the study as an additional stratification variable.
For each of 3 outcome-specific models, a specific index was constructed following a standard approach based on a simple points system used as an approximation of the linear predictor of the respective model.24 Each of the 3 specific outcome indexes was standardized onto a 100-point scale. A composite index was created by taking a simple arithmetic mean of the points of each of the specific outcome indexes.
Predictions of the risk of death, risk of hospitalization, and number of exacerbations from the composite index were achieved by calibrating the prediction equations to the training sample. This involved adjusting the baseline risk in each equation until the overall total of the predicted events in the training sample matched the observed number of events.
To examine the validity of the prognostic index development, the initial model fitting occurred only on the first two-thirds of the available data (training sample) with the final one-third retained for validation purposes (validation sample). Validation was conducted in 2 ways. First, for the time-to-event data (death and hospitalization) Kaplan-Meier analyses were presented by tertile of the indexes for both the training and validation samples. Second, predicted events across a series of univariate splits of the data were assessed against the observed events using a χ2 test. Following validation testing, the final proposed index was estimated using the full sample,17 based on the prognostic factors identified from the training sample.
The results of running the 3 outcome-specific regression models for mortality, hospitalization, and exacerbations on the training sample cohort incorporating the same set of candidate variables are shown in Table 3 controlling for treatment arms in the original trials (coefficients not shown) and stratified by study. The AIC indicated that the negative binomial model fitted the number of exacerbations better than the standard Poisson model; hence, the negative binomial model is reported in Table 3.
The table shows that QOL, history of exacerbations, and female sex were predictive of time to hospitalizations and exacerbations but not mortality (based on conventional significance levels of 5%). By contrast, the variables of body mass index lower than 20 and age were predictive of mortality and hospitalizations but not exacerbations. The FEV1 percentage predicted (the observed FEV1 as a percentage of the predicted FEV1 score for a person of that age, sex, and height) and history of CVD were predictive of all 3 outcomes. The predictive power of a history of exacerbations for mortality was not included in the model for mortality because it generated a coefficient that suggested that a history of exacerbations reduced mortality risk (and also lacked statistical significance); P = .31. The overall C statistic was estimated as 0.71 for both mortality and hospitalization.
The results of the specific indexes generated from the models reported in Table 3 are presented in Table 4, standardized to a 100-point scale. The composite score is also shown in Table 4 and reflects a simple mean of the 3 specific indexes, rounded to the nearest integer value.
For the mortality and hospitalization outcomes, the data were split into tertiles based on the relevant prediction. For the mortality outcome, the Kaplan-Meier plots are presented in Figure 1. Figure 1A and B show the specific index, and Figure 1C and D show the composite index, whereas Figure 1A and C show the observed events for the training sample and Figure 1B and D, the observed events for the validation sample. Exactly the same format of plot is shown in Figure 2 for the time to hospitalization. Three important patterns are apparent from these plots. First, the differentiation between the tertiles is clearly achieved in the training samples. Second, that differentiation is less pronounced for the validation sample. Finally, it is clear that the composite index performs well in comparison with the specific indexes; indeed, there is some suggestion that it performs better for the hospitalization validation in Figure 2 based on differentiation between the curves at 3 years.
Further validation of the composite index is provided in Table 5, which shows the predicted number of events across a number of univariate splits of the validation sample and across the tertiles of the composite index, for each of the 3 outcomes. Although for a small number of categories a significant difference exists between the observed and expected numbers of events (see Table 3 for P values), there is no obvious pattern to these results, and across most categories the composite index gives predictions compared with the observed events.
Given the promising results for the composite index, in terms of the validation sample, the final index was estimated using the full data set (training and validation samples together) to give the best estimate of a composite prognostic index from these data. The models fitted on the full data set were very similar to those reported for the training sample. This final index is presented in Figure 3 in a format that allows the number of points in the prognostic system to be easily determined. The mean (SD) of the index in the data set was 48 (15), with a median of 48 and an interquartile range of 37 to 59. In Table 6, the relationships between the points of the prognostic index and the expected 3-year risk of death or first hospitalization and the expected numbers of exacerbations are presented. There are quite clearly large differences between the risks of all 3 types of events across the index. These absolute predictions correspond to relative increases in the hazard ratio of death and hospitalization of 54% and 57%, respectively, for every 10 points of the index (corresponding to two-thirds of the standard deviation across the observed sample). The increase in the incidence rate corresponding to 10 points of the index is estimated as 21%.
The aim of this analysis was to use an existing archive of trials for potential treatments for COPD to see if it was possible to develop a prognostic index that predicted not just mortality but also other important events, such as exacerbations and hospitalizations. It is important that such an index be simple to use and applicable to both primary and secondary care settings. A further aim was to be able to validate any such index against data not used in the process of developing the index. The results reported herein are encouraging. The composite index, developed from a simple arithmetic mean of the specific indexes predicting each of the 3 outcomes in turn, performed well in comparison with the more specific indexes. The use of a composite index to predict multiple outcomes has the advantage of simplicity relative to using separate indexes for each outcome; it protects against possible dangers of overfitting a specific index, and it is likely that such an index will better predict composite outcomes, such as those used in health economic evaluation, that combine effects on morbidity, mortality, and cost.
Nevertheless, there are a number of limitations of the study. Because the data are from clinical trials, there might be expected to be a lower level of comorbidities among patients represented herein than those in a routine clinical setting, although this effect may be partially offset by the calculation of underlying risk in the absence of any treatment. Although the candidate prognostic factors for the index were those that were collected in all trials, there were some missing observations that were imputed to avoid having to drop observations from the analysis. It should be noted that, although multiple imputation is commonly advocated for handling missing data,25 single imputation was acceptable in this case because the purpose of the analysis was to make predictions rather than to make inferences. Furthermore, the percentage of total data points that were imputed overall was only 6.5%, such that the results are unlikely to be sensitive to the missing data.
The validation process was an important part of assessing the likely performance of the index. We chose to split the data into two-thirds for training and one-third for validation. The split was chosen temporally rather than randomly because we thought this would give a more realistic validation sample in which there might be unobserved confounders in time. This initial validation was promising, with the composite index showing (1) good differentiation between tertiles of the index score in terms of the Kaplan-Meier plots on the validation sample and (2) good predictions of numbers of events across both univariate splits of the data and across the tertiles of the index. Nevertheless, further validation of the index should be undertaken, and we would suggest that appropriate data sources would include both future trials and existing observational data sets. Given this need for further validation, we chose to report in Figure 3 a composite index developed on the full data sample because this represents our best estimate of the index given current information available to us at this time.
The validation of the index involved a calibration step that ensured that the predicted numbers of events matched the mean of the observed events in the validation phase. Such a step is necessary only when the index is used to predict absolute rather than relative event rates and information on absolute event rates in the population of interest is available. If potential users have such information, then calibration to their specific population is recommended. Similar (re)calibration has been used to demonstrate the ability of the Framingham risk equation to predict for other population groups.26
The analysis reported herein shows that patient-reported outcome measures could be an important prognostic indicator. Indeed, for the models produced with the full data set, QOL was a notable predictor of all 3 outcomes (for mortality, P = .02; for hospitalization, P<.001; and for exacerbations, P<.001) (compare the training models of Table 3, in which the coefficient on mortality was not statistically significant [P = .25]). The particular measures used in the analysis reported herein are the SGRQ and CRQ instruments. Although the stated aim was to provide an easy-to-use prognostic index, the use of a QOL measure poses a number of challenges. Neither the SGRQ nor the CRQ is routinely measured in primary or secondary settings. Both instruments, although reasonably short, can take several minutes to complete, and the weighting system of the SGRQ to develop the final score may render it impractical to use in a busy clinical setting. The only way to use the values generated by these scores is in conjunction with the mean (SD) values from this analysis to calculate the standardized score; this was 47 (17) for the SGRQ and 86 (18) for the CRQ (Figure 3).
The use of a standardized score to allow the use of either the SGRQ or CRQ begs the question of whether other instruments might be used. For example, the generic Short Form-36 (SF-36) instrument was also collected at baseline in 5 of the 12 trials. The correlation between the individual dimensions of the SF-36 and the standardized score in those 5 trials ranged from 0.3 to 0.7. Substituting the standardized physical functioning domain of the SF-36, which correlates most highly with the standardized measures of SGRQ and CRQ (Figure 1), for the 5 trials in which the SF-36 was available, resulted in overall C statistics for the models that were almost identical to those reported in the “Results” section. This also suggests that substitution of other measures might be possible, although this assertion should be tested in future validation work.
Subject to satisfactory testing of the CPI proposed herein in further studies, there are a number of ways in which this new prognostic index might be used. First, it might be used in future clinical trials to predict which screened patients might be at highest risk of a serious adverse event, thereby improving the power (or reducing the sample size) of the planned investigation. The index might be used in an economic appraisal to identify different subgroups of patients who might have differing cost-effectiveness of treatment. The index might be used by the health service for predicting patterns of care required for patients. Finally, the index might be used by physicians in discussions with their patients. Nevertheless, with respect to this last endeavor, care must be taken to appropriately reflect the general uncertainty that exists when attempting individual-level predictions.27
In conclusion, this study proposes a composite index for COPD prognosis that has been validated in data not used in its development and that is capable of predicting not only mortality but also hospitalizations and exacerbations. All factors included in the index are straightforward to obtain, which should make the index suitable for use in primary as well as secondary care settings and for planning future studies.
Correspondence: Andrew Briggs, BA, DPhil, Section of Public Health and Health Policy, University of Glasgow, 1 Lilybank Gardens, Glasgow G12 8RZ, Scotland (firstname.lastname@example.org).
Accepted for Publication: August 6, 2007.
Author Contributions: Drs Briggs, Mannino, and Sin, Mr Spencer, and Ms Wang take responsibility for the final methods, reporting, and any remaining errors. Study concept and design: Briggs, Spencer, and Sin. Acquisition of data: Briggs. Analysis and interpretation of data: Briggs, Spencer, Wang, Mannino, and Sin. Drafting of the manuscript: Briggs and Sin. Critical revision of the manuscript for important intellectual content: Spencer, Mannino, and Sin. Statistical analysis: Briggs, Wang, and Sin. Obtained funding: Briggs and Spencer. Administrative, technical, and material support: Briggs and Spencer. Study supervision: Briggs.
Financial Disclosure: Dr Briggs has received research support, honoraria, and consultancy fees from the study sponsor, GlaxoSmithKline. Mr Spencer was an employee of GlaxoSmithKline at the time the study was undertaken. He has been a consultant to GlaxoSmithKline and owns shares in stock in that company. Dr Mannino serves on advisory boards for Boehringer Ingelheim, Pfizer, GlaxoSmithKline, Novartis, and Ortho Biotech; is on the speakers' bureaus for Boehringer Ingelheim, Pfizer, GlaxoSmithKline, and Deyl; and has received research grants from GlaxoSmithKline, Novartis, and Pfizer. He is also an expert witness in cases involving exposure to environmental tobacco smoke. Dr Sin has received research funding and honoraria from GlaxoSmithKline and Astra Zeneca.
Additional Information: The authors and GlaxoSmithKline agreed that the intellectual property associated with this study would be signed over to the Global Initiative for Obstructive Lung Disease Working Group, with the intention that researchers would be freely able to use the index described in this article. We have set up a Web site (www.copdpi.com) that allows physicians to enter patient characteristics and receive the prognostic index back as an output.
Additional Contributions: Henry Glick, MD, and Louise Wilson, MD, provided comments on a previous version of the manuscript.