The predicted and observed event probability estimates represent the mean predicted probability from the Cox proportional hazards regression model and the mean observed probability from the population (Kaplan-Meier estimate) divided into quintiles of predicted probability. Predicted risk categories for quintiles 1 through 5 correspond with 0% to 4.3%, 4.4% to 8.1%, 8.2% to 12.9%, 13.0% to 24.5%, and 24.6% to 53.9%, respectively, for model 2; 0% to 1.6%, 1.7% to 5.3%, 5.4% to 11.0%, 11.1% to 23.1%, 23.2% to 61.7%, respectively, for model 3; and 0% to 1.4%, 1.4% to 4.8%, 4.9% to 10.7%, 10.8% to 24.0%, 24.1% to 61.6%, respectively, for model 6. Nam and D’Agostino χ2 statistic is 37, 32, and 19 for models 2, 3, and 6, respectively.
Tangri N, Stevens LA, Griffith J, Tighiouart H, Djurdjev O, Naimark D, Levin A, Levey AS. A Predictive Model for Progression of Chronic Kidney Disease to Kidney Failure. JAMA. 2011;305(15):1553–1559. doi:10.1001/jama.2011.451
Author Affiliations: Department of Medicine (Drs Tangri, Stevens, and Levey) and Biostatistics Research Center, Tufts Clinical and Translational Science Institute (Dr Griffith and Mr Tighiouart), Tufts Medical Center, Boston, Massachusetts; Department of Medicine, University of British Columbia, and British Columbia Provincial Renal Agency, Vancouver, British Columbia, Canada (Dr Levin and Ms Djurdjev); and Department of Medicine, Sunnybrook Hospital, University of Toronto, Toronto, Ontario, Canada (Dr Naimark).
Context Chronic kidney disease (CKD) is common. Kidney disease severity can be classified by estimated glomerular filtration rate (GFR) and albuminuria, but more accurate information regarding risk for progression to kidney failure is required for clinical decisions about testing, treatment, and referral.
Objective To develop and validate predictive models for progression of CKD.
Design, Setting, and Participants Development and validation of prediction models using demographic, clinical, and laboratory data from 2 independent Canadian cohorts of patients with CKD stages 3 to 5 (estimated GFR, 10-59 mL/min/1.73 m2) who were referred to nephrologists between April 1, 2001, and December 31, 2008. Models were developed using Cox proportional hazards regression methods and evaluated using C statistics and integrated discrimination improvement for discrimination, calibration plots and Akaike Information Criterion for goodness of fit, and net reclassification improvement (NRI) at 1, 3, and 5 years.
Main Outcome Measure Kidney failure, defined as need for dialysis or preemptive kidney transplantation.
Results The development and validation cohorts included 3449 patients (386 with kidney failure [11%]) and 4942 patients (1177 with kidney failure [24%]), respectively. The most accurate model included age, sex, estimated GFR, albuminuria, serum calcium, serum phosphate, serum bicarbonate, and serum albumin (C statistic, 0.917; 95% confidence interval [CI], 0.901-0.933 in the development cohort and 0.841; 95% CI, 0.825-0.857 in the validation cohort). In the validation cohort, this model was more accurate than a simpler model that included age, sex, estimated GFR, and albuminuria (integrated discrimination improvement, 3.2%; 95% CI, 2.4%-4.2%; calibration [Nam and D’Agostino χ2 statistic, 19 vs 32]; and reclassification for CKD stage 3 [NRI, 8.0%; 95% CI, 2.1%-13.9%] and for CKD stage 4 [NRI, 4.1%; 95% CI, −0.5% to 8.8%]).
Conclusion A model using routinely obtained laboratory tests can accurately predict progression to kidney failure in patients with CKD stages 3 to 5.
An estimated 23 million people in the United States (11.5% of the adult population) have chronic kidney disease (CKD) and are at increased risk for cardiovascular events and progression to kidney failure.1- 5 Similar estimates of burden of disease have been reported around the world.6 Although there are proven therapies to improve outcomes in patients with progressive kidney disease, these therapies may also cause harm and add cost. Clinical decision making for CKD is challenging due to the heterogeneity of kidney diseases, variability in rates of disease progression, and the competing risk of cardiovascular mortality.7,8 Accurate prediction of risk could facilitate individualized decision making, enabling early and appropriate patient care.9,10
Currently, there are no widely accepted predictive instruments for CKD progression; therefore, physicians must make ad hoc decisions about which patients to treat, risking delays in treatment in those who ultimately progress to kidney failure, or unnecessary treatment in those who do not progress. The severity of CKD has been recommended to guide treatment-related decisions.11 Severity is commonly staged according to the level of glomerular filtration rate (GFR) estimated from serum creatinine. Reporting estimated GFR when serum creatinine is measured has increased awareness of CKD and referrals to nephrologists, but estimated GFR is not sufficient for clinical decision making.12,13
Recent studies have shown that albuminuria provides additional prognostic information for progression to kidney failure.14 Some studies have examined the use of estimated GFR and albuminuria in prediction models, with additional clinical and laboratory data, but these models are either specific to a particular type of kidney disease or not externally validated.7,15- 18 The ideal model to predict progression would be accurate, easy to implement, and highly generalizable across a spectrum of patients with CKD in independent populations.
Using data from 2 separate CKD cohorts, the goal of our study was to develop and externally validate an accurate but simple prediction model for progression of CKD. The goal was also to use variables routinely measured in patients with CKD to create a model to predict progression to kidney failure that could be easily implemented in clinical practice. We were especially interested in models that rely solely on information available to a clinical laboratory, enabling reporting the risk of kidney failure with laboratory test results.
Development Cohort. The development cohort was derived from the nephrology clinic electronic health record (EHR) at Sunnybrook Hospital, a part of the University of Toronto Health Network, Toronto, Ontario, Canada. Patients with CKD stages 3 to 5 (estimated GFR, <60 mL/min/1.73 m2) at the time of initial nephrology referral were included and were followed up between April 1, 2001, and December 31, 2008 (eFigure 1). Outcomes were ascertained by reviewing clinic records as well as through a matching algorithm with the Toronto Regional Dialysis Registry.
Validation Cohort. The validation cohort was derived from the British Columbia CKD Registry (Patient Registration and Outcome Management Information System), which captures clinical and laboratory data on all patients referred to nephrologists in the province. Patients with CKD stages 3 to 5 at the time of initial nephrology referral between January 1, 2001, and December 31, 2009, were included. Outcomes such as dialysis, death, and transplantation are all captured in the database, which matches all kidney failure outcomes with provincial and national registry.17
The study was reviewed and approved by the institutional review boards at Sunnybrook Hospital, University of Toronto, Toronto, Ontario, Canada; University of British Columbia, Vancouver, British Columbia, Canada; and Tufts Medical Center, Boston, Massachusetts. Informed consent was waived at all participating institutions because of the presence of de-identified data and lack of feasibility of obtaining informed consent from all participants in the cohorts.
Candidate Dependent Variables. Candidate dependent variables were selected by face validity and included demographic variables, including age and sex; physical examination variables, including blood pressure and weight; comorbid conditions, including diabetes, hypertension, and etiology of kidney disease; and laboratory variables from serum and urine collected at the initial nephrology visit (eTable 1). All predictor variables were obtained at baseline from the nephrology clinic EHR in the development data set. The baseline laboratory value was defined as the first test result within 365 days of the initial estimated GFR. Baseline values were restricted to +/−90 days in a sensitivity analysis. Comorbid conditions were categorized as present or absent at the time of initial nephrology visit.
Using a formula derived from the Irbesartan in Diabetic Nephropathy Trial study,19 24-hour urinary protein excretion was transformed to an albumin-to-creatinine ratio. The albumin-to-creatinine ratio was log transformed due to its skewed distribution. The remainder of the laboratory variables and physical examination characteristics were evaluated as linear predictors. The presence of colinearity was examined using a correlation matrix, followed by evaluation of variance inflation factors and magnitude of standard errors. Variables with more than 30% missing values were not included in the analysis. All other missing data were imputed using the multiple imputation technique with 5 imputations based on PROC MI in SAS version 9.1 (SAS Institute Inc, Cary, North Carolina). The complete case was examined as a sensitivity analysis.
Independent Variable. The outcome of interest was kidney failure, which was defined by initiation of dialysis or kidney transplantation and censored for mortality before kidney failure. Although predicting mortality before kidney failure is also important for patients with CKD, predicting the risk of kidney failure alone is important for many decisions made by patients, physicians, and health care systems. The time horizons for risk prediction were 1, 3, and 5 years (eEquation). Risk categories for CKD progression have not been previously defined. Based on input from clinicians, we defined stage-specific CKD risk categories that would be useful for management decisions. In CKD stage 3, risk categories of 0% to 4.9%, 5.0% to 14.9%, and 15.0% or more risk of kidney failure over 5 years were considered as low, intermediate, and high risk, respectively. In CKD stage 4, risk categories of 0% to 9.9%, 10.0% to 19.9%, and 20% or more risk of kidney failure over 2 years were considered as low, intermediate, and high risk, respectively.
Model Development. We developed a sequential series of models and compared those with more variables (ie, greater complexity) to simpler ones. We used a combination of clinical guidance and forward selection to determine variable selection. In univariate Cox proportional hazards regression models, variables not associated with kidney failure (P = .10) were excluded from further analyses. Improvement in model performance through addition of new candidate variables in multivariate Cox proportional hazards regression models was tested using metrics for discrimination and goodness of fit. Models 1 through 3 were developed using age and sex, estimated GFR, and albuminuria, successively, and compared with each other. Models 4 through 7 were developed by adding either clinical variables (diabetes and hypertension), physical examination variables (systolic and diastolic blood pressure and weight), laboratory variables of CKD severity (which were associated with the outcome in multivariate forward selection), or all of the above and compared with model 3.
Prediction Model Validation. Models 2 and 3, the most accurate model among models 4 through 6, and model 7 were evaluated in the external validation data set. The baseline hazard function and β coefficients from the developed model were fixed and applied to the validation data set.
Prediction Model Performance. We used a series of methods to evaluate the performance of the models in the development and the validation data sets. Metrics were assessed in the overall population and in relevant subgroups (age dichotomized at 65 years, sex, CKD stage 3 vs 4, urine albumin-to-creatinine ratio dichotomized at 300 mg/g, and diabetes status).
Discrimination. Discrimination refers to the ability of a model to correctly distinguish between 2 classes of outcomes (kidney failure vs no kidney failure). Concordance statistics (C statistics) and integrated discrimination improvement were computed as measures of discrimination.20- 23
Calibration. Calibration describes how closely the predicted probabilities agree numerically with the observed outcomes. We compared the observed vs predicted risk of kidney failure for each quintile of predicted risk and determined the magnitude of the deviation using the Nam and D’Agostino χ2 statistic.24
Goodness of Fit. Overall model fit for sequential models was compared using the Akaike Information Criterion (AIC), which takes into account both the statistical goodness of fit and the number of parameters required to achieve this particular degree of fit, by imposing a penalty for increasing the number of parameters.22
Reclassification. Reclassification refers to movement of patients from one class to another based on changes to assignment to risk categories. Reclassification improvement was quantified using the net reclassification improvement (NRI) statistic.21
Sensitivity Analysis. To evaluate the effect of definition of risk categories on reclassification, we calculated NRI for alternative risk thresholds in CKD stages 3 and 4 and using an alternative method that does not require categories.25 To evaluate the effect of the competing risk of mortality before kidney failure on risk prediction, we compared the results of our models with competing risk models. The competing risk models goal was to estimate the cumulative incidence function, which is defined by the probability of reaching kidney failure expressed in terms of the cause-specific hazards in the following manner:
IKidney Failure = [ ∑ λKidney Failure×(ti)×S×(ti –1)].
The quantities under the summation denote the instantaneous hazard of kidney failure at event time ti and survival rate from kidney failure or death past event time ti–1.26
All statistical analyses were performed using SAS version 9.2. Two-sided P < .05 was considered statistically significant.
The development and validation cohorts included 3449 and 4942 patients, respectively (Table 1 and eFigure 1. Patients in the development and validation cohorts were similar in age and sex and had similar prevalence of diabetes and smoking. Kidney disease severity was greater (lower mean estimated GFR and higher mean albuminuria) and median follow-up was longer in the validation cohort than in the development cohort. The proportion of events was greater in the validation cohort compared with the development cohort (1177 patients with kidney failure [24%] vs 386 patients with kidney failure [11%]).
The hazard ratios for the variables and statistics for discrimination and goodness of fit for successive models in the development data set are shown in Table 2. Model 1, including age and sex only, performed poorly (C statistic, 0.561; 95% confidence interval [CI], 0.529-0.593). The C statistic improved with the inclusion of estimated GFR in model 2 (0.892; 95% CI, 0.874-0.910; P < .001) and albuminuria in model 3 (0.910; 95% CI, 0.894-0.926; P < .001), did not improve with the addition of diabetes and hypertension in model 4 (0.909; 95% CI, 0.893-0.925; P = .40), and did improve with the inclusion of blood pressure and body weight in model 5 (0.915; 95% CI, 0.899-0.931; P < .001) and laboratory values in model 6 (0.917; 95% CI, 0.901-0.933; P < .001). Despite a similar C statistic, the AIC was lower for model 6 than for model 5 (4432 vs 4463, respectively). The inclusion of all variables in model 7 improved both the C statistic and the AIC compared with model 3 (0.921 [95% CI, 0.905-0.937] vs 0.910 [95% CI, 0.894-0.926] and 4378 vs 4520, respectively). Given these results, models 1, 4, and 5 were not considered in further evaluation steps.
eTable 2 compares discrimination of models 2, 3, 6, and 7 overall and in subgroups in the development and validation cohorts. In both cohorts, the C statistic was higher for model 6 compared with models 2 and 3 in the entire population, and in subgroups (P < .001 for all comparisons). In the validation cohort, no further improvement was observed with the additional nonlaboratory variables (0.835; 95% CI, 0.819-0.851 vs 0.841; 95% CI, 0.825-0.857; P = .90 for model 7 vs model 6). eTable 3 shows discrimination at years 1, 3, and 5. At all times, both the C statistic and integrated discrimination improvement were greater for model 6 compared with models 2 and 3 (P < .001 for all comparisons). Model 6 was more accurate than model 3 (integrated discrimination improvement, 3.2%; 95% CI, 2.4%-4.2%).
The Figure shows observed vs predicted probability of kidney failure at 3 years for models 2, 3, and 6 in the validation cohort. The mean absolute difference between the observed and predictive probabilities over quintiles of risk was lower with model 6 compared with models 2 and 3 (1.9% vs 2.7% and 3.1%, respectively), and the Nam and D’Agostino χ2 statistic also indicated improved fit with model 6 compared with models 2 and 3 (χ2 statistic, 19 vs 37 and 32, respectively).
The NRI risk for CKD stages 3 and 4 overall (Table 3) and by relevant subgroups (eTable 4) in the validation cohort were analyzed. Overall, model 6 outperformed models 2 and 3 with an NRI of 50.4% (95% CI, 42.7%-58.1%) and 8.0% (95% CI, 2.1%-13.9%), respectively, for CKD stage 3, and 26.7% (95% CI, 20.1%-33.3%) and 4.1% (95% CI, −0.5% to 8.8%), respectively, for CKD stage 4. The improvement in reclassification was similar across relevant subgroups.
Reclassification analysis examining the effect of different thresholds for definitions of risk categories for CKD stages 3 and 4 showed consistent improvement with model 6 over models 2 and 3. Similarly, a category-less NRI showed a 30% (95% CI, 16%-44%) improvement in the overall reclassification for model 6 compared with model 3. Survival analysis using the competing risk approach showed no significant differences between the observed Kaplan-Meier method estimate and the cumulative incidence function, or between the predicted probabilities of kidney failure from the Cox proportional hazards regression model and the competing risk model (eFigure 2 and eFigure 3).
We have developed and validated a set of risk prediction models for progression to kidney failure among patients with moderate to severe CKD. Accuracy was robust across relevant subgroups. Our models use laboratory data that are obtained routinely in patients with CKD and could be easily integrated into a laboratory information system or a clinic EHR. The incremental improvement in NRI and integrated discrimination improvement using our best model (model 6) compared with simpler models (models 2 and 3) is greater than the improvements observed with recent refinements to prediction tools used in other chronic diseases, such as the addition of C-reactive protein to traditional cardiovascular risk factors and the addition of serum sodium concentration to the Model for End-Stage Liver Disease score.27,28
Among patients with CKD, there can be considerable heterogeneity in risk for progression to kidney failure. For example, Table 4 shows clinical and laboratory findings in 2 hypothetical patients with the same estimated GFR and kidney failure risk predictions from models 2, 3, and 6. Predictions based on estimated GFR alone are not sufficient for risk prediction in these 2 patients. The addition of the laboratory variables in models 3 and 6 provides a substantially different risk prediction compared with model 2. Compared with model 3, model 6 increases the predicted risk by 9.7% for patient A and decreases the risk by 2.9% for patient B.
Risk prediction has gained increasing attention over the last 2 decades, with emerging literature suggesting improved patient outcomes with individualized risk prediction and with advances in information technology that allow for easy implementation of risk prediction models as components of EHRs.29- 33 The availability of these risk prediction tools and their inclusion in clinical practice guidelines have led to better adherence to treatment guidelines and have encouraged individual decision making.31- 33 Despite these benefits, the lack of easily applicable and externally validated models have delayed the widespread integration of risk prediction in all fields of medicine.34,35
Our models rely on demographic data and laboratory markers of CKD severity to predict the risk of future kidney failure. Similar to previous investigators from Kaiser Permanente and the RENAAL study group,15,16 we find that a lower estimated GFR, higher albuminuria, younger age, and male sex predict faster progression to kidney failure. In addition, a lower serum albumin, calcium, and bicarbonate, and a higher serum phosphate also predict a higher risk of kidney failure and add to the predictive ability of estimated GFR and albuminuria. These markers may enable a better estimate of measured GFR or they may reflect disorders of tubular function or underlying processes of inflammation or malnutrition.36,37
Although these laboratory markers have also previously been associated with progression of CKD, our work integrates them all into a single risk equation (risk calculator and eTable 5, and smartphone app, available at http://www.qxmd.com/Kidney-Failure-Risk-Equation). In addition, we demonstrate no improvement in model performance with the addition of variables obtained from the history (diabetes and hypertension status) and the physical examination (systolic blood pressure, diastolic blood pressure, and body weight). Although these other variables are clearly important for diagnosis and management of CKD, the lack of improvement in model performance may reflect the high prevalence of these conditions in CKD and imprecision with respect to disease severity after having already accounted for estimated GFR and albuminuria.
Our risk prediction models have important implications for clinical practice, research, and public health policy. For example, in CKD stage 3, the relative contribution of the nephrologist vs the primary care physician to CKD care is uncertain.38 Using our models, lower-risk patients could be managed by the primary care physician without additional testing or treatment of CKD complications; whereas, higher-risk patients could receive more intensive testing, intervention, and early nephrology care. Similarly, in CKD stage 4, the timing of appropriate predialysis interventions remains uncertain.39 Using our models, different risk thresholds could be used to triage patients for decisions regarding dialysis modality education, vascular access creation, and preemptive transplantation. Furthermore, our models could be used to select higher-risk patients for enrollment in clinical trials and for evaluation of risk-treatment interactions.40,41 In addition, our models may also be useful for identifying high-risk patients with CKD stage 3 for public health interventions, thereby improving the cost-effectiveness of CKD care.42
The strengths of our analysis are the development and validation of highly accurate prediction tools. The models are practical because all the variables included are collected routinely in clinical care for CKD. The models can be easily integrated into laboratory information systems and clinic EHRs, enabling integration of a risk prediction tool as a clinical decision support aid in outpatient care. In addition, the models were developed in a single-payer health care system, thereby disparities in referral patterns and CKD care were less likely to be present.43 In addition, we have proposed several possible applications for our risk prediction tools in clinical decision making for CKD stages 3 and 4, which can be tested in pragmatic cluster randomized trials and formal cost-effectiveness analyses, that could help translate our models to the bedside.
Our analysis has a few limitations. First, our study population consisted of 2 distinct nephrology referred CKD populations; therefore, the generalizability of our findings to a nonreferred CKD population remains to be determined. Second, although we included patients with a wide spectrum of CKD and ranges of estimated GFR, certain ethnic minorities with increased risk for kidney failure, such as black and Native American individuals are underrepresented. Third, we did not explicitly model the risk of all-cause mortality in our CKD population. Although patients with CKD are at higher risk for mortality, knowing the probability of kidney failure among the survivors has clinical use. In addition, we did not observe a systematic overprediction of kidney failure risk in our models and using a competing-risk approach did not significantly change our predicted probabilities.
In conclusion, we have developed and validated highly accurate predictive models for progression of CKD to kidney failure. Our best model uses routinely available laboratory data and can predict the short-term risk of kidney failure with accuracy and could be easily implemented in a laboratory information system or an EHR. External validation in multiple diverse CKD cohorts and evaluation in clinical trials are needed.
Corresponding Author: Navdeep Tangri, MD, FRCPC, Department of Medicine, Tufts Medical Center, 800 Washington St, PO Box 391, Boston, MA 02111 (email@example.com).
Published Online: April 11, 2011. doi:10.1001/jama.2011.451
Author Contributions: Dr Tangri had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Drs Naimark, Levin, and Levey contributed equally to this article.
Study concept and design: Tangri, Djurdjev, Naimark, Levin, Levey.
Acquisition of data: Tangri, Djurdjev, Naimark, Levin, Levey.
Analysis and interpretation of data: Tangri, Stevens, Griffith, Tighiouart, Naimark, Levin, Levey.
Drafting of the manuscript: Tangri, Levey.
Critical revision of the manuscript for important intellectual content: Tangri, Stevens, Griffith, Tighiouart, Djurdjev, Naimark, Levin, Levey.
Statistical analysis: Tangri, Griffith, Tighiouart, Djurdjev.
Obtained funding: Tangri, Levey.
Administrative, technical, or material support: Tangri, Djurdjev, Naimark, Levin, Levey.
Study supervision: Stevens, Griffith, Naimark, Levin, Levey.
Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Funding/Support: This work was supported by the KRESCENT postdoctoral fellowship award and William B. Schwartz Nephrology Fund. The KRESCENT award is a joint initiative of the Kidney Foundation of Canada, the Canadian Institute of Health Research, and the Canadian Society of Nephrology. Dr Naimark was supported by an operating grant from the Change Foundation of Ontario.
Role of the Sponsors: The funding organizations had no role in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.
Additional Contributions: Daniel Schwartz and Chan Kruse (both from QxMD) provided in-kind assistance with smartphone application development. No compensation was received.