Figure 1. Integer risk score and corresponding predicted probability of 30-day mortality.
Figure 2. Integer risk score and corresponding predicted probability of 90-day mortality.
Figure 3. Calibration plot comparing observed and predicted probabilities of 30-day mortality (area under the receiver operating characteristic curve, 0.74; Hosmer-Lemeshow, P = .62) in the validation cohort.
Figure 4. Calibration plot comparing observed and predicted probabilities of 90-day mortality (area under the receiver operating characteristic curve, 0.73; Hosmer-Lemeshow, P = .87) in the validation cohort.
Venkat R, Puhan MA, Schulick RD, Cameron JL, Eckhauser FE, Choti MA, Makary MA, Pawlik TM, Ahuja N, Edil BH, Wolfgang CL. Predicting the Risk of Perioperative Mortality in Patients Undergoing PancreaticoduodenectomyA Novel Scoring System. Arch Surg. 2011;146(11):1277-1284. doi:10.1001/archsurg.2011.294
Author Affiliations: Department of Surgery, The Johns Hopkins University School of Medicine (Drs Venkat, Schulick, Cameron, Eckhauser, Choti, Makary, Pawlik, Ahuja, Edil, and Wolfgang), and the Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health (Drs Venkat and Puhan), Baltimore, Maryland.
Objective To develop and validate a risk score to predict the 30- and 90-day mortality after a pancreaticoduodenectomy or total pancreatectomy on the basis of preoperative risk factors in a high-volume program.
Design Data from a prospectively maintained institutional database were collected. In a random subset of 70% of patients (training cohort), multivariate logistic regression was used to develop a simple integer score, which was then validated in the remaining 30% of patients (validation cohort). Discrimination and calibration of the score were evaluated using area under the receiver operating characteristic curve and Hosmer-Lemeshow test, respectively.
Setting Tertiary referral center.
Patients The study comprised 1976 patients in a prospectively maintained institutional database who underwent pancreaticoduodenectomy or total pancreatectomy between 1998 and 2009.
Main Outcome Measures The 30- and 90-day mortality.
Results In the training cohort, age, male sex, preoperative serum albumin level, tumor size, total pancreatectomy, and a high Charlson index predicted 90-day mortality (area under the curve, 0.78; 95% CI, 0.71-0.85), whereas all these factors except Charlson index also predicted 30-day mortality (0.79; 0.68-0.89). On validation, the predicted and observed risks were not significantly different for 30-day (1.4% vs 1.0%; P = .62) and 90-day (3.8% vs 3.4%; P = .87) mortality. Both scores maintained good discrimination (for 30-day mortality, area under the curve, 0.74; 95% CI, 0.54-0.95; and for 90-day mortality, 0.73; 0.62-0.84).
Conclusions The risk scores accurately predicted 30- and 90-day mortality after pancreatectomy. They may help identify and counsel high-risk patients, support and calculate net benefits of therapeutic decisions, and control for selection bias in observational studies as propensity scores.
The use of pancreaticoduodenectomy for the resection of periampullary tumors and proximal pancreatic neoplasms has become routine in high-volume centers. Although the perioperative mortality of this procedure has markedly decreased during the past 3 decades, it remains significant in relation to other operations of the gastrointestinal tract.1 With improvements in surgical techniques and critical care, studies2,3 from our institution have reported a substantial decrease in perioperative mortality from 30% in the 1970s to as low as 1% in the 2000s, although recent population-based studies4 have reported higher mortality rates in lower-volume centers, ranging from 3.5% to 8.3%. Despite this improvement in outcome, a relatively wide range of mortality rates have been reported for this operation. It is likely that patient factors, such as significant comorbidities, account for at least a portion of this variability.
Risk scores may serve various purposes. Pancreatic cancer, which is the most common diagnosis that requires a pancreatectomy, is a disease of the elderly, who may also be experiencing multiple comorbidities. The risks and benefits must be carefully assessed before subjecting these subsets of patients, as well as patients with a higher risk of perioperative mortality, to such major operations. Knowledge of the risk of morbidity and mortality after the procedure that is based on preoperative risk factors in each individual patient is critical in helping the patient to make his or her decision, as well as in satisfying the adequacy of an informed consent. Also, additional therapeutic options, such as salvage procedures, prolonged monitoring in the intensive care unit, or withholding surgery, may be considered in selected patients who may have a high risk of perioperative mortality, provided this information is available at the time of decision making. The net benefits of these therapeutic interventions, in terms of reduction of the risk of perioperative mortality, can also be calculated for individual patients using risk scores. They may also be used in clinical trials to identify certain risk groups that may be considered for stratification or exclusion from the trial to minimize confounding or for covariate adjustment and subgroup analysis in the trials. In observational studies, risk scores may serve the purpose of propensity scores to minimize selection bias.
There are studies5- 10 that have built predictive models to calculate the risk of perioperative mortality and/or long-term survival; however, some of these models5,7,8 use operative data to predict the outcome. This makes the score of limited use in preoperative patient counseling and risk stratification. Other models6,9 that are based on national population-based data may be inappropriate to use in the patient population of a tertiary referral institution if the range of the patient profile differs from one institution to another. Also, limited studies have incorporated preoperative laboratory parameters, such as albumin and creatinine levels, or characteristics of the tumor, such as tumor size or histologic diagnosis, which may play an important role in predicting the perioperative mortality with greater accuracy and make the score more specific to patient or tumor characteristics.
Our objective was to develop and validate a simple and easily applicable score based on readily available preoperative parameters to predict perioperative mortality in patients scheduled for pancreatectomy.
The study population (n = 1976) consisted of adult patients (≥18 years old) admitted to The Johns Hopkins Hospital and who subsequently underwent total pancreatectomy or pancreaticoduodenectomy (classic or pylorus-preserving) from January 1, 1998, through June 30, 2009. We included patients from 1998 to 2009 to ensure a contemporary patient population and to account for a potential decreasing trend in perioperative mortality for pancreatectomy over time.11 We excluded patients who underwent distal pancreatectomy because there was only 1 death within 90 days, out of a total of 209 cases, for this procedure (0.48% mortality). In patients who underwent subsequent procedures, only data from their primary surgery were included. We restricted our analysis to patients with a histologic diagnosis of benign cystic lesions, periampullary tumors, or neuroendocrine tumors (Table 1). All data were obtained from an institutional review board–approved pancreatectomy database that comprises prospectively collected patient data from January 1970 to June 2009 at The Johns Hopkins Hospital. The primary outcomes of 30- and 90-day mortality, defined as all-cause mortality occurring within 30 and 90 days of the date of surgery, respectively, were assessed from this institutional database and confirmed using the National Social Security Death Index. Information on patient comorbidities was available from the Hospital billing data, and we converted the International Classification of Diseases, Ninth Revision (ICD-9) codes obtained from this database for each patient admission to a Charlson comorbidity index.12
The predictors of mortality were selected a priori on the basis of clinical usefulness and biological plausibility. To facilitate the use of the model in clinical practice and to ensure building stable models, we chose to restrict the maximum number of predictive parameters to 10 and included only those factors that were readily available preoperatively. Variables included were age, sex, race, histologic diagnosis, type of surgery, preoperative serum albumin and serum creatinine levels, tumor size, and Charlson index. The covariates age, Charlson index, and albumin level were modeled as continuous variables, whereas sex, tumor size, creatinine level, histologic diagnosis, and type of surgery were categorical. Data on tumor size were obtained from surgical pathological evaluation; although it was obtained postoperatively, we included it as a predictor because previous studies13,14 have shown a close agreement between tumor size at computed tomography examination and pathological evaluation.
Standard approaches to developing and validating prediction scores were followed.15- 17 We used split-sample cross-validation in which a random sample of 70% of the study population was used to develop the multivariate model (training data set) and the remaining 30% of the study population was used for validating the model (validation data set). In the training data set, we fitted separate multivariable logistic regression models using backward stepwise selection for 30- and 90-day perioperative mortality. We entered all variables that were associated with the outcomes in univariate analyses (using the χ2 test for categorical variables and the t test for continuous variables) with P ≤ .25. Only variables that remained significant predictors of the outcome in the multivariate model at a significance level of P ≤ .25 on the basis of the likelihood ratio test18 or that changed the model's discrimination by at least 1% were included in the final model. Clinically related variables were examined for interaction. All predictor variables were assessed for correlations.
Development and validation of the prediction model involved assessing the discrimination and calibration of the model. Discrimination is the ability to distinguish between those who survive and those who die and was estimated using the concordance index, also known as the C statistic or the area under the receiver operating characteristic curve (AUC). The AUCs of the different models were evaluated in the training and the validation data sets. The final model that was subsequently converted into the prediction score was thus developed based on a combination of AUC and likelihood ratio test. To reduce the variability of the error estimate and to control for overfitting, we used bootstrapping methods with 1000 resamples.19 Additional internal validation of the final model was done using the jackknifing method,19 in which we systematically estimated the outcome by leaving out 1 observation at a time from the total patient sample. The average 30- and 90-day mortality in these subsets was then compared with that from the entire sample to estimate the bias of the latter. Discrimination and goodness of fit of the model were then assessed using this method.
We used shrinkage to limit the risks of the logistic regression model overestimating the associations of the predictors with the outcomes of 30- and 90-day mortality and to improve the accuracy of the model in other populations. The shrinkage factor represents a constant with which the regression coefficients of the predictor variables are multiplied. The magnitude of the factor depends on the number of predictors and the goodness of fit of the model and reflects the overoptimism of the multivariate model.20
It is critical for prognostic prediction models to be validated in a separate set of patients before being used in clinical practice.15- 17 This ensures that the predictions of the outcome based on the risk factors are reliable and accurate and thus can guide treatment decisions. To assess the calibration of our model, we used the Hosmer-Lemeshow goodness of fit test.21 Using this test, we compared the predicted and observed risks for 30- and 90-day mortality across all risk classes and assessed whether they differed significantly from one another.
Finally, to ensure its utility in clinical practice, the underlying regression equations were converted into a simplified integer score following an established approach used for the Framingham and the ADO (Age, Dyspnea, and airflow Obstruction) risk score.22,23 This method involved transformation of the regression coefficients into points that reflect their strengths of associations in which reference categories were assigned a value of zero. The theoretical range of points was calculated by summing up the points for each predictor. For each total score within the range of points, a specific risk for 30- and 90-day mortality, depending on the predictor variables for each outcome, was calculated using the regression equation. All the statistical analysis was performed using Intercooled Stata, version 11.0 (StataCorp, College Station, Texas).
Most of the patients had multiple comorbidities, with 75.0% (n = 1482) of the patient population having a Charlson index of 3 or higher (Table 2). A randomly selected sample of 70% of the total study cohort were a part of the training data set (n = 1383), and the remaining 30% were a part of the validation data set (n = 593). The training and the validation cohorts of patients were seen to be comparable in terms of demographic parameters (Table 3).
Of the 1383 patients randomly selected for building the prediction model, 20 patients (1.5%) and 53 patients (3.8%) had died within 30 and 90 days of surgery, respectively. Age, sex, tumor size, type of surgery, and preoperative serum albumin levels were predictors of 30-day mortality and age, sex, tumor size, Charlson index, type of surgery, and preoperative serum albumin levels were predictors of 90-day mortality (Table 4). Histologic diagnosis and creatinine levels were not included in the final model because neither were they significantly associated with the outcomes nor did they improve the predictive accuracy of the model. The Charlson index, although a predictor of 90-day mortality, failed to predict the 30-day perioperative mortality per our criteria. The AUCs of the 2 models in the training cohort were 0.79 (95% CI, 0.68-0.89) and 0.78 (0.71-0.85) for 30- and 90-day mortality, respectively, which indicated a good discrimination of patients who survived and died within the 30- and 90-day postoperative period. The point score derived from the regression model ranged from 0 to 15 for 30-day mortality, and from 0 to 23 for 90-day mortality, with a score of 0 corresponding to the lowest risk of both outcomes. A score of 15 corresponded to the highest risk of 30-day mortality (2.9%) and a score of 23 to the highest risk for 90-day mortality (26.7%) (Figure 1 and Figure 2 and Table 5).
The validation population comprised the remaining 593 patients. In this cohort, the predicted risk for 30-day mortality was 1.4%, whereas the observed risk was 1.0%. The predicted and observed 90-day mortality was 3.8% and 3.4%, respectively. The Hosmer-Lemeshow test confirmed that there were no statistically significant differences between observed and expected 30-day (P = .62) and 90-day (P = .87) mortality across risk groups. The AUCs in the validation cohort were 0.74 (95% CI, 0.54-0.95) and 0.73 (0.62-0.84) for 30- and 90-day mortality, respectively, indicating a good discrimination in the validation cohort as well (Figure 3 and Figure 4). We used the final model and validated it using the jackknifing method. The model showed a good predictive accuracy, consistent with the split-sample method of validation for 30-day mortality (AUC, 0.72; Hosmer-Lemeshow test, P = .36) and 90-day mortality (AUC, 0.75; Hosmer-Lemeshow test, P = .09).
We have developed and validated 2 integer scores to predict the perioperative mortality within 30 and 90 days after pancreatectomy for patients with benign and malignant tumors. The scores, based on 5 parameters for 30-day mortality and 6 parameters for 90-day mortality, showed good discrimination and calibration.
Based on these scores, the perioperative risk can be assessed. For instance, an 82-year-old male patient with a 3.5-cm tumor, a serum albumin level of 2.7 g/dL (to convert serum albumin to grams per liter, multiply by 10), and a Charlson index of 5 scheduled for a total pancreatectomy would have a predicted risk of 30-day postoperative mortality of 2.2% (risk score of 14; 95% CI, 2.01-2.49) irrespective of his Charlson index. On the other hand, his risk of 90-day perioperative mortality would be 8.3% (risk score of 18; 95% CI, 7.78-9.23). This may also suggest that the 90-day mortality may be a better benchmark to judge the outcome of the procedure than the 30-day mortality, as observed by the differences in the risk of perioperative mortality. For the same patient, the risk for death in the 90-day perioperative period is more than 2-fold compared with the 30-day mortality. Thus, for a complete risk assessment, more emphasis should be given to the 90-day mortality rate rather than the 30-day mortality rate.
Risk scores have the utility of combining information available at a stage of clinical evaluation where critical decisions may have to be made. The main goal of our scores is the preoperative evaluation of postoperative mortality. Each patient's risk for perioperative mortality can be calculated before surgery so that interventions, such as improvements in patient care, may be directed in high-risk patients during the entire perioperative period. In addition, patients who have a higher risk of mortality on the basis of the score may be provided this information, thus helping them better understand the risks of going through their surgery as well as satisfying the adequacy of an informed consent. This makes our score more useful in preoperative risk stratification and patient counseling compared with the surgical Apgar and the POSSUM (Physiologic and Operative Severity Score for the enUmeration of Mortality and Morbidity) scores,5,7 which also include intraoperative factors. Although intraoperative parameters, such as blood loss or vein resection, may still be considered to update the preoperative risk prediction and to further support the decision making for the postoperative management, such a score would again require development and validation. We have also incorporated important independent predictors of mortality, such as preoperative serum albumin level and tumor size, that may enhance the predictive accuracy of our score compared with similar studies in the past6,9 (Table 6).
Strengths of this study include a large sample size that allowed us to develop and validate the prediction score. We have used advanced statistical techniques, such as bootstrapping and shrinkage, which may increase the applicability and validity of the score in other related settings. We have used parameters of accuracy, such as discrimination and calibration. Assessment of these parameters is indispensable before the use of a risk score in clinical practice. We have also used 2 separate validation techniques, split-sample cross-validation and jackknifing methods, to assess the performance of the model. Our study uses institutional data rather than administrative data and hence would not be associated with the potential risks and limitations of using claims and registry data.24
A potential limitation of our study may be that the patient cohort of a single center was used for the development of the model. Because it was a high-volume tertiary referral hospital, the characteristics of the patients may be different from those observed in lower-volume centers. The results of the score should thus be interpreted with caution in centers that may not receive a similar patient population. However, this does not mean that our scores would not perform adequately in other populations, but they may need some simple recalibration of, for example, the risk of mortality. We have used the established correlations between postoperative pathologic findings and preoperative imaging techniques13,14 to obtain data on tumor size and used it as a proxy for assessing the strength of the association between tumor size and perioperative mortality. Further validation with data on tumor size using preoperative computed tomography may be warranted to test the performance of the scores and to update them, if required.
Our scores also need to be validated and updated in other populations before they are widely used. The range of the patient profiles and the incidence of perioperative mortality may be different in our patient population compared with other institutions. This has been the case for the widely used Framingham risk score and the ADO risk score, which have been updated for populations differing in the incidence of the outcome.22,25 A multi-institutional validation would further strengthen the external validity of the score. It would also be interesting to study the impact of other preoperative, intraoperative, and postoperative risk factors on the risk of mortality as well as morbidity and to update future prediction models to incorporate these parameters. This would help target specific interventions based on the risk stratification to prevent these outcomes as well as to adequately counsel the patients at each stage of therapy.
In conclusion, we have developed and validated a risk score to accurately predict the 30- and 90-day perioperative mortality in patients undergoing pancreatectomy. The risk score will be useful in identifying patients at high risk for perioperative mortality on the basis of simple and easily obtained preoperative risk factors. This information will help in supporting important therapeutic decisions as well as assist in the realization of interventions to improve patient care in high-risk individuals and to calculate their net benefits. The risk scores may be used for risk stratification in clinical trials and, as propensity scores, may be useful in controlling for selection bias in observational studies. The score will also be beneficial in making individual patients better understand the risks of surgery based on their preoperative characteristics and augment the adequacy of an informed consent.
Correspondence: Christopher L. Wolfgang, MD, PhD, Department of Surgery, Johns Hopkins Hospital, 600 N Wolfe St, Osler 624, Baltimore, MD 21287 (firstname.lastname@example.org).
Accepted for Publication: May 16, 2011.
Author Contributions:Study concept and design: Venkat, Puhan, Schulick, Cameron, Eckhauser, Makary, Ahuja, Edil, and Wolfgang. Acquisition of data: Venkat, Choti, Pawlik, and Wolfgang. Analysis and interpretation of data: Venkat, Puhan, Pawlik, and Wolfgang. Drafting of the manuscript: Venkat, Puhan, Schulick, Eckhauser, Makary, Ahuja, Edil, and Wolfgang. Critical revision of the manuscript for important intellectual content: Venkat, Puhan, Cameron, Eckhauser, Choti, Pawlik, Ahuja, Edil, and Wolfgang. Statistical analysis: Venkat, Puhan, Pawlik, and Wolfgang. Obtained funding: Wolfgang. Administrative, technical, and material support: Venkat, Schulick, Makary, and Wolfgang. Study supervision: Venkat, Puhan, Schulick, Cameron, Eckhauser, Choti, Makary, Pawlik, Ahuja, Edil, and Wolfgang.
Financial Disclosure: None reported.