Figure 1. Stepwise methods for creating a 6-variable model based on area under the receiver operator characteristic curve (AUROC) values.
Figure 2. Diminishing returns of additional variables on area under the receiver operator characteristic curve (AUROC). The AUROC values for the top 5 ranked models within each stage are shown.
Anderson JE, Lassiter R, Bickler SW, Talamini MA, Chang DC. Brief Tool to Measure Risk-Adjusted Surgical Outcomes in Resource-Limited Hospitals. Arch Surg. 2012;147(9):798-803. doi:10.1001/archsurg.2012.699
Author Affiliations: Department of Surgery, University of California, San Diego.
Objectives To develop and validate a risk-adjusted tool with fewer than 10 variables to measure surgical outcomes in resource-limited hospitals.
Design All National Surgical Quality Improvement Program (NSQIP) preoperative variables were used to develop models to predict inpatient mortality. The models were built by sequential addition of variables selected based on their area under the receiver operator characteristic curve (AUROC) and externally validated using data based on medical record reviews at 1 hospital outside the data set.
Setting Model development was based on data from the NSQIP from 2005 to 2009. Validation was based on data from 1 nonurban hospital in the United States from 2009 to 2010.
Patients A total of 631 449 patients in NSQIP and 239 patients from the validation hospital.
Main Outcome Measures The AUROC value for each model.
Results The AUROC values reached higher than 90% after only 3 variables (American Society of Anesthesiologists class, functional status at time of surgery, and age). The AUROC values increased to 91% with 4 variables but did not increase significantly with additional variables. On validation, the model with the highest AUROC was the same 3-variable model (0.9398).
Conclusions Fewer than 6 variables may be necessary to develop a risk-adjusted tool to predict inpatient mortality, reducing the cost of collecting variables by 95%. These variables should be easily collectable in resource-poor settings, including low- and middle-income countries, thus creating the first standardized tool to measure surgical outcomes globally. Research is needed to determine which of these limited-variable models is most appropriate in a variety of clinical settings.
Many efforts have been made to define, measure, and evaluate quality surgical care, but these programs tend to focus on hospitals in urban areas, missing many suburban or rural hospitals and completely overlooking low- and middle-income countries (LMICs). In the United States, the most well known include the American College of Surgeons National Surgical Quality Improvement Program (NSQIP),1 the Surgical Care Improvement Project,2 and the Leapfrog Group's surgical care standards.3 Many of these programs focus their research on data from urban and large suburban hospitals and target their programs toward these hospitals. For example, NSQIP collects data on more than 130 variables and includes a 30-day patient follow-up. The cost of participation in this quality improvement program is prohibitory for many small, rural medical centers. The NSQIP recently launched their small and rural program for hospitals that are designated rural by zip code or have fewer than 1680 “NSQIP eligible cases,” but this may miss many medium-sized hospitals in nonurban areas that may be too large for this program or too small to feasibly participate in the original NSQIP.1
In addition, surgical quality improvement programs have largely been isolated in developed countries. To improve global surgery, quality measurement tools must be developed to be broadly and internationally applicable. Allowing hospitals in resource-limited countries to participate in surgical quality improvement efforts through the development of a simplified tool to measure surgical outcomes is the next critical step to improving surgical outcomes globally.
This research seeks to develop and validate a risk-adjusted tool with a limited number of variables to expand risk-adjustment outcomes research to all of the world's surgical settings. This approach will provide the first step to compare risk-adjusted outcomes over time within a given nonurban hospital, between nonurban hospitals, and between urban and nonurban hospitals at a much lower cost. This research creates an important new model for quality improvement and will help establish a system to benchmark surgical outcomes in nonurban hospitals. Ultimately, this research seeks to create pathways to raise standards of health care of all hospitals to the next level.
Patient data from NSQIP from 2005 to 2009 were used to build a tool with a limited number of variables to predict inpatient mortality. This nationally validated program measures more than 130 variables on each patient and includes a 30-day patient follow-up.4 This data set was chosen for its breadth of variables available for each patient, both preoperatively and postoperatively.
A 6-variable tool was built using a list of all preoperative variables included in the NSQIP database, a total of 66 variables, to predict inpatient mortality. All continuous variables were kept as such except for age, which was grouped into 10-year categories.
We performed a 6-stage process to add each additional variable sequentially (Figure 1). For each stage, logistic regression was performed to predict inpatient death. After each regression, the area under the receiver operator characteristic curve (AUROC) for each model was calculated. The AUROC value is a discriminative measure to identify how well a model separates 2 groups (ie, survivors vs nonsurvivors). Quiz Ref IDAn AUROC value of 0.5 would indicate that the model separated the 2 groups no better than chance, whereas an AUROC value of 1.0 would indicate that the model completely separated the 2 groups. The AUROC statistic is actually the percentage of randomly selected pairs that are correctly predicted by the model. Thus, the AUROC value allows us to see which model can more accurately discriminate between the 2 groups of interest.5- 8
In stage 1, simple logistic regression was performed with each variable to predict inpatient death. The variable with the highest AUROC value to predict inpatient death was chosen from this first stage and used as the basis for stage 2. In stage 2, all other variables were added to the top variable chosen from stage 1. Multivariate logistic regression with inpatient death as the outcome was performed again for each variation of this 2-variable model, and AUROC values were found. The models with the top 5 AUROC values were chosen and used as the basis for stage 3. The method for stages 3 through 6 was the same as in stage 2: each additional variable was added to the 5 models chosen from the previous stage, multivariate logistic regression was performed to predict inpatient death, and the AUROC value was found. The 5 models with the highest AUROC value would become the basis for the next stage. This process was repeated until we created 6-variable models.
The models with the highest AUROC value at each stage were plotted to observe the diminishing returns of AUROC by each additional variable added (Figure 2).
The models with the highest AUROC value were validated using patient data from a 110-bed hospital with a level IV trauma center that serves a community of approximately 25 000 people in California. A retrospective medical record review of 239 surgical patients from 2009 to 2010 was conducted to collect data on each variable of interest. Patients were chosen to represent a random sampling of common, low-mortality operations performed at this hospital (40 procedures on 153 patients) and less common, high-mortality procedures (18 procedures on 86 patients). Common procedures were found by ranking International Classification of Diseases, Ninth Revision (ICD-9) procedure codes. High-mortality procedures were found by ranking ICD-9 procedures among patients who died. Endoscopic procedures were excluded. A random number of patients from procedures in each group were chosen to obtain a representative sample of both common and high-mortality operations performed at this hospital.
Patient data from this hospital were used to validate the models by rerunning the original multivariate logistic regressions and calculating AUROC values. Pseudo-R2 values were also found for these models. Some variables, such as albumin, international normalized ratio, blood urea nitrogen, cancer status, ascites status, and surgical specialty of the surgeon, were not identified from medical record reviews; models with these variables were not available to include in the validation.
Statistical analysis was performed with Stata statistical software, version 11.0 (StataCorp). Statistical significance was defined as P < .05. This study received approval from the University of California, San Diego, Institutional Review Board.
Data from 631 449 patients from 2005 to 2009 were considered from the NSQIP database to create the limited risk-adjustment model, and data from 239 patients from 2009 to 2010 from the validation hospital were used to assess the risk-adjustment model (Table 1). Mean age and sex distribution are similar between the 2 study populations. By race, Hispanics constitute most cases at the validation hospital, whereas whites constitute most cases in the NSQIP data set.
Quiz Ref IDThe American Society of Anesthesiologists (ASA) physical status classification had the highest AUROC value (0.8479) in a single-variable model to predict inpatient mortality (Table 2). The top variables were ASA classification, albumin, functional status, age, sepsis status, and preoperative hematocrit. Combinations of these variables made up the 2- and 3-variable models. In the 4-variable model, emergency status and wound classification were added as significant variables. In the 5-variable model, cancer status, surgeon specialty, and ascites emerged as significant variables. In the 6-variable model, weight loss also emerged as a significant variable, but it is possible this is a surrogate for cancer status.
Quiz Ref IDUsing patient data from the validation hospital, the model with the highest AUROC value was a 3-variable model with age, ASA classification, and functional status (AUROC value of 0.9398) (Table 2). The model with the next highest AUROC value was a 2-variable model with ASA classification and functional status (AUROC value of 0.9290).
The AUROC values greater than 90% were achieved after only 3 variables (Figure 2). The AUROC values increased to 91% with 4-variable models and almost 92% with 6-variable models. There is little additional gain in AUROC for a 5- or 6-variable model compared with a 3- or 4-variable model. Including all 66 preoperative variables resulted in an AUROC value of 0.9104 (pseudo-R2 = 0.3342), approximately the same AUROC value achieved with only 4 variables.
We found that 3 or 4 variables may be sufficient for adequate risk adjustment to measure surgical outcomes. We achieved AUROC values of greater than 90% with only 3 variables. On a scale of 0.5 to 1.0, with an AUROC value of 0.5 indicating that the model cannot distinguish between 2 groups any better than change and an AUROC value of 1.0 indicating that the model completely discriminates between the 2 groups, an AUROC value of greater than 90% is substantial.
Our data provide several examples of risk-adjustment models that may be appropriate for hospitals in resource-limited settings. In particular, a 3-variable model with ASA class, functional status, and age was found to have high discrimination within our nonurban validation hospital. However, the data presented allow for a wide range of possible risk-adjustment models, allowing surgical systems to choose the most appropriate model given their unique resources. For example, although it may be possible for hospital systems in one area to collect preoperative laboratory values, such as albumin or hematocrit, other hospital systems may find it easier to collect information on ASA classification or functional status.
Other studies found that a model based on only a few variables may provide enough discrimination to measure surgical outcomes. Rubinfeld et al9 found the AUROC value for mortality decreased only slightly from 0.907 using all variables to 0.902 using 10 variables and argue that only a few variables are required for predictive accuracy. Dimick et al10 found that limited models based on 5 or 12 variables had comparable discrimination to a 21-variable model using receiver operator characteristics. Birkmeyer et al11 also found high correlation between a 5-variable and a 20-variable morbidity risk model and recommended that the new version of the NSQIP have no more than 5 to 10 core covariates.
Quiz Ref IDThere is some concern that ASA class and functional status are not reliable measures because they are more subjective. Some data suggest that there is a lack of interrater reliability in assigning ASA class.12- 14 Davenport et al15 found that although ASA class was the strongest single predictor of outcomes, combinations of other risk variables without ASA class were better predictors than ASA class alone. However, ASA class was significantly correlated with 57 of 59 NSQIP preoperative risk factors.15 In addition, Cohen et al16 did not find evidence that ASA class and functional status were inconsistently classified and argue that they improve model quality and should be used in surgical risk-adjusted assessments. Dimick et al10 also found that ASA class and functional status were the most important variables in all risk-adjustment models. Furthermore, ASA class and functional status were 2 of the most predictive preoperative risk variables of postoperative morbidity in the National Veterans Affairs Surgical Risk Study17 and have been shown to predict operative outcomes in specific procedures.18,19 Disagreement rates between ASA class and functional status, as well as other NSQIP variables, have also improved since implementation (functional status before operation: 11.38% in 2005 to 3.4% in 2008; ASA class: 2.65% in 2005 to 1.82% in 2008); the authors argue that this is possibly due to data collection training and ongoing support.20
This study is strengthened by the fact that we developed our model using data from a large multicenter database from multiple years. Another strength of this study is that it was validated using patient data from a smaller nonurban hospital, using data from both common procedures and less common, high-mortality procedures. Validating our study findings enabled us to judge the practicality of collecting such variables in a resource-limited setting and, in this case, in a setting that has not yet moved to electronic medical records. Our validation process also provided additional information as to which variables had the highest discrimination among this population. This study is also strengthened because we included data on all surgical patients. Some quality improvement programs focus specifically on certain surgical specialties. By using all patients in NSQIP and validating our models using a mix of surgical patients (including patients with the most common procedures performed and those with less common but higher-mortality procedures), our findings can be widely applicable to a variety of surgical fields.
One limitation of this study is that some of the top variables from our models created by the NSQIP data were unable to be collected from our validation hospital because they were not easily obtained through the paper medical record review. However, the 2- and 3-variable models using data from the validation hospital had very high AUROC values, indicating that the additional missing variables would be unlikely to significantly affect the results. Another limitation is that there are likely to be coding errors, both in the NSQIP data and in data from the validation hospital. However, these errors are likely to be evenly and randomly distributed and thus should not affect our conclusions. Furthermore, coding errors will also be a reality when this model is used, so any coding errors present in our current data are likely to be similar to those encountered by this model in practice.
Our study has global implications. Although participation in programs such as the NSQIP offers administrative support and comparison of outcomes among participating hospitals, the low-cost options reported can expand the number of hospitals that participate in risk-adjustment outcomes analysis and quality improvement programs. Our work also allows the expansion of risk-adjustment outcomes research to LMICs. With minimal training, 3 or 4 variables can be easily and efficiently collected by existing hospital personnel at small or resource-limited hospitals in both developed and LMICs with limited costs. From these variables, a hospital's observed-to-expected ratio can be calculated to make comparisons about outcomes. Quiz Ref IDBy offering a simplified risk-adjustment tool, we can compare surgical outcomes among hospitals on a global scale, regardless of the spectrum of surgical procedures offered or hospital resources.
The area of global surgery has focused primarily on issues of access, which are still problematic in many LMICs. However, we should also begin to examine the process and outcomes of a hospital's surgical system to develop more appropriate and cost-effective interventions. Evaluating surgical outcomes requires risk adjustment to take patient variability into account. Our study suggests that simple but sufficient risk adjustment can be achieved in these settings. Future validation in an LMIC setting would be valuable.
Future risk-adjustment models should also consider surgical complications and morbidity, in addition to mortality. Although in-hospital mortality is simple to collect and the ultimate outcome, other outcomes, such as complications and morbidity, should not be overlooked. Other important outcome indicators are disability-adjusted life-years, which can be used to measure reductions in premature death and disability as a result of an intervention.21,22 Disability-adjusted life-years are commonly used in LMICs, particularly in public health efforts aimed at infectious diseases. By considering disability-adjusted life-years as an outcome measurement, we can begin to quantify surgical outcomes in terms of the amount of reduction of death or disability and have a better understanding of the cost-effectiveness of surgical interventions, which is particularly crucial information in resource-limited settings.23
Furthermore, surgical quality assessments must include considerations of structure, process, and outcomes to evaluate and improve the entire system of surgical care.24 We encourage the World Health Organization to expand their Tool for Situational Analysis to Assess Emergency and Essential Surgical Care to include data collection on preoperative variables to perform adequate risk-adjustment analyses.25 With these additional data, the situational analysis tool can help record and compare risk-adjusted surgical outcomes within and among hospitals in LMICs. In conclusion, we propose that future risk-adjustment tools be based on 6 or fewer variables to allow for surgical outcomes to be measured and compared within and among hospitals in resource-limited settings.
Correspondence: Jamie E. Anderson, MPH, Department of Surgery, University of California, San Diego, 200 W Arbor Dr, Ste 8400, San Diego, CA 92103 (email@example.com).
Accepted for Publication: March 9, 2012.
Published Online: May 21, 2012. doi:10.1001 /archsurg.2012.699
Author Contributions:Study concept and design: Anderson, Lassiter, Bickler, Talamini, and Chang. Acquisition of data: Anderson and Chang. Analysis and interpretation of data: Anderson, Lassiter, Bickler, and Chang. Drafting of the manuscript: Anderson, Bickler, and Chang. Critical revision of the manuscript for important intellectual content: Anderson, Lassiter, Bickler, Talamini, and Chang. Statistical analysis: Anderson, Lassiter, and Chang. Obtained funding: Anderson, Bickler, Chang, and Talamini. Administrative, technical, and material support: Bickler and Chang. Study supervision: Bickler and Chang.
Financial Disclosure: None reported.
Funding/Support: Ms Anderson was supported by award T35HL007491 from the National Heart, Lung, and Blood Institute. Dr Chang is partially supported by SCANNER grant R01 HS19913-01 awarded by the Agency for Healthcare Research and Quality. Drs Chang and Bickler are also partially supported by the Medical Education Partnership Initiative grant 1R24TW008910-01 awarded by the National Institutes of Health Fogarty International Center.
Previous Presentation: This paper was presented as a podium presentation at the 83rd Annual Meeting of the Pacific Coast Surgical Association; February 20, 2012; Napa Valley, California, and is published after peer review and revision.
Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
Additional Contributions: The NSQIP and its participating hospitals are the source of data used in this research; they have not verified and are not responsible for the statistical validity of the data analysis or conclusions of the authors.