A, 786 of 1291 patients (60.9%) are categorized as having a long length of stay (LOS), while 3721 of 5561 patients (66.9%) are categorized as having a short LOS. B, 573 of 689 patients (83.2%) with a long intensive care unit (ICU) stay are correctly classified, as are 8052 of 11928 patients (67.5%) with no ICU stay. C, Of 529 patients who needed a ventilator, 503 (95.1%) are in the medium- or high-risk categories. D, Of 952 patients who will be discharged to a skilled nursing facility (SNF) 904 (94.9%) are in the medium- or high-risk categories.
The calendar view provides a look at the upcoming weeks and which cases are most risky. A user can click on an individual case to get more details. Each outcome has its own calendar view.
The monitoring view allows the user to observe how the predictive modeling has been performing. The green bars indicate correct predictions, the yellow bars indicate the overestimation of risk, and the red bars indicate the underestimation of risk. As reflected, the model was designed to overestimate vs underestimate risk.
eFigure 1. Distribution of Length of Stay (a) and ICU Length of Stay (b)
eTable 1. Predictor Variables Used
eFigure 2. Creating a Decision Rule for ICU Length of Stay
eFigure 3. Creating Decision Rule for Need for Ventilator
eFigure 4. Creating Decision Rule for Discharge to SNF
eTable 2. Performance of Decision Rules
eTable 3. Length of Stay Classifications
eTable 4. Performance in Training and Testing Data
eFigure 5. Area Under the Receiver Operator Characteristic
eFigure 6. Area under the Precision-Recall Curve
eFigure 7. Executive Summary Landing Page for the Tableau Dashboard
Customize your JAMA Network experience by selecting one or more topics from the list below.
Goldstein BA, Cerullo M, Krishnamoorthy V, et al. Development and Performance of a Clinical Decision Support Tool to Inform Resource Utilization for Elective Operations. JAMA Netw Open. 2020;3(11):e2023547. doi:10.1001/jamanetworkopen.2020.23547
How can clinical departments implement a clinical decision support tool to predict expected resource use to prioritize elective inpatient surgical procedures?
In this prognostic study, predictive models for length of stay, intensive care unit length of stay, mechanical ventilator requirement, and discharge disposition to a skilled nursing facility were developed using historical case data abstracted from the electronic health records of 42 199 patients. These models were integrated into an interactive online dashboard with end-user input and iteratively tested.
Predictive modeling, in conjunction with other contextualizing factors, can be used to inform how to recommence elective inpatient procedures after the coronavirus disease 2019 (COVID-19) pandemic.
Hospitals ceased most elective procedures during the height of coronavirus disease 2019 (COVID-19) infections. As hospitals begin to recommence elective procedures, it is necessary to have a means to assess how resource intensive a given case may be.
To evaluate the development and performance of a clinical decision support tool to inform resource utilization for elective procedures.
Design, Setting, and Participants
In this prognostic study, predictive modeling was used on retrospective electronic health records data from a large academic health system comprising 1 tertiary care hospital and 2 community hospitals of patients undergoing scheduled elective procedures from January 1, 2017, to March 1, 2020. Electronic health records data on case type, patient demographic characteristics, service utilization history, comorbidities, and medications were and abstracted and analyzed. Data were analyzed from April to June 2020.
Main Outcomes and Measures
Predicitons of hospital length of stay, intensive care unit length of stay, need for mechanical ventilation, and need to be discharged to a skilled nursing facility. These predictions were generated using the random forests algorithm. Predicted probabilities were turned into risk classifications designed to give assessments of resource utilization risk.
Data from the electronic health records of 42 199 patients from 3 hospitals were abstracted for analysis. The median length of stay was 2.3 days (range, 1.3-4.2 days), 6416 patients (15.2%) were admitted to the intensive care unit, 1624 (3.8%) received mechanical ventilation, and 2843 (6.7%) were discharged to a skilled nursing facility. Predictive performance was strong with an area under the receiver operator characteristic ranging from 0.76 to 0.93. Sensitivity of the high-risk and medium-risk groupings was set at 95%. The negative predictive value of the low-risk grouping was 99%. We integrated the models into a daily refreshing Tableau dashboard to guide decision-making.
Conclusions and Relevance
The clinical decision support tool is currently being used by surgical leadership to inform case scheduling. This work shows the importance of a learning health care environment in surgical care, using quantitative modeling to guide decision-making.
The novel coronavirus disease 2019 (COVID-19) has changed the provision of hospital- and clinic-based surgical care. As hospitals prepared for possible surges of infected patients requiring admission and possible intensive care (ICU) stay, entire institutions and health systems took stock of their resources to meet an uncertain demand. This included estimating an ever-fluctuating number of available beds, securing sufficient personal protective equipment and ventilators, minimizing staff shortages, and establishing protocols to mitigate against nosocomial infections. National specialty societies examined the lessons from their counterparts abroad1-3 and, along with federal and state agencies, issued guidelines for the evaluation of the urgency of procedures.4-6
Consequently, during the initial wave of infections starting in March of 2020, many hospitals—including our own—severely restricted elective procedures. Although few operations are optional, some are more urgent than others. This includes procedures such as joint arthroplasties, plastic and reconstructive operations, and even certain oncologic resections. Examples of procedures that were not included would be emergency procedures, such as laparotomy for penetrating trauma, and urgent cardiovascular interventions for patients already admitted to the hospital. National specialty societies published guidelines for determining how to selectively postpone operations without compromising outcomes.7,8 As this initial wave of COVID-19 infections slows in our region, we are beginning to address the backlog of operations that have been postponed and are starting to perform these procedures once more. Even still, we must contend with the potential for local surges in cases and a renewed demand for hospital resources.
With input from health system leaders and experts in infection control, our surgical leadership developed 4 principal questions that, taken together, could help determine how and when an elective inpatient operation could be performed. So, for surgical patients: (1) How long would they stay in the hospital? (2) Would they require an ICU stay and for how long? (3) Would they need a ventilator? (4) Were they likely to be discharged to a skilled nursing facility (SNF)?
These questions were modular, applicable across specialties, and represented the individual hurdles to assessing perioperative patient flow. While these questions are not the only ones necessary to consider when deciding whether to perform a case—issues around medical need, overall backlog, and revenue are factored in—they provide a basis for understanding the real consequences of these scheduling decisions with regard resource use during the pandemic.
While clinical decision support (CDS) has permeated medical care, there are few surgical CDS tools to predict resource utilization. Most work has been specialty and/or procedure specific9 or has addressed questions of comparative effectiveness.10,11 The role of predictive analytics in improving operating room efficiency has been touted for day-of-staffing and resource-allocation decisions.12,13 However, there remains a need for a CDS that could be used across all specialties to evaluate the demand for resources imposed by increasing case volumes.
In this prognostic study, we present both our process for developing a predictive model for each of these constraints and the platform across which this tool has been implemented and accessed. The platform enables surgeons to evaluate a current schedule of cases, their clinical characteristics, and the predicted resources required to perform them. This CDS enables surgical leadership to assess the imminent caseload, incorporate additional data on current hospital resource use, and determine whether a case can proceed or be postponed. Our overall study question was to see whether we could use historic data to develop a CDS tool for resource utilization applicable during the COVID-19 pandemic.
We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline for thedevelopment of predictive models. This study was declared exempt by the Duke University Health System (DUHS) institutional review board, and informed consent was waived, because the data were deidentified.
In March 2020, the DUHS curtailed elective operations owing to concerns regarding the spread of COVID-19. By the end of May, the DUHS recommenced these procedures on a more limited basis. The DUHS consists of 3 hospitals—Duke University Hospital, a tertiary care and level 1 trauma center (939 total beds and 88 ICU beds)—and 2 community hospitals—Duke Regional Hospital (295 total beds and 22 ICU beds) and Duke Raleigh Hospital (185 total beds and 15 ICU beds). Since 2014, the DUHS has used a common electronic health record (EHR) for inpatient and outpatient reviewing of medical records; the ordering of laboratory tests, medications, and radiology studies; and operating room scheduling using the Epic platform.
At initiation, we engaged stakeholders representing key constituencies, including clinical operations, anesthesiology, critical care, case management, surgeons, and hospital or health system leadership. Weekly meetings included focused discussions to identify and refine model inputs and outputs with stakeholders. Hospital leadership stakeholders provided periodic feedback regarding outputs and the user interface. This engagement process strongly informed the selection of clinically and operationally impactful CDS outputs.
We abstracted EHR data on patients who underwent an elective inpatient procedure performed at DUHS hospitals from January 1, 2017, to March 1, 2020. While there is no formal designation for elective procedures within the EHR system, we included all procedures with the admission source of “Surgery Admit Inpatient.” This is a classification for procedures in which a patient is admitted following the surgical intervention for inpatient postoperative care and is not admitted via the emergency department. We further excluded any procedures performed on Saturday or Sunday. Procedures performed in an ambulatory setting were not included because they are rarely associated with inpatient resource utilization.
We sought to predict 4 outcomes for each patient: overall hospital length of stay (LOS), ICU LOS, mechanical ventilator requirement, and discharge to an SNF. The LOS was determined as the total number of hours a patient spent in the hospital. The ICU LOS was determined as the total number of hours a patient was boarded in an ICU (including nonconsecutive days). Because the LOS often has an extreme rightward skew,14-16 both the LOS and the ICU LOS were categorized into clinically meaningful ordinal groups with the input of critical care experts (eFigure 1 in the Supplement). The LOS was categorized into fewer than 2 days, 2 to fewer than 4 days, 4 to fewer than 7 days, and 7 days or longer. The ICU LOS was grouped into 0 days, fewer than 2 days, and 2 days or more.
We abstracted patient- and procedure-specific data that would be known prior to the procedure date. These included demographic characteristics, medication history, comorbidities, and service utilization history (detailed in eTable 1 in the Supplement). Variables with high rates of incompleteness (eg, laboratory values) were excluded, totaling 44 unique predictors.
We randomly divided the data into training and testing sets (two-thirds and one-third, respectively). We considered different analytic methods, including regularized regression and ordinal regression, and ultimately decided on a random forest algorithm because of its performance on the training data and its suitability for the analytic task (eAppendix in the Supplement).17
We developed 4 separate predictive models (one for each outcome defined). Each random forest was grown to 2000 trees. For LOS, we created a multiclass model, predicting the probability of being in each of the 4 classes described. For ICU LOS, we used a 2-stage approach.18-20 First, we modeled the need for an ICU stay (yes or no) in a binary classification model. Then, among operations that resulted in a need for an ICU stay, we modeled a second binary classification model, which was used to identify an ICU LOS of fewer than 2 days (short) and of 2 or more days. Binary classification models were constructed to predict the need for a ventilator and for discharge disposition (SNF).
For each binary classification model (the 2-part ICU LOS model, need for ventilator, and SNF discharge disposition), we evaluated overall model performance using the area under the receiver operating characteristic (AUROC), the area under the precision recall curve (AUPRC), and the calibration slope in the cross-validated training data and independent test data. We evaluated the LOS model using cross-entropy loss and missclassification loss.
To facilitate decision-making, we generated predicted labels as opposed to predicted probabilities for each outcome, having found previously that such groupings are more interpretable to users.21 There are a variety of ways that cutpoints can be generated. We focused on optimizing the sensitivity (percentage of true events captured) of our classifications to mitigate concerns about exceeding available hospital capacity for postoperative care. Therefore, these classifications prioritized overestimation—rather than underestimation—of risk. The eAppendix in the Supplement details how we set model classification thresholds. In brief, we created the high- or medium-risk group to have a sensitivity of 95% (ie, most events would be in the higher-risk groups). Moreover, to engender confidence in the proposed classifications, we sought to optimize the positive predictive value—the percentage of patients in high-risk the group who will truly have the event. All cutpoints were created based on the cross-validated training data, and then the performance of these cutpoints was tested in the prespecified test set to evaluate model performance.
Statistical analyses were conducted in R version 3.6. The random forest model was fit with the ranger package.22
To aid decision support, we integrated the predictive models into a Tableau dashboard. Tableau is an interactive visualization tool, and we have described how we have integrated it with EHR data previously.23 Each morning by 6 am, scheduled cases for the next 30 days are extracted into a data table with the necessary clinical information to run the models. Our analytic script queries this table, generates predictions, and feeds them into the dashboard with other clinical information. We created 3 types of visualizations: an overall summary, a calendar interface for each outcome, and a model monitoring view. This allows surgeons and all coordinators of perioperative services to view both the predictors and the predicted outcomes for individual cases (ie, at the patient level), in the context of other upcoming scheduled cases.
From January 1, 2017, to March 1, 2020, we identified 42 199 elective surgical procedures across our 3 hospitals. Table 1 presents descriptive statistics across the 3 hospitals. The median age of the patients was 62 years (range, 49-71 years); 22 321 of 42 199patients (52.9%) were female. As expected, patients at our community hospitals had a higher burden of chronic disease.24 The 5 most commonly performed procedures were knee arthoplasty (n = 3539), hip arthoplasty (n = 3263), shoulder reconstruction (n = 1227), cervical diskectomy (n = 1162), and microsurgery (n = 1112), accounting for 10 128 of all 42 199 procedures (24.0%). The median LOS was 2.3 days (range, 1.3-4.2 days), with 9.8% of patients having a LOS longer than 7 days (eFigure 1 in the Supplement). Of the 42 199 procedures, 6416 (15.2%) involved an ICU stay, with 3208 (5.0%) of the 6416 patients requiring more than 2 days in the ICU. In addition, of the 42 199 patients, 1624 (3.8%) required a ventilator, and 2842 (6.7%) were discharged to an SNF.
After dividing into training (n = 28 130) and testing (n = 14 069) sets, we found that the overall predictive performance was quite strong in the training (based on cross-validation) and testing data and generally well calibrated (eTable 4 in the Supplement). The strongest models were found for predicting need for ICU (AUROC = 0.94) and need for ventilator (AUROC = 0.92). Predictions were strong but a bit worse for discharge to an SNF (AUROC = 0.84) and long vs short ICU stay (AUROC = 0.76).
Using the training data results, we created decision-rule thresholds for each of the outcomes based on sensitivity and positive predictive value (eFigures 2, 3, and 4 in the Supplement). For classifying LOS, 37121 of 5561 patients (66.9%) with a 0- to 2-day LOS and 786 of 1291 patients (60.9) with an LOS of 7 days or more were correctly classified (Figure 1A; eTable 3 in the Supplement). For ICU LOS, 2051 of 2141 patients (95.8%) with any ICU stay were classified as such. Moreover, the model correctly classified 23 8052 of 11 928 patients (67.5%) with no ICU visit and 573 of 689 patients(83.2%) with a long ICU stay. Only those with a short ICU stay tended to be misclassified (10% correct), reflective of both the poorer second-stage model and the desire for a decision rule with higher sensitivity (Figure 1B; eTable 2 in the Supplement). Of those needing a ventilator, 95% were predicted as being at high or medium risk, with the high-risk group having a positive predictive value of 70%. The negative predictive value of the low-risk group was more than 99% (Figure 1C; eTable 2 in the Supplement). Finally, for discharge to an SNF, we were able to generate a decision rule with a positive predictive value of 22% for the high-risk group and the desired 95% sensitivity for the combined medium- and high-risk groups (Figure 1D; eTable 2 in the Supplement) Additional performance metrics are presented in eFigures 5 and 6 in the Supplement.
To evaluate which factors were important predictors and what results to visualize in our dashboard, we examined the top 5 most important variables for each of the 5 models (Table 2). Patient age and the number of previous outpatient encounters, as well as specific service line and specialty division, were consistent predictors of higher resource utilization. We note that these are simply rankings of variables we considered important and are not associated with traditional metrics of statistical significance.
We integrated the predictive model into a dashboard that shows the scheduled cases over the next month and that is refreshed every morning. Figure 2 shows the calendar view for one of the outcomes, LOS. eFigure 7 in the Supplement shows the primary landing page, and Figure 3 shows a model evaluation page that allows for a real-time, ongoing monitoring of the model’s performance. The interactive nature allows decision makers to assess subsamples based on week, location, service line, or procedure grouping.
To guide decisions around recommencing nonurgent surgical procedures, we produced a novel CDS tool that assesses the risk of high resource utilization for scheduled cases. Our tool identifies those at highest risk to avoiding exceeding hospital capacity. We integrated the CDS into a dashboard that is refreshed daily and provides a global assessment of upcoming cases. We believe that this tool, in conjunction with knowledge of currently available resources—including overall hospital and ICU beds, ventilator use, and area SNF capacity—will help leadership decide whether a case should proceed or be postponed.
Our CDS tool produces predictions of 4 outcomes: LOS, ICU LOS, need for ventilator, and need for discharge to an SNF. Overall, the models performed very well with AUCs in the training and testing data ranging from 0.76 to 0.93. However, instead of presenting predicting probabilities, we presented the predictions in risk thresholds with known performance characteristics. To prevent the underestimation of resource needs and the misclassification of high-risk patients as low-risk patients, we set the low-risk threshold to include less than 5% of those with resource needs. Our low-risk categories all had high negative predictive values (approximately 99%), allowing us to safely consider that those designated as low risk are in fact low risk.
The most predictive variables were demographic, service utilization, and procedural factors. Interestingly, clinical information was not among the top predictors—although it was still a part of the overall models. This highlights the ability to use easily retrievable information available at the time of scheduling to develop a strong CDS.
We established a data-flow process in which information on scheduled cases—including predicted resource utilization—are integrated into a dashboard that is refreshed each morning. While most traditional CDS presents information on 1 patient at a time, our display allows decision makers to examine the patient landscape more globally. The CDS tool is applicable across our 3 hospitals, all surgical specialties, and all patient age groups. While different specialties likely have different risk factors,25 a machine learning approach allows the model to find the appropriate degree of heterogeneity. In contrast, many prior investigations into resource utilization have been specialty- or procedure-specific investigations.26-28 Our integrated approach highlights how perioperative decision-making can be integrated into a hospital’s ultimate responsibility to public health in a time of critical shortages of resources due to community infectious risks.
The CDS tool and associated dashboards were implemented on June 17, 2020. Owing to our local COVID-19 conditions, we have not had to make decisions regarding case prioritization. The tool has seen the greatest use by our case-management team, allowing them to do preplanning regarding who will be discharged to an SNF. We are also currently assessing how the tool can be used to do resource planning in a post–COVID-19 environment.
Overall, we view this as an ongoing learning process. Since launching the tool, we have evaluated alternative modeling approaches, such as specialty- and age-specific models. We have made modifications both speeding up the data flow process and amending aspects of the decision rule to better meet user needs. We created a model evaluation visual within an interactive visualization tool that allows the user to assess the performance of the risk model. This allows us—as the tool developers—to monitor performance and allows tool users to develop confidence in the quality of the CDS. Finally, we are investigating ways to more tightly align this tool with other dashboards indicating ICU census and local COVID-19 conditions.
The models described herein and the CDS they underpin provide both a proof of concept and a blueprint for other institutions. However, our approach is not without limitations. First, the specific model was constructed and validated using data generated in a single hospital system. This limits its external validity, although we hypothesize that this approach—using similar predictor variables—should perform comparably in similar settings (ie, for tertiary academic medical centers). Moreover, our analytic framework cannot provide insight into how to optimally integrate current resource availability with future resource needs from a surgical standpoint. The former is subject to various other factors, including community, day of week, and seasonal variation. To achieve precision that is sufficiently adequate for day-to-day decision-making, we determined that clinically meaningful groupings (eg, extended hospital stay and short vs long ICU stays) were preferable to predicted probabilities. As such, while the underlying predictions are fundamentally quantitative, our formulation of outputs is designed to augment and inform, but not replace, the surgeon’s decision to perform an operation. Therefore, this type of tool supports a system for determining whether hospital resources are at risk of being overwhelmed on any given day or week; it is to be used together with the background COVID-19 rate in the community, a current inpatient and ICU census, an understanding of deferred cases, and the medical necessity of any 1 patient. This highlights the complexity of making such decisions and the need for better integration of various CDS tools. Finally, our model is only as precise as the information interpretable from the medical record. Although patient-level data are subject to misspecification and miscoding, we have devised an approach that is iterative so that predictions can be updated with new inputs as decisions are made and acted on each day forward.
The framework that we present for building a formalized CDS tool for surgical resource utilization, in conjunction with the workflow of integrating EHR data with a dashboard, is highly replicable. As institutions attempt to chart a safe and sustainable path to a new normal, this work illustrates how surgical teams and hospital leadership can do so in a data-driven way, generating a learning health care environment.
Accepted for Publication: August 28, 2020.
Published: November 2, 2020. doi:10.1001/jamanetworkopen.2020.23547
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Goldstein BA et al. JAMA Network Open.
Corresponding Author: Benjamin A. Goldstein, PhD, MPH, Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Rd, Durham, NC 27705 (firstname.lastname@example.org).
Author Contributions: Dr Goldstein had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Goldstein, Cerullo, Krishnamoorthy, Blitz, Mureebe, Webster, Stirling, Scales.
Acquisition, analysis, or interpretation of data: Goldstein, Cerullo, Krishnamoorthy, Webster, Dunston, Stirling, Gagnon, Scales.
Drafting of the manuscript: Goldstein, Cerullo.
Critical revision of the manuscript for important intellectual content: Cerullo, Krishnamoorthy, Blitz, Mureebe, Webster, Dunston, Stirling, Gagnon, Scales.
Statistical analysis: Goldstein, Dunston.
Administrative, technical, or material support: Cerullo, Krishnamoorthy, Blitz, Mureebe, Webster, Dunston, Stirling, Gagnon, Scales.
Supervision: Cerullo, Gagnon, Scales.
Conflict of Interest Disclosures: Dr Cerullo reported salary support provided by the Veterans Affairs (VA) Office of Academic Affiliations through the VA/National Clinician Scholars Program and the School of Medicine at Duke University. No other disclosures were reported.
Funding/Support: Dr Cerullo receives salary support via the National Clinician Scholars Program (Duke University and the Veterans Health Administration Health Services Research and Development, Durham VA Medical Center).
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The contents do not represent the views of the US Department of Veterans Affairs, the Veterans Health Administration, or the US Government.
Additional Contributions: We thank the following members of the working group: Raquel R. Bartz, MD, Oluwadamilola M. Fayanju, MD, MPHS, Bradley J. Goldstein, MD, PhD, Rachel A. Greenup, MD, MPH, E. Shelley Hwang, MD, N. Bora Keskin, PhD, Stephen Lane, BA, Michael E. Lipkin, MD, MBA, Karthik Raghunathan, MD, Jonathan C. Routh, MD, MPH, Betty C. Tong, MD, MPH, and Can Zhang, PhD. No one was finacnially compensated; they are all part of the Duke University Health System.