Key Points español 中文 (chinese)
What is the performance of a new time-series machine learning method for predicting hospital discharge volume?
In this cohort study of daily hospital discharge volumes at 2 academic medical centers (101 867 patient discharges), predictors of discharge volume were well calibrated. These findings were achieved even with shorter training sets and infrequent retraining.
These results appear to demonstrate the feasibility of deploying simple time-series methods to more precisely estimate hospital discharge volumes based on historical data, and may facilitate better matching of resources with clinical volume.
Forecasting the volume of hospital discharges has important implications for resource allocation and represents an opportunity to improve patient safety at periods of elevated risk.
To determine the performance of a new time-series machine learning method for forecasting hospital discharge volume compared with simpler methods.
A retrospective cohort study of daily hospital discharge volumes at 2 large, New England academic medical centers between January 1, 2005, and December 31, 2014 (hospital 1), or January 1, 2005, and December 31, 2010 (hospital 2), comparing time-series forecasting methods for prediction was performed. Data analysis was conducted from February 28, 2017, to August 30, 2018. Group-level data for all discharges from inpatient units were included. In addition to conventional methods, a technique originally developed for allocating data center resources, and comparison strategies for incorporating prior data and frequency of model updates, was conducted to identify the model application that optimized forecast accuracy.
Main Outcomes and Measures
Model calibration as measured by R2 and, secondarily, number of days with errors greater than 1 SD of daily volume.
During the forecasted year, hospital 1 had 54 411 discharges (daily mean, 149) and hospital 2 had 47 456 discharges (daily mean, 130). The machine learning method was well calibrated at both sites (R2, 0.843 and 0.726, respectively) and made errors greater than 1 SD of daily volume on only 13 and 22 days, respectively, of the forecast year at the 2 sites. Last-value-carried-forward models performed somewhat less well (calibration R2, 0.781 and 0.596, respectively) with 13 and 46 errors of 1 SD or greater, respectively. More frequent retraining and training sets of longer than 1 year had minimal effects on the machine learning method’s performance.
Conclusions and Relevance
Volume of hospital discharges can perhaps be reliably forecasted using simple carry-forward models as well as methods drawn from machine learning. The benefit of the latter does not appear to be dependent on extensive training data and may enable forecasts up to 1 year in advance with superior absolute accuracy to carry-forward models.
Variations in discharge volumes create a challenge for hospitals. Adequate staffing is essential for optimizing patient outcomes; however, these staff members are a significant source of fixed hospital cost.1-3 As such, volume-matched staffing is an important component in the goal of delivering high-value care. The biomedical literature includes many efforts to predict discharges at the level of hospital unit or clinical domain.4-6 Although these efforts are invaluable tools for discovery, the resource demand is such that they cannot typically be integrated into routine operations as a monitoring tool or scaled across all units; thus, there is a need for highly scalable forecasting approaches that are suitable for broad application and operational implementation.
Predicting time-series data—that is, using past information to forecast future values of the series—is an area of interest in the field of machine learning and statistics more broadly. Facebook recently released software implementing a Bayesian forecasting approach developed for allocation of computational resources.7 This method recognizes repeating patterns over weeks, months, years, and identified holidays. Recognizing that these secular trends are important drivers of hospital volume, we hypothesized that this method would also be well suited to hospital volume forecasting.
We further hypothesized that minimal dependence on tuning of hyperparameters, a challenge with many standard methods in machine learning, would make implementation practical and generalization possible. We therefore applied the Facebook forecasting method to predict discharge volume from 2 large academic medical centers. With an eye toward deployment of this system, we examined the importance of large training data sets (ie, considering longer vs shorter periods of time) and frequent training (ie, regenerating the model on a regular basis vs infrequently).8,9
The overall aim of the study was to understand this tool’s performance sufficiently to facilitate broader dissemination and application among hospital systems. To contextualize this understanding, we also applied simple previous-value-carried-forward and autoregressive approaches that have been studied by other investigators in the context of hospital volume forecasting.10-14
Overview and Data Set Generation
Hospital discharge data for each calendar date were extracted from the longitudinal electronic health records of 2 large, New England academic medical centers. Data covering different years were available from the 2 sites. At hospital 1, data from January 1, 2005, through December 31, 2014, were available, whereas at hospital 2, data from January 1, 2005, through December 31, 2010, were available. We analyzed time-series data in which the unit of analysis was calendar date. While hospital shifts do not correspond solely to such dates, the available data allowed reliable estimates of calendar dates only. No data were missing and, thus, no imputation strategy was required and all available data were included. Data analysis was conducted from February 28, 2017, to August 30, 2018. A datamart containing these data was generated with the i2b2, version 1.6 server software (i2b2 tranSMART Foundation), a computational framework for managing human health data.15,16
The Partners Human Research Committee approved all aspects of this study with waiver of informed consent. The study was conducted using the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
The primary learning task in this study was a forecast of daily hospital discharge volume for the last full year available for both hospitals (2010). This task was approached using 5 separate models for subsequent comparison: 3 simple variations on prior values carried forward, a seasonal autoregressive-integrated moving average (SARIMA) model, and Facebook’s Prophet model (FIS Corp).7 The primary outcome for comparison between models was prediction accuracy, measured by correlation between predicted value and actual observed value over the 1-year (2010) prediction horizon. This outcome was calculated as the linear model observedday = β0 + β1forecastedday,1. As each component of this model is interpretable, it is reported in whole with R2 values and their 95% CIs.17 To further characterize model performance in units of discharges, error was operationalized as the difference of the predicted and the observed number of discharges over the forecast period (forecastedday − observedday). Because the error can be negative and thus errors over the forecasting horizon could cancel one another, which may or may not be desirable depending on intended use, both total and total absolute error are reported.18 Except where noted in the secondary analysis, the forecasting horizon was 1 year.
Prophet is an open-source implementation (Python and R interfaces available) of a Bayesian forecaster with learned modeling of yearly and weekly seasonality, as well as prespecified holidays expected to be anomalous, which automatically detects change points in a growth curve, released by Facebook Research in early 2017. Conceptually, Prophet reframes forecasting as a curve fitting problem using a decomposable time-series model including holidays, seasonality, and overall trend that makes use of nonlinear smoothers.19
The 3 carry-forward models were the corresponding day, 1 year earlier; the corresponding day, 1 week earlier; and the mean of these 2. For example, for the yearly comparison, the second Monday of 2010 would be compared with the second Monday of 2009, representing a simple means of forecasting volume that still takes into account day of week and seasonal effects. For the weekly comparison, the second Monday of 2010 would be predicted to have the same volume as the first Monday of 2010. The third carry-forward forecast for the second week of 2010 would be the mean of the prior 2 (second Monday of 2009 and first Monday of 2010).
For the primary analysis, forecasting 2010 volume, Prophet was trained on all prior years (January 1, 2005, through December 31, 2009) and then used to predict the full 2010 calendar year. Hospital calendars were used to identify observed holidays at each site and these were used in training and forecasting of both the Prophet and SARIMA models. In all 5 cases, each hospital was modeled independently. All analysis was performed using R, version 3.4 with the R interface to Prophet, version 0.1.1.
Model Parameter Investigation
We next examined 2 important operational characteristics of Prophet relevant to clinical dissemination and operationalization of hospital discharges forecasting. First, we allowed the training data set to vary between 1 and 5 years for all years at either site with at least 5 years of prior data available for training. In other words, as before, 2010 would be predicted but this time using first only 2009, then 2009 and 2008, then 2009 to 2007, and so on back to 2005. In this analysis, the years available for only 1 of the 2 hospitals (2011-2014) were included as forecasting targets, subject to the 5-year training data limit for comparability. This variable reflects the amount of training data required to build a reliable prediction model, that is, whether a hospital with a single year of discharge data could benefit from application of this model and whether a hospital could reasonably expect accuracy to improve with additional data. This assessment of the consequence of additional training data comes from the machine learning literature on learning curves.20
Second, we compared the forecast accuracy when run once a year vs rerunning on a monthly basis. In other words, as before, 2010 would be predicted but this time the first fit of the year (2005-2009) would be used to forecast January 2010; next, 2005 to January 2010 would be used to predict February 2010, and so on through the end of the year. This variable provides guidance about how frequently a model should be regenerated and insight into how quickly forecast accuracy degrades with distance from the last true observation. This iterative refitting of a model using a shorter forecasting horizon has conceptual validation to the idea of cross validation.21 These follow-up secondary experiments were performed only for the Prophet model.
Over the course of the primary outcome year, 2010, hospital 1 had 54 411 discharges (daily mean, 149) and hospital 2 had 47 456 discharges (daily mean, 130). For the primary outcome, accuracy of the 2010 forecast based on all prior data, the Prophet model was the most accurate of the 5 models at both hospitals (Table 1 and Figure 1). The mean absolute error of the 1-year forecast by the Prophet model at hospital 1 was 11.5 discharges per day and 11.7 discharges per day at hospital 2. Among the 3 carry-forward models, the mean of the prior week and prior year’s value had the highest accuracy (Table 1). The mean absolute error of the forecast by the mean of the prior week and prior year carried forward model at hospital 1 was 13.7 discharges per day and 14.3 discharges per day at hospital 2. To further characterize the forecast accuracy, we selected 3 error thresholds (1 SD of daily volume, 25 discharges, and 10 discharges) and compared the total number of days for which the absolute forecast error was above the threshold for the 2 best models (Prophet and the mean of the prior week and year). These performance metrics are presented in Table 2, with Prophet outperforming the mean carry-forward model in 5 of 6 comparisons. Prophet was well calibrated at both sites (R2, 0.843 and 0.726, respectively) and made errors greater than 1 SD of daily volume on only 13 and 22 days, respectively, of the forecast year at the 2 sites. Last-value-carried-forward models performed somewhat less well (calibration R2, 0.781 and 0.596, respectively) with 13 and 46 errors of 1 SD or greater, respectively.
We compared the total absolute forecast error and the total forecast error for both of the top-performing models (Table 3). In this comparison of total error, the mean carry-forward model outperformed Prophet on the net error over the course of the full year forecast, as this model tended to overpredict and underpredict in equal measure and thus negative and positive errors canceled each other over the course of the year, whereas Prophet consistently overpredicted hospital volume but did so to a lesser extent than the mean carry-forward model as indicated by the total absolute error in Table 3. Whether in terms of calibration (Table 1), days above error threshold (Table 2), or cumulative error over the full forecast horizon (Table 3 and Figure 2), the autoregressive model produced larger errors than the Prophet model.
In the secondary analysis, we assessed the consequences of training data and forecast window on the accuracy of Prophet model predictions. Additional training data, added 1 year at a time, slightly increased the accuracy of Prophet forecasts and are summarized in eFigure 1 in the Supplement. Similarly, refitting the model monthly—using a shorter forecast horizon—had a minimal association with accuracy (eFigure 2 and eTables 1-3 in the Supplement, which mirror Table 1, Table 2, and Table 3 using the shorter prediction window).
In this effort to model volume of hospital discharge from 2 large academic medical centers spanning more than a decade, we found that an open-source tool intended to model server load reliably, if imprecisely, predicted volume. The predictions were better calibrated than those made by autoregressive models and simple carry forward of prior volumes. Moreover, the modest amount of training data required and the adequate performance for up to 365 days of follow-up suggest that this approach is feasible for essentially any hospital. It appears that the largest portion of forecast accuracy can be recognized with a single annual forecasting effort based on only the prior year’s data. Unlike many methods in machine learning, the model training and forecasting reported herein can be replicated on an Intel i5-2400 system from 2011 in less than half an hour. In short, this method is neither data nor compute intensive and thus could be widely adopted. As such, given that the existing literature on forecasting using carry-forward models, conventional regression, autoregression, and more exotic models is mixed with respect to most successful model, the Prophet model is of particular appeal as it both performs well and is highly usable in terms of computational, data, and human resources.10-14,22
Is the ability to reliably predict volume useful for quality and safety? Certainly at the extremes, matching patient load is important; studies suggest optimal patient to clinical staff ratios vary substantially by specialty but are associated with a range of outcomes, including mortality.23 Differences in risk and length of stay associated with discharge on weekends or at night further underscore the importance of such staffing decisions, although not all studies find such variability.24-28 Conversely, consistently erring on the side of overstaffing is likely to entail additional costs, consuming resources that could be better spent on other quality-improvement strategies. As such, even coarse predictions may allow hospital administrators to better balance staffing and patient needs. We are not the first to note the importance of holidays in forecasting hospital volume as these days are of particular relevance in staffing.13 Furthermore, we are interested in the possibility of using real-time deviation from forecasted volume at the nursing unit and clinical service level as a means of gaining insight into health system performance; however, this application requires additional work beyond the foundational effort reported here.
We note several limitations in interpreting these results. First, while on average, errors are small, the absolute errors on any given day may be relatively large. At each of the 2 hospitals, the error exceeded 25 patients on fewer than 10% of the days. Although these errors are still less than those arising from a simpler prediction approach, they nonetheless indicate that a flexible staffing model is likely to be necessary even with optimal prediction. In addition, we emphasize that these estimates represent only a starting point. It is likely that further optimization, for example, taking into account weather or local rates of influenza infection in winter, or modeling individual units, would allow more precise near-term predictions.12 On the other hand, a strength of the approach studied here is that it is readily implemented at nearly any site without requiring other data streams or tuning of hyperparameters. The ease of fitting is of particular importance given the variability in model performance seen between the 2 hospital sites. This variability is consistent with the existing literature that shows variable results.10-14 As such, those looking to forecast volume should evaluate a range of models and consider adding additional variables beyond historical volume if forecasts are of insufficient accuracy.
We note an important principle of forecasting in general: these tools are best applied thoughtfully, with consideration of their strengths and limitations. For example, computers cannot be expected to incorporate externalities unavailable to them, such as changes in patient flow related to the availability of beds at other hospitals or to reimbursement.
For all the enthusiasm about machine learning in medicine, which seems to recur approximately every 30 years,29 impact on real-world clinical practice remain modest; a recent commentary noted the mismatch between promise and concrete accomplishment.30 The present study suggests that straightforward application of existing software would allow reliable prediction of a critically important metric of hospital operation and that such application need not use prohibitively large data sets, computational resources, or the operational complexity of frequent updates. While more advanced models are developed, time-series–based prediction offers the possibility of improving clinical planning in the near term.
Accepted for Publication: September 3, 2018.
Published: November 2, 2018. doi:10.1001/jamanetworkopen.2018.4087
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2018 McCoy TH Jr et al. JAMA Network Open.
Corresponding Author: Thomas H. McCoy Jr, MD, Center for Quantitative Health, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, 185 Cambridge St, Simches Research Bldg, Sixth Floor, Boston, MA 02114 (email@example.com).
Author Contributions: Drs McCoy and Perlis had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: McCoy, Perlis.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: All authors.
Critical revision of the manuscript for important intellectual content: Perlis.
Statistical analysis: McCoy.
Administrative, technical, or material support: Pellegrini.
Conflict of Interest Disclosures: Dr McCoy reported receiving grant support from The Stanley Center at the Broad Institute. Dr Perlis reported receiving grants from the National Human Genome Research Institute and from the National Institute of Mental Health and personal fees for service on scientific advisory boards or consulting to Genomind, Psy Therapeutics, and RID Ventures. No other disclosures were reported.
Disclaimer: Dr Perlis is an associate editor of JAMA Network Open, but he was not involved in any of the decisions regarding review of the manuscript or its acceptance.
B. Association of nursing overtime, nurse staffing, and unit occupancy with health care–associated infections in the NICU. Am J Perinatol
. 2017;34(10):996-1002. doi:10.1055/s-0037-1601459PubMedGoogle ScholarCrossref
et al. Nurse staffing and patient outcomes: strengths and limitations of the evidence to inform policy and practice—a review and discussion paper based on evidence reviewed for the National Institute for Health and Care Excellence Safe Staffing guideline development. Int J Nurs Stud
. 2016;63:213-225. doi:10.1016/j.ijnurstu.2016.03.012PubMedGoogle ScholarCrossref
et al. Mapping workforce configuration and operational models in Australian emergency departments: a national survey. Aust Health Rev
. 2018;42(3):340-347. doi:10.1071/AH16231PubMedGoogle ScholarCrossref
W. Short-term forecasting of hospital discharge volume based on time series analysis. In: 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services (Healthcom). Dalian, China: IEEE; 2017:1-6. doi:10.1109/HealthCom.2017.8210801
et al. Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside. AMIA Annu Symp Proc
. 2007:548-552.PubMedGoogle Scholar
LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 3rd ed. Mahwah, NH: Lawrence Erlbaum Associates Inc; 2002.
S. Predicting Patient Volumes in Hospital Medicine: A Comparative Study of Different Time Series Forecasting Methods. Evanston, IL: Northwestern University; 2014:1-13.
H. Weekend versus weekday hospital admission and outcomes during hospitalization for patients due to worsening heart failure: a report from Japanese Cardiac Registry of Heart Failure in Cardiology (JCARE-CARD). Heart Vessels
. 2014;29(3):328-335. doi:10.1007/s00380-013-0359-5PubMedGoogle ScholarCrossref