Hospital daily outpatient visits forecasting using a combinatorial model based on ARIMA and SES models

Background Accurate forecasting of hospital outpatient visits is beneficial for the reasonable planning and allocation of healthcare resource to meet the medical demands. In terms of the multiple attributes of daily outpatient visits, such as randomness, cyclicity and trend, time series methods, ARIMA, can be a good choice for outpatient visits forecasting. On the other hand, the hospital outpatient visits are also affected by the doctors’ scheduling and the effects are not pure random. Thinking about the impure specialty, this paper presents a new forecasting model that takes cyclicity and the day of the week effect into consideration. Methods We formulate a seasonal ARIMA (SARIMA) model on a daily time series and then a single exponential smoothing (SES) model on the day of the week time series, and finally establish a combinatorial model by modifying them. The models are applied to 1 year of daily visits data of urban outpatients in two internal medicine departments of a large hospital in Chengdu, for forecasting the daily outpatient visits about 1 week ahead. Results The proposed model is applied to forecast the cross-sectional data for 7 consecutive days of daily outpatient visits over an 8-weeks period based on 43 weeks of observation data during 1 year. The results show that the two single traditional models and the combinatorial model are simplicity of implementation and low computational intensiveness, whilst being appropriate for short-term forecast horizons. Furthermore, the combinatorial model can capture the comprehensive features of the time series data better. Conclusions Combinatorial model can achieve better prediction performance than the single model, with lower residuals variance and small mean of residual errors which needs to be optimized deeply on the next research step.


Background
Due to a rapid growing aging population, medical resources in China are comparatively scarce, with disproportional distribution and generally insufficient medical investment. It causes a great gap between supply and demand, as well as the increasing attention on optimal management of healthcare resources, which makes accurate forecasts of future healthcare demand and resource availability more significant and critical. Outpatient department (OD) is the window of the hospital external service from the hospital actual operation, and is experiencing increasing stress from year to year because of the increasing patient volumes, the growing complexity of health conditions and so on. The ability to predict outpatient visits is crucial for resource planning and allocation as well as efficient appointment scheduling in OD aimed to avoiding overcrowding and providing high quality patient care service [1]. Moreover, as the key and major element to implement the hospital management, daily outpatient visits are directly related to the workload of medical examinations and hospitalization services. Precise and reliable forecasts of the different types of outpatients amount can contribute to allocate the key healthcare resources scientifically (such as medical equipment, surgical equipment and hospital beds). So that it is significant to make an accurate forecast for the daily outpatient visits in advance, in order to help hospital managers to make the right decisions to meet the expected healthcare demand effectively and timely.
In the field of health care system, increasing attention has been paid to the prediction of patient arrivals in hospital establishments, especially the prediction of Emergency department(ED)visits or admission which is substantially differ from the outpatient visits prediction [2][3][4][5]. According to ED data,patients arrive in an unplanned fashion beyond the control of the hospital, and the number of ED visits shows strong periodical, seasonal and stochastic fluctuations driven by factors such as climate, epidemic disease, the type of the day and socio demographic effects [6][7][8]. Based on these features, researchers used different methods to predict the number of ED visits hourly, daily and monthly [5,6,9]. In contrast, mainly affected by the current healthcare system, China's outpatient visit is more of a planned elective nature. Patients can self-refer to any provider they could afford when they feel sick, and they should be booked days or weeks in advance to seek outpatient service in most of large-scale hospitals. However, in light of the fixed team arrangement and the limited outpatient service resources, the hospital can only accept a certain amount of outpatient visits every day, and the number of OD visits in large-scale hospitals fluctuates much less violently on a weekly, monthly or yearly basis. Instead, the daily number of OD visits shows a periodic fluctuation on a weekly basis, with a strong day-in-week variation mostly attributed to the doctors' influence, which can be transferred to the downstream service nodes and have much influence on preparing the required reserve capacity in all sectors of hospital. Therefore, accurate prediction of the daily outpatient visits in 1 week can contribute to reasonable formulation of the weekly rotating shift schedule, which is of high importance to achieve an efficient routine management of outpatient services in hospitals. Meanwhile, it is also crucial for administrators in other sectors to optimize resource allocation and strategic planning.
Motivated by overcrowding and resource scheduling problems arising in healthcare systems in China, we choose a typical hospital-West China Hospital (WCH) as cooperated hospital, and consider a time-series forecasting problem of daily outpatient visits. WCH is a public tertiary teaching hospital in Chengdu, China, which has large-scale outpatients. It served about 4.9 million outpatients and emergency attendances in 2014, and the number of outpatient visits accounted for about a quarter of the total amount. Research shows the amount of outpatient attendance and medical service needs are significantly affected by patient's disease type, distance to healthcare providers and other factors, which present high level varying characteristics [10]. It's worth noting that Chengdu patient accounts for the largest proportion of outpatients by each specialty of WCH. According to its internal statistical analysis on outpatient amount in 9 common and main specialty care units, the proportion of outpatients living in Chengdu city reached 46.31% in 2014.Compared with patients from other cities of Sichuan province and those from other provinces, the arrival pattern of outpatients from Chengdu showed higher randomness, moreover, the daily outpatient volume fluctuated relatively frequently. Therefore, this paper chose two common disease types (Endocrine and Respiratory disease) as illustrations to study the daily volume prediction of Chengdu outpatients.

Literature
For the past few years, increasing attention has been accorded to time series models to predict demands for medical services, such as patient arrivals, hospital discharges and length of stay [2,3]. Compared to another classical method of regression, time series model [4,5] is important for healthcare managers to capture features of short-term fluctuations better [3]. Being a general class of linear model, the ARIMA model can perfectly capture linear patterns in a time series with minimum computational efforts, so most studies adopt them to describe the relationship between the variables or use them as the benchmark to test the effectiveness of combined models with mixed results [9]. Sun et al. [11] and Kadri et al. [1] developed ARIMA models for forecasting daily attendances at ED of hospitals to prove time-series analysis to be a useful, readily available tool for predicting ED workload. Li et al. adopted ARIMA model to forecast monthly outpatient visits in a general hospital in China [9]. However, relationship between target variable and factors in many time series is nonlinear and complex, many recent studies have focused on the use of machine learning techniques as alternatives to the traditional time series methods, such as Fuzzy logic, Artificial Neural Network(ANN), Genetic Algorithms, Support Vector Machine(SVM), Taylor expansion and non-parametric smoothing etc. [1,[12][13][14]. Cheng et al. [15] and Garg et al. [12] proposed new fuzzy time series methods to forecast the number of outpatient visits. Hadavandi et al. [1] presented a hybrid artificial intelligence model based on Genetic Fuzzy Systems to forecast monthly outpatient visits of the department of internal medicine with high accuracy. Wang et al. [4] developed a novel forecasting model based on hybridization the firefly algorithm and support vector regression to forecast the daily diarrhoeal outpatient visits in Shanghai. Moreover, because a real-world time series is complex in nature, and a single linear or nonlinear model may not be totally sufficient to identify all the characteristics of the time series datasets, so many hybrid or combined models have been proposed to complement each other to improve forecasting accuracy while the literature on this topic has expanded dramatically [1,16,17]. Although machine learning techniques based on data-driven and non-parametric models could outperform many traditional techniques, they also have some limitations, such as unknown or hard to describe the underlying relationships among datasets [15]. As we known, high prediction accuracy depends not only on modeling but also on understanding of the data features, which means that advanced, sophisticated, and simpler extrapolation methods must be associated with specific features of data [18]. Hence, this paper differs from previous studies, in which we attempt to simplify the outpatient visits prediction problem and improve the forecasting accuracy by capturing the intrinsic fluctuating characteristics of the hospital's daily outpatient visits.
The analysis and forecasting of hospital daily outpatient visits belongs to time series prediction problem, in which the features of randomness and cyclic fluctuations have the biggest effect on forecasting accuracy [18]. Due to the ability of effectively extracting the linear features of randomness and trend, ARIMA model has found extensive applications in forecasting hospital daily visits of ED and OD [6,19,20]. Furthermore, aiming to Fig. 1 The framework of proposed combinatorial model based on SARIMA and SES models better capture cyclicity over a period of week in daily time series data, ARIMA models have been extended and modified among which seasonal ARIMA (SARIMA) and multiplicative seasonal ARIMA (MSARIMA) models are the most widely used [3,7,21]. In addition to randomness and periodicity, the day of the week effect is also present in daily volume of patients in ED and OD as well as daily number of discharged patients [3,8], which refers to the situation where the time series at the same time point within 7 days a week during different weeks show non-linear trend variation and some change patterns can be found in the data from Monday to Sunday. Several researchers proposed seasonal regression model to capture the day of the week effect of time series data, but it may be problematic and lead to poor forecasts due to the nonlinear patterns. Because of the good performance in capturing the nonlinear pattern and description of both trend and seasonality in the data, exponential smoothing(ES) methods have since been among the most widely used forecasting procedures in industry and commerce as well as healthcare [7,22,23]. Automated ES approach [5] and simple seasonal ES model [7] were applied to forecast the demand of medical care in terms of monthly and daily visits in ED of hospitals. Previous study indicated that the ES model incorporated the time correlation knowledge, made full use of the historical information [16] and consequently yielded superior performance in modeling ED demand due to its simplicity, robustness and accuracy [7]. Moreover, it was also demonstrated that although model selection should be done individually (per series) when we deal with heterogeneous data as to capture the different features met in each series [18], forecasting models alone are not sufficient for decision making [24], and combining models could improve predictive accuracy [25]. While ES model which is based on a description of nonlinear feature of trend and seasonality and SARIMA model which aims to capture the linear feature of autocorrelations and periodicity in the data provide complementary approaches to the prediction problem, we try to develop a new model to forecast the daily number of OD visits by combing them.
This paper examines the issue of forecasting the daily outpatient visits in China's large hospitals which are often overloaded with various types of patients. The objective is to develop a methodology, present data and modeling details and provide results of modeling the daily outpatient visits. In order to achieve good forecasting performance, several time series models are built for different features in each series respectively, and a combinatorial model is obtained for short-term forecasting. First we construct a SARIMA model to capture the cyclicity and autocorrelation in daily time series data. In terms of non-linearity in weekly time series data of the same day over different weeks, single exponential smoothing (SES) model is adopted to treat the day of the week effect. After that, we establish a linear combinatorial model based on the above models through residual modification approach. Finally the combinatorial forecasting model is verified by empirical studies involving two time series data of Chengdu outpatients in departments of endocrinology and respiratory in WCH over the entire year of 2014. The daily outpatient visits are predicted about 1 week ahead using the proposed model, and the results indicate that the combinatorial model outperforms any of the above two single traditional models.

Methods
In this section, we describe several models for forecasting the daily outpatient visits. Two models on time series are built for different traits of the outpatient visit data, and the combinatorial forecasting model is established based on the residual modification idea: Where X 1 T represents the estimate value of daily outpatient visit for daily time series data, which is estimated by SARIMA model in "Seasonal ARIMA model" section; X 2 T represents the estimate value of daily outpatient visit for the day of the week time series (weekly time series) data, which is estimated by SES model in "Single exponential smoothing model" section; l i is the weighting coefficient of the single model i, calculated as describe in "Combinatorial model based on SARIMA and SES" section.
Seasonal ARIMA model ARIMA model is an extension of ARMA model, combining differential operation and ARMA model. Seasonal ARIMA model is to model on the time series presenting with obvious seasonal effect (cyclic effect), by smoothing over the time series through trend and seasonal differencing along with the fitting of ARMA model. Its theoretical model has the following form: where x 1 t and ε t are the observed value of hospital daily outpatient visits and random error at time period t, D is step length for single cycle, d is order of differential equation for extracting trend and B is backshift operator.
are the polynomials in B for AR and MA components, respectively.
Four steps are involved for modeling and forecasting using SARIMA model: smoothing over time series (differential operation), model recognition, parameter estimation and model diagnosis, and model forecasting.
In the smoothing step, data transformation is often needed to make the time-series stationary. Results or plots of autocorrelation function (ACF) and partial autocorrelation function (PACF) of the sample data are the basic tools for analyzing stationary and order of model. When trend and seasonality are observed in the time series, differencing and log transformation are applied to eliminate the fluctuation to stabilize the variance. Once the initial model is determined, the key difficulty is the estimation of the other model parameters specifically, To recognize whether the chosen model can describe the fluctuation pattern of time series data well, goodness-of-fit test is needed mainly through white noise test. If the residual series is not a white noise sequence, it is necessary to repeat above steps to create a new model.

Single exponential smoothing model
To deal with non-linearity in moving average interval, larger weights are assigned to short-term observations, while smaller weights to long-term observation, so as to achieve better forecasting result. Time series of daily outpatient visits of the same day in different weeks are modeled using SES model, and its general form is as follows: where α is smoothing constant, ranging from 0 to 1; x t ′ ;τ is the observed value of daily outpatient visits of day τ in week t ′ ; d x 2 t ′ ;τ is the smoothed value of daily outpatient visits of day τ in week t ′ and d x 2 t ′ þ1;τ is the smoothed value of daily outpatient visits of day τ in week t ′ + 1, which is selected to be the predicted value. The optimal value of α is found using an optimization method according to the criteria of minimum values of mean square error (MSE) and mean absolute percentage error (MAPE) between the predicted and observed values. The means of the observed values in the first 3 weeks in the same time series is taken as the initial value d x 2 1;τ for SES forecasting [25].

Combinatorial model based on SARIMA and SES Combinatorial model
In order to better capture the characteristics of autocorrelation, cyclicity and the day of the week effect of daily outpatient visits, a combinatorial model based on the two above models is built through residual modification idea in the form of weighted averaging. The formulations can be written as: where b x 1 t and x 2 t ′ ;τ are the estimated values of daily outpatient visits from SARIMA and SES, respectively; x t ′ ;τ is the predicted value of the weighted combinatorial model; and l i is weighting coefficient of single model i which is formulated as Where l 1 + l 2 = 1, and E i ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P n i¼1 e i 2 p . The weighing coefficient is determined by the error indicators of the single model, and E i is the error variance sum of the single model which is the indicator of forecasting accuracy. The lower the E i , the higher the forecasting accuracy of model i and hence the higher the importance in combinatorial forecasting model. And, the framework of proposed combinatorial model based on SARIMA and SES models is shown in Fig. 1.

Testing indicator
MAPE is used as the indicator of forecasting accuracy, calculated as Moreover, a cyclic variation pattern is found over the period of a week, while great differences exist among different points in a same cycle, such as fewer outpatient visits over the weekends.
To account for the cyclicity, daily outpatient visits data spanning over incomplete periods are removed. Therefore, the time series from week 2 to week 52 (totally 51 weeks) are used for model building. The models are fitted for the first 301 observations from week 2 to week 44 (N = 301, 84% of all data as training set, from January 1, 2014 to November 2, 2014). The remaining 56 observations from week 45 to week 52 are utilized for examining the forecasting ability of fitted models (T = 56, 16% as testing set, from November 3, 2014 to December 28, 2014). That is, time series of 43 weeks for model fitting and time series of 8 weeks for verification while predicting each week of last 2 months based on all observations 1 week before. We explain the process of how to predict the daily outpatient visits in the 45th week of time series in detail, and so on, for each of the rest 7 weeks.

Pre-processing of singularities on holidays
The daily outpatient visits present obvious fluctuations in weekly pattern and special day effect, and the decrease of visits during national holidays such as the Spring Festival, the May Day, the Mid-Autumn Festival, etc. are significant, mainly resulting from the patients' traditional customs and behavior habits. They can't real reflect the daily routine fluctuations of outpatient visits. Therefore, singularity is identified which is distributed outside of the scope of 2 times the standard deviation from the mean value in each the day of the week time series. Finally, ten singularities in time series data of EOV as well as twelve in time series data of ROV are picked out and pre-processed, by replacing them with the mean of the same day in the previous and the following weeks.
Building the seasonal ARIMA model Data analysis and differential operation The stationarity of the time series is assessed through autocorrelation chart (Figs. 4,5,6,7,8,9). The time series of EOV and ROV are non-stationary sequences and display strong cyclicity feature over the period of 7 days. Deterministic information is extracted from the original time series in these two departments both using first-order 7-step differencing. And the differential time series are verified as non-random series through white noise test. Then ARMA models are fitted and used for forecasting.

Model recognition and parameter estimation
According to BIC, the optimal model is the one with the minimum value of BIC function, which is ARIMA (1,1,7) model for EOV and ARIMA(2,1,7) model for ROV. Model parameters are estimated with leastsquares estimation and the parameters failing to pass the significance test are further adjusted. The estimated parameters are shown in Tables 1 and 2. According to residual analysis, P value of Ljung-Box test statistics at all lags is more than 0.05, indicating the reliability of the model design.
The model forecasting the EOV is fitted as and the model forecasting the ROV is fitted as Forecasting The fitted models are used for short-term forecasting on the time series and the 7-day predicted values of the following week are obtained (Figs. 10, 11). The MAPE of the fitted and predicted values is 16.90 and 19.31% respectively in endocrinology department, while 25.32 and 25.44% in respiratory department.

Building SES model
Aiming to capture the feature of the day of week effect, SES model is implemented for the time series of the day of the week (the same day in different weeks). The initial value for the smoothing d x 2 1;τ and the smoothing constant α are determined using the method described in the above section. The predicted values are calculated using   Table 3.

Building the combinatorial model
The combinatorial model by combining SARIMA and SES model is established. Through the residual modification approach, the weighting coefficients for different days within a week are calculated with formula (4-6). The predicted and fitted values are presented in Tables 4  and 5, while residuals comparison between ARIMA, SES and combinatorial models are presented in Table 6. In turn, the prediction of the remaining 7 weeks can be computed using the same approach and process, through which the prediction results are finally acquired (Tables 7, 8, 9, 10). As seen above, in the two empirical studies, the result of analysis show the combinatorial model has superior prediction performance, with small mean of residual errors and residuals variance.

Discussion
This paper examines the issue of forecasting the daily outpatient visits in China's large hospital. Daily visits data of Chengdu outpatients in departments of endocrinology and respiratory in WCH over 2014 are used for forecasting. The proposed models, SARIMA model and SES model, can forecast daily outpatient visits in the short term well alone and there are considerable improvements in forecasting performance by combining the two complementary approaches together.
One important criterions of model selection is to select the appropriate method that has the best trade-off between the goodness-of-fit and the complexity of modeling (number of parameters). In this paper, we choose SARIMA and SES models to capture linear and nonlinear patterns, describe the features of autocorrelation, trend and cyclicity as well as the day of the week effect from different sub-time-series data respectively, and then establish a linear combinatorial model based on the above models through residual modification approach. All of the three models have good performance in forecasting accuracy. SARIMA model has predicted the daily number of EOV with a MAPE value of 0.1177 whereas it is 0.1325 for SES model, and 0.1061 for combinatorial model. And for the time series of ROV, the MAPE value of SARIMA model is 0.1526 whereas it is 0.1360 for SES model, and 0.1349 for combinatorial model. Moreover, the daily visits fluctuation of weekends series is relatively complex, which cannot be effectively extracted by single time series model. In our study, the combinatorial model is superior in modifying the extreme value of deviation in daily outpatient volume than SARIMA model,especially for the value of weekend's data. For example, the results of weekend's volume prediction show that MAPE reduces from 13.13 and 28.49% to 11.68 and 23.44% for EOV and ROV respectively. In general, the two single traditional models and the combinatorial model are simplicity of implementation and low computational intensiveness, whilst being appropriate for short-term forecast horizons over a large number of items.
As indicated both by the result of preliminary data analysis and the weighting coefficients, the fluctuations of daily outpatient visits within 1 week show different characteristics. Linear fluctuation is predominant from Monday to Thursday (l 1 > 0.5) which can be better extracted by SARIAM model, while non-linear fluctuation is predominant from Friday to Sunday (l 2 > 0.5) which can be better captured by SES model. That is to say, the of ROV (0.54) is bigger than that of EOV (0.49), which means that the non-linear fluctuation of ROV is more significant. In addition to the autocorrelation in daily time series data, the time series of ROV are more easily affected by uncertain factors. So, the prediction models of different diseases can be further optimized based on their own particular features and influence factors to increase forecasting accuracy. The literatures show that there is no universal model applicable to any environment due to the inherent complexity of the time series data in the real world. The proposed model in this paper is designed to improve the forecasting accuracy from the data driven by incorporating the intrinsic characteristics of the historical time series data of hospital outpatient visits. It includes the randomness and cyclicity in the daily time series and the day of the week effect in weekly time series. According to this consideration, we adopt classical time series models and parametric estimation methods to ensure stability and convergence property. And based on this, we describe the combinatorial model to forecast the cross-sectional data for 7 consecutive days of daily outpatient visits over an 8-weeks period based on 43 weeks of observation data in two outpatient departments. As shown in the previous forecast results, the prediction performance of the combinatorial model is superior to the two single traditional models, as well as has the small mean of residual errors and better residuals variance. Thus it can be seen whether for the volume of ED visits or other time series data of healthcare system, the prediction model which is associated with effectively extracting meaningful features embedded implicitly in data is of great theoretical and practical significance. What's more, by analyzing the research of outpatient visits forecasting, it shows that the combinatorial model can better observe and extract the general regularity of daily time series data in two different departments from a finite training sample size, while the ARIMA model has significant difference in feature identification among the two different diseases. And data collection and prediction of more kinds of disease should be discussed in the further research so as to validate the stable capability of the proposed model to discover the intrinsic laws of time series data.
Moreover, the forecasting horizon of 7 days for outpatient visits prediction in our paper is determined based on the current circumstances of healthcare system and hospital management requirement in China. It's not hard to see China's outpatient visit is more of a planned elective nature, and in light of the total outpatient amount controlling policy for large hospitals, the limited outpatient service resources and the weekly fixed team arrangement in most large hospitals, daily fluctuation within 1 week changes significantly compared to no The daily outpatient visits forecasting about 1 week ahead have practical implications in hospital operation management, not only contributing to guiding the longterm resource planning and scheduling in OD or other related sectors, but also helping to make tactical resolutions for short-term adjustment in special day. For an example, to deal with the problems of relative large prediction deviations for some certain day within a week, some operation measures such as appointment overbooking or adding capacities on the spot can be adopted in outpatient service management. The impact of nonlinear variation of outpatient visits for Friday can be counteracted by reasonable management strategies, such as optimizing the work shift schedule of doctors who have strong effect of attracting patients and guiding the healthcare seeking behaviors of patients.
There are several limitations in this study. Firstly, the combinatorial model used in this paper is only for shortterm forecasting from Monday to Sunday with 7 days in advance. More studies are required for moderate-and long-term forecasting of daily outpatient visits. Secondly, in addition to the inherent features of time series, other influence factors also come into play, such as the planned supply of outpatient resource and patients' preference for certain doctors. The forecasting model can be optimized based on these features so as to achieve better prediction accuracy, especially for the daily visits prediction over the weekend. Thirdly, it is the lack of more discussions and analyses on alternative forecasting models, and reasons why they are not applicable. Fourthly, this paper only used outpatients' visit data of two specific departments in one large hospital for forecasting, and data collection and prediction of more kinds of disease should be discussed in the further research, especially the chronic diseases.

Conclusions
In conclusion, this paper contributes to exploration of prediction method to forecast outpatient visits in China's large hospitals. We compared the forecast accuracy of a SARIMA model, a SES model, and that of a combinatorial model by combining SARIMA model with SES model for short-term daily outpatient visits forecasting. All of the selection models are relatively simple in terms of implementation and of low computational intensiveness, whilst being appropriate for short-term prediction. And compared simple models, the combinatorial model can more effectively extract depth information from a finite training sample size, and achieve better performance for predicting daily outpatient visits 1 week ahead, with lower residuals variance and relative small mean of residual errors which needs to be optimized deeply on the next research step. Also, the results can be used to support the decisions of outpatient resource planning and scheduling. It helps hospital managers to implement periodical scheduling of available resources on the basis of periodic features, as well as the proactive scheduling of additional resources based on the increasing regularity caused by the day of the week effect.