Simplicity within complexity: Seasonality and predictability of hospital admissions in the province of Ontario 1988–2001, a population-based analysis

Background Seasonality is a common feature of communicable diseases. Less well understood is whether seasonal patterns occur for non-communicable diseases. The overall effect of seasonal fluctuations on hospital admissions has not been systematically evaluated. Methods This study employed time series methods on a population based retrospective cohort of for the fifty two most common causes of hospital admissions in the province of Ontario from 1988–2001. Seasonal patterns were assessed by spectral analysis and autoregressive methods. Predictive models were fit with regression techniques. Results The results show that 33 of the 52 most common admission diagnoses are moderately or strongly seasonal in occurrence; 96.5% of the predicted values were within the 95% confidence interval, with 37 series having all values within the 95% confidence interval. Conclusion The study shows that hospital admissions have systematic patterns that can be understood and predicted with reasonable accuracy. These findings have implications for understanding disease etiology and health care policy and planning.


Background
Health care is a complex human endeavor constituted by the interaction of multiple professions, organizations, industries, technologies and the public. Health itself is also a complex concept, with multiple determinants including genetic, socio-cultural, economic and environmental influences [1]. At the centre of this complex system is the hospital. Arguably, after a physician visit, the hospital admission represents the key event in the delivery of health care.
Do hospital admissions have consistent patterns? While individual diseases are extensively studied, there is a paucity of systematic approaches to the study of health care events. Epidemiology is not regarded as a science with the predictive accuracy and explanatory power of the physical sciences [2]. Health services research is in its scientific infancy and is directed towards policy and practice, however, recent trends in theoretical epidemiology have focused on more powerful computational approaches [3].
Using time series analysis, our research program investigates seasonality in the occurrence of health care events. Seasonality is an important aspect of disease manifestation as well as a clue to the etiology of disease. Our initial studies explored seasonality in hospital admissions in discrete disease categories including asthma [4], falls [5] and aortic aneurysms [6]. Subsequently, we hypothesized and confirmed that the hospital admissions in the system considered in totality also demonstrated consistent seasonal effects [7].
Consistent seasonal behavior suggests the possibility of predictable behavior. To the best of our knowledge, there are no studies systematically evaluating the seasonality and predictability of multiple hospital admissions using health services data. We therefore assessed the seasonality and predictability of the most common causes of hospital admission in the province of Ontario, Canada.

Methods
We conducted a retrospective, population-based study to assess temporal patterns in hospitalisations for the 52 most common admission discharge diagnoses from April 1, 1988 to December 2001. Approximately 14 million residents of Ontario eligible for universal healthcare coverage during this time were included for analysis. The Canadian Institute for Health Information Discharge Abstract Database was used to obtain information on the most responsible diagnosis. This database records discharges from all Ontario acute care hospitals, documenting a scrambled patient identifier, date of admission and discharge, up to 16 diagnoses as coded by the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM), and up to 10 procedures.
Researchers using these databases have found that diagnoses and surgical procedures are coded with a high degree of accuracy. There is very little missing information in the Ontario databases; other studies have similarly found that less than 1 percent of the basic information on patients is missing in various provincial databases [8][9][10].
The 52 most common discharges diagnoses over the 10 years were identified by summing all admissions and calculating in rank order the frequencies of admission. Owing to the influence of obstetric related admissions, we limited obstetric codes to the consideration of singleton births. Categories of closely related health conditions (such as myocardial infarction) were combined.
Numerator data consisted of the total number of discharges for each month for each of the most responsible diagnoses. Denominator data was derived from annual census data for each age group for residents of Ontario provided by Statistics Canada. Monthly population estimates were derived through linear interpolation. All transfers from within one acute care hospital to another within this study group were excluded from the analysis. To take into account the population changes over time we analyzed monthly admission rates per 100,000.

Analytic method
This study employed time series methods to assess the presence of statistically significant seasonality, the strength of the seasonal effect and the predictability of the time series. A time series can be decomposed as the sum or product of trend, seasonality, and random components. Trend is the long term movement of the series which is a systematic component that changes over time and generally does not repeat itself within the time range of the available data. If we eliminate the trend then the time series will consist of seasonal and random components.

Assessment of seasonality
Analysis of the data involved the use of the following statistical techniques in identical fashion to each series in order to assess statistical significance of seasonal patterns and the consistency and magnitude of seasonal effect. Spectral analyses were conducted to detect statistically significant seasonality. Spectral analysis detects periodicity in time series, by plotting the periodogram or spectral density of the series against the period or frequency [11]. The data series was de-trended using moving averages prior to conducting spectral analysis. Two tests for the null hypothesis that the series is strictly white noise were conducted. The Fisher Kappa (FK) Test is designed to detect one major sinusoidal component buried in white noise, whereas the Bartlett Kolmogorov Smirnov (BKS) Test accumulates departures from the white noise hypothesis over all frequencies [12]. Finally, R-squared autoregression coefficients (R 2 Autoreg ) were calculated. Autoregression uses the coefficient of determination of the autoregressive regression model fitted to the data, and can be used for quantifying the strength of the seasonality within a set of serially correlated observations as occurs with time series data [13]. The R 2 Autoreg is interpreted the same way as the coefficient of determination in classic regression: values from 0 to less than 0.4 represent nonexistent to weak seasonality, 0.4 to less than 0.7 moderate to strong seasonality, and 0.7 to 1 strong to perfect seasonality. The magnitude of the R 2 Autoreg shows how well the next value can be predicted when the seasonal component is the only predictor. In other words it shows the contribution of seasonality in the total variation of the data. Thus 1-R 2 Autoreg would be the variance that remains unexplained [13]. When the autoregression procedure is applied to observed data, it is important to validate the stationarity of the series as the R 2 Autoreg may be underestimated when the seasonal variation is non-stable. To account for this, data transformations were conducted where appropriate, to stabilize the seasonal variations [13]. All statistical analyses were performed using SAS (v8.2).

Predictive modeling
Of the 160 monthly observations for each series, the first 148 (April 1988 to December 2000) were used for fitting the model and estimating the parameters. We set aside the last 12 observations (January to December 2001) for assessing the performance of the suggested model and used the rest for fitting the model and estimating the parameters. We applied the first order differencing to eliminate the trend [14] and then used a very simple regression model to predict 12 new monthly observations for each series. We compared the observed 12 observations with the corresponding predicted values. Then we checked to see which observed value falls outside the 95 percent confidence interval.
Suppose n monthly observations x 1 , x 2 , ..., x n are available and we are interested in predicting the next k unobserved data points x x+1 , x x+2 ,..., x x+k using the n observed data points. Here we will assume that the time series is an additive composition of trend, seasonality, and random components. The multiplicative case can be converted to additive by simply taking the log transformation. The time plot of the series did not indicate large changes in the variations of the amplitude of either seasonal or irregular components of the series whereas the level of the trend increased or decreased. Thus an additive model is appropriate. The first component we should deal with is trend. Visual inspection of the time plots of the 52 series indicate different trend patterns ranging from simple linear to more complex nonlinear patterns. We did not attempt to model the trend component parametrically as estimating the pattern of the trend components globally by a closed mathematical function of time may severely misestimate the true trend beyond the range of fitting period. Instead we decided to use the first order differencing to eliminate the trend component. The first order differencing of a time series x t , t = 1,2, ..., n is the series w t , t = 2,3, ..., n where w t = x t -x t-1 [14]. Visual inspection of the time plots of the differenced series showed elimination of the trend components. For monthly rates of hospitalization data it is reasonable to anticipate seasonal components of order 12 and 6 due to seasonal variation of the weather or administration (e.g. winter, Christmas, and vacation season). This was confirmed in spectral analysis. By modifying the components of the following regression equation we can model the series at different seasonal orders.
In the regression model we included for seasonal factors of period 12 and 6. Thus the regression model takes the following form where β i 's can be estimated through linear regression framework. Having fitted the model, one can substitute t = n + 1, n + 2, ..., n + k to estimate the next k differenced observations with their corresponding confidence intervals. The predicted differenced data points can be converted to raw data points by applying the following simple transformation: x n+j = w n+j + x n+j-1 , j = 1,2, ..., k Confidence intervals can be transferred in a similar manner. For j > 1we can substitute the predicted values for x n+j-1 .

Results
A total of 6,560,210 million admissions were included in the analysis. Figures 1 and 2 provide examples of the heterogeneity of the time series. There is visual evidence of non-linearity and clear seasonality in the time plot graphs.

Discussion
Hospital admissions in the province of Ontario show remarkable consistency and predictability of occurrence. A heterogeneous group of health conditions are represented in the sample including surgical and medical conditions, acute and chronic diseases, communicable and non-communicable diseases. The performance of the proposed model for predicting the one-year ahead number of hospital admissions in the province of Ontario is excellent for the 52 most frequent hospital admissions series considered in this study.
Are these results of significance? We believe so. Most health care planning is based on what could be termed the 'invariance principle' that holds that all events are equally likely to happen and therefore hospitals should be staffed and managed accordingly [15]. Our study indicates that demand for hospital services varies, can be predicted with a high degree of accuracy and therefore planning and resource allocation could possibly be reorganized to reflect this knowledge. Furthermore, there are significant seasonal fluctuations to at least one third of the series analyzed, indicating that planning could be tailored to predictable demands. Understanding such seasonal patterns also promises to shed light on disease causality as not all highly seasonal conditions can be explained by infectious diseases known to have seasonal occurrence.
Our study is limited to the context of Ontario, and is applicable at a population level. Focusing on the most responsible diagnosis may bias the account of seasonal occurrence, although this bias is likely to be non-differen-Time plots (rates per 100,000 population) of highly seasonal hospital admission patterns: Chronic obstructive pulmonary dis-ease and bronchiolitis  COPD Acute Bronchiolitis tial. In this study we focused on total counts for each most responsible diagnosis, which may obscure significant variation in rates between age and gender.
The proposed methods enjoy simplicity and stability. The prediction approach does not require model selection or any other sophisticated statistical methods. Selecting an appropriate seasonal model can be a challenging task in time series analysis. For example, the Box Jenkins approach is popular for selecting linear time series models. In this approach sometimes the analyst has to select a model subjectively from among several potentially appropriate models. Our proposed regression model does not require model selection.
The first order differencing eliminates trend; sin and cosine terms estimate the seasonal factors. The simple regression model works well for highly seasonal to non-seasonal data. Although the seasonal factors of some of the series are changing over time, the simple first order differencing in conjunction with the regression model forecast the future observations within the 95 percent confidence bounds. The confidence intervals around the predicted values are tight, reflecting the accuracy of the projections. This attenuates concerns expressed about the robustness of predictive models in epidemiology [16].

Conclusion
The results of this study demonstrate a simplicity underlying the complexity of hospital admissions. We believe these results are promising and can lead to more rational planning of hospital resources and open up areas of exploration for understanding the determinants of disease causation, specifically in those conditions with moderate to strong seasonality. Further research is necessary to look at whether more complex models have greater predictive Time plots (rates per 100,000 population) of moderately seasonal and non-linear trend in hospital admission patterns: Coro-nary atherosclerosis and dehydration  Coronary Artherosclerosis Dehydration