Introduction

Forecasting is about predicting future events based on a foreknowledge acquired through a systematic process or intuition [1, 2]. Some of the earliest forms of health forecasting date back to the period of Hippocrates of Cos (460 BC–370 BC). Hippocrates studied the natural history of diseases and their major environmental sources (including food and water) [3], and believed that prognosis was an important part of medical treatment, because by forecasting disease outcome, the physician established his expertise for treating the patient [4]. He was able to develop and to forecast the occurrence of many diseases and conditions. One of the classical terms in medicine, ‘Hippocratic facies’, describes the procedure for forecasting impending death based on the observation of distinctive signs and symptoms that he identified [5]. The birth of forecasting as a science, however, is associated with weather forecasting and, is credited to Francis Beaufort, who developed the popularly known scale for measuring wind force (the Beaufort scale) and Robert Fitzroy, who developed the Fitzroy barometer for measuring atmospheric pressure [6]. Forecasting has advanced over time and has increased in sophistication in many specialised areas, including the fields of health [710], economics and commerce [1, 11], sports [12, 13], environment (including meteorology) [14, 15], technology and politics [1618].

Two approaches to forecasting, statistical and judgmental are widely discussed in the forecasting literature [19]. An integration of both approaches has been discussed by some as the best way to obtain a more reliable forecast [1922].

Monitoring population health, which includes demographic and health surveillances and epidemiological studies on disease surveillance, can generate very useful data that can be used in health forecasting. A reliable health forecast is important for health service delivery, because it can: (1) enhance preventive health care/services; (2) create alerts for the management of patient overflows (in situations of peak demand for health care services); and (3) significantly reduce the associated costs in supplies and staff redundancy.

Health forecasting is a useful tool for health service provision, but very few reviews on the subject exist. Some previous studies on health forecasting focused on very specific conditions, like ischaemic heart disease [23], chronic obstructive pulmonary disease (COPD) [24], diabetes prevalence [25], or aggregate health situations, such as emergency department visits [26, 27]. These individual studies adapted environmental, climatic and other factors as predictors in forecasting health. They are very specific and do not give information on general approaches that could guide the development of other health forecasts. An overview published by Ioannidis discussed the limits of forecasting in personalised medicine and focused only on challenges associated with this form of health forecasting [28]. A systematic review conducted by Wargon et al. [26] on models for forecasting focused only on the number of emergency department visits. More recently, a similar study was conducted by Boyle et al. [27], in which they reviewed and predicted emergency department admissions. The above-mentioned reviews on health forecasting had a very specific focus on emergency attendance. However, health forecasting possesses potential applications across a wider range of health issues. There is dearth of information pertaining to the many possible applications of health forecasting in relation to health service delivery. There seem to be no reports that have gathered the basic principles and procedures for developing pragmatic health forecasting schemes.

This paper describes the key issues of health forecasting; including definitions, principles of health forecasting, and the properties of health data, which influence the choice of health forecasting methods. It also identifies the values of health forecasting in health service provision, and further presents the general challenges associated with developing and using health forecasting services.

In preparing this review, a search of the literature on health forecasting and statistical methods used in the analysis of health conditions/situations was conducted, using popular medical-related databases including PubMed (Medline) and then Google Scholar. Our search strategy included: “health” and “forecast*” and combinations of terms, including “principle*”, “data”, “predict*”, “model*”, “method*”, “challenge*”. Based on the titles and subsequently on the abstracts, articles unrelated to our objectives were excluded. Additional literature searches were done through citation mapping of key papers and the selected papers and documents were synthesized and summarized according to the set objective of this paper.

Defining health forecasting and related terminologies

Health forecasting is predicting health situations or disease episodes and forewarning future events. It is also a form of preventive medicine or preventive care that engages public health planning and is aimed at facilitating health care service provision in populations [8, 10, 29, 30]. Health forecasting has been commonly applied to emergency department visits, daily hospital attendance and admissions [27, 3133].

There are important terms in forecasting that are worth noting because of the way in which they are used across various fields. The term prediction is mainly used across several fields of study to mean an opinion-based speculation with no explicit causal assumptions [34]. In the health forecasting literature, however, the terms prediction and prognosis could mean different things, even though they are sometimes used interchangeably and without clarity. The term prognosis refers to a forecasting of outcomes under no intervention, whilst prediction is used to mean forecasting health outcomes that are associated with some health-related intervention [28, 35]. Syndromic surveillance is another closely related concept that is well known in disease surveillance literature. This concept focuses on case detection and events that lead to/precede an outbreak, and involves detecting aberrations in the patterns of diseases and using this information to determine future outbreaks [3638]. Syndromic surveillance was initially developed as an innovative electronic surveillance system and was aimed at improving early detection of outbreaks attributable to biologic terrorism or other causes [36].

Forecasting is a key component in the practice of medicine, with the main purpose of improving both health service provision and individual patient outcome [10, 24, 30, 39, 40]. For example, the United Kingdom Meteorological Office developed a health forecasting service for COPD, which provides health alerts to both individuals and health service providers through an automated call system [7, 24, 41]. This forecast combines a rule-based model that predicts risks based on environmental conditions, with an anticipatory care intervention to provide information, which is then communicated. The service enables patients and care providers to take precautionary actions to improve health service delivery and reduce COPD events [7, 10, 30, 42].

Principles of health forecasting

There are four main principles of health forecasting: (1) the measure of uncertainty and errors, (2) the focus, (3) the nature of data aggregation and how it affects accuracy, and (4) the horizon of health forecasting. These properties are not only hypothetically important, but also have applications that are exemplified in the literature, as discussed below.

Uncertainty and error of health forecasting

According to the definition of health forecasting, determining future health events or situations involves a degree of uncertainty, as it is virtually impossible to have a perfect (i.e. 100 % error free) prediction. We therefore describe the measurement of uncertainty and error of health forecasting as a principle in forecasting, because it is a basic requirement, and is also desirable for validation and determining the real value of a forecast. The data used is a major source of uncertainty and error, but this basic problem can partly be addressed methodologically, to obtain health forecasts with the least possible error [43].

The focus of health forecasting

The focus of a health forecast relates to the central targeted issue that is being forecast. This is with reference to the basic unit of the health outcome measure that is being forecast. One focus of health forecasting is to predict population health outcome in terms of the number of events occurring within a space of time; for example, the forecasting of life expectancy and health expectancies [44]. Another focus is to determine the course of an ailment for a particular individual, which is usually referred to as prognosis [28]. These two categories are related to how the data is aggregated in health forecasting.

Data aggregation and accuracy of health forecasting

Forecasting a health condition or situation for a population aggregate of a particular problem, or for groups of the same family, presents a lesser challenge than doing so for an individual case. This is because by pooling the variances of the population-related factors (which are usually broad and well known), the behavior of the aggregated data can have very stable characteristics, even when the individuals within exhibit high degrees of randomness [45]. It is therefore easier to obtain a higher degree of accuracy in forecasting specific health events when using pooled population data versus data for specific individuals.

Horizons of health forecasting

A health forecasting horizon refers to the range of the period the forecast is intended to cover. The demand for a health forecast determines the forecast horizon (range), and this could be in a short, medium or long term. There are no clearly defined boundaries to health forecast horizons in the literature. However, borrowing the common classifications from other disciplines such as finance, business or econometric forecasting, a short-range forecast horizon refers to a period of 1 day to a quarter of a year; a medium-range forecast horizon refers to a quarter of a year to a year; and long-range forecasts refer to a year to five or more years. These horizons are, however, not fixed for all situations, but rather may be defined in relation to the qualitative indicator being forecast (e.g. life expectancy), as well as its weighting over an extended time period. Major population health issues, such as life expectancy or future health expectancies [44], or the forecasting of some chronic disease prevalence (i.e. obesity and diabetes) in large populations [25, 46], are often forecast with a long range. Short-range and medium-range health forecasts are applicable to routine health service uptake (e.g. hospital visits), and some chronic disease exacerbations resulting from environmental exposures [7, 24]. The choice of a long-range, medium-range, or short-range forecast is critical in developing a forecast,as health forecasting horizons also have applications in the planning of health care service deliveries.

The discussions around short, medium, and long range health forecasting do not identify some of the fundamental differences in assumptions between the various forecasting horizons. Yet, these differences are important since forecasting future events is based on a strong assumption that the current drivers or predictors will also follow the trend over the future horizon. Hence, long-range forecasting models will be prone to having more "shocks" compared to short-term forecasts. The “shocks” herein refer to disruptions/disturbances of function of the distributions’ equilibrium, which is caused by a significant change in magnitude of the forecast model predictor(s). This may then lead to a shift in the trend. Shocks also have effects on forecast errors because their occurrence, which is between the time of the forecast and the realization of the outcome, determines the error of the forecast. However, research on the mechanisms by which health forecasting models are developed to accommodate shocks at various thresholds is not explicit.

The principles discussed above also serve as a guide to creating simple decision tools for health forecasting, based on: the type, amount and distribution of data (the kernel density) that is required by a quantitative predictive model; the forecasting horizon for which the health forecast is being created; and then the degree of accuracy or error that is acceptable—taking into account the need for a parsimonious model. The type of data, as described elsewhere [47], refers to the classification of the data as either continuous (ratio or interval scales) or categorical (ordinal, nominal, or dichotomous scales); the amount of data simply refers to the sample size or total number of the unit of reference of the primary variable and its corresponding independent variables/predictors. The section below exemplifies a hypothetical approach and framework for developing a health forecasting scheme with simple decision tools.

A schematic approach to health forecasting

A framework for health forecasting is an essential guide. It is, however, uncommon in the literature and so the following framework, which presents a summary of the key processes involved in developing a general health forecasting service, is illustrated below (Fig. 1). The steps help to identify and broadly define the needs and tools of health forecasting. Further, they state the key processes involved in developing and perfecting a health forecasting scheme over time. Thus, the framework demonstrates a dynamic process in which the forecast models created at any time would be continuously improved to meet the purpose of the forecast or the client’s needs.

Fig. 1
figure 1

A schematic approach to health forecasting

Step 1::

Identify the concepts and ideas that address an important health condition of great burden and ones that significantly cost the health service. Provide a precise specification of the health outcome to be forecast and a clear definition of the forecasting horizon;

Step 2::

Use the literature to identify causal or highly correlated variables that are associated with the identified health outcome measures in Step 1 (expert consult may be required in building this domain knowledge);

Step 3::

Identify the data sources for both the health outcome measure (Step 1) and all of the potential predictors, and ascertain the availability and completeness (i.e. checking for gaps in the data series) of the data;

Step 4::

Prepare the data sets for basic statistical analyses, including descriptive patterns and the development of forecast algorithms. Some preliminary activities include data cleaning and management, and the generation of supplementary variables for further analyses;

Step 5::

Generate the predictive models and validate them using different sets of similar historical data;

Step 6::

Evaluate and determine the final lists of indicators needed for good predictive model(s) based on the practical access to their measures (data);

Step 7::

Develop very specific and tailor-made health forecast services for a specific purpose/client, and then periodically update the model(s).

Patterns of health data and applications in forecasting

In health forecasting, the pattern of distribution of previous health data over a period of time (i.e. in the form of time series) is important for determining the choice of an appropriate forecasting method. Time series plays an important role in many forecasting approaches, and has been extensively used in subject areas such as climate science, finance and econometrics. The patterns of health data in time series, which are of importance to health forecasting are trend, seasonality, cyclicality, and randomness [48, 49].

Time series and health forecasting

Time series is defined by Shumway and Stoffer [50] as “a collection of random variables indexed according to the order they are obtained in time”. In the broader literature, time series is similarly defined as a collection of data points that are typically measured at successive and uniformly spaced time intervals. In relation to health forecasting, the importance of this second definition is the emphasis it places on the “uniformly spaced time intervals”, which is important in the use of health data for health forecasting. Thus, time series provides statistical setting for describing seemingly random fluctuating health data and projecting the data series into the future [49, 50].

Trend is the long-term variation in a time series that is not influenced by irregular effects or seasonally related components in the data. For instance, in health data, an overall record of a progressively increasing incidence over a specified period would show an increasing trend, irrespective of any random or systematic fluctuations.

When the pattern of health data (e.g. containing the incidence of health events/situations) is influenced by some periodic (long-term/short-term) fluctuations that are associated with other characteristics, it is described as cyclical. Cyclicality therefore refers to the extent to which disease incident data points are influenced by overall disease patterns. Seasonality is also a cyclic phenomenon, but is related to annual events, and is described as the predictable and repetitive positions of data points around the trend line within a year. A major difference between cyclical and seasonal patterns is that the former varies in length and magnitude, as compared to the latter. Chatfield describes how seasonality and cyclicality can be estimated either in an additive or multiplicative form [49]. Additive seasonality is estimated as a function of the sums of the de-seasonalized mean (m), the seasonal factor (S) and an error term (ε) (i.e. additive seasonality = m + S + ε). Multiplicative seasonality is defined by two functions, either the product of m, S and ε (multiplicative seasonality = m·S·ε), or the product of m and S and sum of ε (i.e. multiplicative seasonality = m·S + ε). In order to minimise the overall error, shorter cyclical effects that fall within the annual seasonal effect are best estimated with additive seasonality, whereas the effect of annual seasonality is best computed as “m·S·ε” [49].

Randomness is also a common feature of all time series data, and refers to unexpected distortions of existing or anticipated trends.

Lag refers to the lapse of time before an effect is manifested. Lags have proven useful in forecasting events globally, and are a feature of time series data that is widely exploited in many forecasting techniques, e.g. in auto regressive integrated moving averages (ARIMA) [27]. In developing health forecast models for a particular condition/situation, the key questions are: how many days back should one go back in history to identify appropriate predictors, and how many lags should be included.

The properties of time series mentioned above require specific treatment prior to any analysis, and they have been described more elaborately elsewhere [4851]. However, the statistical forecasting models that involve time series analysis and are commonly used in health forecasting include moving average models, such as ARIMA, and smoothing techniques, e.g. the Holt-Winters methods. For instance, the Box–Jenkins ARIMA model, is commonly used in fitting forecasting models when dealing with a non-stationary time series, and this model has been used extensively in health forecasting [27, 33, 5255]. Stationarity is a feature of trend in a time series, and refers to the level of variation in the statistical properties (such as the mean, variance, auto-correlation, etc.) over time. Smoothing models have also been used in health forecasting studies conducted by Medina et al. and Hyndman et al. [5658]. In the study conducted by Champion et al. [33], the authors identified trend, seasonal variations and randomness/“noise” in the data distribution, but used a time series statistical package to automatically identify optimal models to forecast monthly emergency department presentations. After, the authors proceeded to compare forecasts, based on a simple seasonal exponential smoothing model to an ARIMA model. Similarly, the study conducted by Medina et al. also identified seasonal oscillations and trends in the time series data (of the diseases they analyzed). The harmonics in the data distributions were handled as level, and trend components by the multiplicative Holt-Winters forecasting method, which is also a smoothing technique in forecasting [56].

Probabilistic health forecasting methods for peak events

Health forecasting techniques generally rely on modelling expectancy of the mean, but this is not useful for looking at extreme events. Nonetheless, extreme events represent the greatest test of a health system, because they expose the weaknesses of the system whenever they occur. A reliable method of modelling and predicting extreme events is therefore important. Quantile regression models (QRMs) and fractional polynomial models (FPMs) are potential probabilistic techniques that could be adopted for predicting extreme health situations/conditions.

Quantile regressions are extensions of the linear-regression models, and do not assume normality of the dependent variable. They model the conditional quantiles as functions of predictors, specifying changes in any conditional quantile [59, 60]. Unlike linear-regression models, QRMs have the ability to characterize the relationship between the dependent variable and the independent variable(s), particularly in the extremes of the distribution. They have common applications in medical reference charts, and could be used in preliminary medical diagnosis to identify unusual subjects by providing robust regressions for estimating extreme values [61]. QRMs also have the potential of predicting and forecasting extreme chronic respiratory illnesses like asthma. For instance, a QRM could be used to estimate extreme variations in daily asthma hospital admissions resulting from the changing patterns of selected meteorological and air quality indicators that are known to exacerbate asthma in a given location/area [62].

Williams [63] also showed how fractional polynomials could be used in modelling specific categories of dependant variables within a linear distribution of data, and thus target specific groups more precisely. In this study, the author used various categories of age groups as regressors to model a dichotomous health care demand. Logistic regression outputs of two arbitrary age-categorized models were then compared to a fractional polynomial model. The polynomial method of categorizing had clear advantages because it allowed a fuller representation of non-linear relationships between the predictor and outcome variables. This approach can be extended to a wide range of health situations or conditions.

Both approaches (QRM and FPM) can be adapted to suit extreme health forecasting.

The value of health forecasting in health services provision

Health service(s) is (are) the most important component of any health system. The World Health Organisation (WHO), reports that effective health service delivery requires some key resources including information, finance, equipment, drugs and well motivated staff [64]. Given the ever-increasing demand for both the coverage and quality of health care services, health service delivery institutions and service providers struggle to tackle situations of excess demand particularly associated with peak events [6567]. This is because front-line health delivery services and providers are not usually adequately informed and do not have adequate resources to meet the needs of a “higher than normal” demand for health care. Hence, improving the access, coverage and quality of health services depends on the ways these services are pre-informed, organised and managed. Even though there are still unanswered questions about how to improve the organisation and management of health service delivery in a manner that would help achieve a better and more equitable coverage and quality [64], there are equally untapped resources such as health forecasting, which can aid the process. Health forecasting services enable both individuals and service providers to anticipate situations, and hence take the necessary steps to manage peak or extreme events.

There are important features or health outcome measures that are considered to have a significant impact on the coping mechanisms of health service providers. These features include the total duration of the care/support being provided (also described in the literature as “length of stay” in the hospital or “spell duration”), and the periodic (daily) rates of attendance of patients to the general practitioner or emergency rooms. The length of stay provides some insights into the disease burden of a particular health condition. The length of stay, in combination with other related factors like demographic, diagnostic and temporal factors, can explain and forecast future events [6870]. On the other hand, the daily rate of health events is a very useful indicator, which can be used in time series forecasting.

The challenges in developing and using health forecasts

The value of health forecasting has been mentioned in previous discussions, but there are a number of challenging issues to be noted and addressed in developing and using a health forecast. These include limitations in the scope and reliability of health data, the robustness of health forecasting tools and techniques, and the poor demand for health forecasting [11, 28]. In recent times, technological advances have enabled health indicators to be more easily and cheaply measured, and yet the record capture of important population health indicators is not very efficient and not easily accessible or validated [28]. In the practice of personalised medicine, for instance, there are slight prognostic effects attributable to a wide range of complex factors (including some unknown factors), and these factors usually intermingle (randomly) to generate clinical outcomes. Data limitation on these complex factors can pose a challenge in developing a reliable health forecast. Aside from the data and methodological limitations in developing reliable health forecast, it is difficult to convincingly demonstrate the performance of a health forecasting model in realistic settings [71].

Health forecasting-related researches have sometimes focused on methods or procedures for forecasting aggregate health conditions, or on situations like crowding at emergency departments and total admissions [7274]. Even though these kinds of aggregate health forecasts are useful, health care providers would be better informed and prepared with condition-specific health forecasts. Therefore, health forecasts need to be more specific for particular health conditions. For example, the health forecast service provided by the United Kingdom Meteorological Office to some Primary Care Trusts (PCT) is very specific for conditions such as COPD [7, 8]. This kind of service is rare but useful.

Health forecasts are most valuable when they provide sufficient warning for timely, remedial action to be taken. Providers make critical decisions and resource allocations to meet the potential demand for health care services. Some of the complexities associated with these types of health care provider actions could range from providing basic social care for early symptoms, to using sophisticated staff and facilities and attending to extreme events [7, 24, 41]. Meanwhile, being able to meet the demand for a health forecast that provides ample time for preparatory activities often requires the use of a good forecasting technique and ample reliable data. It also comes with an additional compromise as to the precision and accuracy of the forecast [75]. Hence, finding a fine line between what is predictable vis-à-vis the demand for specific health forecast is a key challenge in health forecasting.

Another challenge in health forecasting relates to its practical use. A health forecast is usually developed to target the needs of susceptible individuals or institutions (health care providers). In any instance, there is a need for a technology with an intelligent early warning system that can communicate the forecast to the users. Automated telephone services, home visits/treatment, and direct health forecast (to individuals and service providers) are means through which some health forecast services have been delivered [76]. Although there have been some challenges and debates regarding the relevance of some of these existing health forecasting programmes, there are a couple of success stories which provide compelling evidence for their usage [7]. The case of the UK Meteorological Offices’ COPD forecast, which was available to general practitioners in Bradford and Airendale, is an example. In 2009, Maheswaran et al. [77] evaluated this health forecasting alert service and failed to show that any change in admissions associated with the forecasting service was significant, and hence they challenged the effectiveness of the COPD forecast. Meanwhile, in cross-sectional study on the acceptability and utility of this same service in England, Scotland and Wales, Marno et al. [7] concluded that the service was both viable and useful. Further research to improve or develop new approaches or schemes in health forecasting is therefore important and will contribute to easing disease burden.

Conclusion

Health forecasting is a dynamic process and requires frequent updates. This can be done with novel techniques and data, taking into consideration the principles of health forecasting. The methodologies currently used involve time series analyses with smoothing or moving average models, and less probabilistic forecasting models like QRM, which offers a useful alternative for predicting and forecasting extreme health events. The horizons of health forecasting are important but not classified in the literature, and so the approaches used to forecasting various horizons have no common benchmarks to guide new health forecasts. The patterns of health data can be exploited in health forecasting, using time series analysis or other probabilistic techniques. Health forecasting is a valuable resource for enhancing and promoting health services provision; but it also has a number of drawbacks, which are related either to the data source, methodology or technology. This overview is presented to stimulate further discussions on standardizing health forecasting approaches and methods, so that it can be used as a tool to facilitate health care and health services delivery.