Skip to main content

2008 | Buch

Forecasting with Exponential Smoothing

The State Space Approach

verfasst von: Professor Rob Hyndman, Professor Anne Koehler, Professor Keith Ord, Associate Professor Ralph Snyder

Verlag: Springer Berlin Heidelberg

Buchreihe : Springer Series in Statistics

insite
SUCHEN

Über dieses Buch

Exponential smoothing methods have been around since the 1950s, and are still the most popular forecasting methods used in business and industry. However, a modeling framework incorporating stochastic models, likelihood calculation, prediction intervals and procedures for model selection, was not developed until recently. This book brings together all of the important new results on the state space framework for exponential smoothing. It will be of interest to people wanting to apply the methods in their own area of interest as well as for researchers wanting to take the ideas in new directions. Part 1 provides an introduction to exponential smoothing and the underlying models. The essential details are given in Part 2, which also provide links to the most important papers in the literature. More advanced topics are covered in Part 3, including the mathematical properties of the models and extensions of the models for specific problems. Applications to particular domains are discussed in Part 4.

Inhaltsverzeichnis

Frontmatter

Introduction

1. Basic Concepts
Time series arise in many different contexts including minute-by-minute stock prices, hourly temperatures at a weather station, daily numbers of arrivals at a medical clinic, weekly sales of a product, monthly unemployment figures for a region, quarterly imports of a country, and annual turnover of a company. That is, time series arise whenever something is observed over time. While a time series may be observed either continuously or at discrete times, the focus of this book is on discrete time series that are observed at regular intervals over time.
2. Getting Started
Although exponential smoothing methods have been around since the 1950s, a modeling framework incorporating stochastic models, likelihood calculations, prediction intervals, and procedures for model selection was not developed until relatively recently, with the work of Ord et al. (1997) and Hyndman et al. (2002). In these (and other) papers, a class of state space models has been developed that underlies all of the exponential smoothing methods.
In this chapter, we provide an introduction to the ideas underlying exponential smoothing and the associated state space models.Many of the details will be skipped over in this chapter, but will be covered in later chapters.
Figure 2.1 shows the four time series from Fig. 1.1, along with point forecasts and 80% prediction intervals. These were all produced using exponential smoothing state space models. In each case, the particular models and all model parameters were chosen automatically with no intervention by the user. This demonstrates one very useful feature of state space models for exponential smoothing—they are easy to use in a completely automated way. In these cases, the models were able to handle data exhibiting a range of features, including very little trend, strong trend, no seasonality, a seasonal pattern that stays constant, and a seasonal pattern with increasing variation as the level of the series increases.

Essentials

3. Linear Innovations State Space Models
In Chap. 2, state space models were introduced for all 15 exponential smoothing methods. Six of these involved only linear relationships, and so are “linear innovations state space models”. In this chapter, we consider linear innovations state space models, including the six linear models of Chap. 2, but also any other models of the same form. The advantage of working with the general framework is that estimation and prediction methods for the general model automatically apply to the six special cases in Chap. 2 and other cases conforming to its structure. There is no need to derive these results on a case by case basis.
The general linear innovations state spacemodel is introduced in Sect. 3.1. Section 3.2 provides a simple algorithm for computing the one-step prediction errors (or innovations); it is this algorithm which makes innovations state space models so appealing. Some of the properties of the models, including stationarity and stability, are discussed in Sect. 3.3. In Sect. 3.4 we discuss some basic innovations state space models that were introduced briefly in Chap. 2. Interesting variations on these models are considered in Sect. 3.5.
4. Nonlinear and Heteroscedastic Innovations State Space Models
In this chapter we consider a broader class of innovations state space models, which enables us to examine multiplicative structures for any or all of the trend, the seasonal pattern and the innovations process. This general class was introduced briefly in Sect. 2.5.2. As for the linear models introduced in the previous chapter, this discussion will pave the way for a general discussion of estimation and prediction methods later in the book.
One of the intrinsic advantages of the innovations framework is that we preserve the ability to write down closed-form expressions for the recursive relationships and point forecasts. In addition, the time series may be represented as a weighted sum of the innovations, where the weights for a given innovation depend only on the initial conditions and earlier innovations, so that the weight and the innovation are conditionally independent. As before, we refer to this structure as the innovations representation of the time series. We find that these models are inherently similar to those for the linear case.
The general innovations form of the state space model is introduced in Sect. 4.1 and various special cases are considered in Sect. 4.2. We then examine seasonal models in Sect. 4.3. Finally, several variations on the core models are examined in Sect. 4.4.
5. Estimation of Innovations State Space Models
For any innovations state space model, the initial (seed) states and the parameters are usually unknown, and therefore must be estimated. This can be done using maximum likelihood estimation, based on the innovations representation of the probability density function.
In Chap. 3 we outlined transformations (referred to as “general exponential smoothing”) that convert a linear time series of mutually dependent random variables into an innovations series of independent and identically distributed random variables. In the heteroscedastic and nonlinear cases, such a representation remains a viable approximation in most circumstances, an issue to which we return in Chap. 15. These innovations can be used to compute the likelihood, which is then optimized with respect to the seed states and the parameters. We introduce the basic methodology in Sect. 5.1. The estimation procedures discussed in this chapter assume a finite start-up; consideration of the infinite start-up case is deferred until Chap. 12.
Any numerical optimization procedure used for this task typically requires starting values for the quantities that are to be estimated. An appropriate choice of starting values is important. The likelihood function may not be unimodal, so a poor choice of starting values can result in sub-optimal estimates. Good starting values (i.e., values that are as close as possible to the optimal estimates) not only increase the chances of finding the true optimum, but typically reduce the computational loads required during the search for the optimum solution. In Sect. 5.2 we will discuss plausible heuristics for determining the starting values.
6. Prediction Distributions and Intervals
Point forecasts for each of the state space models were given in Table 2.1 (p. 18). It is also useful to compute the associated prediction distributions and prediction intervals for each model. In this chapter, we discuss how to compute these distributions and intervals.
7. Selection of Models
One important step in the forecasting process is the selection of a model that could have generated the time series and would, therefore, be a reasonable choice for producing forecasts and prediction intervals. As we have seen in Chaps. 2–4, there are many specific models within the general innovations state space model (2.12). There are also many approaches that one might implement in a model selection process. In Sect. 7.1, we will describe the use of information criteria for selecting among the innovations state space models. These information criteria have been developed specifically for time series data and are based on maximized likelihoods. We will consider four commonly recommended information criteria and one relatively new information criterion. Then, in Sect. 7.2, we will use the MASE from Chap. 2 to develop measures for comparing model selection procedures. These measures will be used in Sects. 7.2.2 and 7.2.3 to compare the five information criteria with each other, and the commonly applied prediction validation method for model selection using the M3 competition data (Makridakis and Hibon 2000) and a hospital data set. We also compare the results with the application of damped trend models for all time series. Finally, some implications of these comparisons will be given in Sect. 7.3.

Further Topics

8. Normalizing Seasonal Components
In exponential smoothing methods, the m seasonal components are combined with level and trend components to indicate changes to the time series that are caused by seasonal effects. It is sometimes desirable to report the value of these m seasonal components, and then it is important for them to make intuitive sense. For example, in the additive seasonal model ETS(A,A,A), the seasonal components are added to the other components of the model. If one seasonal component is positive, there must be at least one other seasonal component that is negative, and the average of the m seasonal components should be 0. When the average value of the m additive seasonal components at time t is 0, the seasonal components are said to be normalized. Similarly, we say that multiplicative seasonal components are normalized if the average of the m multiplicative seasonal components at time t is 1.
Normalized seasonal components can be used to seasonally adjust the data. To calculate the seasonally adjusted data when the model contains an additive seasonal component, it is necessary to subtract the seasonal component from the data. For a multiplicative seasonal component, the data should be divided by the seasonal component.
9. Models with Regressor Variables
Up to this point in the book, we have considered models based upon a single series. However, in many applications, additional information may be available in the form of input or regressor variables; the name may be rather opaque, but we prefer it to the commonly-used but potentially misleading description of independent variables. We then refer to the series of interest as the dependent series. Regressor series may represent either explanatory or intervention variables.
An explanatory variable is one that provides the forecaster with additional information. For example, futures prices for petroleum products can foreshadow changes for consumers in prices at the pump. Despite the term “explanatory” we do not require a causal relationship between the input and dependent variables, but rather a series that is available in timely fashion to improve the forecasting process. Thus, stock prices or surveys of consumer sentiment are explanatory in this sense, even though they may not have causal underpinnings in their relationship with a dependent variable.
An intervention is often represented by an indicator variable taking values 0 and 1, although more general forms are possible. These variables may represent planned changes (e.g., the introduction of new legislation) or unusual events that are recognized only in retrospect (e.g., extreme weather conditions). Indicator variables may also be used to flag unusual observations or outliers; if such values are not identified they can distort the estimates of other parameters in the model.
10. Some Properties of Linear Models
In this chapter, we discuss some of the mathematical properties of the linear innovations state space models described in Chap. 3. These results are based on Hyndman et al. (2008).
We provide conditions that ensure the model is of minimal dimension (Sect. 10.1) and conditions that guarantee the model is stable (Sect. 10.2). We will see that the non-seasonal models are already of minimal dimension, but that the seasonal models are slightly larger than necessary. The normalized seasonal models, introduced in Chap. 8, are of minimal dimension.
The stability conditions discussed in Sect. 10.2 can be used to derive the associated parameter space. We find that the usual parameter restrictions (requiring all smoothing parameters to lie between 0 and 1) do not always lead to stable models. Exact parameter restrictions are derived for all the linear models.
11. Reduced Forms and Relationships with ARIMA Models
The purpose of this chapter is to examine the links between the (linear) innovations state space models and autoregressive integrated moving average (ARIMA) models, frequently called “Box-Jenkins models” because Box and Jenkins (1970) proposed a complete methodology for identification, estimation and prediction with these models. We will show that when the state variables are eliminated from a linear innovations state space model, an ARIMA model is obtained. This ARIMA form of the state space model is called its reduced form.
We begin the chapter with a brief summary of ARIMA models and their properties. In Sect. 11.2 we obtain reduced forms for the simple cases of the local levelmodel, ETS(A,N,N), and the local trendmodel, ETS(A,A,N). Then, in Sect. 11.3 we show how to put a general linear innovations state space model into an ARIMA reduced form. (Causal) stationarity and invertibility conditions for the reduced form model are developed in Sect. 11.4, and we explore the links with causal stationarity and stability of the corresponding innovations state space model.
12. Linear Innovations State Space Models with Random Seed States
Exponential smoothing was used in Chap. 5 to generate the one-step-ahead prediction errors needed to evaluate the likelihood function when estimating the parameters of an innovations state space model. It relied on a fixed seed state vector to initialize the associated recurrence relationships, something that was rationalized by recourse to a finite start-up assumption. The focus is now changed to stochastic processes that can be taken to have begun prior to the period of the first observed time series value, and which, as a consequence, have a random seed state vector. The resulting theory of estimation and prediction is suitable for applications in economics and finance where observations rarely cover the entire history of the generating process.
The Kalman filter (Kalman 1960) can be used in place of exponential smoothing. Like exponential smoothing, it generates one-step-ahead prediction errors, but it works with random seed states. It is an enhanced version of exponential smoothing that is used to update the moments of states and associated quantities by conditioning on successive observations of a time series. It will be seen that it was devised for stationary time series and that it cannot be adapted for nonstationary time series without major modifications.
An alternative to the Kalman filter is an information filter, which also conditions on successive observations. However, instead of having a primary focus on the manipulation of moments of associated random quantities, it relies on linear stochastic equations. By using an information filter, the problems encountered with the Kalman filter for nonstationary data conveniently disappear. An information filter can be applied to both stationary and nonstationary time series without modification. The version presented here is an adaptation of the Paige and Saunders (1977) information filter to the linear innovations state space model context.
13. Conventional State Space Models
The primary purpose of this book is to demonstrate that the innovations form of the state space model provides a simple but flexible approach to forecasting time series. However, for reasons that are not completely clear, the innovations form has been largely over-shadowed in the literature by another version of the state space model that has multiple sources of randomness. We refer to this version as the multi-disturbance or multiple source of error (MSOE) model. The two approaches are compared and contrasted in this chapter. When we are comparing the two frameworks directly, both the finite and infinite start-up assumptions are valid; however, when the two are compared via their ARIMA reduced forms, the infinite start-up assumption will be used. The emphasis will be almost exclusively upon linear state space models, because, as we shall see in Sect. 13.4, the MSOE formulation becomes difficult to manage in the nonlinear case.
In Chap. 2, we introduced the local level and local trend models, together with their seasonal extensions. Itwill be seen that these innovations, or single source of error (SSOE), models all have their counterparts within a multiple source of error framework. It is often thought that the MSOE provides a better modeling framework than the SSOE because the multiple sources of error appear to allow greater generality. We will show that any MSOE model has an innovations representation, so that this viewpoint cannot be correct.
A general definition of the state space framework is presented in Sect. 13.1. It is seen to encompass both the innovations and the multiple disturbance forms of the state space model. Several important special cases of the MSOE are also given. A general approach to estimation is given in Sect. 13.2. Reduced forms of the MSOE models are examined in Sect. 13.3. The SSOE and MSOE approaches are then compared in Sect. 13.4.
14. Time Series with Multiple Seasonal Patterns
Time series that exhibit multiple seasonal patterns can arise from many sources. For example, both daily and weekly cycles can be seen in Fig. 14.1 for hourly utility demand data and in Fig. 14.2 for hourly vehicle counts on a freeway. Usually when we discuss seasonal patterns we mean patterns that change with the seasons of the year for weekly, quarterly, or monthly data. In this chapter any periodic pattern of fixed length will be considered to be a seasonal pattern and called a cycle. It is easy to think of many examples where multiple seasonal cycles would occur, say for hours of the day within days of the week, such as hospital admissions, demand for public transportation, telephone calls to call centers, requests for cash at ATMs, and accessing computer websites. The seasonal innovations state space models in Tables 2.2 and 2.3 are designed for one seasonal cycle in which the seasonal components are revised only once during every cycle. Thus, the objective of this chapter is to extend some of those models to handle more frequent observations with more than one cycle. Another objective is the ability to revise the seasonal components more often than once every seasonal cycle. For example, in the case of hourly traffic count data we would like to be able to revise the seasonal components for a weekly cycle (a cycle of length 168) more frequently than once a week.
Phillip Gould, Farshid Vahid-Araghi
15. Nonlinear Models for Positive Data
In Chap. 4 we considered a class of nonlinear and heteroscedastic innovations state space models and developed their properties. At that time we noted that the Gaussian distribution was not always an appropriate distribution for the error process. Nevertheless, we claimed that the Gaussian likelihood would often provide a reasonable framework for parameter estimation. We also used the Gaussian distribution to construct prediction intervals. Our aim in this chapter is to examine the structure of these nonlinear exponential smoothing state space models in greater detail and to check the conditions under which the use of the Gaussian distribution provides an appropriate approximation.
Why should these issues concern us? First of all, we may note that most of the series that we encounter in business applications are strictly positive, such as sales, prices, etc. Of course, there are many exceptions, most notably the returns on an investment (although the underlying stock or bond price is strictly positive even here). Nevertheless, the linear models of Chap. 3 are widely used for such series, so why should we pay particular attention to the nonlinear models? The first reason is that, under the Gaussian assumption, the forecast variances may be undefined. Second, we find that there are some difficult specification problems associated with models strictly defined on the positive half line; we examine these questions in greater detail in Sect. 15.1.We then explore purely multiplicative models in Sect. 15.2 in order to identify possible solutions to these difficulties. Section 15.3 contains some distributional results for the ETS(M,N,N) model, where the innovations are froma lognormal or gamma distribution. In Sect. 15.4,we examine the extent to which the Gaussian distribution can serve as a reasonable approximation, notwithstanding the theoretical objections noted earlier.We need to consider parameter estimation, point forecasting, interval forecasting and simulation. We find that the Gaussian approximation typically works well for the first two issues, has a somewhat mixed record for interval estimation and may lead to problems in long series simulations.
Muhammad Akram
16. Models for Count Data
Time series are often formed from counts. The number of accidents per month at an intersection, the number of cardiac cases per day presenting at an emergency center, the number of power failures each month in a geographical region, and the weekly demand for a slow moving inventory are all examples of time series of counts. Such data are non-negative and integer-valued.
The models in earlier chapters can be used with count data when counts are large because a Gaussian distribution typically provides a good fit to an empirical distribution of large counts. The latter is typically symmetric, and although a Gaussian distribution spills over into the negative part of the real line, the probability of a negative value implied by a fitted Gaussian distribution is usually very small.
However, the earlier models are not appropriate when counts are small. Counts cannot be negative, yet the probability of a negative value implied by a fitted Gaussian distribution is usually not negligible in this circumstance. Moreover, empirical distributions of low count data are typically positively skewed rather than symmetric. A common practice that retains a role for the Gaussian distribution in the presence of low counts is to model the data after a log transformation. However, this approach fails on count time series which contain zeros, and does not take account of the discrete nature of the sample space.
17. Vector Exponential Smoothing
In earlier chapters we have considered only univariate models; we now proceed to examine multi-series extensions and to compare the multi-series innovations models with other multi-series schemes. We shall refer to our approach as the vector exponential smoothing (VES) framework. The innovations framework is similar to the structural time series models advocated by Harvey (1989) in that both rely upon unobserved components, but there is a fundamental difference: in keeping with the earlier developments in this book, each time series has only one source of error.
The VES models are introduced in Sect. 17.1; special cases of the general model are then discussed in Sect. 17.2. An inferential framework is then developed in Sect. 17.3 for the VES models, building upon our earlier results for the univariate schemes.
The most commonly used multivariate time series models are those defined within the ARIMA framework. Interestingly, this approach also has only one source of randomness for each time series. Thus, the vector versions of the ARIMA framework (VARIMA), and special cases such as vector autoregression (VAR) and vector moving average (VMA), may be classified as innovations approaches to time series analysis (Lütkepohl 2005).We compare the VES frameworkwith existing approaches in Sect. 17.4. As in Chap. 11, when we consider equivalences between vector innovations models and the VARIMA forms, we will make the infinite start-up assumption.
Finally we compare the performance of VES models to VAR and other existing state space alternatives, first in an empirical study of exchange rates (Sect. 17.5), and then across a range of different time series taken froma large macroeconomic database, in Sect. 17.6.
Ashton de Silva

Applications

18. Inventory Control Applications
Since the pioneering work of Brown (1959), it has been a common practice to use exponential smoothing methods to forecast demand in computerized inventory control systems. It transpired that exponential smoothing often produced good point forecasts. However, the methods proposed to measure the risk associated with the predictions typically ignored the effect of random changes to the states, and so seriously underestimated the level of risk as a consequence (Johnston and Harrison 1986; Snyder et al. 1999; Graves 1999). The innovations state space model provides the statistical underpinnings of exponential smoothing and may be used to derive measures of prediction risk that are properly consistent with the use of these forecasting methods, and which, as a consequence, allow for random changes in the states.
There is often, however, a twist to the problem of predicting demand in inventory systems. Predictions are typically made from sales data that are recorded for accounting purposes. Sales, however, may be lost during shortages, in which case sales are a corrupted reflection of demand.Without proper precautions, the use of sales data can lead to forecasts of demand with a downward bias. This problem is considered in Sect. 18.1.
Once obtained, predictions of demand and the uncertainty surrounding them are used as inputs to replenishment decisions. The details of how this is done depends in part on the decision rules employed to determine the timing and size of replenishment orders. There is an extensive literature on inventory control; see, for example, Silver et al. (1998) for a comprehensive coverage. Section 18.2 provides some insights into the problem of properly integrating the demand models underlying the exponential smoothing methods of forecasting with common inventory models.
19. Conditional Heteroscedasticity and Applications in Finance
In 1900, Louis Bachelier published the findings of his doctoral research on stock prices; his empirical results indicated that stock prices behaved like a random walk. However, this study was overlooked for the next 50 years. Then, in 1953, Maurice Kendall published his analysis of stock market prices in which he suggested that price changes were essentially random. Such a claim ran counter to the perceived wisdom of the times, but the empirical studies that followed confirmed Kendall’s claim and ultimately led to the path-breaking work of Black and Scholes (1973) and Merton (1973) on the Efficient Market Hypothesis. In essence, the Black-Scholes theory states that prices will move randomly in an efficient market. Intuitively, we may argue that if prices were predictable, trading would quickly take place to erode the implied advantage. Of course, the theory does not apply to insider knowledge exploited by the few!
We discuss the Black–Scholes model briefly in Sect. 19.1 and relate it to our analysis of discrete time processes. This development leads naturally to conditionally heteroscedastic processes, which is the subject of Sect. 19.2. Then, in Sect. 19.3 we examine time series that evolve over time in both their conditional mean and conditional variance structures. We conclude with a re-analysis of the US gasoline price data considered earlier in Chap. 9, which illustrates the value of conditionally heteroscedastic models in the construction of prediction intervals.
20. Economic Applications: The Beveridge–Nelson Decomposition
Two features that characterize most macroeconomic time series are sustained long run growth and fluctuations around the growth path. These features are often called “trend” and “cycle” respectively, and the macroeconomic, econometric and statistical literatures contain a variety of techniques for decomposing economic time series into components that are roughly aligned with these notions. Most popular in the macroeconomic literature is the use of the Hodrick-Prescott (1980) filter for trend-cycle decomposition, followed by the Beveridge-Nelson (1981) decomposition, the decomposition implied by Harvey’s (1985) unobserved component model, and a myriad of other contenders. Some, but not all, of these decomposition methods are based on statistical models, and there is vigorous debate about which of these methods leads to series that best capture the concepts of economic growth and business cycles. Canova (1998) provides an excellent survey of various methods that are used to decompose economic data, and he also outlines the motivational and quantitative differences between them. He is careful to point out that the “cycles” which result from statistical filters need not have a close correspondence with the classical ideas that underlie business cycle dating exercises undertaken by think-tanks such as the National Bureau of Economic Research in the USA. He also emphasizes that such a correspondence is not even desirable. Alternative decomposition techniques extract different types of information from the data, and each can be used to focus on different aspects of economic theory. Which filter is appropriate depends on the question at hand.
The aim of this chapter is to familiarize readers with the BN decomposition of economic time series, because it provides an interesting application of the linear innovations state space approach to economics. The treatise provided here focusses on the decomposition of a single variable (in practice, this is usually the logarithm of the gross domestic product (GDP) of a country) into just two components. We do not consider more general linear innovations state space approaches that might explicitly account for seasonality, because economic theory has little to say about seasonal effects, and economists typically think in terms of seasonally adjusted data. We also do not consider local trend models, because there is very little empirical evidence that macroeconomic time series follow processes that are integrated of order two, and economic theory does not distinguish between different types of low frequency data. A state space framework that can be used to undertake the BN decomposition is outlined in Sect. 20.1. This framework is based on Anderson et al. (2006) and uses the perfect correlation between permanent and transitory components to recast the BN decomposition as an innovations state space model. It turns out that this state space approach avoids a computational problem associated with other techniques for estimating the BN permanent component (see Miller 1988; Newbold 1990; Morley 2002), and it also facilitates the direct estimation of various measures of “persistence in output.”
Chin Nam Low, Heather M. Anderson
Backmatter
Metadaten
Titel
Forecasting with Exponential Smoothing
verfasst von
Professor Rob Hyndman
Professor Anne Koehler
Professor Keith Ord
Associate Professor Ralph Snyder
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-71918-2
Print ISBN
978-3-540-71916-8
DOI
https://doi.org/10.1007/978-3-540-71918-2