Elsevier

International Journal of Forecasting

Volume 33, Issue 1, January–March 2017, Pages 199-213
International Journal of Forecasting

Forecasting stochastic processes using singular spectrum analysis: Aspects of the theory and application

https://doi.org/10.1016/j.ijforecast.2016.01.003Get rights and content

Abstract

This paper presents theoretical results on the properties of forecasts obtained by using singular spectrum analysis to forecast time series that are realizations of stochastic processes. The mean squared forecast errors are derived under broad regularity conditions, and it is shown that, in practice, the forecasts obtained will converge to their population ensemble counterparts. The theoretical results are illustrated by examining the performances of singular spectrum analysis forecasts when applied to autoregressive processes and a random walk process. Simulation experiments suggest that the asymptotic properties developed are reflected in the behaviour of observed finite samples. Empirical applications using real world data sets indicate that forecasts based on singular spectrum analysis are competitive with other methods currently in vogue.

Introduction

Singular spectrum analysis (SSA) is a nonparametric technique that is designed for use in signal extraction and the prediction of irregular time series that may exhibit non-stationary and nonlinear properties, as well as intermittent or transient behaviour. The development of SSA is often attributed to researchers working in the physical sciences, namely Broomhead and King (1986), Vautard and Ghil (1989) and Vautard, Yiou, and Ghil (1992), although many of the basic building blocks were outlined by Basilevsky and Hum (1979) in a socioeconomic setting, and an early formulation of some of the key ideas can be found in the work of Prony (1795). An introduction to SSA is presented by Elsner and Tsonis (1996), and a more detailed examination of the methodology, with an emphasis on the algebraic structure and algorithms, is provided by Golyandina, Nekrutkin, and Zhigljavski (2001).

The application of SSA to forecasting has gained popularity over recent years (see for example Hassani, Heravi, & Zhigljavsky, 2009, Hassani, Soofi, & Zhigljavsky, 2010, Hassani & Zhigljavsky, 2009 and Thomakos, Wang, & Wille, 2002, for applications in business and economics), and the general finding appears to be that SSA performs well. These studies have examined SSA forecasts by investigating real world applications and comparing the performance of SSA to those of other benchmarks like ARIMA models and Holt–Winters procedures. However, with real world data the true data generating mechanism is not known, and making a comparison with such benchmarks does not convey the full picture: knowing that SSA outperforms a benchmark serves only to show that the benchmark is suboptimal, and therefore does not provide an appropriate baseline.

In this paper, our purpose is to provide what we believe to be the first theoretical analysis of the forecasting performance of SSA under appropriate regularity conditions concerning the true data generating mechanism. We present a formulation of the SSA mean squared forecast error (MSFE) for a general class of processes. The usefulness of such formulae lies not only in the fact that they provide a neat mathematical characterization of the SSA forecast error, but also in the fact that they allow a comparison to be made between SSA and the optimal mean squared error solution for a known random processes. The minimal mean squared error (MMSE) predictor obviously provides a (gold) standard against which all other procedures can be measured.

Irrespective of the actual structure of the observed process, SSA forecasts are obtained by calculating a linear recurrence formula (LRF) that is used to construct a prediction of the future value(s) of the realized time series. Given a univariate time series of length N, the coefficients of the LRF are computed from a spectral decomposition of an m×n dimensional Hankel matrix known as the trajectory matrix. The dimension m is called the window length, and n=Nm+1 is referred to as the window width. The Gramian of the trajectory matrix is constructed for a known window length, and the eigenvalue decomposition of the Gramian evaluated. This is then used to decompose the observed series into a signal component, constructed from k eigentriples of the Hankel matrix (the first k left and right hand eigenvalues and their associated singular values), and a residual. The resulting signal plus noise decomposition is then employed to produce a forecast via the LRF coefficients. Details are presented in the following section, where we outline the basic structure of the calculations underlying the construction of a SSA(m,k) model and the associated forecasts.

Section  3 presents the theoretical MSFE of a SSA(m,k) model under very broad assumptions. The formulae that we derive indicate how the use of different values of m, a tuning parameter, and k, a modeling parameter, will interact to influence the MSFE obtained from a given SSA(m,k) model. In Section  4, it is shown that, when appropriate regularity conditions are satisfied, the SSA forecasts constructed in practice, and their associated MSFE estimates, will converge to their theoretical population ensemble counterparts.

Section  5 illustrates the theoretical results obtained in Sections  3 Theoretical properties of SSA forecasts, 4 Consistent parameter estimation. In forecasting applications, it is common practice to assume implicitly that the fitted model is correct, and therefore that the forecasting formulae derived from the model are appropriate; however, such an assumption rarely holds true. In general, the true data generating process (DGP) is unknown, and the fitted model will only provide, at best, a close approximation to the true DGP. Hence, the expectation is that the forecasting performance of a fitted model will be sub-optimal, and therefore it is natural to ask in what ways and to what extent the forecasting performance of the fitted model will fall short. In an attempt to address this question, Section  5 examines the MSFE performances of different SSA(m,k) models and compares them with those of the optimal MSE predictors for known DGPs.

Section  6 demonstrates the application of SSA forecasting to different real world time series. It shows that SSA forecasts can provide considerable improvements in empirical MSFE performances over the conventional benchmark models that have been used previously to characterize these series. Section  7 presents a brief conclusion.

Section snippets

The mechanics of SSA forecasting

Singular spectrum analysis (SSA) is based on the basic idea that there is an isomorphism between an observed time series {x(t):t=1,,N} and the vector space of m×n Hankel matrices, defined by the mapping {x(t):t=1,,N}X=[x(1)x(2)x(n)x(2)x(3)x(n+1)x(m)x(m+1)x(N)]=[x1::xn], where m is a preassigned window length, n=Nm+1, xt=(x(t),x(t+1),,x(t+m1)), and the so called trajectory matrix X=[x(i+t1)] for i=1,,m and t=1,,n. Let 12m>0 denote the eigenvalues of XX arranged in

Theoretical properties of SSA forecasts

Let us assume that the data-generating mechanism of the underlying stochastic process x(t) is such that there exists a k<m for which the m-lagged vectors of the trajectory matrix X can be modeled as xt=Φzt+εt=st+εt,t=1,,n, where Φ=[φ1::φk] is an m×k coefficient matrix, and zt=(ζ1t,,ζkt) is a zero mean stochastic process with contemporaneous covariance matrix Λ=diag{λ1,,λk}, denoted zt(0,Λ) henceforth, that is orthogonal to εt(0,σ2I), where I is the mth order identity. The specification

Consistent parameter estimation

Before proceeding, let us note that the population ensemble forecasting parameters and MSFE values presented in Lemma 3, Theorem 1 and Proposition 1 will not be available to the practitioner. However, they can be estimated from the data using obvious “plug in” estimates. Thus, At+j, j=1,,h, can be estimated by substituting a for α, Γm+h1 can be estimated using t=1nξ(t)ξ(t)/n, where ξ(t)=(x(t),,x(t+m+h1)) and n=Nmh+1, and Σm+h1 can be estimated by replacing ΥmkuGΥmku with, in an

Forecasting an AR(1) process

Consider a zero mean AR(1) process x(t)=ϕx(t1)+ε(t), where ε(t)WN(0,σ2) and |ϕ|<1. The autocovariance generating function of this process is γ(z)=γ(0)ϕizi, where γ(0)=σ2/(1ϕ2), and Assumption 1 is satisfied with Γ=γ(0)T{1,ϕ,,ϕm1}. For an AR(1) process, the optimal MSE forecast of x(t+j) given x(τ), τt, is x(t+j|t)=ϕjx(t), j=1,2,,h, with a MSFE of MSFEAR(1)(j)=σ2(1ϕ2j)/(1ϕ2) for the jth forecast horizon.

Empirical applications

When examining real world data sets, the predictive performance of a model can be evaluated only by comparing it to other competing models, as the true DGP is unknown in practice, and therefore the optimal MSE predictor used for the theoretical processes examined in the previous section is no longer available for analysis. We have therefore selected three different time series that have been examined elsewhere in the literature, and used models the have previously been fitted to these data sets

Concluding remarks

The theoretical examination of SSA forecasting presented above indicates that the loss in MSFE performance of SSA relative to the optimal MSE predictor is both process-specific and variable in nature, and need not be severe. However, when applied to different real world time series, SSA can exhibit considerable improvements in empirical MSFE performances over the conventional benchmark models that have been used to characterize the series previously.

The contrast between the inferior performance

M. Atikur Rahman Khan is an OCE Postdoctoral Fellow at the Commonwealth Scientific and Industrial Research Organisation (CSIRO). He completed his M.Sc. in Statistics at the National University of Singapore and his Ph.D. in Econometrics at the Department of Econometrics and Business Statistics, Monash University. He is a member of the International Statistical Institute and the Statistical Society of Australia. He was awarded a Postgraduate Publication Award at Monash University for carrying out

References (26)

  • J. Davidson

    Stochastic limit theory: An introduction for econometricians

    (1994)
  • J.B. Elsner et al.

    Singular spectrum analysis: A new tool in time series analysis

    (1996)
  • N. Golyandina et al.

    Analysis of time series structure: SSA and related techniques

    (2001)
  • Cited by (26)

    • Circulant singular spectrum analysis: A new automated procedure for signal extraction

      2021, Signal Processing
      Citation Excerpt :

      The main task in SSA is to extract the underlying signals of a time series like the trend, cycle, seasonal and irregular components. It has been applied to a wide range of time series problems, besides signal processing [2], like forecasting [3], missing value imputation [4] or functional time series [5] among others. SSA builds a trajectory matrix by putting together lagged pieces of the original time series and works with the Singular Value Decomposition of this matrix.

    • Forecasting with auxiliary information in forecasts using multivariate singular spectrum analysis

      2019, Information Sciences
      Citation Excerpt :

      In brief, the MSSA process filters the data and extracts signals which can be used to generate a new time series that is less noisy, and then uses this less noisy, reconstructed series for generating a forecast [40]. In comparison to its univariate counterpart, Singular Spectrum Analysis (SSA) [4,5,14], which has a variety of applications [1,13,21,22,26,30–32,34,37,41,43,51], MSSA is yet to be exploited on a similar scale. As a result there have been comparatively few applications of MSSA, see for example [6,20,25,35,39,42,44].

    • Forecasting the demand of the aviation industry using hybrid time series SARIMA-SVR approach

      2019, Transportation Research Part E: Logistics and Transportation Review
    • Structured low-rank matrix completion for forecasting in time series analysis

      2018, International Journal of Forecasting
      Citation Excerpt :

      SSA uses the fact that many time series can be approximated well by a class of so-called time series of finite rank. However, despite many successful examples (Hassani, Heravi, & Zhigljavsky, 2009; Khan & Poskitt, 2017; Papailias & Thomakos, 2017), SSA forecasting has a number of disadvantages. This paper develops a method based on Hankel matrix completion.

    View all citing articles on Scopus

    M. Atikur Rahman Khan is an OCE Postdoctoral Fellow at the Commonwealth Scientific and Industrial Research Organisation (CSIRO). He completed his M.Sc. in Statistics at the National University of Singapore and his Ph.D. in Econometrics at the Department of Econometrics and Business Statistics, Monash University. He is a member of the International Statistical Institute and the Statistical Society of Australia. He was awarded a Postgraduate Publication Award at Monash University for carrying out this work. His research interests include time series analysis and forecasting, predictive modelling, and privacy-preserving data analytics.

    D.S. Poskitt holds a chair in the Department of Econometrics and Business Statistics, Monash University, having previously been a member of the Department of Statistics and Econometrics, Australian National University. He is a member of the Econometrics Society, the Institute of Mathematical Statistics and the Australian and New-Zealand Statistical Society, and a Fellow of the Royal Statistical Society. He has published extensively in the area of statistical time series analysis and is an Associate Editor of the Journal of Time Series Analysis. He is a recipient of an American Statistical Association Award for Outstanding Statistical Application, and an Econometric Theory Multa Scripsit Award.

    View full text