Elsevier

Expert Systems with Applications

Volume 41, Issue 15, 1 November 2014, Pages 6596-6610
Expert Systems with Applications

Nonlinear time series forecasting with Bayesian neural networks

https://doi.org/10.1016/j.eswa.2014.04.035Get rights and content

Highlights

Abstract

The Bayesian learning provides a natural way to model the nonlinear structure as the artificial neural networks due to their capability to cope with the model complexity. In this paper, an evolutionary Monte Carlo (MC) algorithm is proposed to train the Bayesian neural networks (BNNs) for the time series forecasting. This approach called as Genetic MC is based on Gaussian approximation with recursive hyperparameter. Genetic MC integrates MC simulations with the genetic algorithms and the fuzzy membership functions. In the implementations, Genetic MC is compared with the traditional neural networks and time series techniques in terms of their forecasting performances over the weekly sales of a Finance Magazine.

Introduction

In the literature, there are many stochastic processes described over time. These processes have quantities related to recent time periods influenced by their past values. This structure is based on time series methodology. The approach of Box and Jenkins (1970) is given as a classic reference on time series techniques in order to model the functional structure (Davidson & Mackinnon, 1993). Let yt be a sequence depending on time {yt:t   (0, ± 1, ± 2,  )}. A model can be defined asyt-a1yt-1--apyt-p=et-b1et-1--bqet-qwhere a’s and b’s are parameters, ap  0, bq  0 and {et} is known as white-noise disturbance, which is an uncorrelated random variable with mean zero and finite variance σ2. This model is called an autoregressive moving average (ARMA) time series model with order (p, q) shown as ARMA (p, q). An ARMA (p, q) model can be also given by the form,yt=c+i=1paiyt-i-j=1qbjet-j+etwhere c denotes the constant term, yti and etj denote the lagged values in the model. The ordinary least squares (OLS) which assumes the stationary condition for time series models are mostly used to estimate the parameters (Fuller, 1996).

Instead of the linear time series models, alternatively the non-linear models such as the bilinear and the threshold autoregressive (AR) model can be used. Despite these models generally perform well, they have inherent limitations (Liang, 2005). First of all, determining the most suitable model requires an expert, otherwise it is possible to construct the incorrect functional structure. Secondly, some kinds of nonlinear behaviors might not be modeled because of the preferred functional structure. To overcome these limitations, artificial neural networks (ANNs) are mostly used to model the nonlinear time series. However, when ANNs are being trained, using the mean square empirical risk as error function requires handling the model complexity criteria.

The Bayesian neural networks (BNNs) provide a flexible way to model the nonlinear problems due to their capabilities to cope with the complexity issue. Besides, they ensure a natural interpretation to the estimations and the predictions performed over the estimated models. For this reason, BNNs are quite useful to apply the regression, the time series, the classification and the density estimation problems. Essentially, Bayesian treatments of the learning in the ANNs are typically based on the Gaussian approximation, the ensemble learning and Markov Chain Monte Carlo (MCMC) simulations known as the full Bayesian approach. For ANNs, Gaussian approximation known as Laplace’s method was introduced by Buntine and Weigend, 1991, Mackay, 1992. This approach is to model the posterior distribution by a Gaussian distribution, centered locally at a mode of posterior distribution of parameters (Mackay (1992)). The ensemble learning was introduced by Hinton and van Camp (1993) in which the approximating distribution is fitted globally by minimizing a Kullback–Leibler divergence rather than locally. Within the context of full Bayes, Neal (1992) introduced advanced Bayesian simulation methods in which MCMC simulations are used to generate the parameter samples from the posterior distribution. However, MCMC techniques can be computationally expensive, and also suffer from assessing the convergence. For this reason, Neal (1992) integrated Bayesian learning with Hybrid Monte Carlo (HMC) method to overcome the mentioned shortcomings. Afterwards, the Bayesian applications to ANNs were reviewed in Mackay, 1995, Bishop, 1995, Neal, 1996 in detail.

In the literature, there are the remarkable studies in which are focused the specific problems related to ANNs from Bayesian perspective. For instance; Insua and Muller, 1998, Marrs, 1998, Holmes and Mallick, 1998 worked on the issue of selecting the number of hidden neurons with the growing and the pruning algorithms for the dimensionality problem in the ANNs. Freitas (2000) incorporated the particle filters and the sequential Monte Carlo (MC) methods in the BNNs. Liang and Wong (2001) proposed to the evolutionary MC algorithm which samples the parameters in the ANNs from the Boltzmann distribution using the mutation, the crossover and the exchange operations defined in the genetic algorithms (GAs). Chua and Goh (2003) proposed a hybrid Bayesian back-propagation approach to the multivariate modeling in the ANNs. They used the stochastic gradient descent algorithm integrated with the evolutionary operators to produce the parameters in the ANNs. Liang (2005) and Lord, Xie, and Zhang (2007) used the truncated Poisson priors to determine the neuron numbers in the hidden layers, and estimated the parameters in the ANNs via the evolutionary MC algorithms proposed by Liang and Wong (2001). Lampinen and Vehtari, 2001, Vanhatalo and Vehtari, 2006 improved a hybrid and reversible MCMC algorithm that are based on Neal (1996). Marwala (2007) adapted to the mutation and the crossover operators defined in GAs into Bayesian learning, and estimated the parameters using Genetic MC algorithm. Mirikitani (2010) proposed a probabilistic approach to recursive the second-order training of recurrent neural networks using the regularization hyperparameters. Goodrich (2011) developed a powerful methodology for estimating the full residual uncertainty in the network weights and making predictions by using a modified Jeffery’s prior combined with a Metropolis MCMC method. Martens and Sutskever (2011) resolved the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies. Kocadağlı, 2012, Kocadağlı, 2013a, Kocadağlı, 2013b integrated the hierarchical Bayesian learning with GAs and the fuzzy numbers to estimate the parameters in the ANNs.

In the context of the nonlinear time series forecasting using ANNs and BNNs, there are remarkable studies in the literature. For instance, Faraway and Chatfield (1998) used the complexity criteria to determine the neuron numbers in the hidden layer of the feed forward neural network; Brahim-Belhouari and Bermak (2004) made the time series forecasting using Gaussian process for the non-stationarity perspective; Teräsvirta, van Dijk, and Medeiros (2005) investigated the smooth transition autoregressions and the neural networks for the macroeconomic time series; Liang (2005) proposed an evolutionary MC algorithm to estimate the parameters in the BNNs for nonlinear time series forecasting; Hippert and Taylor (2010) utilized Bayesian techniques for controlling model complexity and selecting inputs in an ANN for the short-term time series forecasting and Mirikitani (2010) proposed a probabilistic approach to estimate the parameters in the ANNs using the regularization hyperparameters for the time series forecasting.

In order to measure the model complexity and to estimate the parameters and hyperparameters in the BNNs, the hybrid methods are mostly preferred due to their superior performance against the classical time series techniques. In order to train BNNs, MCMC techniques are mostly integrated with the gradient optimization algorithms, the simulated annealing method or the evolutionary algorithms. HMC approaches with the gradient search algorithms provide a desired performance against the mentioned problems because they cope with random-walk behavior encountered in the MCMC treatments in addition to reducing the training time. As known in literature; Gradient descent with momentum, conjugate gradients, quasi Newton and Levenberg–Marquardt are the efficient gradient algorithms that can be integrated with MCMC techniques. However, the gradient algorithms might not able to explore all space freely in the high dimensional parameter cases including many local optimums, and they do not work without the derivative information as well. In the context of ANNs and nonlinear problems, the shortcomings of these algorithms are discussed in the details (Bishop, 1995, Kocadağlı, 2012, Liang, 2005, Neal, 1992). In order to handle the non-linear functions without their derivative information and reduce the training time expended for the parameter estimations in the ANNs, recently GAs is quite popular.

GAs developed by John Holland in 1970’s are heuristic optimization methods based on the concepts of natural evolution, and belongs to the larger class of evolutionary algorithms (Goldberg, 1989, Holland, 1975). They consist of the artificial operators such as selection, mutation, crossover and migration that are the components of the natural evaluation process. In the selection process of GAs, the possibilistic (or linguistic) and the probabilistic uncertainties arise when the parents (the parameter vectors), who would create the next generation, are being selected into the mating pool (Kocadağlı, 2012). In this paper, to measure the possibilistic uncertainty encountered in the selection process, a specific fuzzy membership function is proposed. The possibilistic uncertainty concept is first introduced by Zadeh (1965). According to Zadeh (1965), making decision about processes that show nonrandom uncertainty, such as the uncertainty in the natural language, has been demonstrated to be less than perfect. In these situations, the membership function is the main key for decision making when faced with uncertainty. Essentially, such a framework provides a natural way of dealing with problems in which the source of imprecision is the absence of sharply defined criteria of class membership rather than the presence of random variables (Zadeh, 1965). Thus, intuitively plausible semantic description of imprecise properties of data used in the natural system might be made by fuzzy theory easily (Ross, 1995).

To summarize, the aim of this study is to introduce an evolutionary Monte Carlo algorithm called Genetic MC to train BNNs where they are used the time series forecasting. This approach is based on Gaussian approximation with recursive hyperparameters of BNNs. Specifically, Genetic MC integrates GAs with the Simulated Annealing and Metropolis–Hastings method, and uses the recursive hyperparameters discussed in Mackay (1995) and Bishop (1995) that control the noise of variance and the widths of the parameters in the Gaussian approximation. Besides, Genetic MC determines the temperature values used in the Simulated Annealing and measure the possibilistic uncertainty in the selection process of the GAs via the specific fuzzy membership functions. This paper consists of four sections. The first section is left to the basic definitions, the literature survey, the aim and scope of the study. The second section covers the general structure of ANNs, the error functions and the model complexity superficially. In the third section, the Bayesian learning and the Gaussian approximation with the recursive hyperparameters are given. In the fourth section, the Genetic MC and its structure are introduced. In the last section, the Genetic MC is compared with the traditional neural networks and time series techniques in terms of their forecasting performances over the weekly sales of a Finance Magazine, and then the analysis results are discussed and interpreted in detail.

Section snippets

Artificial neural network structures

In this study, an ANN with three layers is used as demonstrated in Fig. 1. That is, this network composes of the input, hidden and output layers. According to the network structure given in Fig. 1, the mathematical relation between inputs and the sth output for the ith observation can be formulated as follow:fs(xi,θ)=btII+k=1mwtkIIAkwkIxi+bkIs=1,2,rwhere yi = [y1i y2i  yri] and xi = [x1i x2i  xpi] are target and input vectors of the ith observation respectively; wkI=w1kIw2kIwpkI is a row vector that

Bayesian learning

According to the basic rule within the learning framework, the simpler models should be preferred to the unnecessarily complex ones. This approach is called as Occam’s razor which is actually embodied automatically and quantitatively in Bayesian methods without the penalty function, since complex model is automatically self-penalizing in Bayes’s rule. However, the maximum likelihood model choice would lead us inevitably to implausible over parameterized models that generalize poorly (Mackay,

Genetic Monte Carlo algorithms for Gaussian approximation

In this section, a novel Bayesian learning algorithm called as Genetic MC is introduced. The purpose of developing a novel Bayesian learning algorithm is to control complexity using the efficient hyperparameters αeff and βeff, and to estimate the parameters in the Gaussian approximation of BNNs accurately. This approach integrates GAs with the Simulated Annealing and Metropolis–Hastings method, and uses the recursive hyperparameters discussed by Mackay (1992) and Bishop (1995) that control the

Applications

In this paper, the weekly sales between 01/06/2008 and 10/30/2011 of a Finance Magazine are handled, and then the time series analysis is performed over this series given Fig. 4. The series consists of totally 214 observations. To check the normality of this series, the Jarque–Bera test is used. According to the result of this test, it can be said that the series comes from the normal distribution since p-value = 0.1427 > α = 0.05. Moreover, one can see by means of time series graphs and boxplot that

Results and discussion

According to the analysis results, it is possible to attain better performances using BNNs and ANNs than the classical time series techniques. However, to train the feedforward ANN, the gradient descent algorithm based on minimizing MSE requires some complicated approaches such as early stopping for controlling the model complexity and determining a suitable learning rate for gradient search. If the training process realized by the gradient descent is stopped too early, then lower-fitting is

Conclusions

In this paper, an evolutionary Monte Carlo algorithm was proposed to train BNNs in the context of the time series forecasting. The proposed approach integrated Monte Carlo simulations with GAs and fuzzy membership functions; thus, BNNs were trained by a hybrid system for the time series forecasting. Basically, this approach utilizes Gaussian approximation with recursive hyperparameters in the Bayesian learning. By means of the recursive hyperparameters, the approximation and the estimation

Acknowledgments

This work was supported by The Scientific and Technological Research Council of Turkey (TUBİTAK) when corresponding author visited the Institute for Integrating Statistics in Decision Sciences at The George Washington University, USA. The data used in the application was taken from Turkuvaz Distribution and Marketing Co.

References (50)

  • L. Chambers

    Genetic algorithms

    (2000)
  • C.G. Chua et al.

    Nonlinear modeling with confidence estimation using Bayesian neural networks

    International Journal for Numerical and Analytical Methods in Geomechanics

    (2003)
  • R. Davidson et al.

    Estimation and inference in econometrics

    (1993)
  • J. Faraway et al.

    Time series forecasting with neural networks: A comparative study using the air line data

    Journal of the Royal Statistical Society: Series C (Applied Statistics)

    (1998)
  • de Freitas, J. F. G. (2000). Bayesian methods for neural networks (Ph.D. thesis). Trinity College University of...
  • W.A. Fuller

    Introduction to statistical time series

    (1996)
  • S. Geman et al.

    Neural networks and the bias/variance dilemma

    Massachusetts Institute of Technology

    (1992)
  • J. Gill

    Bayesian methods: A social and behavioral sciences approach

    (2008)
  • D.E. Goldberg
    (1989)
  • R.M. Golden

    Mathematical methods for neural network analysis and design

    (1996)
  • Goodrich, M. S. (2011). Markov chain Monte Carlo Bayesian learning for neural networks. In Selected papers at MODSIM...
  • M.T. Hagan et al.

    Neural network design

    (1996)
  • W.K. Hastings

    Monte Carlo sampling methods using Markov chain and their applications

    Biometrica

    (1970)
  • Hinton, G. E. & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the...
  • J.H. Holland

    Adaptation in natural and artificial systems

    (1975)
  • Cited by (0)

    View full text