Nonlinear time series forecasting with Bayesian neural networks
Introduction
In the literature, there are many stochastic processes described over time. These processes have quantities related to recent time periods influenced by their past values. This structure is based on time series methodology. The approach of Box and Jenkins (1970) is given as a classic reference on time series techniques in order to model the functional structure (Davidson & Mackinnon, 1993). Let yt be a sequence depending on time {yt:t ∈ (0, ± 1, ± 2, … )}. A model can be defined aswhere a’s and b’s are parameters, ap ≠ 0, bq ≠ 0 and {et} is known as white-noise disturbance, which is an uncorrelated random variable with mean zero and finite variance σ2. This model is called an autoregressive moving average (ARMA) time series model with order (p, q) shown as ARMA (p, q). An ARMA (p, q) model can be also given by the form,where c denotes the constant term, yt−i and et−j denote the lagged values in the model. The ordinary least squares (OLS) which assumes the stationary condition for time series models are mostly used to estimate the parameters (Fuller, 1996).
Instead of the linear time series models, alternatively the non-linear models such as the bilinear and the threshold autoregressive (AR) model can be used. Despite these models generally perform well, they have inherent limitations (Liang, 2005). First of all, determining the most suitable model requires an expert, otherwise it is possible to construct the incorrect functional structure. Secondly, some kinds of nonlinear behaviors might not be modeled because of the preferred functional structure. To overcome these limitations, artificial neural networks (ANNs) are mostly used to model the nonlinear time series. However, when ANNs are being trained, using the mean square empirical risk as error function requires handling the model complexity criteria.
The Bayesian neural networks (BNNs) provide a flexible way to model the nonlinear problems due to their capabilities to cope with the complexity issue. Besides, they ensure a natural interpretation to the estimations and the predictions performed over the estimated models. For this reason, BNNs are quite useful to apply the regression, the time series, the classification and the density estimation problems. Essentially, Bayesian treatments of the learning in the ANNs are typically based on the Gaussian approximation, the ensemble learning and Markov Chain Monte Carlo (MCMC) simulations known as the full Bayesian approach. For ANNs, Gaussian approximation known as Laplace’s method was introduced by Buntine and Weigend, 1991, Mackay, 1992. This approach is to model the posterior distribution by a Gaussian distribution, centered locally at a mode of posterior distribution of parameters (Mackay (1992)). The ensemble learning was introduced by Hinton and van Camp (1993) in which the approximating distribution is fitted globally by minimizing a Kullback–Leibler divergence rather than locally. Within the context of full Bayes, Neal (1992) introduced advanced Bayesian simulation methods in which MCMC simulations are used to generate the parameter samples from the posterior distribution. However, MCMC techniques can be computationally expensive, and also suffer from assessing the convergence. For this reason, Neal (1992) integrated Bayesian learning with Hybrid Monte Carlo (HMC) method to overcome the mentioned shortcomings. Afterwards, the Bayesian applications to ANNs were reviewed in Mackay, 1995, Bishop, 1995, Neal, 1996 in detail.
In the literature, there are the remarkable studies in which are focused the specific problems related to ANNs from Bayesian perspective. For instance; Insua and Muller, 1998, Marrs, 1998, Holmes and Mallick, 1998 worked on the issue of selecting the number of hidden neurons with the growing and the pruning algorithms for the dimensionality problem in the ANNs. Freitas (2000) incorporated the particle filters and the sequential Monte Carlo (MC) methods in the BNNs. Liang and Wong (2001) proposed to the evolutionary MC algorithm which samples the parameters in the ANNs from the Boltzmann distribution using the mutation, the crossover and the exchange operations defined in the genetic algorithms (GAs). Chua and Goh (2003) proposed a hybrid Bayesian back-propagation approach to the multivariate modeling in the ANNs. They used the stochastic gradient descent algorithm integrated with the evolutionary operators to produce the parameters in the ANNs. Liang (2005) and Lord, Xie, and Zhang (2007) used the truncated Poisson priors to determine the neuron numbers in the hidden layers, and estimated the parameters in the ANNs via the evolutionary MC algorithms proposed by Liang and Wong (2001). Lampinen and Vehtari, 2001, Vanhatalo and Vehtari, 2006 improved a hybrid and reversible MCMC algorithm that are based on Neal (1996). Marwala (2007) adapted to the mutation and the crossover operators defined in GAs into Bayesian learning, and estimated the parameters using Genetic MC algorithm. Mirikitani (2010) proposed a probabilistic approach to recursive the second-order training of recurrent neural networks using the regularization hyperparameters. Goodrich (2011) developed a powerful methodology for estimating the full residual uncertainty in the network weights and making predictions by using a modified Jeffery’s prior combined with a Metropolis MCMC method. Martens and Sutskever (2011) resolved the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies. Kocadağlı, 2012, Kocadağlı, 2013a, Kocadağlı, 2013b integrated the hierarchical Bayesian learning with GAs and the fuzzy numbers to estimate the parameters in the ANNs.
In the context of the nonlinear time series forecasting using ANNs and BNNs, there are remarkable studies in the literature. For instance, Faraway and Chatfield (1998) used the complexity criteria to determine the neuron numbers in the hidden layer of the feed forward neural network; Brahim-Belhouari and Bermak (2004) made the time series forecasting using Gaussian process for the non-stationarity perspective; Teräsvirta, van Dijk, and Medeiros (2005) investigated the smooth transition autoregressions and the neural networks for the macroeconomic time series; Liang (2005) proposed an evolutionary MC algorithm to estimate the parameters in the BNNs for nonlinear time series forecasting; Hippert and Taylor (2010) utilized Bayesian techniques for controlling model complexity and selecting inputs in an ANN for the short-term time series forecasting and Mirikitani (2010) proposed a probabilistic approach to estimate the parameters in the ANNs using the regularization hyperparameters for the time series forecasting.
In order to measure the model complexity and to estimate the parameters and hyperparameters in the BNNs, the hybrid methods are mostly preferred due to their superior performance against the classical time series techniques. In order to train BNNs, MCMC techniques are mostly integrated with the gradient optimization algorithms, the simulated annealing method or the evolutionary algorithms. HMC approaches with the gradient search algorithms provide a desired performance against the mentioned problems because they cope with random-walk behavior encountered in the MCMC treatments in addition to reducing the training time. As known in literature; Gradient descent with momentum, conjugate gradients, quasi Newton and Levenberg–Marquardt are the efficient gradient algorithms that can be integrated with MCMC techniques. However, the gradient algorithms might not able to explore all space freely in the high dimensional parameter cases including many local optimums, and they do not work without the derivative information as well. In the context of ANNs and nonlinear problems, the shortcomings of these algorithms are discussed in the details (Bishop, 1995, Kocadağlı, 2012, Liang, 2005, Neal, 1992). In order to handle the non-linear functions without their derivative information and reduce the training time expended for the parameter estimations in the ANNs, recently GAs is quite popular.
GAs developed by John Holland in 1970’s are heuristic optimization methods based on the concepts of natural evolution, and belongs to the larger class of evolutionary algorithms (Goldberg, 1989, Holland, 1975). They consist of the artificial operators such as selection, mutation, crossover and migration that are the components of the natural evaluation process. In the selection process of GAs, the possibilistic (or linguistic) and the probabilistic uncertainties arise when the parents (the parameter vectors), who would create the next generation, are being selected into the mating pool (Kocadağlı, 2012). In this paper, to measure the possibilistic uncertainty encountered in the selection process, a specific fuzzy membership function is proposed. The possibilistic uncertainty concept is first introduced by Zadeh (1965). According to Zadeh (1965), making decision about processes that show nonrandom uncertainty, such as the uncertainty in the natural language, has been demonstrated to be less than perfect. In these situations, the membership function is the main key for decision making when faced with uncertainty. Essentially, such a framework provides a natural way of dealing with problems in which the source of imprecision is the absence of sharply defined criteria of class membership rather than the presence of random variables (Zadeh, 1965). Thus, intuitively plausible semantic description of imprecise properties of data used in the natural system might be made by fuzzy theory easily (Ross, 1995).
To summarize, the aim of this study is to introduce an evolutionary Monte Carlo algorithm called Genetic MC to train BNNs where they are used the time series forecasting. This approach is based on Gaussian approximation with recursive hyperparameters of BNNs. Specifically, Genetic MC integrates GAs with the Simulated Annealing and Metropolis–Hastings method, and uses the recursive hyperparameters discussed in Mackay (1995) and Bishop (1995) that control the noise of variance and the widths of the parameters in the Gaussian approximation. Besides, Genetic MC determines the temperature values used in the Simulated Annealing and measure the possibilistic uncertainty in the selection process of the GAs via the specific fuzzy membership functions. This paper consists of four sections. The first section is left to the basic definitions, the literature survey, the aim and scope of the study. The second section covers the general structure of ANNs, the error functions and the model complexity superficially. In the third section, the Bayesian learning and the Gaussian approximation with the recursive hyperparameters are given. In the fourth section, the Genetic MC and its structure are introduced. In the last section, the Genetic MC is compared with the traditional neural networks and time series techniques in terms of their forecasting performances over the weekly sales of a Finance Magazine, and then the analysis results are discussed and interpreted in detail.
Section snippets
Artificial neural network structures
In this study, an ANN with three layers is used as demonstrated in Fig. 1. That is, this network composes of the input, hidden and output layers. According to the network structure given in Fig. 1, the mathematical relation between inputs and the sth output for the ith observation can be formulated as follow:where yi = [y1i y2i … yri] and xi = [x1i x2i … xpi] are target and input vectors of the ith observation respectively; is a row vector that
Bayesian learning
According to the basic rule within the learning framework, the simpler models should be preferred to the unnecessarily complex ones. This approach is called as Occam’s razor which is actually embodied automatically and quantitatively in Bayesian methods without the penalty function, since complex model is automatically self-penalizing in Bayes’s rule. However, the maximum likelihood model choice would lead us inevitably to implausible over parameterized models that generalize poorly (Mackay,
Genetic Monte Carlo algorithms for Gaussian approximation
In this section, a novel Bayesian learning algorithm called as Genetic MC is introduced. The purpose of developing a novel Bayesian learning algorithm is to control complexity using the efficient hyperparameters αeff and βeff, and to estimate the parameters in the Gaussian approximation of BNNs accurately. This approach integrates GAs with the Simulated Annealing and Metropolis–Hastings method, and uses the recursive hyperparameters discussed by Mackay (1992) and Bishop (1995) that control the
Applications
In this paper, the weekly sales between 01/06/2008 and 10/30/2011 of a Finance Magazine are handled, and then the time series analysis is performed over this series given Fig. 4. The series consists of totally 214 observations. To check the normality of this series, the Jarque–Bera test is used. According to the result of this test, it can be said that the series comes from the normal distribution since p-value = 0.1427 > α = 0.05. Moreover, one can see by means of time series graphs and boxplot that
Results and discussion
According to the analysis results, it is possible to attain better performances using BNNs and ANNs than the classical time series techniques. However, to train the feedforward ANN, the gradient descent algorithm based on minimizing MSE requires some complicated approaches such as early stopping for controlling the model complexity and determining a suitable learning rate for gradient search. If the training process realized by the gradient descent is stopped too early, then lower-fitting is
Conclusions
In this paper, an evolutionary Monte Carlo algorithm was proposed to train BNNs in the context of the time series forecasting. The proposed approach integrated Monte Carlo simulations with GAs and fuzzy membership functions; thus, BNNs were trained by a hybrid system for the time series forecasting. Basically, this approach utilizes Gaussian approximation with recursive hyperparameters in the Bayesian learning. By means of the recursive hyperparameters, the approximation and the estimation
Acknowledgments
This work was supported by The Scientific and Technological Research Council of Turkey (TUBİTAK) when corresponding author visited the Institute for Integrating Statistics in Decision Sciences at The George Washington University, USA. The data used in the application was taken from Turkuvaz Distribution and Marketing Co.
References (50)
- et al.
Gaussian process for nonstationary time series prediction
Computational Statistics & Data Analysis
(2004) - et al.
An evaluation of Bayesian techniques for controlling model complexity and selecting inputs in a neural network for short-term load forecasting
Neural Networks
(2010) Bayesian training of neural networks using genetic programming
Pattern Recognition Letters
(2007)- et al.
Linear models, smooth transition autoregressions, and neural networks for forecasting macroeconomic time series: A re-examination
International Journal of Forecasting
(2005) Fuzzy sets
Information Control
(1965)Neural networks for pattern recognition
(1995)Pattern recognition and machine learning
(2006)- et al.
Time series analysis, forecast and control
(1970) - et al.
Bayesian back-propagation
Complex Systems
(1991) - et al.
An Iterative pruning algorithm for feedforward neural networks
IEEE Transactions on Neural Networks
(1997)