Skip to main content
Log in

A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Training of recurrent neural networks (RNNs) introduces considerable computational complexities due to the need for gradient evaluations. How to get fast convergence speed and low computational complexity remains a challenging and open topic. Besides, the transient response of learning process of RNNs is a critical issue, especially for online applications. Conventional RNN training algorithms such as the backpropagation through time and real-time recurrent learning have not adequately satisfied these requirements because they often suffer from slow convergence speed. If a large learning rate is chosen to improve performance, the training process may become unstable in terms of weight divergence. In this paper, a novel training algorithm of RNN, named robust recurrent simultaneous perturbation stochastic approximation (RRSPSA), is developed with a specially designed recurrent hybrid adaptive parameter and adaptive learning rates. RRSPSA is a powerful novel twin-engine simultaneous perturbation stochastic approximation (SPSA) type of RNN training algorithm. It utilizes three specially designed adaptive parameters to maximize training speed for a recurrent training signal while exhibiting certain weight convergence properties with only two objective function measurements as the original SPSA algorithm. The RRSPSA is proved with guaranteed weight convergence and system stability in the sense of Lyapunov function. Computer simulations were carried out to demonstrate applicability of the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Abbreviations

k :

Discrete time step

m :

Output dimension

n I :

Input dimension

n h :

Hidden neuron number

p v :

= m × n h . Dimension of output layer weight vector

p w :

= n i  × n h . Dimension of hidden layer weight vector

l :

Number of time-delayed outputs

u i (k):

\(i=1,\ldots,m.\) External input at time step k

x(k):

\(\in R^{n_{I}}.\) Neural network input vector

\(\Updelta x(k)\) :

\(\in R^{n_{I}}.\) Perturbation vectors on input x(k − 1)

y(k):

\(\in R^{m}.\) Desired output vector

\(\hat{y}(k)\) :

\(\in R^{m}.\) Estimated output vector

\(\varepsilon(k)\) :

\(\in R^{m}.\) Total disturbance vector

e(k):

\(\in R^{m}.\) Output estimation error vector

\(\overline{V}(k)\) :

\(\in R^{p^{v}}.\) Estimated weight vector of output layer

\(\overline{V}^{*}(k)\) :

\(\in R^{p^{v}}.\) Ideal weight vector of output layer

\(\tilde{\overline{V}}(k)\) :

\(\in R^{p^{v}}.\) Estimated error vector of output layer

\(\overline{W}(k)\) :

\(\in R^{p^{w}}.\) Estimated weight vector of hidden layer

\(\overline{W}^{*}(k)\) :

\(\in R^{p^{w}}.\) Ideal weight vector of hidden layer

\(\tilde{\overline{W}}(k)\) :

\(\in R^{p^{w}}.\) Estimated weight vector of hidden layer

\(\hat{W}_{j,:}(k)\) :

\(\in R^{n_{i}}.\) The jth row vector of hidden layer weight matrix \(\hat{W}(k)\)

δ v(k):

Equivalent approximation errors of the loss function of output layer

δ w(k):

Equivalent approximation errors of the loss function of hidden layer

c :

Perturbation gain parameter of the SPSA

\(\Updelta^{v}\) :

\(\in R^{p^v}.\) Perturbation vectors of output layer

r v :

\(\in R^{p^v}.\) Perturbation vectors of output layer

\(\Updelta^{w}\) :

\(\in R^{p^w}.\) Perturbation vectors of hidden layer

r w :

\(\in R^{p^w}.\) Perturbation vectors of hidden layer

\(H(\overline{W}(k),x(k))\) :

\(=H(k)\in R^{m\times p^w}.\) Nonlinear activation function matrix

h j (k):

\(=h\left(\hat{W}_{j,:}(k)x(k)\right).\) The nonlinear activation function and the scalar element of \(H(\overline{V}(k),x(k))\)

α v :

Adaptive learning rate of output layer

α w :

Adaptive learning rate of hidden layer

α vα w :

Positive scalars

ρ v :

Normalization factor of output layer

ρ w :

Normalization factor of hidden layer

β v :

Recurrent hybrid adaptive parameter of output layer

β w :

Recurrent hybrid adaptive parameter of hidden layer

μ j (k):

\(\in R;1\leq j\leq n_{h}.\) Mean value of the input vectors of the jth hidden layer neuron

\(\tilde{\mu}_{j}(k)\) :

\(\in R;1\leq j\leq n_{h}.\) Mean value of the hidden layer weight vector of the jth hidden layer neuron

τ :

Positive scalar

λ :

Positive gain parameter of the threshold function

η :

A small perturbation parameter

References

  1. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79:2554–2558

    Article  MathSciNet  Google Scholar 

  2. Talebi HA (2009) A recurrent neural-network-based sensor and actuator fault detection and isolation for nonlinear systems with application to the satellite’s attitude control subsystem. IEEE Trans Neural Netw 20:45–60

    Article  MathSciNet  Google Scholar 

  3. Hou ZG, Gupta MM, Nikiforuk PN, Tan M, Cheng L (2007) A recurrent neural network for hierarchical control of interconnected dynamic systems. IEEE Trans Neural Netw 18:466–481

    Article  Google Scholar 

  4. Al Seyab RK, Cao Y (2008) Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation. J Process Control 18:568–581

    Article  Google Scholar 

  5. Song Q, Xiao J, Soh YC (1999) Robust backpropagation training algorithm for multilayered neural tracking controller. IEEE Trans Neural Netw 10:1133–1141

    Article  Google Scholar 

  6. Song Q, Wu Y, Soh YC (2008) Robust adaptive gradient-descent training algorithm for recurrent neural networks in discrete time domain. IEEE Trans Neural Netw 19:1841–1853

    Article  Google Scholar 

  7. Song Q, Spall JC, Soh YC, Ni J (2008) Robust neural network tracking controller using simultaneous perturbation stochastic approximation. IEEE Trans Neural Netw 19:817–835

    Article  Google Scholar 

  8. Song Q (2008) On the weight convergence of Elman networks. IEEE Trans Neural Netw 21:463–480

    Article  Google Scholar 

  9. Haykin S (1999) Neural networks: a comprehensive foundation. Printice Hall, New Jersey

    MATH  Google Scholar 

  10. Mandic DP, Chambers JA (2001) Recurrent neural networks for prediction: learning algorithms, architectures and stability. Wiley, New York

    Book  Google Scholar 

  11. Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans Autom Control 37:332–341

    Article  MathSciNet  MATH  Google Scholar 

  12. Maeda Y, Wakamura M (2005) Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation. IEEE Trans Neural Netw 16:1664–1672

    Article  Google Scholar 

  13. Trunov AB, Polycarpou MM (2000) Automated fault diagnosis in nonlinear multivariable systems using a learning methodology. IEEE Trans Neural Netw 11:91–101

    Article  Google Scholar 

  14. Spall JC, Cristion JA (1998) Model-free control of nonlinear stochastic systems with discrete-time measurements. IEEE Trans Autom Control 43:1198–1210

    Article  MathSciNet  MATH  Google Scholar 

  15. Spall JC, Cristion JA (1997) A neural network controller for systems with unmodeled dynamics with applications to wastewater treatment. IEEE Trans Syst Man Cybern Part B Cybern 27:369–375

    Article  Google Scholar 

  16. Maeda Y, De Figueiredo RJP (1997) Learning rules for neuro-controller via simultaneous perturbation. IEEE Trans Neural Netw 8:1119–1130

    Article  Google Scholar 

  17. Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1:339–356

    Article  Google Scholar 

  18. Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error backpropagation. Parallel Distrib Process 1:318–362

    Google Scholar 

  19. Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1:270–280

    Article  Google Scholar 

  20. Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropag Theory Archit Appl 2:433–501

    Google Scholar 

  21. Lin T, Giles C, Horne B, Kung S (1997) A delay damage model selection algorithm for NARX neural networks. IEEE Trans Signal Process 45:2719–2730

    Article  Google Scholar 

  22. Park Y, Murray T, Chen C (2002) Predicting sun spots using a layered perceptron neural network. IEEE Trans Neural Netw 7:501–505

    Article  Google Scholar 

  23. Ljung L (2010) Perspectives on system identification. Annu Rev Control 34(1):1–12

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhao Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Z., Song, Q. & Wang, D. A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks. Neural Comput & Applic 24, 1851–1866 (2014). https://doi.org/10.1007/s00521-013-1436-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-013-1436-5

Keywords

Navigation