A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks

Xu, Zhao; Song, Qing; Wang, Danwei

doi:10.1007/s00521-013-1436-5

A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks

Original Article
Published: 14 June 2013

Volume 24, pages 1851–1866, (2014)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zhao Xu¹,
Qing Song¹ &
Danwei Wang¹

265 Accesses
3 Citations
Explore all metrics

Abstract

Training of recurrent neural networks (RNNs) introduces considerable computational complexities due to the need for gradient evaluations. How to get fast convergence speed and low computational complexity remains a challenging and open topic. Besides, the transient response of learning process of RNNs is a critical issue, especially for online applications. Conventional RNN training algorithms such as the backpropagation through time and real-time recurrent learning have not adequately satisfied these requirements because they often suffer from slow convergence speed. If a large learning rate is chosen to improve performance, the training process may become unstable in terms of weight divergence. In this paper, a novel training algorithm of RNN, named robust recurrent simultaneous perturbation stochastic approximation (RRSPSA), is developed with a specially designed recurrent hybrid adaptive parameter and adaptive learning rates. RRSPSA is a powerful novel twin-engine simultaneous perturbation stochastic approximation (SPSA) type of RNN training algorithm. It utilizes three specially designed adaptive parameters to maximize training speed for a recurrent training signal while exhibiting certain weight convergence properties with only two objective function measurements as the original SPSA algorithm. The RRSPSA is proved with guaranteed weight convergence and system stability in the sense of Lyapunov function. Computer simulations were carried out to demonstrate applicability of the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and Application of Artificial Neural Network

Article 30 December 2017

A survey on LSTM memristive neural network architectures and applications

Article 14 October 2019

A Review of Physics Informed Neural Networks for Multiscale Analysis and Inverse Problems

Article 13 February 2024

Abbreviations

k :: Discrete time step
m :: Output dimension
n _I :: Input dimension
n _h :: Hidden neuron number
p ^v :: = m × n _h. Dimension of output layer weight vector
p ^w :: = n _i × n _h. Dimension of hidden layer weight vector
l :: Number of time-delayed outputs
u _i(k):: \(i=1,\ldots,m.\) External input at time step k
x(k):: \(\in R^{n_{I}}.\) Neural network input vector
\(\Updelta x(k)\) :: \(\in R^{n_{I}}.\) Perturbation vectors on input x(k − 1)
y(k):: \(\in R^{m}.\) Desired output vector
\(\hat{y}(k)\) :: \(\in R^{m}.\) Estimated output vector
\(\varepsilon(k)\) :: \(\in R^{m}.\) Total disturbance vector
e(k):: \(\in R^{m}.\) Output estimation error vector
\(\overline{V}(k)\) :: \(\in R^{p^{v}}.\) Estimated weight vector of output layer
\(\overline{V}^{*}(k)\) :: \(\in R^{p^{v}}.\) Ideal weight vector of output layer
\(\tilde{\overline{V}}(k)\) :: \(\in R^{p^{v}}.\) Estimated error vector of output layer
\(\overline{W}(k)\) :: \(\in R^{p^{w}}.\) Estimated weight vector of hidden layer
\(\overline{W}^{*}(k)\) :: \(\in R^{p^{w}}.\) Ideal weight vector of hidden layer
\(\tilde{\overline{W}}(k)\) :: \(\in R^{p^{w}}.\) Estimated weight vector of hidden layer
\(\hat{W}_{j,:}(k)\) :: \(\in R^{n_{i}}.\) The jth row vector of hidden layer weight matrix \(\hat{W}(k)\)
δ ^v(k):: Equivalent approximation errors of the loss function of output layer
δ ^w(k):: Equivalent approximation errors of the loss function of hidden layer
c :: Perturbation gain parameter of the SPSA
\(\Updelta^{v}\) :: \(\in R^{p^v}.\) Perturbation vectors of output layer
r ^v :: \(\in R^{p^v}.\) Perturbation vectors of output layer
\(\Updelta^{w}\) :: \(\in R^{p^w}.\) Perturbation vectors of hidden layer
r ^w :: \(\in R^{p^w}.\) Perturbation vectors of hidden layer
\(H(\overline{W}(k),x(k))\) :: \(=H(k)\in R^{m\times p^w}.\) Nonlinear activation function matrix
h _j(k):: \(=h\left(\hat{W}_{j,:}(k)x(k)\right).\) The nonlinear activation function and the scalar element of \(H(\overline{V}(k),x(k))\)
α ^v :: Adaptive learning rate of output layer
α ^w :: Adaptive learning rate of hidden layer
α ^v, α ^w :: Positive scalars
ρ ^v :: Normalization factor of output layer
ρ ^w :: Normalization factor of hidden layer
β ^v :: Recurrent hybrid adaptive parameter of output layer
β ^w :: Recurrent hybrid adaptive parameter of hidden layer
μ _j(k):: \(\in R;1\leq j\leq n_{h}.\) Mean value of the input vectors of the jth hidden layer neuron
\(\tilde{\mu}_{j}(k)\) :: \(\in R;1\leq j\leq n_{h}.\) Mean value of the hidden layer weight vector of the jth hidden layer neuron
τ :: Positive scalar
λ :: Positive gain parameter of the threshold function
η :: A small perturbation parameter

References

Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79:2554–2558
Article MathSciNet Google Scholar
Talebi HA (2009) A recurrent neural-network-based sensor and actuator fault detection and isolation for nonlinear systems with application to the satellite’s attitude control subsystem. IEEE Trans Neural Netw 20:45–60
Article MathSciNet Google Scholar
Hou ZG, Gupta MM, Nikiforuk PN, Tan M, Cheng L (2007) A recurrent neural network for hierarchical control of interconnected dynamic systems. IEEE Trans Neural Netw 18:466–481
Article Google Scholar
Al Seyab RK, Cao Y (2008) Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation. J Process Control 18:568–581
Article Google Scholar
Song Q, Xiao J, Soh YC (1999) Robust backpropagation training algorithm for multilayered neural tracking controller. IEEE Trans Neural Netw 10:1133–1141
Article Google Scholar
Song Q, Wu Y, Soh YC (2008) Robust adaptive gradient-descent training algorithm for recurrent neural networks in discrete time domain. IEEE Trans Neural Netw 19:1841–1853
Article Google Scholar
Song Q, Spall JC, Soh YC, Ni J (2008) Robust neural network tracking controller using simultaneous perturbation stochastic approximation. IEEE Trans Neural Netw 19:817–835
Article Google Scholar
Song Q (2008) On the weight convergence of Elman networks. IEEE Trans Neural Netw 21:463–480
Article Google Scholar
Haykin S (1999) Neural networks: a comprehensive foundation. Printice Hall, New Jersey
MATH Google Scholar
Mandic DP, Chambers JA (2001) Recurrent neural networks for prediction: learning algorithms, architectures and stability. Wiley, New York
Book Google Scholar
Spall JC (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans Autom Control 37:332–341
Article MathSciNet MATH Google Scholar
Maeda Y, Wakamura M (2005) Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation. IEEE Trans Neural Netw 16:1664–1672
Article Google Scholar
Trunov AB, Polycarpou MM (2000) Automated fault diagnosis in nonlinear multivariable systems using a learning methodology. IEEE Trans Neural Netw 11:91–101
Article Google Scholar
Spall JC, Cristion JA (1998) Model-free control of nonlinear stochastic systems with discrete-time measurements. IEEE Trans Autom Control 43:1198–1210
Article MathSciNet MATH Google Scholar
Spall JC, Cristion JA (1997) A neural network controller for systems with unmodeled dynamics with applications to wastewater treatment. IEEE Trans Syst Man Cybern Part B Cybern 27:369–375
Article Google Scholar
Maeda Y, De Figueiredo RJP (1997) Learning rules for neuro-controller via simultaneous perturbation. IEEE Trans Neural Netw 8:1119–1130
Article Google Scholar
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1:339–356
Article Google Scholar
Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error backpropagation. Parallel Distrib Process 1:318–362
Google Scholar
Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1:270–280
Article Google Scholar
Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropag Theory Archit Appl 2:433–501
Google Scholar
Lin T, Giles C, Horne B, Kung S (1997) A delay damage model selection algorithm for NARX neural networks. IEEE Trans Signal Process 45:2719–2730
Article Google Scholar
Park Y, Murray T, Chen C (2002) Predicting sun spots using a layered perceptron neural network. IEEE Trans Neural Netw 7:501–505
Article Google Scholar
Ljung L (2010) Perspectives on system identification. Annu Rev Control 34(1):1–12
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Zhao Xu, Qing Song & Danwei Wang

Authors

Zhao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Song
View author publications
You can also search for this author in PubMed Google Scholar
Danwei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhao Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Z., Song, Q. & Wang, D. A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks. Neural Comput & Applic 24, 1851–1866 (2014). https://doi.org/10.1007/s00521-013-1436-5

Download citation

Received: 22 February 2012
Accepted: 21 May 2013
Published: 14 June 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00521-013-1436-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks

Abstract

Access this article

Similar content being viewed by others

Development and Application of Artificial Neural Network

A survey on LSTM memristive neural network architectures and applications

A Review of Physics Informed Neural Networks for Multiscale Analysis and Inverse Problems

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks

Abstract

Access this article

Similar content being viewed by others

Development and Application of Artificial Neural Network

A survey on LSTM memristive neural network architectures and applications

A Review of Physics Informed Neural Networks for Multiscale Analysis and Inverse Problems

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation