Skip to main content
Top

2018 | OriginalPaper | Chapter

5. Recurrent Neural Networks

Authors : Anthony L. Caterini, Dong Eui Chang

Published in: Deep Neural Networks in a Mathematical Framework

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We applied the generic neural network framework from Chap. 3 to specific network structures in the previous chapter. Multilayer Perceptrons and Convolutional Neural Networks fit squarely into that framework, and we were also able to modify it to capture Deep Auto-Encoders. We now extend the generic framework even further to handle Recurrent Neural Networks (RNNs), the sequence-parsing network structure containing a recurring latent, or hidden, state that evolves at each layer of the network. This involves the development of new notation, but we remain as consistent as possible with previous chapters. The specific layout of this chapter is as follows. We first formulate a generic, feed-forward recurrent neural network. We calculate gradients of loss functions for these networks in two ways: Real-Time Recurrent Learning (RTRL) and Backpropagation Through Time (BPTT). Using our notation for vector-valued maps, we derive these algorithms directly over the inner product space in which the parameters reside. We then proceed to formally represent a vanilla RNN, which is the simplest form of RNN, and we formulate RTRL and BPTT for that as well. At the end of the chapter, we briefly mention modern RNN variants in the context of our generic framework.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We have adopted a slightly different indexing convention in this chapter—notice that f i takes in h i−1 and outputs h i , as opposed to the previous chapters where we evolved the state variable according to x i+1 = f i (x i ). This indexing convention is more natural for RNNs, as we will see that the ith prediction will depend on h i with this adjustment, instead of on h i+1.
 
2
We use \(\overline e_j\) here instead of simply e j since we already have e i defined in (5.8) and will continue to use it throughout this section.
 
Literature
1.
go back to reference K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014, preprint) K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014, preprint)
2.
go back to reference J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014, preprint) J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014, preprint)
4.
go back to reference A. Graves, N. Jaitly, A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (IEEE, New York, 2013), pp. 273–278CrossRef A. Graves, N. Jaitly, A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (IEEE, New York, 2013), pp. 273–278CrossRef
5.
go back to reference A. Graves, A. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, New York, 2013), pp. 6645–6649CrossRef A. Graves, A. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, New York, 2013), pp. 6645–6649CrossRef
6.
go back to reference K. Greff, R. Srivastava, J. Koutník, B. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)MathSciNetCrossRef K. Greff, R. Srivastava, J. Koutník, B. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)MathSciNetCrossRef
7.
go back to reference S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
8.
go back to reference R. Jozefowicz, W. Zaremba, I. Sutskever, An empirical exploration of recurrent network architectures, in Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (2015), pp. 2342–2350 R. Jozefowicz, W. Zaremba, I. Sutskever, An empirical exploration of recurrent network architectures, in Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (2015), pp. 2342–2350
9.
go back to reference R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent neural networks. arXiv:1312.6026 (2013, preprint) R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent neural networks. arXiv:1312.6026 (2013, preprint)
10.
go back to reference D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation. Technical report, California University San Diego La Jolla Institute for Cognitive Science, 1985 D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation. Technical report, California University San Diego La Jolla Institute for Cognitive Science, 1985
11.
go back to reference J. Schmidhuber, A fixed size storage O(n 3) time complexity learning algorithm for fully recurrent continually running networks. Neural Comput. 4(2), 243–248 (1992)CrossRef J. Schmidhuber, A fixed size storage O(n 3) time complexity learning algorithm for fully recurrent continually running networks. Neural Comput. 4(2), 243–248 (1992)CrossRef
12.
go back to reference M. Schuster, K. Paliwal, Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRef M. Schuster, K. Paliwal, Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRef
13.
go back to reference I. Sutskever, Training recurrent neural networks. University of Toronto, Toronto, Ontario, Canada, 2013 I. Sutskever, Training recurrent neural networks. University of Toronto, Toronto, Ontario, Canada, 2013
14.
go back to reference S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko, Translating videos to natural language using deep recurrent neural networks. arXiv:1412.4729 (2014, preprint) S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko, Translating videos to natural language using deep recurrent neural networks. arXiv:1412.4729 (2014, preprint)
15.
go back to reference R. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)CrossRef R. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)CrossRef
Metadata
Title
Recurrent Neural Networks
Authors
Anthony L. Caterini
Dong Eui Chang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-75304-1_5

Premium Partner