Skip to main content
Top

2019 | OriginalPaper | Chapter

7. Recurrent Neural Networks

Authors : Uday Kamath, John Liu, James Whitaker

Published in: Deep Learning for NLP and Speech Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the previous chapter, CNNs provided a way for neural networks to learn a hierarchy of weights, resembling that of n-gram classification on the text. This approach proved to be very effective for sentiment analysis, or more broadly text classification. One of the disadvantages of CNNs, however, is their inability to model contextual information over long sequences.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
This statement is made in a basic context of CNNs and RNNs. The CNN vs. RNN superiority debate in sequential contexts is an active area of research.
 
2
This history vector will be called the hidden state later on for obvious reasons.
 
3
It is common to split the single weight matrix W of an RNN in Eq. (7.3) into two separate weight matrices, here U and W. Doing this allows for a lower computational cost and forces separation between the hidden state and input in the early stages of training.
 
4
The \(\tanh \) activation function bounds the gradient between 0 and 1. This has the effect of shrinking the gradient in these circumstances.
 
5
This is not to say that academic benchmarks are not relevant, but rather to point out the importance of domain and technological understanding for domain adaptation.
 
Literature
[AKB16]
go back to reference Jeremy Appleyard, Tomas Kocisky, and Phil Blunsom. “Optimizing performance of recurrent neural networks on GPUs”. In: arXiv preprint arXiv:1604.01946 (2016). Jeremy Appleyard, Tomas Kocisky, and Phil Blunsom. “Optimizing performance of recurrent neural networks on GPUs”. In: arXiv preprint arXiv:1604.01946 (2016).
[BCB14a]
go back to reference Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014).
[BSF94b]
go back to reference Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166. Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166.
[Bow+15]
go back to reference Samuel R. Bowman et al. “Generating Sentences from a Continuous Space”. In: CoRR abs/1511.06349 (2015). Samuel R. Bowman et al. “Generating Sentences from a Continuous Space”. In: CoRR abs/1511.06349 (2015).
[Bra+16]
go back to reference James Bradbury et al. “Quasi-Recurrent Neural Networks”. In: CoRR abs/1611.01576 (2016). James Bradbury et al. “Quasi-Recurrent Neural Networks”. In: CoRR abs/1611.01576 (2016).
[Bri+17]
go back to reference Denny Britz et al. “Massive exploration of neural machine translation architectures”. In: arXiv preprint arXiv:1703.03906 (2017). Denny Britz et al. “Massive exploration of neural machine translation architectures”. In: arXiv preprint arXiv:1703.03906 (2017).
[Cho+14]
go back to reference Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014). Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation”. In: arXiv preprint arXiv:1406.1078 (2014).
[Chu+14]
go back to reference Junyoung Chung et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling”. In: arXiv preprint arXiv:1412.3555 (2014). Junyoung Chung et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling”. In: arXiv preprint arXiv:1412.3555 (2014).
[DN17]
go back to reference Michael Denkowski and Graham Neubig. “Stronger baselines for trustable results in neural machine translation”. In: arXiv preprint arXiv:1706.09733 (2017). Michael Denkowski and Graham Neubig. “Stronger baselines for trustable results in neural machine translation”. In: arXiv preprint arXiv:1706.09733 (2017).
[Dye+15]
go back to reference Chris Dyer et al. “Transition-Based Dependency Parsing with Stack Long Short-Term Memory”. In: CoRR abs/1505.08075 (2015). Chris Dyer et al. “Transition-Based Dependency Parsing with Stack Long Short-Term Memory”. In: CoRR abs/1505.08075 (2015).
[EHB96]
go back to reference Salah El Hihi and Yoshua Bengio. “Hierarchical recurrent neural networks for long-term dependencies”. In: Advances in neural information processing systems. 1996, pp. 493–499. Salah El Hihi and Yoshua Bengio. “Hierarchical recurrent neural networks for long-term dependencies”. In: Advances in neural information processing systems. 1996, pp. 493–499.
[GG16]
go back to reference Yarin Gal and Zoubin Ghahramani. “A theoretically grounded application of dropout in recurrent neural networks”. In: Advances in neural information processing systems. 2016, pp. 1019–1027. Yarin Gal and Zoubin Ghahramani. “A theoretically grounded application of dropout in recurrent neural networks”. In: Advances in neural information processing systems. 2016, pp. 1019–1027.
[Geh+17a]
go back to reference Jonas Gehring et al. “Convolutional Sequence to Sequence Learning”. In: Proc. of ICML. 2017. Jonas Gehring et al. “Convolutional Sequence to Sequence Learning”. In: Proc. of ICML. 2017.
[Gho+16]
go back to reference Shalini Ghosh et al. “Contextual lstm (clstm) models for large scale nlp tasks”. In: arXiv preprint arXiv:1602.06291 (2016). Shalini Ghosh et al. “Contextual lstm (clstm) models for large scale nlp tasks”. In: arXiv preprint arXiv:1602.06291 (2016).
[GK96]
go back to reference Christoph Goller and Andreas Kuchler. “Learning task-dependent distributed representations by backpropagation through structure”. In: Neural Networks, 1996., IEEE International Conference on. Vol. 1. IEEE. 1996, pp. 347–352. Christoph Goller and Andreas Kuchler. “Learning task-dependent distributed representations by backpropagation through structure”. In: Neural Networks, 1996., IEEE International Conference on. Vol. 1. IEEE. 1996, pp. 347–352.
[GWD14a]
go back to reference Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing machines”. In: arXiv preprint arXiv:1410.5401 (2014). Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural turing machines”. In: arXiv preprint arXiv:1410.5401 (2014).
[Gre16]
go back to reference Ed Grefenstette. Beyond Seq2Seq with Augmented RNNs. 2016. Ed Grefenstette. Beyond Seq2Seq with Augmented RNNs. 2016.
[HS97b]
go back to reference Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.CrossRef Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.CrossRef
[HM17]
go back to reference Matthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. In: To appear (2017). Matthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. In: To appear (2017).
[HXY15]
go back to reference Zhiheng Huang, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF models for sequence tagging”. In: arXiv preprint arXiv:1508.01991 (2015). Zhiheng Huang, Wei Xu, and Kai Yu. “Bidirectional LSTM-CRF models for sequence tagging”. In: arXiv preprint arXiv:1508.01991 (2015).
[KEL17]
go back to reference Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. “Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition”. In: CoRR abs/1701.03360 (2017). Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. “Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition”. In: CoRR abs/1701.03360 (2017).
[KB14]
go back to reference Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).
[KW13]
go back to reference Diederik P Kingma and Max Welling. “Auto-encoding variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2013). Diederik P Kingma and Max Welling. “Auto-encoding variational Bayes”. In: arXiv preprint arXiv:1312.6114 (2013).
[KG16]
go back to reference Eliyahu Kiperwasser and Yoav Goldberg. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. In: arXiv preprint arXiv:1603.04351 (2016). Eliyahu Kiperwasser and Yoav Goldberg. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. In: arXiv preprint arXiv:1603.04351 (2016).
[Kru+16]
go back to reference David Krueger et al. “Zoneout: Regularizing rnns by randomly preserving hidden activations”. In: arXiv preprint arXiv:1606.01305 (2016). David Krueger et al. “Zoneout: Regularizing rnns by randomly preserving hidden activations”. In: arXiv preprint arXiv:1606.01305 (2016).
[Lam+16b]
go back to reference Guillaume Lample et al. “Neural architectures for named entity recognition”. In: arXiv preprint arXiv:1603.01360 (2016). Guillaume Lample et al. “Neural architectures for named entity recognition”. In: arXiv preprint arXiv:1603.01360 (2016).
[LD16]
go back to reference Ji Young Lee and Franck Dernoncourt. “Sequential short-text classification with recurrent and convolutional neural networks”. In: arXiv preprint arXiv:1603.03827 (2016). Ji Young Lee and Franck Dernoncourt. “Sequential short-text classification with recurrent and convolutional neural networks”. In: arXiv preprint arXiv:1603.03827 (2016).
[LZA17]
go back to reference Tao Lei, Yu Zhang, and Yoav Artzi. “Training RNNs as Fast as CNNs”. In: CoRR abs/1709.02755 (2017). Tao Lei, Yu Zhang, and Yoav Artzi. “Training RNNs as Fast as CNNs”. In: CoRR abs/1709.02755 (2017).
[LKB18]
go back to reference Adam Liska, Germán Kruszewski, and Marco Baroni. “Memorize or generalize? Searching for a compositional RNN in a haystack”. In: CoRR abs/1802.06467 (2018). Adam Liska, Germán Kruszewski, and Marco Baroni. “Memorize or generalize? Searching for a compositional RNN in a haystack”. In: CoRR abs/1802.06467 (2018).
[Low+15]
go back to reference Ryan Lowe et al. “The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems”. In: arXiv preprint arXiv:1506.08909 (2015). Ryan Lowe et al. “The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems”. In: arXiv preprint arXiv:1506.08909 (2015).
[LSM13b]
go back to reference Thang Luong, Richard Socher, and Christopher Manning. “Better word representations with recursive neural networks for morphology”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 104–113. Thang Luong, Richard Socher, and Christopher Manning. “Better word representations with recursive neural networks for morphology”. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, pp. 104–113.
[MH16]
go back to reference Xuezhe Ma and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf”. In: arXiv preprint arXiv:1603.01354 (2016). Xuezhe Ma and Eduard Hovy. “End-to-end sequence labeling via bi-directional lstm-cnns-crf”. In: arXiv preprint arXiv:1603.01354 (2016).
[MRF15]
go back to reference Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. “Ask your neurons: A neural-based approach to answering questions about images”. In: Proceedings of the IEEE international conference on computer vision. 2015, pp. 1–9. Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. “Ask your neurons: A neural-based approach to answering questions about images”. In: Proceedings of the IEEE international conference on computer vision. 2015, pp. 1–9.
[MDB17]
go back to reference Gábor Melis, Chris Dyer, and Phil Blunsom. “On the state of the art of evaluation in neural language models”. In: arXiv preprint arXiv:1707.05589 (2017). Gábor Melis, Chris Dyer, and Phil Blunsom. “On the state of the art of evaluation in neural language models”. In: arXiv preprint arXiv:1707.05589 (2017).
[Pan+15a]
go back to reference Pingbo Pan et al. “Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning”. In: CoRR abs/1511.03476 (2015). Pingbo Pan et al. “Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning”. In: CoRR abs/1511.03476 (2015).
[Pas+13]
go back to reference Razvan Pascanu et al. “How to construct deep recurrent neural networks.”. In: arXiv preprint arXiv:1312.6026 (2013). Razvan Pascanu et al. “How to construct deep recurrent neural networks.”. In: arXiv preprint arXiv:1312.6026 (2013).
[Pra+16]
go back to reference Aaditya Prakash et al. “Neural Paraphrase Generation with Stacked Residual LSTM Networks”. In: CoRR abs/1610.03098 (2016). Aaditya Prakash et al. “Neural Paraphrase Generation with Stacked Residual LSTM Networks”. In: CoRR abs/1610.03098 (2016).
[RM15]
go back to reference Danilo Jimenez Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: arXiv preprint arXiv:1505.05770 (2015). Danilo Jimenez Rezende and Shakir Mohamed. “Variational inference with normalizing flows”. In: arXiv preprint arXiv:1505.05770 (2015).
[SP97]
go back to reference Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681.CrossRef Mike Schuster and Kuldip K Paliwal. “Bidirectional recurrent neural networks”. In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681.CrossRef
[SSB16]
go back to reference Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. “Recurrent Dropout without Memory Loss”. In: CoRR abs/1603.05118 (2016). Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. “Recurrent Dropout without Memory Loss”. In: CoRR abs/1603.05118 (2016).
[SMN10]
go back to reference Richard Socher, Christopher D Manning, and Andrew Y Ng. “Learning continuous phrase representations and syntactic parsing with recursive neural networks”. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. Vol. 2010. 2010, pp. 1–9. Richard Socher, Christopher D Manning, and Andrew Y Ng. “Learning continuous phrase representations and syntactic parsing with recursive neural networks”. In: Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. Vol. 2010. 2010, pp. 1–9.
[Soc+12]
go back to reference Richard Socher et al. “Semantic compositionality through recursive matrix-vector spaces”. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics. 2012, pp. 1201–1211. Richard Socher et al. “Semantic compositionality through recursive matrix-vector spaces”. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics. 2012, pp. 1201–1211.
[Soc+13]
go back to reference Richard Socher et al. “Reasoning with neural tensor networks for knowledge base completion”. In: Advances in neural information processing systems. 2013, pp. 926–934. Richard Socher et al. “Reasoning with neural tensor networks for knowledge base completion”. In: Advances in neural information processing systems. 2013, pp. 926–934.
[SVL14b]
go back to reference Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence learning with neural networks”. In: Advances in neural information processing systems. 2014, pp. 3104–3112. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence learning with neural networks”. In: Advances in neural information processing systems. 2014, pp. 3104–3112.
[TSM15]
go back to reference Kai Sheng Tai, Richard Socher, and Christopher D Manning. “Improved semantic representations from tree-structured long short-term memory networks”. In: arXiv preprint arXiv:1503.00075 (2015). Kai Sheng Tai, Richard Socher, and Christopher D Manning. “Improved semantic representations from tree-structured long short-term memory networks”. In: arXiv preprint arXiv:1503.00075 (2015).
[Tan+15]
go back to reference Ming Tan et al. “LSTM-based deep learning models for non-factoid answer selection”. In: arXiv preprint arXiv:1511.04108 (2015). Ming Tan et al. “LSTM-based deep learning models for non-factoid answer selection”. In: arXiv preprint arXiv:1511.04108 (2015).
[Vas+17b]
go back to reference Ashish Vaswani et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems. 2017, pp. 5998–6008. Ashish Vaswani et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems. 2017, pp. 5998–6008.
[Ven+14]
go back to reference Subhashini Venugopalan et al. “Translating videos to natural language using deep recurrent neural networks”. In: arXiv preprint arXiv:1412.4729 (2014). Subhashini Venugopalan et al. “Translating videos to natural language using deep recurrent neural networks”. In: arXiv preprint arXiv:1412.4729 (2014).
[VFJ15]
go back to reference Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. “Pointer networks”. In: Advances in Neural Information Processing Systems. 2015, pp. 2692–2700. Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. “Pointer networks”. In: Advances in Neural Information Processing Systems. 2015, pp. 2692–2700.
[Vin+15b]
go back to reference Oriol Vinyals et al. “Show and tell: A neural image caption generator”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 3156–3164. Oriol Vinyals et al. “Show and tell: A neural image caption generator”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp. 3156–3164.
[Wan+16a]
go back to reference Cheng Wang et al. “Image captioning with deep bidirectional LSTMs”. In: Proceedings of the 2016 ACM on Multimedia Conference. ACM. 2016, pp. 988–997. Cheng Wang et al. “Image captioning with deep bidirectional LSTMs”. In: Proceedings of the 2016 ACM on Multimedia Conference. ACM. 2016, pp. 988–997.
[Wan+15b]
go back to reference Xin Wang et al. “Predicting polarities of tweets by composing word embeddings with long short-term memory”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015, pp. 1343–1353. Xin Wang et al. “Predicting polarities of tweets by composing word embeddings with long short-term memory”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015, pp. 1343–1353.
[XMS16]
go back to reference Caiming Xiong, Stephen Merity, and Richard Socher. “Dynamic memory networks for visual and textual question answering”. In: International conference on machine learning. 2016, pp. 2397–2406. Caiming Xiong, Stephen Merity, and Richard Socher. “Dynamic memory networks for visual and textual question answering”. In: International conference on machine learning. 2016, pp. 2397–2406.
[Yin+17]
go back to reference Wenpeng Yin et al. “Comparative study of CNN and RNN for natural language processing”. In: arXiv preprint arXiv:1702.01923 (2017). Wenpeng Yin et al. “Comparative study of CNN and RNN for natural language processing”. In: arXiv preprint arXiv:1702.01923 (2017).
[Zil+16]
go back to reference Julian G. Zilly et al. “Recurrent Highway Networks”. In: CoRRabs/1607.03474 (2016). Julian G. Zilly et al. “Recurrent Highway Networks”. In: CoRRabs/1607.03474 (2016).
Metadata
Title
Recurrent Neural Networks
Authors
Uday Kamath
John Liu
James Whitaker
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-14596-5_7

Premium Partner