Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2020

29.01.2020 | Original Article

Learning deep hierarchical and temporal recurrent neural networks with residual learning

verfasst von: Tehseen Zia, Assad Abbas, Usman Habib, Muhammad Sajid Khan

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning both hierarchical and temporal dependencies can be crucial for recurrent neural networks (RNNs) to deeply understand sequences. To this end, a unified RNN framework is required that can ease the learning of both the deep hierarchical and temporal structures by allowing gradients to propagate back from both ends without being vanished. The residual learning (RL) has appeared as an effective and less-costly method to facilitate backward propagation of gradients. The significance of the RL is exclusively shown for learning deep hierarchical representations and temporal dependencies. Nevertheless, there is lack of efforts to unify these finding into a single framework for learning deep RNNs. In this study, we aim to prove that approximating identity mapping is crucial for optimizing both hierarchical and temporal structures. We propose a framework called hierarchical and temporal residual RNNs, to learn RNNs by approximating identity mappings across hierarchical and temporal structures. To validate the proposed method, we explore the efficacy of employing shortcut connections for training deep RNNs structures for sequence learning problems. Experiments are performed on Penn Treebank, Hutter Prize and IAM-OnDB datasets and results demonstrate the utility of the framework in terms of accuracy and computational complexity. We demonstrate that even for large datasets exploiting parameters for increasing network depth can gain computational benefits with reduced size of the RNN "state".

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Fußnoten
1
These notations are used consistently throughout the paper unless specified otherwise.
 
3
Notations are used.
 
Literatur
2.
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
3.
Zurück zum Zitat Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. https://arxiv.org/abs/arXiv:1506.00019 Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. https://​arxiv.​org/​abs/​arXiv:​1506.​00019
4.
Zurück zum Zitat Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3 Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2:3
5.
Zurück zum Zitat Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649 Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649
6.
Zurück zum Zitat Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) DRAW: a recurrent neural network for image generation. https://arxiv.org/abs/arXiv:1502.04623 Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) DRAW: a recurrent neural network for image generation. https://​arxiv.​org/​abs/​arXiv:​1502.​04623
7.
Zurück zum Zitat Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552 Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552
8.
Zurück zum Zitat Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (M-RNN). https://arxiv.org/abs/arXiv:1412.6632 Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A (2014) Deep captioning with multimodal recurrent neural networks (M-RNN). https://​arxiv.​org/​abs/​arXiv:​1412.​6632
9.
Zurück zum Zitat Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2625–2634 Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2625–2634
10.
Zurück zum Zitat Luo J, Wu J, Zhao S, Wang L, Xu T (2019) Lossless compression for hyperspectral image using deep recurrent neural networks. Int J Mach Learn Cybern 10(10):2619–2629CrossRef Luo J, Wu J, Zhao S, Wang L, Xu T (2019) Lossless compression for hyperspectral image using deep recurrent neural networks. Int J Mach Learn Cybern 10(10):2619–2629CrossRef
11.
Zurück zum Zitat Kim J, Kim H (2016) Classification performance using gated recurrent unit recurrent neural network on energy disaggregation. In: 2016 international conference on machine learning and cybernetics (ICMLC), vol 1. IEEE, pp 105–110 Kim J, Kim H (2016) Classification performance using gated recurrent unit recurrent neural network on energy disaggregation. In: 2016 international conference on machine learning and cybernetics (ICMLC), vol 1. IEEE, pp 105–110
12.
Zurück zum Zitat Chung J, Gulcehre C, Cho K, Bengio Y (2015) Gated feedback recurrent neural networks. In: International conference on machine learning, pp 2067–2075 Chung J, Gulcehre C, Cho K, Bengio Y (2015) Gated feedback recurrent neural networks. In: International conference on machine learning, pp 2067–2075
13.
Zurück zum Zitat Schmidhuber J (1992) Learning complex, extended sequences using the principle of history compression. Neural Comput 4(2):234–242CrossRef Schmidhuber J (1992) Learning complex, extended sequences using the principle of history compression. Neural Comput 4(2):234–242CrossRef
14.
Zurück zum Zitat Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. https://arxiv.org/abs/arXiv:1312.6026 Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. https://​arxiv.​org/​abs/​arXiv:​1312.​6026
15.
Zurück zum Zitat Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, pp 13100–1318 Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, pp 13100–1318
16.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2016) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2016) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
17.
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML'15: proceedings of the 32nd international conference on international conference on machine learning, vol 37, pp 448–456 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML'15: proceedings of the 32nd international conference on international conference on machine learning, vol 37, pp 448–456
18.
Zurück zum Zitat Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256 Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
19.
Zurück zum Zitat Bengio Y, Boulanger-Lewandowski N, Pascanu R (2013) Advances in optimizing recurrent networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, Vancouver, pp 8624–8628CrossRef Bengio Y, Boulanger-Lewandowski N, Pascanu R (2013) Advances in optimizing recurrent networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, Vancouver, pp 8624–8628CrossRef
20.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
21.
Zurück zum Zitat Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth annual conference of the international speech communication association Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth annual conference of the international speech communication association
22.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
23.
Zurück zum Zitat Kim JH, Lee SW, Kwak D, Heo MO, Kim J, Ha JW, Zhang BT (2016) Multimodal residual learning for visual qa. In: Advances in neural information processing systems, pp 361–369 Kim JH, Lee SW, Kwak D, Heo MO, Kim J, Ha JW, Zhang BT (2016) Multimodal residual learning for visual qa. In: Advances in neural information processing systems, pp 361–369
25.
Zurück zum Zitat Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRef Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRef
26.
Zurück zum Zitat Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. https://arxiv.org/abs/arXiv:1412.3555 Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. https://​arxiv.​org/​abs/​arXiv:​1412.​3555
27.
Zurück zum Zitat Šter B (2013) Selective recurrent neural network. Neural Process Lett 38(1):1–15CrossRef Šter B (2013) Selective recurrent neural network. Neural Process Lett 38(1):1–15CrossRef
28.
Zurück zum Zitat Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2016) Recurrent highway networks. https://arxiv.org/abs/arXiv:1607.03474 Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2016) Recurrent highway networks. https://​arxiv.​org/​abs/​arXiv:​1607.​03474
29.
Zurück zum Zitat Zhang Y, Chen G, Yu D, Yaco K, Khudanpur S, Glass J (2016) Highway long short-term memory RNNS for distant speech recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 5755–5759 Zhang Y, Chen G, Yu D, Yaco K, Khudanpur S, Glass J (2016) Highway long short-term memory RNNS for distant speech recognition. In: IEEE international conference on acoustics, speech and signal processing, pp 5755–5759
30.
Zurück zum Zitat Zia T (2019) Hierarchical recurrent highway networks. Pattern Recogn Lett 119:71–76CrossRef Zia T (2019) Hierarchical recurrent highway networks. Pattern Recogn Lett 119:71–76CrossRef
31.
Zurück zum Zitat Zia T, Razzaq S (2018) Residual recurrent highway networks for learning deep sequence prediction models. J Grid Comput 1–8 Zia T, Razzaq S (2018) Residual recurrent highway networks for learning deep sequence prediction models. J Grid Comput 1–8
32.
Zurück zum Zitat Wang Y, Tian F (2016) Recurrent residual learning for sequence classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 938–943 Wang Y, Tian F (2016) Recurrent residual learning for sequence classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 938–943
33.
Zurück zum Zitat Kim J, El-Khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. https://arxiv.org/abs/arXiv:1701.03360 Kim J, El-Khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. https://​arxiv.​org/​abs/​arXiv:​1701.​03360
34.
Zurück zum Zitat Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560CrossRef Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560CrossRef
35.
Zurück zum Zitat Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. https://arxiv.org/abs/arXiv:1609.01704 Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. https://​arxiv.​org/​abs/​arXiv:​1609.​01704
36.
Zurück zum Zitat El Hihi S, Bengio Y (1996) Hierarchical recurrent neural networks for long-term dependencies. In: Advances in neural information processing systems, pp 493–499 El Hihi S, Bengio Y (1996) Hierarchical recurrent neural networks for long-term dependencies. In: Advances in neural information processing systems, pp 493–499
38.
Zurück zum Zitat Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330 Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330
39.
Zurück zum Zitat Graves A (2013) Generating sequences with recurrent neural networks. https://arxiv.org/abs/arXiv:1308.0850 Graves A (2013) Generating sequences with recurrent neural networks. https://​arxiv.​org/​abs/​arXiv:​1308.​0850
40.
Zurück zum Zitat Goldsborough P (2016) A tour of tensorflow. https://arxiv.org/abs/arXiv:1610.01178 Goldsborough P (2016) A tour of tensorflow. https://​arxiv.​org/​abs/​arXiv:​1610.​01178
Metadaten
Titel
Learning deep hierarchical and temporal recurrent neural networks with residual learning
verfasst von
Tehseen Zia
Assad Abbas
Usman Habib
Muhammad Sajid Khan
Publikationsdatum
29.01.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01063-0

Weitere Artikel der Ausgabe 4/2020

International Journal of Machine Learning and Cybernetics 4/2020 Zur Ausgabe

Neuer Inhalt