Skip to main content
Top

2019 | OriginalPaper | Chapter

9. Attention and Memory Augmented Networks

Authors : Uday Kamath, John Liu, James Whitaker

Published in: Deep Learning for NLP and Speech Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In deep learning networks, as we have seen in the previous chapters, there are good architectures for handling spatial and temporal data using various forms of convolutional and recurrent networks, respectively. When the data has certain dependencies such as out-of-order access, long-term dependencies, unordered access, most standard architectures discussed are not suitable. Let us consider a specific example from the bAbI dataset where there are stories/facts presented, a question is asked, and the answer needs to be inferred from the stories. As shown in Fig. 9.1, it requires out of order access and long-term dependencies to find the right answer.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
[BCB14b]
go back to reference Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: CoRR abs/1409.0473 (2014). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: CoRR abs/1409.0473 (2014).
[Bah+16b]
go back to reference Dzmitry Bahdanau et al. “End-to-end attention-based large vocabulary speech recognition”. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2016, Shanghai, China, March 20–25, 2016. 2016, pp. 4945–4949. Dzmitry Bahdanau et al. “End-to-end attention-based large vocabulary speech recognition”. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2016, Shanghai, China, March 20–25, 2016. 2016, pp. 4945–4949.
[Cha+16a]
go back to reference William Chan et al. “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition”. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2016, Shanghai, China, March 20–25, 2016. 2016, pp. 4960–4964. William Chan et al. “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition”. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2016, Shanghai, China, March 20–25, 2016. 2016, pp. 4960–4964.
[Cho+15b]
go back to reference Jan Chorowski et al. “Attention-Based Models for Speech Recognition”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 577–585. Jan Chorowski et al. “Attention-Based Models for Speech Recognition”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 577–585.
[Dan+17]
go back to reference Michal Daniluk et al. “Frustratingly Short Attention Spans in Neural Language Modeling”. In: CoRR abs/1702.04521 (2017). Michal Daniluk et al. “Frustratingly Short Attention Spans in Neural Language Modeling”. In: CoRR abs/1702.04521 (2017).
[DGS92]
go back to reference Sreerupa Das, C. Lee Giles, and Guo-Zheng Sun. “Using Prior Knowledge in a { NNPDA} to Learn Context-Free Languages”. In: Advances in Neural Information Processing Systems 5, [NIPS Conference, Denver, Colorado, USA, November 30 - December 3, 1992]. 1992, pp. 65–72. Sreerupa Das, C. Lee Giles, and Guo-Zheng Sun. “Using Prior Knowledge in a { NNPDA} to Learn Context-Free Languages”. In: Advances in Neural Information Processing Systems 5, [NIPS Conference, Denver, Colorado, USA, November 30 - December 3, 1992]. 1992, pp. 65–72.
[Den+12]
go back to reference M. Denil et al. “Learning where to Attend with Deep Architectures for Image Tracking”. In: Neural Computation (2012).MathSciNetCrossRef M. Denil et al. “Learning where to Attend with Deep Architectures for Image Tracking”. In: Neural Computation (2012).MathSciNetCrossRef
[GWD14b]
go back to reference Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural Turing Machines”. In: CoRR abs/1410.5401 (2014). Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural Turing Machines”. In: CoRR abs/1410.5401 (2014).
[Gra+16]
go back to reference Alex Graves et al. “Hybrid computing using a neural network with dynamic external memory”. In: Nature 538.7626 (Oct. 2016), pp. 471–476. Alex Graves et al. “Hybrid computing using a neural network with dynamic external memory”. In: Nature 538.7626 (Oct. 2016), pp. 471–476.
[Gre+15]
go back to reference Edward Grefenstette et al. “Learning to Transduce with Unbounded Memory”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 1828–1836. Edward Grefenstette et al. “Learning to Transduce with Unbounded Memory”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 1828–1836.
[Hen+16]
go back to reference Mikael Henaff et al. “Tracking the World State with Recurrent Entity Networks”. In: CoRR abs/1612.03969 (2016). Mikael Henaff et al. “Tracking the World State with Recurrent Entity Networks”. In: CoRR abs/1612.03969 (2016).
[Kum+16]
go back to reference Ankit Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. 2016, pp. 1378–1387. Ankit Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. 2016, pp. 1378–1387.
[LH10]
go back to reference Hugo Larochelle and Geoffrey E Hinton. “Learning to combine foveal glimpses with a third-order Boltzmann machine”. In: Advances in Neural Information Processing Systems 23. Ed. by J. D. Lafferty et al. Curran Associates, Inc., 2010, pp. 1243–1251. Hugo Larochelle and Geoffrey E Hinton. “Learning to combine foveal glimpses with a third-order Boltzmann machine”. In: Advances in Neural Information Processing Systems 23. Ed. by J. D. Lafferty et al. Curran Associates, Inc., 2010, pp. 1243–1251.
[Lin+17]
go back to reference Zhouhan Lin et al. “A Structured Self-attentive Sentence Embedding”. In: CoRR abs/1703.03130 (2017). Zhouhan Lin et al. “A Structured Self-attentive Sentence Embedding”. In: CoRR abs/1703.03130 (2017).
[LPM15]
go back to reference Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. “Effective Approaches to Attention-based Neural Machine Translation”. In: CoRR abs/1508.04025 (2015). Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. “Effective Approaches to Attention-based Neural Machine Translation”. In: CoRR abs/1508.04025 (2015).
[Moz94]
go back to reference Michael C. Mozer. “Neural Net Architectures for Temporal Sequence Processing”. In: Addison-Wesley, 1994, pp. 243–264. Michael C. Mozer. “Neural Net Architectures for Temporal Sequence Processing”. In: Addison-Wesley, 1994, pp. 243–264.
[RCW15]
go back to reference Alexander M. Rush, Sumit Chopra, and Jason Weston. “A Neural Attention Model for Abstractive Sentence Summarization”. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. 2015, pp. 379–389. Alexander M. Rush, Sumit Chopra, and Jason Weston. “A Neural Attention Model for Abstractive Sentence Summarization”. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. 2015, pp. 379–389.
[SP63]
go back to reference Karl Steinbuch and Uwe A. W. Piske. “Learning Matrices and Their Applications”. In: IEEE Trans. Electronic Computers 12.6 (1963), pp. 846–862.CrossRef Karl Steinbuch and Uwe A. W. Piske. “Learning Matrices and Their Applications”. In: IEEE Trans. Electronic Computers 12.6 (1963), pp. 846–862.CrossRef
[Suk+15]
go back to reference Sainbayar Sukhbaatar et al. “End-To-End Memory Networks”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2440–2448. Sainbayar Sukhbaatar et al. “End-To-End Memory Networks”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2440–2448.
[Vas+17c]
go back to reference Ashish Vaswani et al. “Attention is All you Need”. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA. 2017, pp. 6000–6010. Ashish Vaswani et al. “Attention is All you Need”. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA. 2017, pp. 6000–6010.
[Vin+15a]
go back to reference Oriol Vinyals et al. “Grammar as a Foreign Language”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2773–2781. Oriol Vinyals et al. “Grammar as a Foreign Language”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2773–2781.
[Wan+16b]
go back to reference Yequan Wang et al. “Attention-based LSTM for Aspect-level Sentiment Classification”. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016. 2016, pp. 606–615. Yequan Wang et al. “Attention-based LSTM for Aspect-level Sentiment Classification”. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016. 2016, pp. 606–615.
[WCB14]
go back to reference Jason Weston, Sumit Chopra, and Antoine Bordes. “Memory Networks”. In: CoRR abs/1410.3916 (2014). Jason Weston, Sumit Chopra, and Antoine Bordes. “Memory Networks”. In: CoRR abs/1410.3916 (2014).
[Yan+16]
go back to reference Zichao Yang et al. “Hierarchical Attention Networks for Document Classification”. In: NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12–17, 2016. 2016, pp. 1480–1489. Zichao Yang et al. “Hierarchical Attention Networks for Document Classification”. In: NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12–17, 2016. 2016, pp. 1480–1489.
[Zha+18]
go back to reference Yuanyuan Zhang et al. “Attention Based Fully Convolutional Network for Speech Emotion Recognition”. In: CoRR abs/1806.01506 (2018). Yuanyuan Zhang et al. “Attention Based Fully Convolutional Network for Speech Emotion Recognition”. In: CoRR abs/1806.01506 (2018).
Metadata
Title
Attention and Memory Augmented Networks
Authors
Uday Kamath
John Liu
James Whitaker
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-14596-5_9

Premium Partner