Top

Published in:

2017 | OriginalPaper | Chapter

Sequence Generation with Target Attention

Authors : Yingce Xia, Fei Tian, Tao Qin, Nenghai Yu, Tie-Yan Liu

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Source-target attention mechanism (briefly, source attention) has become one of the key components in a wide range of sequence generation tasks, such as neural machine translation, image caption, and open-domain dialogue generation. In these tasks, the attention mechanism, typically in control of information flow from the encoder to the decoder, enables to generate every component in the target sequence relying on different source components. While source attention mechanism has attracted many research interests, few of them turn eyes to if the generation of target sequence can additionally benefit from attending back to itself, which however is intuitively motivated by the nature of attention. To investigate the question, in this paper, we propose a new target-target attention mechanism (briefly, target attention). Along the progress of generating target sequence, target attention mechanism takes into account the relationship between the component to generate and its preceding context within the target sequence, such that it can better keep the coherent consistency and improve the readability of the generated sequence. Furthermore, it complements the information from source attention so as to further enhance semantic adequacy. After designing an effective approach to incorporate target attention in encoder-decoder framework, we conduct extensive experiments on both neural machine translation and image caption. Experimental results clearly demonstrate the effectiveness of our design of integrating both source and target attention for sequence generation tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Joint User Modeling Across Aligned Heterogeneous Sites Using Neural Networks

next chapter Wikipedia Vandal Early Detection: From User Behavior to User Embedding

In some scenarios such as image caption, the source-side input is not in a typical sequential form. For the ease of statement but with a little inaccuracy, we still use “sequence-to-sequence” as a general name even for these scenarios.

For ease of reference, \([T_x]\) denotes the set \(\{1,2,\cdots ,T_x \}\).

For \(j<2\), let \(M_{\text {tgt}}^1=\emptyset \). Target attention starts to work when \(j\ge 2\).

We focus on the word-level translations, instead of subword-level ones like BPE [20]. The reason is that BPE cannot be applied to some languages like Chinese, although it works well for other languages like German and Czech.

The script is from https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl.

The script is from https://github.com/clab/fast_align.

For Zh\(\rightarrow \)En, each source sentence x has four references y(j) \(j\in \{1,2,3,4\}\). To calculate the perplexities, we simply regard them as four individual sentence pairs (x, y(j)).

Although the coverage model in [26] can eliminate repeated translations, it is actually based on source-target attention but not target-target attention. Therefore, [26] can be further combined with our proposed target-target attention. We leave it as a future work.

https://github.com/kelvinxu/arctic-captions.

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)

Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: EMNLP (2016)

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)

Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. Neural Comput. 24(8), 2151–2184 (2012)MathSciNetCrossRefMATH

Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649 (2013)

He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., Ma, W.Y.: Dual learning for machine translation. In: Advances In Neural Information Processing Systems, pp. 820–828 (2016)

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 CrossRef

Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: The Annual Meeting of the Association for Computational Linguistics (2015)

Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)

10.

Koehn, P.: Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30194-3_13 CrossRef

11.

Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004)

12.

Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

13.

Luong, T., Cho, K., Manning, C.: Tutorial: Neural Machine Translation. ACL 2016 tutorial (2016)

14.

Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: COLING (2016)

15.

Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)

16.

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: The Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

17.

Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, vol. 28, pp. 1310–1318 (2013)

18.

Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60(5), 503–520 (2004)CrossRef

19.

Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389 (2015)

20.

Sennrich, R., Haddow, B., Birch, A.: Neural Machine Translation of Rare Words with Subword Units. In: The annual meeting of the Association for Computational Linguistics (2016)

21.

Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364 (2015)

22.

Shao, L., Gouws, S., Britz, D., Goldie, A., Strope, B., Kurzweil, R.: Generating long and diverse responses with neural conversation models. arXiv preprint arXiv:1701.03185 (2017)

23.

Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum Risk Training for Neural Machine Translation. In: The Annual Meeting of the Association for Computational Linguistics (2016)

24.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

25.

Tu, Z., Liu, Y., Shang, L., Liu, X., Li, H.: Neural machine translation with reconstruction. In: AAAI (2017)

26.

Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: The Annual Meeting of the Association for Computational Linguistics, pp. 76–85 (2016)

27.

Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)

28.

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

29.

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

30.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning vol. 14, pp. 77–81 (2015)

31.

Yang, Z., Hu, Z., Deng, Y., Dyer, C., Smola, A.: Neural machine translation with recurrent attention modeling. arXiv preprint arXiv:1607.05108 (2016)

32.

Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)

33.

Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

Title: Sequence Generation with Target Attention
Authors: Yingce Xia
Fei Tian
Tao Qin
Nenghai Yu
Tie-Yan Liu
Publisher: Springer International Publishing
Book: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-319-71248-2

Electronic ISBN: 978-3-319-71249-9

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-71249-9_49

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner