Skip to main content
Top

2017 | OriginalPaper | Chapter

Sequence Generation with Target Attention

Authors : Yingce Xia, Fei Tian, Tao Qin, Nenghai Yu, Tie-Yan Liu

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Source-target attention mechanism (briefly, source attention) has become one of the key components in a wide range of sequence generation tasks, such as neural machine translation, image caption, and open-domain dialogue generation. In these tasks, the attention mechanism, typically in control of information flow from the encoder to the decoder, enables to generate every component in the target sequence relying on different source components. While source attention mechanism has attracted many research interests, few of them turn eyes to if the generation of target sequence can additionally benefit from attending back to itself, which however is intuitively motivated by the nature of attention. To investigate the question, in this paper, we propose a new target-target attention mechanism (briefly, target attention). Along the progress of generating target sequence, target attention mechanism takes into account the relationship between the component to generate and its preceding context within the target sequence, such that it can better keep the coherent consistency and improve the readability of the generated sequence. Furthermore, it complements the information from source attention so as to further enhance semantic adequacy. After designing an effective approach to incorporate target attention in encoder-decoder framework, we conduct extensive experiments on both neural machine translation and image caption. Experimental results clearly demonstrate the effectiveness of our design of integrating both source and target attention for sequence generation tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In some scenarios such as image caption, the source-side input is not in a typical sequential form. For the ease of statement but with a little inaccuracy, we still use “sequence-to-sequence” as a general name even for these scenarios.
 
2
For ease of reference, \([T_x]\) denotes the set \(\{1,2,\cdots ,T_x \}\).
 
3
For \(j<2\), let \(M_{\text {tgt}}^1=\emptyset \). Target attention starts to work when \(j\ge 2\).
 
4
We focus on the word-level translations, instead of subword-level ones like BPE [20]. The reason is that BPE cannot be applied to some languages like Chinese, although it works well for other languages like German and Czech.
 
7
For Zh\(\rightarrow \)En, each source sentence x has four references y(j) \(j\in \{1,2,3,4\}\). To calculate the perplexities, we simply regard them as four individual sentence pairs (xy(j)).
 
8
Although the coverage model in [26] can eliminate repeated translations, it is actually based on source-target attention but not target-target attention. Therefore, [26] can be further combined with our proposed target-target attention. We leave it as a future work.
 
Literature
1.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
2.
go back to reference Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: EMNLP (2016) Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: EMNLP (2016)
3.
go back to reference Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014) Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
4.
go back to reference Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. Neural Comput. 24(8), 2151–2184 (2012)MathSciNetCrossRefMATH Denil, M., Bazzani, L., Larochelle, H., de Freitas, N.: Learning where to attend with deep architectures for image tracking. Neural Comput. 24(8), 2151–2184 (2012)MathSciNetCrossRefMATH
5.
go back to reference Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649 (2013) Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649 (2013)
6.
go back to reference He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., Ma, W.Y.: Dual learning for machine translation. In: Advances In Neural Information Processing Systems, pp. 820–828 (2016) He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., Ma, W.Y.: Dual learning for machine translation. In: Advances In Neural Information Processing Systems, pp. 820–828 (2016)
8.
go back to reference Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: The Annual Meeting of the Association for Computational Linguistics (2015) Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: The Annual Meeting of the Association for Computational Linguistics (2015)
9.
go back to reference Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015) Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
11.
go back to reference Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004) Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004)
12.
go back to reference Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015) Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:​1508.​04025 (2015)
13.
go back to reference Luong, T., Cho, K., Manning, C.: Tutorial: Neural Machine Translation. ACL 2016 tutorial (2016) Luong, T., Cho, K., Manning, C.: Tutorial: Neural Machine Translation. ACL 2016 tutorial (2016)
14.
go back to reference Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: COLING (2016) Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: COLING (2016)
15.
go back to reference Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014) Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
16.
go back to reference Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: The Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: The Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
17.
go back to reference Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, vol. 28, pp. 1310–1318 (2013) Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, vol. 28, pp. 1310–1318 (2013)
18.
go back to reference Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60(5), 503–520 (2004)CrossRef Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60(5), 503–520 (2004)CrossRef
19.
go back to reference Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389 (2015) Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389 (2015)
20.
go back to reference Sennrich, R., Haddow, B., Birch, A.: Neural Machine Translation of Rare Words with Subword Units. In: The annual meeting of the Association for Computational Linguistics (2016) Sennrich, R., Haddow, B., Birch, A.: Neural Machine Translation of Rare Words with Subword Units. In: The annual meeting of the Association for Computational Linguistics (2016)
22.
go back to reference Shao, L., Gouws, S., Britz, D., Goldie, A., Strope, B., Kurzweil, R.: Generating long and diverse responses with neural conversation models. arXiv preprint arXiv:1701.03185 (2017) Shao, L., Gouws, S., Britz, D., Goldie, A., Strope, B., Kurzweil, R.: Generating long and diverse responses with neural conversation models. arXiv preprint arXiv:​1701.​03185 (2017)
23.
go back to reference Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum Risk Training for Neural Machine Translation. In: The Annual Meeting of the Association for Computational Linguistics (2016) Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum Risk Training for Neural Machine Translation. In: The Annual Meeting of the Association for Computational Linguistics (2016)
24.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
25.
go back to reference Tu, Z., Liu, Y., Shang, L., Liu, X., Li, H.: Neural machine translation with reconstruction. In: AAAI (2017) Tu, Z., Liu, Y., Shang, L., Liu, X., Li, H.: Neural machine translation with reconstruction. In: AAAI (2017)
26.
go back to reference Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: The Annual Meeting of the Association for Computational Linguistics, pp. 76–85 (2016) Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: The Annual Meeting of the Association for Computational Linguistics, pp. 76–85 (2016)
27.
go back to reference Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015) Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: consensus-based image description evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)
28.
go back to reference Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
29.
go back to reference Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016) Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:​1609.​08144 (2016)
30.
go back to reference Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning vol. 14, pp. 77–81 (2015) Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning vol. 14, pp. 77–81 (2015)
31.
go back to reference Yang, Z., Hu, Z., Deng, Y., Dyer, C., Smola, A.: Neural machine translation with recurrent attention modeling. arXiv preprint arXiv:1607.05108 (2016) Yang, Z., Hu, Z., Deng, Y., Dyer, C., Smola, A.: Neural machine translation with recurrent attention modeling. arXiv preprint arXiv:​1607.​05108 (2016)
32.
go back to reference Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014) Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Metadata
Title
Sequence Generation with Target Attention
Authors
Yingce Xia
Fei Tian
Tao Qin
Nenghai Yu
Tie-Yan Liu
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-71249-9_49

Premium Partner