Skip to main content
Top

2021 | OriginalPaper | Chapter

Sentence-State LSTMs For Sequence-to-Sequence Learning

Authors : Xuefeng Bai, Yafu Li, Zhirui Zhang, Mingzhou Xu, Boxing Chen, Weihua Luo, Derek Wong, Yue Zhang

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Transformer is currently the dominant method for sequence to sequence problems. In contrast, RNNs have become less popular due to the lack of parallelization capabilities and the relatively lower performance. In this paper, we propose to use a parallelizable variant of bi-directional LSTMs (BiLSTMs), namely sentence-state LSTMs (S-LSTM), as an encoder for sequence-to-sequence tasks. The complexity of S-LSTM is only \(\mathcal {O}(n)\) as compared to \(\mathcal {O}(n^2)\) of Transformer. On four neural machine translation benchmarks, we empirically find that S-SLTM can achieve significantly better performances than BiLSTM and convolutional neural networks (CNNs). When compared to Transformer, our model gives competitive performance while being 1.6 times faster during inference.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
LDC2000T46, LDC2000T47, LDC2000T50, LDC2003E14, LDC2005T10, LDC2002E18, LDC2007T09, LDC2004T08.
 
Literature
1.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
3.
go back to reference Bengio, Y., Frasconi, P., Schmidhuber, J., Elvezia, C.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies(2001) Bengio, Y., Frasconi, P., Schmidhuber, J., Elvezia, C.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies(2001)
4.
go back to reference Chen, M., et al.: The best of both worlds: combining recent advances in neural machine translation. In: ACL (2018) Chen, M., et al.: The best of both worlds: combining recent advances in neural machine translation. In: ACL (2018)
6.
go back to reference Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) EMNLP (2014) Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) EMNLP (2014)
7.
go back to reference Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL (2019) Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL (2019)
8.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
9.
go back to reference Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv:1505.08075 (2015) Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv:​1505.​08075 (2015)
10.
go back to reference Gao, F., Wu, L., Zhao, L., Qin, T., Cheng, X., Liu, T.Y.: Efficient sequence learning with group recurrent networks. In: NAACL (2018) Gao, F., Wu, L., Zhao, L., Qin, T., Cheng, X., Liu, T.Y.: Efficient sequence learning with group recurrent networks. In: NAACL (2018)
11.
go back to reference Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML (2017) Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML (2017)
12.
go back to reference Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. In: NAACL (2019) Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. In: NAACL (2019)
13.
go back to reference Hassan, A., Mahmood, A.: Deep learning for sentence classification. In: LISAT (2017) Hassan, A., Mahmood, A.: Deep learning for sentence classification. In: LISAT (2017)
14.
go back to reference Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP (2013) Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP (2013)
15.
go back to reference Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., Kavukcuoglu, K.: Neural machine translation in linear time. arXiv (2016) Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., Kavukcuoglu, K.: Neural machine translation in linear time. arXiv (2016)
16.
go back to reference Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
17.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
18.
go back to reference Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017) Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
19.
go back to reference Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer. In: ICLR (2020) Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer. In: ICLR (2020)
20.
go back to reference Kübler, S., McDonald, R., Nivre, J.: Dependency parsing. Synthesis Lectures on Human Language Technologies (2009) Kübler, S., McDonald, R., Nivre, J.: Dependency parsing. Synthesis Lectures on Human Language Technologies (2009)
21.
go back to reference Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020) Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
22.
go back to reference Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015) Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)
23.
go back to reference Nguyen, N., Guo, Y.: Comparisons of sequence labeling algorithms and extensions. In: ICML (2007) Nguyen, N., Guo, Y.: Comparisons of sequence labeling algorithms and extensions. In: ICML (2007)
24.
go back to reference Oluwatobi, O., Mueller, E.T.: DLGNet: a transformer-based model for dialogue response generation. arXiv: Computation and Language (2020) Oluwatobi, O., Mueller, E.T.: DLGNet: a transformer-based model for dialogue response generation. arXiv: Computation and Language (2020)
25.
go back to reference Pickering, M.J., Van Gompel, R.P.: Syntactic parsing. In: Handbook of Psycholinguistics (2006) Pickering, M.J., Van Gompel, R.P.: Syntactic parsing. In: Handbook of Psycholinguistics (2006)
26.
go back to reference Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
27.
go back to reference Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. TACL 9, 53–68 (2021)CrossRef Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. TACL 9, 53–68 (2021)CrossRef
28.
go back to reference Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997)CrossRef Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997)CrossRef
29.
go back to reference Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: WMT (2016) Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: WMT (2016)
30.
go back to reference Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL (2016) Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL (2016)
31.
go back to reference Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL-HLT (2018) Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL-HLT (2018)
32.
go back to reference Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018) Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018)
33.
go back to reference Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013) Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013)
34.
go back to reference Song, L., Zhang, Y., Wang, Z., Gildea, D.: A graph-to-sequence model for AMR-to-text generation. In: ACL (2018) Song, L., Zhang, Y., Wang, Z., Gildea, D.: A graph-to-sequence model for AMR-to-text generation. In: ACL (2018)
35.
go back to reference Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)
36.
go back to reference Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015) Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
37.
go back to reference Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: COLING (2016) Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: COLING (2016)
38.
go back to reference Tang, G., Müller, M., Gonzales, A.R., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. In: EMNLP (2018) Tang, G., Müller, M., Gonzales, A.R., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. In: EMNLP (2018)
39.
go back to reference Tian, Y., Song, Y., Xia, F., Zhang, T.: Improving constituency parsing with span attention. In: EMNLP (2020) Tian, Y., Song, Y., Xia, F., Zhang, T.: Improving constituency parsing with span attention. In: EMNLP (2020)
40.
go back to reference Vaswani, A., et al.: Attention is all you need. In: NIPS (2017) Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
41.
go back to reference Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., Bengio, Y.: Graph attention networks. In: ICLR (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)
42.
43.
go back to reference Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. ArXiv (2016) Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. ArXiv (2016)
44.
go back to reference Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. In: NAACL (2019) Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. In: NAACL (2019)
45.
go back to reference Zhang, J., et al.: Improving the transformer translation model with document-level context. In: EMNLP (2018) Zhang, J., et al.: Improving the transformer translation model with document-level context. In: EMNLP (2018)
46.
go back to reference Zhang, Y., et al.: DialoGPT: large-scale generative pre-training for conversational response generation. In: ACL (2020) Zhang, Y., et al.: DialoGPT: large-scale generative pre-training for conversational response generation. In: ACL (2020)
47.
go back to reference Zhang, Y., Liu, Q., Song, L.: Sentence-state LSTM for text representation. In: ACL (2018) Zhang, Y., Liu, Q., Song, L.: Sentence-state LSTM for text representation. In: ACL (2018)
Metadata
Title
Sentence-State LSTMs For Sequence-to-Sequence Learning
Authors
Xuefeng Bai
Yafu Li
Zhirui Zhang
Mingzhou Xu
Boxing Chen
Weihua Luo
Derek Wong
Yue Zhang
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-88480-2_9

Premium Partner