Skip to main content

2021 | OriginalPaper | Buchkapitel

Sentence-State LSTMs For Sequence-to-Sequence Learning

verfasst von : Xuefeng Bai, Yafu Li, Zhirui Zhang, Mingzhou Xu, Boxing Chen, Weihua Luo, Derek Wong, Yue Zhang

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Transformer is currently the dominant method for sequence to sequence problems. In contrast, RNNs have become less popular due to the lack of parallelization capabilities and the relatively lower performance. In this paper, we propose to use a parallelizable variant of bi-directional LSTMs (BiLSTMs), namely sentence-state LSTMs (S-LSTM), as an encoder for sequence-to-sequence tasks. The complexity of S-LSTM is only \(\mathcal {O}(n)\) as compared to \(\mathcal {O}(n^2)\) of Transformer. On four neural machine translation benchmarks, we empirically find that S-SLTM can achieve significantly better performances than BiLSTM and convolutional neural networks (CNNs). When compared to Transformer, our model gives competitive performance while being 1.6 times faster during inference.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
LDC2000T46, LDC2000T47, LDC2000T50, LDC2003E14, LDC2005T10, LDC2002E18, LDC2007T09, LDC2004T08.
 
Literatur
1.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
3.
Zurück zum Zitat Bengio, Y., Frasconi, P., Schmidhuber, J., Elvezia, C.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies(2001) Bengio, Y., Frasconi, P., Schmidhuber, J., Elvezia, C.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies(2001)
4.
Zurück zum Zitat Chen, M., et al.: The best of both worlds: combining recent advances in neural machine translation. In: ACL (2018) Chen, M., et al.: The best of both worlds: combining recent advances in neural machine translation. In: ACL (2018)
5.
6.
Zurück zum Zitat Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) EMNLP (2014) Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) EMNLP (2014)
7.
Zurück zum Zitat Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL (2019) Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL (2019)
8.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
9.
Zurück zum Zitat Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv:1505.08075 (2015) Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. arXiv:​1505.​08075 (2015)
10.
Zurück zum Zitat Gao, F., Wu, L., Zhao, L., Qin, T., Cheng, X., Liu, T.Y.: Efficient sequence learning with group recurrent networks. In: NAACL (2018) Gao, F., Wu, L., Zhao, L., Qin, T., Cheng, X., Liu, T.Y.: Efficient sequence learning with group recurrent networks. In: NAACL (2018)
11.
Zurück zum Zitat Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML (2017) Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML (2017)
12.
Zurück zum Zitat Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. In: NAACL (2019) Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. In: NAACL (2019)
13.
Zurück zum Zitat Hassan, A., Mahmood, A.: Deep learning for sentence classification. In: LISAT (2017) Hassan, A., Mahmood, A.: Deep learning for sentence classification. In: LISAT (2017)
14.
Zurück zum Zitat Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP (2013) Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP (2013)
15.
Zurück zum Zitat Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., Kavukcuoglu, K.: Neural machine translation in linear time. arXiv (2016) Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., Kavukcuoglu, K.: Neural machine translation in linear time. arXiv (2016)
16.
Zurück zum Zitat Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
17.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
18.
Zurück zum Zitat Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017) Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
19.
Zurück zum Zitat Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer. In: ICLR (2020) Kitaev, N., Kaiser, L., Levskaya, A.: Reformer: the efficient transformer. In: ICLR (2020)
20.
Zurück zum Zitat Kübler, S., McDonald, R., Nivre, J.: Dependency parsing. Synthesis Lectures on Human Language Technologies (2009) Kübler, S., McDonald, R., Nivre, J.: Dependency parsing. Synthesis Lectures on Human Language Technologies (2009)
21.
Zurück zum Zitat Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020) Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
22.
Zurück zum Zitat Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015) Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)
23.
Zurück zum Zitat Nguyen, N., Guo, Y.: Comparisons of sequence labeling algorithms and extensions. In: ICML (2007) Nguyen, N., Guo, Y.: Comparisons of sequence labeling algorithms and extensions. In: ICML (2007)
24.
Zurück zum Zitat Oluwatobi, O., Mueller, E.T.: DLGNet: a transformer-based model for dialogue response generation. arXiv: Computation and Language (2020) Oluwatobi, O., Mueller, E.T.: DLGNet: a transformer-based model for dialogue response generation. arXiv: Computation and Language (2020)
25.
Zurück zum Zitat Pickering, M.J., Van Gompel, R.P.: Syntactic parsing. In: Handbook of Psycholinguistics (2006) Pickering, M.J., Van Gompel, R.P.: Syntactic parsing. In: Handbook of Psycholinguistics (2006)
26.
Zurück zum Zitat Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
27.
Zurück zum Zitat Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. TACL 9, 53–68 (2021)CrossRef Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. TACL 9, 53–68 (2021)CrossRef
28.
Zurück zum Zitat Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997)CrossRef Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997)CrossRef
29.
Zurück zum Zitat Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: WMT (2016) Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: WMT (2016)
30.
Zurück zum Zitat Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL (2016) Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: ACL (2016)
31.
Zurück zum Zitat Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL-HLT (2018) Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL-HLT (2018)
32.
Zurück zum Zitat Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018) Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: AAAI (2018)
33.
Zurück zum Zitat Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013) Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP (2013)
34.
Zurück zum Zitat Song, L., Zhang, Y., Wang, Z., Gildea, D.: A graph-to-sequence model for AMR-to-text generation. In: ACL (2018) Song, L., Zhang, Y., Wang, Z., Gildea, D.: A graph-to-sequence model for AMR-to-text generation. In: ACL (2018)
35.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)
36.
Zurück zum Zitat Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015) Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
37.
Zurück zum Zitat Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: COLING (2016) Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: COLING (2016)
38.
Zurück zum Zitat Tang, G., Müller, M., Gonzales, A.R., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. In: EMNLP (2018) Tang, G., Müller, M., Gonzales, A.R., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. In: EMNLP (2018)
39.
Zurück zum Zitat Tian, Y., Song, Y., Xia, F., Zhang, T.: Improving constituency parsing with span attention. In: EMNLP (2020) Tian, Y., Song, Y., Xia, F., Zhang, T.: Improving constituency parsing with span attention. In: EMNLP (2020)
40.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: NIPS (2017) Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
41.
Zurück zum Zitat Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., Bengio, Y.: Graph attention networks. In: ICLR (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)
42.
43.
Zurück zum Zitat Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. ArXiv (2016) Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. ArXiv (2016)
44.
Zurück zum Zitat Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. In: NAACL (2019) Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. In: NAACL (2019)
45.
Zurück zum Zitat Zhang, J., et al.: Improving the transformer translation model with document-level context. In: EMNLP (2018) Zhang, J., et al.: Improving the transformer translation model with document-level context. In: EMNLP (2018)
46.
Zurück zum Zitat Zhang, Y., et al.: DialoGPT: large-scale generative pre-training for conversational response generation. In: ACL (2020) Zhang, Y., et al.: DialoGPT: large-scale generative pre-training for conversational response generation. In: ACL (2020)
47.
Zurück zum Zitat Zhang, Y., Liu, Q., Song, L.: Sentence-state LSTM for text representation. In: ACL (2018) Zhang, Y., Liu, Q., Song, L.: Sentence-state LSTM for text representation. In: ACL (2018)
Metadaten
Titel
Sentence-State LSTMs For Sequence-to-Sequence Learning
verfasst von
Xuefeng Bai
Yafu Li
Zhirui Zhang
Mingzhou Xu
Boxing Chen
Weihua Luo
Derek Wong
Yue Zhang
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-88480-2_9

Premium Partner