Skip to main content
Top

2019 | OriginalPaper | Chapter

Efficient Transformer-Based Sentence Encoding for Sentence Pair Modelling

Authors : Mahtab Ahmed, Robert E. Mercer

Published in: Advances in Artificial Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Modelling a pair of sentences is important for many NLP tasks such as textual entailment (TE), paraphrase identification (PI), semantic relatedness (SR) and question answer pairing (QAP). Most sentence pair modelling work has looked only at the local context to generate a distributed sentence representation without considering the mutual information found in the other sentence. The proposed attentive encoder uses the representation of one sentence generated by a multi-head transformer encoder to guide the focussing on the most semantically relevant words from the other sentence using multi-branch attention. Evaluating this novel sentence encoder on the TE, PI, SR and QAP tasks shows notable improvements over the standard Transformer encoder as well as other current state-of-the-art models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Marelli, M., et al.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 1–8 (2014) Marelli, M., et al.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 1–8 (2014)
2.
go back to reference Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015) Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:​1508.​05326 (2015)
3.
go back to reference Yin, W., et al.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)CrossRef Yin, W., et al.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)CrossRef
4.
go back to reference He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods of Natural Language Processing, pp. 1576–1586 (2015) He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods of Natural Language Processing, pp. 1576–1586 (2015)
5.
go back to reference Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)CrossRef Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)CrossRef
6.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
7.
go back to reference Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014) Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:​1412.​3555 (2014)
8.
go back to reference Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015) Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:​1508.​04025 (2015)
9.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
10.
11.
go back to reference Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
12.
go back to reference Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017) Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:​1705.​03122 (2017)
15.
go back to reference Conneau, A., et al.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017) Conneau, A., et al.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:​1705.​02364 (2017)
16.
18.
go back to reference Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: IJCAI, pp. 4069–4076 (2015) Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: IJCAI, pp. 4069–4076 (2015)
19.
go back to reference Yang, Z., et al.: Hierarchical attention networks for document classification. In: Proceeding of the 2016 Conference of NAACL: HLT, pp. 1480–1489 (2016) Yang, Z., et al.: Hierarchical attention networks for document classification. In: Proceeding of the 2016 Conference of NAACL: HLT, pp. 1480–1489 (2016)
20.
go back to reference Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Inf. Processing Systems, pp. 801–809 (2011) Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Inf. Processing Systems, pp. 801–809 (2011)
21.
go back to reference Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: AAAI, vol. 16, pp. 2786–2792 (2016) Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: AAAI, vol. 16, pp. 2786–2792 (2016)
23.
go back to reference Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015) Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
24.
25.
go back to reference Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015) Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:​1503.​00075 (2015)
26.
go back to reference Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 350 (2004) Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 350 (2004)
27.
go back to reference Baudis, P., Stanko, S., Sedivy, J.: Joint learning of sentence embeddings for relevance and entailment. arXiv preprint arXiv:1605.04655 (2016) Baudis, P., Stanko, S., Sedivy, J.: Joint learning of sentence embeddings for relevance and entailment. arXiv preprint arXiv:​1605.​04655 (2016)
28.
go back to reference Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015) Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
29.
go back to reference Lai, A., et al.: Illinois-LH: a denotational and distributional approach to semantics. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 329–334 (2014) Lai, A., et al.: Illinois-LH: a denotational and distributional approach to semantics. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 329–334 (2014)
30.
go back to reference Jimenez, S., et al.: UNAL-NLP: combining soft cardinality features for semantic textual similarity, relatedness and entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 732–742 (2014) Jimenez, S., et al.: UNAL-NLP: combining soft cardinality features for semantic textual similarity, relatedness and entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 732–742 (2014)
31.
go back to reference Zhao, J., et al.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 271–277 (2014) Zhao, J., et al.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 271–277 (2014)
Metadata
Title
Efficient Transformer-Based Sentence Encoding for Sentence Pair Modelling
Authors
Mahtab Ahmed
Robert E. Mercer
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-18305-9_12

Premium Partner