Skip to main content
Erschienen in:

08.12.2022

VSCA: A Sentence Matching Model Incorporating Visual Perception

verfasst von: Zhe Zhang, Guangli Xiao, Yurong Qian, Mengnan Ma, Hongyong Leng, Tao Zhang

Erschienen in: Cognitive Computation | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Stacking multiple layers of attention networks can significantly improve a model’s performance. However, this also increases the model’s time and space complexity, making it difficult for the model to capture detailed information on the underlying features. We propose a novel sentence matching model (VSCA) that uses a new attention mechanism based on variational autoencoders (VAE), which exploits the contextual information in sentences to construct a basic attention feature map and combines it with VAE to generate multiple sets of related attention feature maps for fusion. Furthermore, VSCA introduces a spatial attention mechanism that combines visual perception to capture multilevel semantic information. The experimental results show that our proposed model outperforms pretrained models such as BERT on the LCQMC dataset and performs well on the PAWS-X data. Our work consists of two parts. The first part compares the proposed sentence matching model with state-of-the-art pretrained models such as BERT. The second part conducts innovative research on applying VAE and spatial attention mechanisms in NLP. The experimental results on the related datasets show that the proposed method has satisfactory performance, and VSCA can capture rich attentional information and detailed information with less time and space complexity. This work provides insights into the application of VAE and spatial attention mechanisms in NLP.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Zhang X, Sun X, Wang H. Duplicate question identification by integrating framenet with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32. Zhang X, Sun X, Wang H. Duplicate question identification by integrating framenet with neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32.
3.
Zurück zum Zitat Bogdanova D, dos Santos C, Barbosa L, Zadrozny B. Detecting semantically equivalent questions in online user forums. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning. 2015. p. 123–31. Bogdanova D, dos Santos C, Barbosa L, Zadrozny B. Detecting semantically equivalent questions in online user forums. In: Proceedings of the Nineteenth Conference on Computational Natural Language Learning. 2015. p. 123–31.
4.
Zurück zum Zitat Chen J, Chen Q, Liu X, Yang H, Lu D, Tang B. The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018. p. 4946–51. Chen J, Chen Q, Liu X, Yang H, Lu D, Tang B. The BQ corpus: a large-scale domain-specific Chinese corpus for sentence semantic equivalence identification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018. p. 4946–51.
6.
Zurück zum Zitat Iftene A, Balahur A. Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing; 2007. p. 125–30. Iftene A, Balahur A. Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing; 2007. p. 125–30.
7.
Zurück zum Zitat Madnani N, Tetreault J, Chodorow M. Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012. p. 182–90. Madnani N, Tetreault J, Chodorow M. Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012. p. 182–90.
8.
Zurück zum Zitat Yin W, Schütze H, Xiang B, Zhou B. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist. 2016;4:259–72.CrossRef Yin W, Schütze H, Xiang B, Zhou B. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist. 2016;4:259–72.CrossRef
9.
Zurück zum Zitat Dolan W, Quirk C, Brockett C, Dolan B. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. 2004. Dolan W, Quirk C, Brockett C, Dolan B. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. 2004.
10.
Zurück zum Zitat Liu Q, Huang Z, Huang Z, Liu C, Chen E, Su Y, et al. Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018. p. 1821–30. Liu Q, Huang Z, Huang Z, Liu C, Chen E, Su Y, et al. Finding similar exercises in online education systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018. p. 1821–30.
11.
Zurück zum Zitat Clark P, Etzioni O, Khot T, Sabharwal A, Tafjord O, Turney P, et al. Combining retrieval, statistics, and inference to answer elementary science questions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016;30. Clark P, Etzioni O, Khot T, Sabharwal A, Tafjord O, Turney P, et al. Combining retrieval, statistics, and inference to answer elementary science questions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016;30.
12.
Zurück zum Zitat Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Info Sci. 2020;514:88–105.CrossRef Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Info Sci. 2020;514:88–105.CrossRef
13.
Zurück zum Zitat Xiao L, Wissmann D, Brown M, Jablonski S. Information extraction from the web: System and techniques. Appl Intell. 2004;21(2):195–224.CrossRefMATH Xiao L, Wissmann D, Brown M, Jablonski S. Information extraction from the web: System and techniques. Appl Intell. 2004;21(2):195–224.CrossRefMATH
14.
Zurück zum Zitat Gálvez-López D, Tardos JD. Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot. 2012;28(5):1188–97.CrossRef Gálvez-López D, Tardos JD. Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot. 2012;28(5):1188–97.CrossRef
15.
Zurück zum Zitat Landauer TK, Dumais ST. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev. 1997;104(2):211.CrossRef Landauer TK, Dumais ST. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev. 1997;104(2):211.CrossRef
16.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
17.
Zurück zum Zitat Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech Signal Process. 1980;28(4):357–66.CrossRef Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech Signal Process. 1980;28(4):357–66.CrossRef
20.
Zurück zum Zitat Ghaeini R, Hasan SA, Datla V, Liu J, Lee K, Qadir A, et al. DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. arXiv:1802.05577 [Preprint]. 2018. Available from http://arxiv.org/abs/1802.05577. Ghaeini R, Hasan SA, Datla V, Liu J, Lee K, Qadir A, et al. DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. arXiv:1802.05577 [Preprint]. 2018. Available from http://​arxiv.​org/​abs/​1802.​05577.
21.
Zurück zum Zitat Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T. Attention-fused deep matching network for natural language inference. In: IJCAI; 2018. p. 4033–40. Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T. Attention-fused deep matching network for natural language inference. In: IJCAI; 2018. p. 4033–40.
22.
Zurück zum Zitat Park C, Song H, Lee C. S3-NET: SRU-based sentence and self-matching networks for machine reading comprehension. ACM Trans Asian Low-Resource Language Info Process (TALLIP). 2020;19(3):1–14.CrossRef Park C, Song H, Lee C. S3-NET: SRU-based sentence and self-matching networks for machine reading comprehension. ACM Trans Asian Low-Resource Language Info Process (TALLIP). 2020;19(3):1–14.CrossRef
23.
Zurück zum Zitat Peng D, Wu S, Liu C. MPSC: a multiple-perspective semantics-crossover model for matching sentences. IEEE Access. 2019;7:61320–30.CrossRef Peng D, Wu S, Liu C. MPSC: a multiple-perspective semantics-crossover model for matching sentences. IEEE Access. 2019;7:61320–30.CrossRef
24.
Zurück zum Zitat Yu W, Yang K, Yao H, Sun X, Xu P. Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing. 2017;237:235–41.CrossRef Yu W, Yang K, Yao H, Sun X, Xu P. Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing. 2017;237:235–41.CrossRef
26.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit; 2016. p. 770–8. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit; 2016. p. 770–8.
28.
Zurück zum Zitat Woo S, Park J, Lee JY, Kweon IS. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3–19. Woo S, Park J, Lee JY, Kweon IS. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. p. 3–19.
30.
31.
Zurück zum Zitat Shen Y, He X, Gao J, Deng L, Mesnil G. A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management; 2014. p. 101–10. Shen Y, He X, Gao J, Deng L, Mesnil G. A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management; 2014. p. 101–10.
32.
Zurück zum Zitat Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. Adv Neural Info Process Syst. 2014;27. Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. Adv Neural Info Process Syst. 2014;27.
33.
Zurück zum Zitat Huang PS, He X, Gao J, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management; 2013. p. 2333–8. Huang PS, He X, Gao J, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management; 2013. p. 2333–8.
35.
Zurück zum Zitat Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, et al. Teaching machines to read and comprehend. Adv Neural Info Process Syst. 2015;28. Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, et al. Teaching machines to read and comprehend. Adv Neural Info Process Syst. 2015;28.
36.
Zurück zum Zitat Yuan Z, Jun S. Network cooperating with multi-head attention for semantic sentence matching. In: 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE. 2020;2020:215–8. Yuan Z, Jun S. Network cooperating with multi-head attention for semantic sentence matching. In: 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES). IEEE. 2020;2020:215–8.
39.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Info Process Syst. 2017;30. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Info Process Syst. 2017;30.
41.
42.
Zurück zum Zitat Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. XLNet: Generalized autoregressive pretraining for language understanding. Adv Neural Info Process Syst. 2019;32. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. XLNet: Generalized autoregressive pretraining for language understanding. Adv Neural Info Process Syst. 2019;32.
44.
46.
Zurück zum Zitat Miao Y, Yu L, Blunsom P. Neural variational inference for text processing. In: International Conference on Machine Learning. PMLR; 2016. p. 1727–36. Miao Y, Yu L, Blunsom P. Neural variational inference for text processing. In: International Conference on Machine Learning. PMLR; 2016. p. 1727–36.
47.
Zurück zum Zitat Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, et al. A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31. Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, et al. A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31.
48.
Zurück zum Zitat Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T. Improved variational autoencoders for text modeling using dilated convolutions. In: International Conference on Machine Learning. PMLR; 2017. p. 3881–90. Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T. Improved variational autoencoders for text modeling using dilated convolutions. In: International Conference on Machine Learning. PMLR; 2017. p. 3881–90.
49.
Zurück zum Zitat Liu D, Liu G. A transformer-based variational autoencoder for sentence generation. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1–7. Liu D, Liu G. A transformer-based variational autoencoder for sentence generation. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1–7.
50.
Zurück zum Zitat Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 562–70. Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 562–70.
51.
Zurück zum Zitat Dowty D. Compositionality as an empirical problem. Direct Compositional. 2007;14:23–101. Dowty D. Compositionality as an empirical problem. Direct Compositional. 2007;14:23–101.
52.
Zurück zum Zitat Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef
54.
Zurück zum Zitat Loshchilov I, Hutter F. Fixing weight decay regularization in Adam. 2018. Loshchilov I, Hutter F. Fixing weight decay regularization in Adam. 2018.
55.
Zurück zum Zitat Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.MathSciNetMATH
56.
Zurück zum Zitat Liu X, Chen Q, Deng C, Zeng H, Chen J, Li D, et al. LCQMC: a large-scale Chinese question matching corpus. In: Proceedings of the 27th International Conference on Computational Linguistics; 2018. p. 1952–62. Liu X, Chen Q, Deng C, Zeng H, Chen J, Li D, et al. LCQMC: a large-scale Chinese question matching corpus. In: Proceedings of the 27th International Conference on Computational Linguistics; 2018. p. 1952–62.
59.
Zurück zum Zitat Yu R, Lu W, Li Y, Yu J, Zhang G, Zhang X. Sentence semantic matching with hierarchical CNN based on dimension-augmented representation. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE; 2021. p. 1–8. Yu R, Lu W, Li Y, Yu J, Zhang G, Zhang X. Sentence semantic matching with hierarchical CNN based on dimension-augmented representation. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE; 2021. p. 1–8.
64.
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002. p. 311–8. Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; 2002. p. 311–8.
65.
Zurück zum Zitat Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C. DiSAN: Directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C. DiSAN: Directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32.
67.
Zurück zum Zitat Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K. Improve diverse text generation by self labeling conditional variational auto encoder. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2019. p. 2767–71. Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K. Improve diverse text generation by self labeling conditional variational auto encoder. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2019. p. 2767–71.
Metadaten
Titel
VSCA: A Sentence Matching Model Incorporating Visual Perception
verfasst von
Zhe Zhang
Guangli Xiao
Yurong Qian
Mengnan Ma
Hongyong Leng
Tao Zhang
Publikationsdatum
08.12.2022
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 1/2023
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-022-10074-8