Skip to main content

2018 | OriginalPaper | Buchkapitel

Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings

verfasst von : Vijay Krishna Menon, Sabdhi M., Harikumar K., Soman K.P.

Erschienen in: Intelligent Systems Technologies and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Comparing the semantics of a pair of sentences has been an interesting yet unstructured problem. Semantic analysis is mostly elusive due to the fact that the semantics of Natural language constructs cannot be measured, let alone be compared to one another. Methods like Latent Semantic Analysis(LSA) and Latent Dichlaret Analysis(LDA) are able to capture broader semantics between documents, but their contribution in pairwise comparison tasks which require deeper semantics may be limited. In this paper we present a local alignment based scoring scheme for sentence pairs using word embeddings and how this can be used as a feature for some popular text analysis tasks such as summarization, paraphrase comparison, topic profiling and other semantic comparison tasks. We also present a theoretical analysis on the metrics used in this approach and a separability argument using t-SNE plots. Furthermore we detail our Spark implementation model for the pairwise comparison and summarization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 305–316. Springer (2008) Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 305–316. Springer (2008)
2.
Zurück zum Zitat Amiri, H., Resnik, P., Boyd-Graber, J., III, H.D.: Learning text pair similarity with context-sensitive autoencoders. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1882–1892 (2016) Amiri, H., Resnik, P., Boyd-Graber, J., III, H.D.: Learning text pair similarity with context-sensitive autoencoders. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1882–1892 (2016)
3.
Zurück zum Zitat Ashwini, B., Menon, V.K., Soman, K.P.: Prediction of Malicious Domains Using Smith Waterman Algorithm, pp. 369–376. Springer, Singapore (2016) Ashwini, B., Menon, V.K., Soman, K.P.: Prediction of Malicious Domains Using Smith Waterman Algorithm, pp. 369–376. Springer, Singapore (2016)
4.
Zurück zum Zitat Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATH Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATH
5.
Zurück zum Zitat Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 340–348 (2010) Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 340–348 (2010)
6.
Zurück zum Zitat Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013) Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
7.
Zurück zum Zitat Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: International Conference on Web Information Systems Engineering, pp. 136–150. Springer (2008) Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: International Conference on Web Information Systems Engineering, pp. 136–150. Springer (2008)
8.
Zurück zum Zitat Hassanzadeh, H., Groza, T., Nguyen, A., Hunter, J.: Uqeresearch: semantic textual similarity quantification. In: SemEval-2015, p. 123 (2015) Hassanzadeh, H., Groza, T., Nguyen, A., Hunter, J.: Uqeresearch: semantic textual similarity quantification. In: SemEval-2015, p. 123 (2015)
9.
Zurück zum Zitat He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of NAACL-HLT, pp. 937–948 (2016) He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of NAACL-HLT, pp. 937–948 (2016)
10.
Zurück zum Zitat He, H., Gimpel, K., Lin, J.J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: EMNLP, pp. 1576–1586 (2015) He, H., Gimpel, K., Lin, J.J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: EMNLP, pp. 1576–1586 (2015)
11.
Zurück zum Zitat Irving, R.W.: Plagiarism and collusion detection using the smithwaterman algorithm. Technical report, University of Glasgow, Department of Computer Science (2004) Irving, R.W.: Plagiarism and collusion detection using the smithwaterman algorithm. Technical report, University of Glasgow, Department of Computer Science (2004)
12.
Zurück zum Zitat Jensen, A.S., Boss, N.S.: Textual similarity: comparing texts in order to discover how closely they discuss the same topics. B.S. thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2008) Jensen, A.S., Boss, N.S.: Textual similarity: comparing texts in order to discover how closely they discuss the same topics. B.S. thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2008)
13.
Zurück zum Zitat Kågebäck, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), EACL, Citeseer, pp. 31–39 (2014) Kågebäck, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), EACL, Citeseer, pp. 31–39 (2014)
14.
Zurück zum Zitat van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)MATH
16.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)
17.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
18.
Zurück zum Zitat Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 23–31 (2009) Ramage, D., Rafferty, A.N., Manning, C.D.: Random walks for text semantic similarity. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 23–31 (2009)
19.
Zurück zum Zitat Sanborn, A., Skryzalin, J.: Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing Stanford, Stanford University, CA (2015) Sanborn, A., Skryzalin, J.: Deep learning for semantic similarity. CS224d: Deep Learning for Natural Language Processing Stanford, Stanford University, CA (2015)
20.
Zurück zum Zitat Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRef Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)CrossRef
21.
Zurück zum Zitat Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010) Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)
22.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, HotCloud 2010, Berkeley, CA, USA, p. 10 (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, HotCloud 2010, Berkeley, CA, USA, p. 10 (2010)
23.
Zurück zum Zitat Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). doi:10.1145/2934664 CrossRef Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). doi:10.​1145/​2934664 CrossRef
Metadaten
Titel
Semantic Analysis Using Pairwise Sentence Comparison with Word Embeddings
verfasst von
Vijay Krishna Menon
Sabdhi M.
Harikumar K.
Soman K.P.
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-68385-0_23