Skip to main content
Top
Published in: Pattern Recognition and Image Analysis 4/2019

01-10-2019 | ARTIFICIAL INTELLIGENCE TECHNIQUES IN PATTERN RECOGNITION AND IMAGE ANALYSIS

Estimation of the Closeness to a Semantic Pattern of a Topical Text without Construction of Periphrases

Authors: D. V. Mikhaylov, G. M. Emelyanov

Published in: Pattern Recognition and Image Analysis | Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The paper considers the problem of numerical estimation of the closeness of a topical text to the most rational linguistic variant (i.e. semantic pattern or sense standard) of the description of the knowledge fragment it represents without paraphrasing. This problem is relevant when implementing targeted selection of text information by the maximum of the useful semantic component with respect to the tasks solved by the user. Examples of practical applications may include selection of papers for scientific publishing and design of training courses and educational portals. In the suggested solution, the basis of the estimate of the closeness of the text to the semantic pattern is the splitting of the words of each of its phrases into classes by the TF-IDF metric value relative to texts of a corpus preformed by an expert. Abstracts of scientific papers together with their titles are analyzed. The suggested numerical estimate of closeness to the sense standard makes it possible to rank articles by the significance of the described fragments of knowledge regarding a given subject area and by non-redundancy of the description itself. Here, the semantic images of the texts closest to the semantic pattern specify the words with the highest TF-IDF values, which, when placed next to each other in the linear series of a phrase, are, most probably, semantically related and form key combinations with words whose mentioned metric is close to average. To classify word combinations as key ones, the interpretation of the TF-IDF metric, estimating the number of simultaneous occurrences of all words in the analyzed combination into phrases of the individual document, is introduced.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference G. M. Emelyanov and E. I. Smirnova, “Logical model of hypertext image database,” Pattern Recogn. Image Anal. 9 (3), 458–491 (1999). G. M. Emelyanov and E. I. Smirnova, “Logical model of hypertext image database,” Pattern Recogn. Image Anal. 9 (3), 458–491 (1999).
2.
go back to reference A. A. Kuzmin, A. A. Aduenko, and V. V. Strijov, “Thematic classification using expert model for major conference abstracts,” Inf. Technol. 6 (214), 22–26 (2014) [in Russian]. A. A. Kuzmin, A. A. Aduenko, and V. V. Strijov, “Thematic classification using expert model for major conference abstracts,” Inf. Technol. 6 (214), 22–26 (2014) [in Russian].
3.
go back to reference E. Huang, Paraphrase Detection Using Recursive Autoencoder. Available at: http://nlp.stanford.edu/courses/cs224n/2011/reports/ehhuang.pdf (Accessed March 14, 2019). E. Huang, Paraphrase Detection Using Recursive Autoencoder. Available at: http://​nlp.​stanford.​edu/​courses/​cs224n/​2011/​reports/​ehhuang.​pdf (Accessed March 14, 2019).
4.
go back to reference ParaPhraser: Paraphrasing and text synonymization [In Russian]. Available at: http://paraphraser.ru/ (Accessed March 14, 2019). ParaPhraser: Paraphrasing and text synonymization [In Russian]. Available at: http://​paraphraser.​ru/​ (Accessed March 14, 2019).
5.
go back to reference I. V. Sochenkov, D. V. Zubarev, and I. V. Smirnov, “The ParaPlag: Russian dataset for paraphrased plagiarism detection,” in Computational Linguistics and Intellectual Technologies, Proc. Annual International Conference “Dialogue 2017” (Moscow, Russia, 2017), Vol. 1. pp. 284–297. I. V. Sochenkov, D. V. Zubarev, and I. V. Smirnov, “The ParaPlag: Russian dataset for paraphrased plagiarism detection,” in Computational Linguistics and Intellectual Technologies, Proc. Annual International ConferenceDialogue 2017” (Moscow, Russia, 2017), Vol. 1. pp. 284–297.
6.
go back to reference SyntaxNet: Neural Models of Syntax. Available at: https://github.com/tensorflow/models/tree//master/research/syntaxnet (Accessed March 14, 2019). SyntaxNet: Neural Models of Syntax. Available at: https://​github.​com/​tensorflow/​models/​tree/​/​master/​research/​syntaxnet (Accessed March 14, 2019).
7.
go back to reference Russian Paraphrase Detection Task. Available at: https://ainlconf.ru/2016/paraphraser (Accessed March 14, 2019). Russian Paraphrase Detection Task. Available at: https://​ainlconf.​ru/​2016/​paraphraser (Accessed March 14, 2019).
8.
go back to reference G. M. Emelyanov, D. V. Mikhaylov, and A. P. Kozlov, “Formation of the representation of topical knowledge units in the problem of their estimation on the basis of open tests,” Mash. Obuch. Anal. Dannykh (Mach. Learn. Data Anal.) 1 (8), 1089–1106 (2014) [in Russian]. G. M. Emelyanov, D. V. Mikhaylov, and A. P. Kozlov, “Formation of the representation of topical knowledge units in the problem of their estimation on the basis of open tests,” Mash. Obuch. Anal. Dannykh (Mach. Learn. Data Anal.) 1 (8), 1089–1106 (2014) [in Russian].
9.
go back to reference G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “Relevance of a set of topical texts to a knowledge unit and the estimation of the closeness of linguistic forms of its expression to a semantic pattern,” Pattern Recogn. Image Anal. 28 (4), 771–782 (2018).CrossRef G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “Relevance of a set of topical texts to a knowledge unit and the estimation of the closeness of linguistic forms of its expression to a semantic pattern,” Pattern Recogn. Image Anal. 28 (4), 771–782 (2018).CrossRef
10.
go back to reference N. G. Zagoruiko, Applied Methods of Data and Knowledge Analysis (Institute of Mathematics SD RAS, Novosibirsk, 1999) [in Russian]. N. G. Zagoruiko, Applied Methods of Data and Knowledge Analysis (Institute of Mathematics SD RAS, Novosibirsk, 1999) [in Russian].
11.
go back to reference Proceedings of the Southwest State University. Series: Control, Computer Engineering, Information Science. Medical Instruments Engineering. Available at: https://swsu.ru/izvestiya/seriesivt/eng/ (Accessed March 18, 2019). Proceedings of the Southwest State University. Series: Control, Computer Engineering, Information Science. Medical Instruments Engineering. Available at: https://​swsu.​ru/​izvestiya/​seriesivt/​eng/​ (Accessed March 18, 2019).
12.
go back to reference The Eclipse Foundation. Available at: https://www.eclipse.org (Accessed March 21, 2019). The Eclipse Foundation. Available at: https://​www.​eclipse.​org (Accessed March 21, 2019).
13.
go back to reference N. Chirkova, R. Aysina, and K. Vorontsov, “Hierarchical additively regularized topic model of a scientific conference,” in Mathematical Methods for Pattern Recognition, Book of Abstracts of 17th All-Russian Conference with International Participation MMPR-17 (Svetlogorsk, Russia, 2015) (Torus Press, Moscow, 2015), p. 231 [in Russian]. N. Chirkova, R. Aysina, and K. Vorontsov, “Hierarchical additively regularized topic model of a scientific conference,” in Mathematical Methods for Pattern Recognition, Book of Abstracts of 17th All-Russian Conference with International Participation MMPR-17 (Svetlogorsk, Russia, 2015) (Torus Press, Moscow, 2015), p. 231 [in Russian].
14.
go back to reference MaltParser – A data-driven dependency parser. Available at: http://www.maltparser.org/ (Accessed March 23, 2019). MaltParser – A data-driven dependency parser. Available at: http://​www.​maltparser.​org/​ (Accessed March 23, 2019).
15.
go back to reference PDFMiner – Python PDF parser and analyzer. Available at: https://euske.github.io/pdfminer/ (Accessed March 25, 2019). PDFMiner – Python PDF parser and analyzer. Available at: https://​euske.​github.​io/​pdfminer/​ (Accessed March 25, 2019).
16.
go back to reference Natural Language Toolkit. Available at: http://www.nltk.org/ (Accessed March 25, 2019). Natural Language Toolkit. Available at: http://​www.​nltk.​org/​ (Accessed March 25, 2019).
17.
go back to reference M. Korobov, “Morphological analyzer and generator for Russian and Ukrainian languages,” in Analysis of Images, Social Networks and Texts, AIST 2015, Ed. by M. Khachay, N. Konstantinova, A. Panchenko, D. Ignatov, and V. Labunets, Communications in Computer and Information Science (Springer, Cham, 2015), Vol. 542, pp. 320–332. M. Korobov, “Morphological analyzer and generator for Russian and Ukrainian languages,” in Analysis of Images, Social Networks and Texts, AIST 2015, Ed. by M. Khachay, N. Konstantinova, A. Panchenko, D. Ignatov, and V. Labunets, Communications in Computer and Information Science (Springer, Cham, 2015), Vol. 542, pp. 320–332.
18.
go back to reference A. D. Moskvina, D. Orlova, P. V. Panicheva, and O. A. Mitrofanova, “Development of the core for syntactic parser for Russian language based on the NLTK libraries,” in Computational Linguistics and Ontology, Proc. XIX International Conference “Internet and Modern Society” IMS-2016 (St. Petersburg, Russia, 2016), pp. 44–54 [in Russian]. A. D. Moskvina, D. Orlova, P. V. Panicheva, and O. A. Mitrofanova, “Development of the core for syntactic parser for Russian language based on the NLTK libraries,” in Computational Linguistics and Ontology, Proc. XIX International ConferenceInternet and Modern SocietyIMS-2016 (St. Petersburg, Russia, 2016), pp. 44–54 [in Russian].
Metadata
Title
Estimation of the Closeness to a Semantic Pattern of a Topical Text without Construction of Periphrases
Authors
D. V. Mikhaylov
G. M. Emelyanov
Publication date
01-10-2019
Publisher
Pleiades Publishing
Published in
Pattern Recognition and Image Analysis / Issue 4/2019
Print ISSN: 1054-6618
Electronic ISSN: 1555-6212
DOI
https://doi.org/10.1134/S1054661819040114

Other articles of this Issue 4/2019

Pattern Recognition and Image Analysis 4/2019 Go to the issue

Premium Partner