Skip to main content
Erschienen in: Pattern Recognition and Image Analysis 3/2020

01.07.2020 | ARTIFICIAL INTELLIGENCE TECHNIQUES IN PATTERN RECOGNITION AND IMAGE ANALYSIS

Hierarchization of Topical Texts Based on the Estimate of Proximity to the Semantic Pattern without Paraphrasing

verfasst von: D. V. Mikhaylov, G. M. Emelyanov

Erschienen in: Pattern Recognition and Image Analysis | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper is devoted to the problem of numerically estimating the mutual semantic dependence of topical texts with respect to the most rational (i.e., standard) variants for describing the knowledge fragments they represent. The proximity of the text to the standard is evaluated without searching for paraphrases. This problem is relevant in determining the significance of information sources regarding tasks performed by the user. At this point, an example is the search for the optimal order of working with primary sources in the formation of the individual educational trajectory of a student. In the proposed solution, the basis for assessing the proximity of a text to the standard is the division of the words of each of its phrases into classes according to the value of the TF-IDF measure relative to the texts of the corpus, which was previously formed by an expert. The analyzed texts are the abstracts of scientific articles together with their titles. The principles of ranking and subsequent hierarchization of texts of an original collection based on the assessment variants relative to the title and phrase with the closest proximity to the standard are considered. The semantic images of the texts that are the closest to the standard are determined by the words with the highest TF-IDF values, which, when located next to each other in a linear row of a phrase, are most likely related by meaning and form key combinations together with the words that are close to the average value of the specified measure. An analysis of the occurrence of words with the highest TF-IDF values in different texts of the collection assesses the relationship of their standards as the basis for assessing the complementarity of texts in meaning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat D. V. Mikhaylov and G. M. Emelyanov, “Estimation of the closeness to a semantic pattern of a topical text without construction of periphrases,” Pattern Recogn. Image Anal. 29 (4), 647–653 (2019).CrossRef D. V. Mikhaylov and G. M. Emelyanov, “Estimation of the closeness to a semantic pattern of a topical text without construction of periphrases,” Pattern Recogn. Image Anal. 29 (4), 647–653 (2019).CrossRef
2.
Zurück zum Zitat Yu. O. Trusova and V. N. Beloozerov, “Representation of classification systems in the form of ontologies (review),” Nauchno-Tekh. Inf. Ser. 1 (Scientific and Technical Information. Ser. 1. Organization and Methods of Information Work), No. 11, 34–38 (2015) [in Russian]. Yu. O. Trusova and V. N. Beloozerov, “Representation of classification systems in the form of ontologies (review),” Nauchno-Tekh. Inf. Ser. 1 (Scientific and Technical Information. Ser. 1. Organization and Methods of Information Work), No. 11, 34–38 (2015) [in Russian].
3.
Zurück zum Zitat A. Ianina and K. Vorontsov, “Regularized multimodal hierarchical topic model for document-by-document exploratory search,” in Mathematical Methods for Pattern Recognition (MMPR-2019): Book of Abstracts of the 19th All-Russian Conference with International Participation (Moscow, 2019) (Russian Academy of Sciences, Moscow, 2019), pp. 256–258. A. Ianina and K. Vorontsov, “Regularized multimodal hierarchical topic model for document-by-document exploratory search,” in Mathematical Methods for Pattern Recognition (MMPR-2019): Book of Abstracts of the 19th All-Russian Conference with International Participation (Moscow, 2019) (Russian Academy of Sciences, Moscow, 2019), pp. 256–258.
4.
Zurück zum Zitat A. Kuzmin, A, Aduenko, and V. Strijov, “Thematic classification using expert model for major conference abstracts,” Inf. Tekhnol. 20 (6), 22–26 (2014) [in Russian]. A. Kuzmin, A, Aduenko, and V. Strijov, “Thematic classification using expert model for major conference abstracts,” Inf. Tekhnol. 20 (6), 22–26 (2014) [in Russian].
5.
Zurück zum Zitat M. Eremeev and K. Vorontsov, “Lexical quantile-based text complexity measure,” in Proc. Int. Conf. on Recent Advances in Natural Language Processing (RANLP 2019) (Varna, Bulgaria, September 2–4, 2019), pp. 270–275. M. Eremeev and K. Vorontsov, “Lexical quantile-based text complexity measure,” in Proc. Int. Conf. on Recent Advances in Natural Language Processing (RANLP 2019) (Varna, Bulgaria, September 2–4, 2019), pp. 270–275.
6.
Zurück zum Zitat G. M. Emelyanov, D. V. Mikhaylov, and A. P. Kozlov, “Formation of the representation of topical knowledge units in the problem of their estimation on the basis of open tests,” Mash. Obuch. Anal. Dannykh (Mach. Learn. Data Anal.) 1 (8), 1089–1106 (2014) [in Russian]. G. M. Emelyanov, D. V. Mikhaylov, and A. P. Kozlov, “Formation of the representation of topical knowledge units in the problem of their estimation on the basis of open tests,” Mash. Obuch. Anal. Dannykh (Mach. Learn. Data Anal.) 1 (8), 1089–1106 (2014) [in Russian].
7.
Zurück zum Zitat N. Yu. Korneeva, D. N. Korneev, A. A. Loskutov, and N. V. Uvarina, “The technology of modular education as a tool for the creation of individual educational trajectory of the student,” Vestn. Chelyab. Gos. Pedagog. Univ. (Herald of the Chelyabinsk State Pedagogical University), No. 7, 49–55 (2016) [in Russian]. N. Yu. Korneeva, D. N. Korneev, A. A. Loskutov, and N. V. Uvarina, “The technology of modular education as a tool for the creation of individual educational trajectory of the student,” Vestn. Chelyab. Gos. Pedagog. Univ. (Herald of the Chelyabinsk State Pedagogical University), No. 7, 49–55 (2016) [in Russian].
8.
Zurück zum Zitat D. Mikhaylov and G. Emelyanov, “Estimation by phrases for the closeness of a topical text to the semantic pattern without paraphrasing,” in Proc. 14th Int. Conf. on Interactive Systems: Problems of Human-Computer Interaction (IS-2019) (Ulyanovsk, Russia, September 24-27, 2019), pp. 23–31. Available at: http://ceur-ws.org/Vol-2475/paper2.pdf. D. Mikhaylov and G. Emelyanov, “Estimation by phrases for the closeness of a topical text to the semantic pattern without paraphrasing,” in Proc. 14th Int. Conf. on Interactive Systems: Problems of Human-Computer Interaction (IS-2019) (Ulyanovsk, Russia, September 24-27, 2019), pp. 23–31. Available at: http://​ceur-ws.​org/​Vol-2475/​paper2.​pdf.​
9.
Zurück zum Zitat N. G. Zagoruiko, Applied Methods of Data and Knowledge Analysis (Institute of Mathematics SD RAS, Novosibirsk, 1999) [in Russian]. N. G. Zagoruiko, Applied Methods of Data and Knowledge Analysis (Institute of Mathematics SD RAS, Novosibirsk, 1999) [in Russian].
10.
Zurück zum Zitat M. Sahlgren, “The distributional hypothesis,” From Context to Meaning: Distributional Models of the Lexicon in Linguistics and Cognitive Science: Special issue of the Italian Journal of Linguistics, Rivista di Linguistica 20 (1), 33–53 (2008). M. Sahlgren, “The distributional hypothesis,” From Context to Meaning: Distributional Models of the Lexicon in Linguistics and Cognitive Science: Special issue of the Italian Journal of Linguistics, Rivista di Linguistica 20 (1), 33–53 (2008).
11.
Zurück zum Zitat D. V. Mikhaylov, A. P. Kozlov, and G. M. Emelyanov, “An approach based on TF-IDF metrics to extract the knowledge and their linguistic forms of expression on the subject-oriented text set,” Comput. Opt. 39 (3), 429–438 (2015) [in Russian].CrossRef D. V. Mikhaylov, A. P. Kozlov, and G. M. Emelyanov, “An approach based on TF-IDF metrics to extract the knowledge and their linguistic forms of expression on the subject-oriented text set,” Comput. Opt. 39 (3), 429–438 (2015) [in Russian].CrossRef
12.
Zurück zum Zitat The Eclipse Foundation. Available at: https://www.eclipse.org. The Eclipse Foundation. Available at: https://​www.​eclipse.​org.​
13.
Zurück zum Zitat G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “Relevance of a set of topical texts to a knowledge unit and the estimation of the closeness of linguistic forms of its expression to a semantic pattern,” Pattern Recogn. Image Anal. 28 (4), 771–782 (2018).CrossRef G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “Relevance of a set of topical texts to a knowledge unit and the estimation of the closeness of linguistic forms of its expression to a semantic pattern,” Pattern Recogn. Image Anal. 28 (4), 771–782 (2018).CrossRef
14.
Zurück zum Zitat PDFMiner — Python PDF parser and analyzer. Available at: https://euske.github.io/pdfminer/. PDFMiner — Python PDF parser and analyzer. Available at: https://​euske.​github.​io/​pdfminer/​.​
15.
Zurück zum Zitat Natural Language Toolkit. Available at: http://www.nltk.org/. Natural Language Toolkit. Available at: http://​www.​nltk.​org/​.​
16.
Zurück zum Zitat M. Korobov, “Morphological analyzer and generator for Russian and Ukrainian languages,” in Analysis of Images, Social Networks and Texts, AIST 2015, Ed. by M. Yu. Khachay, Communications in Computer and Information Science (Springer, Cham, 2018), Vol. 542, pp. 320–332. M. Korobov, “Morphological analyzer and generator for Russian and Ukrainian languages,” in Analysis of Images, Social Networks and Texts, AIST 2015, Ed. by M. Yu. Khachay, Communications in Computer and Information Science (Springer, Cham, 2018), Vol. 542, pp. 320–332.
17.
Zurück zum Zitat A. D. Moskvina, D. Orlova, P. V. Panicheva, and O. A. Mitrofanova, “Development of the Core for Syntactic Parser for Russian based on NLTK libraries,” in Computer Linguistics and Computational Ontologies, Proc. XIX International Joint Scientific Conference “Internet and Modern Society” (IMS-2016) (St. Petersburg, 2016), pp. 44–54 [In Russian]. A. D. Moskvina, D. Orlova, P. V. Panicheva, and O. A. Mitrofanova, “Development of the Core for Syntactic Parser for Russian based on NLTK libraries,” in Computer Linguistics and Computational Ontologies, Proc. XIX International Joint Scientific ConferenceInternet and Modern Society” (IMS-2016) (St. Petersburg, 2016), pp. 44–54 [In Russian].
18.
Zurück zum Zitat G. M. Adel’son-Vel’skii and E. M. Landis, “An algorithm for organization of information,” Dokl. Akad. Nauk SSSR 146 (2), P. 263–266 (1962) [In Russian].MathSciNet G. M. Adel’son-Vel’skii and E. M. Landis, “An algorithm for organization of information,” Dokl. Akad. Nauk SSSR 146 (2), P. 263–266 (1962) [In Russian].MathSciNet
Metadaten
Titel
Hierarchization of Topical Texts Based on the Estimate of Proximity to the Semantic Pattern without Paraphrasing
verfasst von
D. V. Mikhaylov
G. M. Emelyanov
Publikationsdatum
01.07.2020
Verlag
Pleiades Publishing
Erschienen in
Pattern Recognition and Image Analysis / Ausgabe 3/2020
Print ISSN: 1054-6618
Elektronische ISSN: 1555-6212
DOI
https://doi.org/10.1134/S1054661820030207

Weitere Artikel der Ausgabe 3/2020

Pattern Recognition and Image Analysis 3/2020 Zur Ausgabe

MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION, AND UNDERSTANDING

On the Metric on Images Invariant with Respect to the Monotonic Brightness Transformation

MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION, AND UNDERSTANDING

3D Shape Recognition Based on 1D Signal Processing for Real-Time Applications

MATHEMATICAL THEORY OF IMAGES AND SIGNALS REPRESENTING, PROCESSING, ANALYSIS, RECOGNITION, AND UNDERSTANDING

Probabilistic Decision Based Improved Trimmed Median Filter to Remove High-Density Salt and Pepper Noise