Weitere Artikel dieser Ausgabe durch Wischen aufrufen
D.V. Mikhaylov Born 1974. Graduated from the Yaroslav-the-Wise Novgorod State University, Novgorod, in 1997. Obtained his PhD (Kandidat Nauk) and his Doctoral (Doktor Nauk) degrees in Physics and Mathematics in 2003 and 2013, respectively. From 2000 to 2007 has worked at the Department of Computer Software of Novgorod State University. Now he is a Docent of the Department of Information Technologies and Systems at the same university. Since 2002 is a member of Russian Association for Pattern Recognition and Image Analysis. Scientific interests: computational linguistics and artificial intelligence. In scientific area of Pattern Recognition and Image Analysis has 43 papers.
G.M. Emelyanov Born 1943. Graduated from the Leningrad Institute of Electrical Engineering in 1966. Obtained his PhD (Kandidat Nauk) and his Doctoral (Doktor Nauk) degrees in 1971 and 1990, respectively. From 1993 to 2003, a Dean of the Faculty of Mathematics and Computer Science at Yaroslav-the-Wise Novgorod State University. Now he is a Professor of the Department of Information Technologies and Systems at the same university. Scientific interests: construction of problem-oriented computing systems of image processing and analysis. He is the author of 98 publications in the field of pattern recognition and image analysis.
Translated by A.M. Khaitin
The paper considers the problem of numerical estimation of the closeness of a topical text to the most rational linguistic variant (i.e. semantic pattern or sense standard) of the description of the knowledge fragment it represents without paraphrasing. This problem is relevant when implementing targeted selection of text information by the maximum of the useful semantic component with respect to the tasks solved by the user. Examples of practical applications may include selection of papers for scientific publishing and design of training courses and educational portals. In the suggested solution, the basis of the estimate of the closeness of the text to the semantic pattern is the splitting of the words of each of its phrases into classes by the TF-IDF metric value relative to texts of a corpus preformed by an expert. Abstracts of scientific papers together with their titles are analyzed. The suggested numerical estimate of closeness to the sense standard makes it possible to rank articles by the significance of the described fragments of knowledge regarding a given subject area and by non-redundancy of the description itself. Here, the semantic images of the texts closest to the semantic pattern specify the words with the highest TF-IDF values, which, when placed next to each other in the linear series of a phrase, are, most probably, semantically related and form key combinations with words whose mentioned metric is close to average. To classify word combinations as key ones, the interpretation of the TF-IDF metric, estimating the number of simultaneous occurrences of all words in the analyzed combination into phrases of the individual document, is introduced.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
G. M. Emelyanov and E. I. Smirnova, “Logical model of hypertext image database,” Pattern Recogn. Image Anal. 9 (3), 458–491 (1999).
A. A. Kuzmin, A. A. Aduenko, and V. V. Strijov, “Thematic classification using expert model for major conference abstracts,” Inf. Technol. 6 (214), 22–26 (2014) [in Russian].
E. Huang, Paraphrase Detection Using Recursive Autoencoder. Available at: http://nlp.stanford.edu/courses/cs224n/2011/reports/ehhuang.pdf (Accessed March 14, 2019).
ParaPhraser: Paraphrasing and text synonymization [In Russian]. Available at: http://paraphraser.ru/ (Accessed March 14, 2019).
I. V. Sochenkov, D. V. Zubarev, and I. V. Smirnov, “The ParaPlag: Russian dataset for paraphrased plagiarism detection,” in Computational Linguistics and Intellectual Technologies, Proc. Annual International Conference “ Dialogue 2017” (Moscow, Russia, 2017), Vol. 1. pp. 284–297.
SyntaxNet: Neural Models of Syntax. Available at: https://github.com/tensorflow/models/tree//master/research/syntaxnet (Accessed March 14, 2019).
Russian Paraphrase Detection Task. Available at: https://ainlconf.ru/2016/paraphraser (Accessed March 14, 2019).
G. M. Emelyanov, D. V. Mikhaylov, and A. P. Kozlov, “Formation of the representation of topical knowledge units in the problem of their estimation on the basis of open tests,” Mash. Obuch. Anal. Dannykh (Mach. Learn. Data Anal.) 1 (8), 1089–1106 (2014) [in Russian].
G. M. Emelyanov, D. V. Mikhailov, and A. P. Kozlov, “Relevance of a set of topical texts to a knowledge unit and the estimation of the closeness of linguistic forms of its expression to a semantic pattern,” Pattern Recogn. Image Anal. 28 (4), 771–782 (2018). CrossRef
N. G. Zagoruiko, Applied Methods of Data and Knowledge Analysis (Institute of Mathematics SD RAS, Novosibirsk, 1999) [in Russian].
Proceedings of the Southwest State University. Series: Control, Computer Engineering, Information Science. Medical Instruments Engineering. Available at: https://swsu.ru/izvestiya/seriesivt/eng/ (Accessed March 18, 2019).
The Eclipse Foundation. Available at: https://www.eclipse.org (Accessed March 21, 2019).
N. Chirkova, R. Aysina, and K. Vorontsov, “Hierarchical additively regularized topic model of a scientific conference,” in Mathematical Methods for Pattern Recognition, Book of Abstracts of 17th All- Russian Conference with International Participation MMPR-17 (Svetlogorsk, Russia, 2015) (Torus Press, Moscow, 2015), p. 231 [in Russian].
MaltParser – A data-driven dependency parser. Available at: http://www.maltparser.org/ (Accessed March 23, 2019).
PDFMiner – Python PDF parser and analyzer. Available at: https://euske.github.io/pdfminer/ (Accessed March 25, 2019).
Natural Language Toolkit. Available at: http://www.nltk.org/ (Accessed March 25, 2019).
M. Korobov, “Morphological analyzer and generator for Russian and Ukrainian languages,” in Analysis of Images, Social Networks and Texts, AIST 2015, Ed. by M. Khachay, N. Konstantinova, A. Panchenko, D. Ignatov, and V. Labunets, Communications in Computer and Information Science (Springer, Cham, 2015), Vol. 542, pp. 320–332.
A. D. Moskvina, D. Orlova, P. V. Panicheva, and O. A. Mitrofanova, “Development of the core for syntactic parser for Russian language based on the NLTK libraries,” in Computational Linguistics and Ontology, Proc. XIX International Conference “ Internet and Modern Society” IMS- 2016 (St. Petersburg, Russia, 2016), pp. 44–54 [in Russian].
- Estimation of the Closeness to a Semantic Pattern of a Topical Text without Construction of Periphrases
D. V. Mikhaylov
G. M. Emelyanov
- Pleiades Publishing
Neuer Inhalt/© ITandMEDIA