Skip to main content
Erschienen in: Automatic Documentation and Mathematical Linguistics 3/2021

01.05.2021 | AUTOMATED TEXT PROCESSING

A New Method of Automatic Text Document Classification

verfasst von: V. A. Yatsko

Erschienen in: Automatic Documentation and Mathematical Linguistics | Ausgabe 3/2021

Einloggen, um Zugang zu erhalten

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper describes the procedures and specific features of application of a new method of automatic classification based on calculation of deviations of stop-words distribution from Zipfian score. To neutralize discrepancies in texts lengths the author describes and applies the text undersampling methodology. The concept of an iterative threshold level is introduced to reduce text dimensionality to several dozen units. To evaluate the method’s efficiency the author has developed discriminative and similarative powers indicators that underlie the generalized efficiency score. Fourteen tests have been conducted, including comparison with the cosine similarity measure, that proved high efficiency of the proposed method for the solution of the tasks of authorship attribution of texts of fiction and clusterization of political texts.
Literatur
2.
Zurück zum Zitat Yatsko, V.A., Automatic text classification method based on Zipf’s law, Autom. Doc. Math. Linguist., 2015, vol. 49, no. 3, pp. 83–88.CrossRef Yatsko, V.A., Automatic text classification method based on Zipf’s law, Autom. Doc. Math. Linguist., 2015, vol. 49, no. 3, pp. 83–88.CrossRef
3.
Zurück zum Zitat Yatsko, V.A., A methodology of using a concordancer and table processor for authorship attribution, Autom. Doc. Math. Linguist., 2020, vol. 54, no. 5, pp. 269–274.CrossRef Yatsko, V.A., A methodology of using a concordancer and table processor for authorship attribution, Autom. Doc. Math. Linguist., 2020, vol. 54, no. 5, pp. 269–274.CrossRef
7.
Zurück zum Zitat Haj-Yahia, Z., Sieg, A., and Deleris, L.A., Towards unsupervised text classification leveraging experts and word embeddings, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 371–379. https://www.aclweb.org/anthology/P19-1036.pdf. Haj-Yahia, Z., Sieg, A., and Deleris, L.A., Towards unsupervised text classification leveraging experts and word embeddings, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 371–379. https://​www.​aclweb.​org/​anthology/​P19-1036.​pdf.
14.
Zurück zum Zitat Free eBooks – Project Gutenberg, 2021. https:// www.gutenberg.org/. Free eBooks – Project Gutenberg, 2021. https:// www.gutenberg.org/.
16.
Zurück zum Zitat Yatsko, V.A., Starikov, M.S., and Butakov, A.V., Automatic genre recognition and adaptive text summarization, Autom. Doc. Math. Linguist., 2010, vol. 44, no. 3, pp. 111–120.CrossRef Yatsko, V.A., Starikov, M.S., and Butakov, A.V., Automatic genre recognition and adaptive text summarization, Autom. Doc. Math. Linguist., 2010, vol. 44, no. 3, pp. 111–120.CrossRef
Metadaten
Titel
A New Method of Automatic Text Document Classification
verfasst von
V. A. Yatsko
Publikationsdatum
01.05.2021
Verlag
Pleiades Publishing
Erschienen in
Automatic Documentation and Mathematical Linguistics / Ausgabe 3/2021
Print ISSN: 0005-1055
Elektronische ISSN: 1934-8371
DOI
https://doi.org/10.3103/S0005105521030080

Weitere Artikel der Ausgabe 3/2021

Automatic Documentation and Mathematical Linguistics 3/2021 Zur Ausgabe