Skip to main content
Top
Published in: Automatic Documentation and Mathematical Linguistics 3/2021

01-05-2021 | AUTOMATED TEXT PROCESSING

A New Method of Automatic Text Document Classification

Author: V. A. Yatsko

Published in: Automatic Documentation and Mathematical Linguistics | Issue 3/2021

Login to get access

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes the procedures and specific features of application of a new method of automatic classification based on calculation of deviations of stop-words distribution from Zipfian score. To neutralize discrepancies in texts lengths the author describes and applies the text undersampling methodology. The concept of an iterative threshold level is introduced to reduce text dimensionality to several dozen units. To evaluate the method’s efficiency the author has developed discriminative and similarative powers indicators that underlie the generalized efficiency score. Fourteen tests have been conducted, including comparison with the cosine similarity measure, that proved high efficiency of the proposed method for the solution of the tasks of authorship attribution of texts of fiction and clusterization of political texts.
Literature
2.
go back to reference Yatsko, V.A., Automatic text classification method based on Zipf’s law, Autom. Doc. Math. Linguist., 2015, vol. 49, no. 3, pp. 83–88.CrossRef Yatsko, V.A., Automatic text classification method based on Zipf’s law, Autom. Doc. Math. Linguist., 2015, vol. 49, no. 3, pp. 83–88.CrossRef
3.
go back to reference Yatsko, V.A., A methodology of using a concordancer and table processor for authorship attribution, Autom. Doc. Math. Linguist., 2020, vol. 54, no. 5, pp. 269–274.CrossRef Yatsko, V.A., A methodology of using a concordancer and table processor for authorship attribution, Autom. Doc. Math. Linguist., 2020, vol. 54, no. 5, pp. 269–274.CrossRef
7.
14.
go back to reference Free eBooks – Project Gutenberg, 2021. https:// www.gutenberg.org/. Free eBooks – Project Gutenberg, 2021. https:// www.gutenberg.org/.
16.
go back to reference Yatsko, V.A., Starikov, M.S., and Butakov, A.V., Automatic genre recognition and adaptive text summarization, Autom. Doc. Math. Linguist., 2010, vol. 44, no. 3, pp. 111–120.CrossRef Yatsko, V.A., Starikov, M.S., and Butakov, A.V., Automatic genre recognition and adaptive text summarization, Autom. Doc. Math. Linguist., 2010, vol. 44, no. 3, pp. 111–120.CrossRef
Metadata
Title
A New Method of Automatic Text Document Classification
Author
V. A. Yatsko
Publication date
01-05-2021
Publisher
Pleiades Publishing
Published in
Automatic Documentation and Mathematical Linguistics / Issue 3/2021
Print ISSN: 0005-1055
Electronic ISSN: 1934-8371
DOI
https://doi.org/10.3103/S0005105521030080

Other articles of this Issue 3/2021

Automatic Documentation and Mathematical Linguistics 3/2021 Go to the issue

Premium Partner