Skip to main content
Erschienen in: Automatic Documentation and Mathematical Linguistics 5/2022

01.10.2022 | AUTOMATED TEXT PROCESSING

Patterns of Using the Z-Score for Text Classification Purposes

verfasst von: V. A. Yatsko

Erschienen in: Automatic Documentation and Mathematical Linguistics | Ausgabe 5/2022

Einloggen, um Zugang zu erhalten

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper describes procedures of the use of the Z-score for text document classification purposes. The author tested the efficiency of this approach to the solution of authorship attribution and genre classification tasks, based on the analysis of distribution of stop words. The paper finds that the calculation of this score based on the raw counts of stop words produces a negative result, while its calculation based on the deviations of frequencies of stop words from the Zipfian score allows a higher classification efficiency. Matching against the previously developed Y-method demonstrated a higher Z-score efficiency for the solution of text classification purposes.
Fußnoten
1
Fox’s list and the application are available at http://​iatskota.​wixsite.​com/​yatsko.
 
2
https://www.gutenburg.
 
Literatur
2.
Zurück zum Zitat Yatsko, V.A., Y-method of text classification, Grani Poznaniya, 2021, no. 3, pp. 52–56. http://grani.vspu. ru/jurnal/79. Yatsko, V.A., Y-method of text classification, Grani Poznaniya, 2021, no. 3, pp. 52–56. http://​grani.​vspu.​ ru/jurnal/79.
3.
Zurück zum Zitat Z-score, 2022. https://www.sciencedirect.com/topics/engineering/z-score. Z-score, 2022. https://​www.​sciencedirect.​com/​topics/​engineering/​z-score.​
5.
Zurück zum Zitat Kummer, O. and Savoy, J., Feature selection in sentiment analysis, Conf. en recherche d’information et applications, Bordeaux, 2012, pp. 273–284. http://www.asso-aria.org/coria/2012/273.pdf. Kummer, O. and Savoy, J., Feature selection in sentiment analysis, Conf. en recherche d’information et applications, Bordeaux, 2012, pp. 273–284. http://​www.​asso-aria.​org/​coria/​2012/​273.​pdf.​
6.
Zurück zum Zitat Pandey, A. and Jain, A., Comparative analysis of KNN algorithm using various normalization techniques, Int. J. Comput. Network Inf. Secur., 2017, no. 11, pp. 36–42. http://j.mecs-press.net/ijcnis/ijcnis-v9-n11/IJCNIS-V9-N11-4.pdf. Pandey, A. and Jain, A., Comparative analysis of KNN algorithm using various normalization techniques, Int. J. Comput. Network Inf. Secur., 2017, no. 11, pp. 36–42. http://​j.​mecs-press.​net/​ijcnis/​ijcnis-v9-n11/​IJCNIS-V9-N11-4.​pdf.​
7.
Zurück zum Zitat Westergaard, D. and Jensen, L., Z scores for text mining, 2018. https://figshare.com/articles/dataset/Z_scores_ for_text_mining/5340514. Westergaard, D. and Jensen, L., Z scores for text mining, 2018. https://​figshare.​com/​articles/​dataset/​Z_​scores_​ for_text_mining/5340514.
8.
Zurück zum Zitat Liu, B., Li, X., Lee, W.S., and Yu, Ph.S., Text classification by labeling words, AAAI’04: Proc. 19th Natl. Conf. on Artificial Intelligence, 2004, pp. 425–430. https://www.cs.uic.edu/~liub/publications/aaai04-labelWords.pdf. Liu, B., Li, X., Lee, W.S., and Yu, Ph.S., Text classification by labeling words, AAAI’04: Proc. 19th Natl. Conf. on Artificial Intelligence, 2004, pp. 425–430. https://​www.​cs.​uic.​edu/​~liub/​publications/​aaai04-labelWords.​pdf.​
9.
Zurück zum Zitat Mahinovs, A. and Tiwari, A., Text classification method review, Cranfield, UK: Cranfield Univ., 2007. https://dspace.lib.cranfield.ac.uk/bitstream/handle/ 1826/1860/mahinovs.pdf?sequence=1&isAllowed=y. Mahinovs, A. and Tiwari, A., Text classification method review, Cranfield, UK: Cranfield Univ., 2007. https://​dspace.​lib.​cranfield.​ac.​uk/​bitstream/​handle/​ 1826/1860/mahinovs.pdf?sequence=1&isAllowed=y.
Metadaten
Titel
Patterns of Using the Z-Score for Text Classification Purposes
verfasst von
V. A. Yatsko
Publikationsdatum
01.10.2022
Verlag
Pleiades Publishing
Erschienen in
Automatic Documentation and Mathematical Linguistics / Ausgabe 5/2022
Print ISSN: 0005-1055
Elektronische ISSN: 1934-8371
DOI
https://doi.org/10.3103/S0005105522050041

Weitere Artikel der Ausgabe 5/2022

Automatic Documentation and Mathematical Linguistics 5/2022 Zur Ausgabe