01.10.2022 | AUTOMATED TEXT PROCESSING
Patterns of Using the Z-Score for Text Classification Purposes
Erschienen in: Automatic Documentation and Mathematical Linguistics | Ausgabe 5/2022
Einloggen, um Zugang zu erhaltenAbstract
This paper describes procedures of the use of the Z-score for text document classification purposes. The author tested the efficiency of this approach to the solution of authorship attribution and genre classification tasks, based on the analysis of distribution of stop words. The paper finds that the calculation of this score based on the raw counts of stop words produces a negative result, while its calculation based on the deviations of frequencies of stop words from the Zipfian score allows a higher classification efficiency. Matching against the previously developed Y-method demonstrated a higher Z-score efficiency for the solution of text classification purposes.
Anzeige