2007 | OriginalPaper | Buchkapitel
A Simple Probability Based Term Weighting Scheme for Automated Text Classification
verfasst von : Ying Liu, Han Tong Loh
Erschienen in: New Trends in Applied Artificial Intelligence
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
In the automated text classification,
tfidf
is often considered as the default term weighting scheme and has been widely reported in literature. However,
tfidf
does not directly reflect terms’ category membership. Inspired by the analysis of various feature selection methods, we propose a simple probability based term weighting scheme which directly utilizes two critical information ratios, i.e. relevance indicators. These relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study based on two data sets, including Reuters-21578, demonstrates that the proposed probability based term weighting scheme outperforms
tfidf
significantly using Bayesian classifier and Support Vector Machines (SVM).