2008 | OriginalPaper | Buchkapitel
Semi-supervised Document Classification with a Mislabeling Error Model
verfasst von : Anastasia Krithara, Massih R. Amini, Jean-Michel Renders, Cyril Goutte
Erschienen in: Advances in Information Retrieval
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the estimation of the new model parameters before the next round. Our approach outperforms an earlier semi-supervised extension of PLSA introduced by [9] which is based on the use of
fake labels
. However, it maintains its simplicity and ability to solve multiclass problems. In addition, it gives valuable information about the most uncertain and difficult classes to label. We perform experiments over the
20Newsgroups
,
WebKB
and
Reuters
document collections and show the effectiveness of our approach over two other semi-supervised algorithms applied to these text classification problems.