2005 | OriginalPaper | Buchkapitel
Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification
verfasst von : Chong Wang, Wenyuan Wang
Erschienen in: Advances in Knowledge Discovery and Data Mining
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The similarity measure is a crucial step in many machine learning problems. The traditional cosine similarity suffers from its inability to represent the semantic relationship of terms. This paper explores the kernel-based similarity measure by using term clustering. An affinity matrix of terms is constructed via the co-occurrence of the terms in both unsupervised and supervised ways. Normalized cut is employed to do the clustering to cut off the noisy edges. Diffusion kernel is adopted to measure the kernel-like similarity of the terms in the same cluster. Experiments demonstrate our methods can give satisfactory results, even when the training set is small.