Skip to main content

2000 | OriginalPaper | Buchkapitel

Centroid-Based Document Classification: Analysis and Experimental Results

verfasst von : Eui-Hong (Sam) Han, George Karypis

Erschienen in: Principles of Data Mining and Knowledge Discovery

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

In this paper we present a simple linear-time centroid-based document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. Our experiments show that this centroidbased classifier consistently and substantially outperforms other algorithms such as Naive Bayesian, k-nearest-neighbors, and C4.5, on a wide range of datasets. Our analysis shows that the similarity measure used by the centroid-based scheme allows it to classify a new document based on how closely its behavior matches the behavior of the documents belonging to different classes. This matching allows it to dynamically adjust for classes with different densities and accounts for dependencies between the terms in the different classes.

Metadaten
Titel
Centroid-Based Document Classification: Analysis and Experimental Results
verfasst von
Eui-Hong (Sam) Han
George Karypis
Copyright-Jahr
2000
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-45372-5_46