2005 | OriginalPaper | Buchkapitel
Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm
verfasst von : Liping Jing, Michael K. Ng, Jun Xu, Joshua Zhexue Huang
Erschienen in: Advances in Knowledge Discovery and Data Mining
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
This paper presents a new method to solve the problem of clustering large and complex text data. The method is based on a new subspace clustering algorithm that automatically calculates the feature weights in the
k
-means clustering process. In clustering sparse text data the feature weights are used to discover clusters from subspaces of the document vector space and identify key words that represent the semantics of the clusters. We present a modification of the published algorithm to solve the sparsity problem that occurs in text clustering. Experimental results on real-world text data have shown that the new method outperformed the
Standard KMeans
and
Bisection-KMeans
algorithms, while still maintaining efficiency of the
k-means
clustering process.