Skip to main content

2001 | OriginalPaper | Buchkapitel

A Scalable Hierarchical Algorithm for Unsupervised Clustering

verfasst von : Daniel Boley

Erschienen in: Data Mining for Scientific and Engineering Applications

Verlag: Springer US

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Top-down hierarchical clustering can be done in a scalable way. Here we describe a scalable unsupervised clustering algorithm designed for large datasets from a variety of applications. The method constructs a tree of nested clusters top-down, where each cluster in the tree is split according to the leading principal direction. We use a fast principal direction solver to achieve a fast overall method. The algorithm can be applied to any dataset whose entries can be embedded in a high dimensional Euclidean space, and takes full advantage of any sparsity present in the data. We show the performance of the method on text document data, in terms of both scalability and quality of clusters. We demonstrate the versatility of the method in different domains by showing results from text documents, human cancer gene expression data, and astrophysical data. For that last domain, we use an out of core variant of the underlying method which is capable of efficiently clustering large datasets using only a relatively small memory partition.

Metadaten
Titel
A Scalable Hierarchical Algorithm for Unsupervised Clustering
verfasst von
Daniel Boley
Copyright-Jahr
2001
Verlag
Springer US
DOI
https://doi.org/10.1007/978-1-4615-1733-7_21

Premium Partner