2011 | OriginalPaper | Buchkapitel
Tolerance Rough Set Theory Based Data Summarization for Clustering Large Datasets
verfasst von : Bidyut Kr. Patra, Sukumar Nandi
Erschienen in: Transactions on Rough Sets XIV
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Finding clusters in large datasets is an interesting challenge in many fields of Science and Technology. Many clustering methods have been successfully developed over the years. However, most of the existing clustering methods need multiple data scans to get converged. Therefore, these methods cannot be applied for cluster analysis in large datasets. Data summarization can be used as a pre-processing step to speed up classical clustering methods for large datasets. In this paper, we propose a data summarization scheme based on tolerance rough set theory termed
rough bubble.
Rough bubble
utilizes leaders clustering method to collect sufficient statistics of the dataset, which can be used to cluster the dataset. We show that proposed summarization scheme outperforms recently introduced
data bubble
as a summarization scheme when agglomerative hierarchical clustering (single-link) method is applied to it. We also introduce a technique to reduce the number of distance computations required in leaders clustering method. Experiments are conducted with synthetic and real world datasets which show effectiveness of our methods for large datasets.