Top

Soft Computing

Published in:

01-05-2011 | Focus

Efficient approaches for summarizing subspace clusters into k representatives

Authors: Guanhua Chen, Xiuli Ma, Dongqing Yang, Shiwei Tang, Meng Shuai, Kunqing Xie

Published in: Soft Computing | Issue 5/2011

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data’s dimensionality. In this paper, we propose to summarize the set of subspace clusters into k representative clusters to alleviate the problem. Typically, subspace clusters can be clustered further into k groups, and the set of representative clusters can be selected from each group. In such a way, only the most representative subspace clusters will be returned to user. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present two approximate methods: PCoC and HCoC. The greatest advantage of our methods is that we only need a subset of subspace clusters as the input instead of the complete set of subspace clusters. Precisely, only the clusters in low-dimensional subspaces are computed and assembled into representative clusters in high-dimensional subspaces. The approximate results can be found in polynomial time. Our performance study shows both the effectiveness and efficiency of these methods.

previous article A hierarchical multiclass support vector machine incorporated with holistic triple learning units

next article A modified hybrid method of spatial credibilistic clustering and particle swarm optimization

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Afrati F, Gionis A, Mannila H (2004) Approximating a collection of frequent sets. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 12–19

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of international conference on management of data (SIGMOD’98), pp 94–105

Assent I, Krieger R, Müller E, Seidl T (2007) DUSC: dimensionality unbiased subspace clustering. In: Proceedings of the seventh IEEE international conference on data mining (ICDM’07), pp 409–414

Baumgartner C, Kailing K, Kriegel H-P, Kroger P, Plant C (2004) Subspace selection for clustering high dimensional data. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04), pp 11–18

Bohm C, Kailing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04), pp 27–34

Chen G, Ma X, Yang D, Tang S (2008) Discovering the skyline of subspace clusters in high-dimensional data, In: Proceedings of the fifth international conference on fuzzy systems and knowledge discovery (FSKD ‘08), vol 2, pp 439–443

Cheng CH, Fu AC, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 84–93

Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining (KDD’96), pp 226–231

Jin R, Abu-Ata M, Xiang Y, Ruan N (2008) Effective and efficient itemset pattern summarization: regression-based approaches, In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 399–407

Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the fifth IEEE international conference on data mining (ICDM’05)

Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Lecture notes in computer science: database theory, ICDT, pp 398–416

Wang C, Parthasarathy S (2006) Summarizing itemset patterns using probabilistic models. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), pp 730–735

Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases (VLDB’05), pp 709–720

Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05), pp 314–323

Title: Efficient approaches for summarizing subspace clusters into k representatives
Authors: Guanhua Chen
Xiuli Ma
Dongqing Yang
Shiwei Tang
Meng Shuai
Kunqing Xie
Publication date: 01-05-2011
Publisher: Springer-Verlag
Published in: Soft Computing / Issue 5/2011
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-010-0552-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 5/2011

A hierarchical multiclass support vector machine incorporated with holistic triple learning units

An approach to parameters estimation of a chromatography model using a clustering genetic algorithm based inverse model

A modified hybrid method of spatial credibilistic clustering and particle swarm optimization

On the solution of the fuzzy Sylvester matrix equation

Recent progress in natural computation and knowledge discovery: an ICNC’09-FSKD’09 special issue

Image annotation by incorporating word correlations into multi-class SVM

Premium Partner