Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2020

22.04.2020 | Theoretical advances

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

verfasst von: Kavan Fatehi, Mohsen Rezvani, Mansoor Fateh

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: DASFAA , vol 4443, pp 152–163. Springer Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: DASFAA , vol 4443, pp 152–163. Springer
2.
Zurück zum Zitat Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGKDD record, vol 28, pp 61–72. ACM Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGKDD record, vol 28, pp 61–72. ACM
3.
Zurück zum Zitat Agrawal R, Gehrke JE, Gunopulos D, Raghavan P (1999) Automatic subspace clustering of high dimensional data for data mining applications, Dec 14. US Patent 6,003,029 Agrawal R, Gehrke JE, Gunopulos D, Raghavan P (1999) Automatic subspace clustering of high dimensional data for data mining applications, Dec 14. US Patent 6,003,029
4.
Zurück zum Zitat Aksehirli E, Goethals B, Müller E (2015) Efficient cluster detection by ordered neighborhoods. In: International conference on big data analytics and knowledge discovery, pp 15–27. Springer Aksehirli E, Goethals B, Müller E (2015) Efficient cluster detection by ordered neighborhoods. In: International conference on big data analytics and knowledge discovery, pp 15–27. Springer
5.
Zurück zum Zitat Assent I, Krieger R, Müller E, Seidl T (2007) Dusc: dimensionality unbiased subspace clustering. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 409–414. IEEE Assent I, Krieger R, Müller E, Seidl T (2007) Dusc: dimensionality unbiased subspace clustering. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 409–414. IEEE
6.
Zurück zum Zitat Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining, 2004. ICDM’04, pp 27–34. IEEE Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining, 2004. ICDM’04, pp 27–34. IEEE
7.
Zurück zum Zitat Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78MathSciNetCrossRef Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78MathSciNetCrossRef
8.
Zurück zum Zitat Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446CrossRef Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446CrossRef
9.
Zurück zum Zitat Chu Y-H, Huang J-W, Chuang K-T, Yang D-N, Chen M-S (2010) Density conscious subspace clustering for high-dimensional data. IEEE Trans Knowl Data Eng 22(1):16–30CrossRef Chu Y-H, Huang J-W, Chuang K-T, Yang D-N, Chen M-S (2010) Density conscious subspace clustering for high-dimensional data. IEEE Trans Knowl Data Eng 22(1):16–30CrossRef
10.
Zurück zum Zitat De Raedt L, Jaeger M, Lee SD, Mannila H (2010) A theory of inductive query answering. In: Inductive databases and constraint-based data mining, pp 79–103. Springer De Raedt L, Jaeger M, Lee SD, Mannila H (2010) A theory of inductive query answering. In: Inductive databases and constraint-based data mining, pp 79–103. Springer
11.
12.
Zurück zum Zitat Dongen S (2000) Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science) Dongen S (2000) Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science)
13.
Zurück zum Zitat Esmin AA, Coelho RA, Matwin S (2015) A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif Intell Rev 44(1):23–45CrossRef Esmin AA, Coelho RA, Matwin S (2015) A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif Intell Rev 44(1):23–45CrossRef
14.
Zurück zum Zitat Gan G, Ng MK-P (2015) Subspace clustering using affinity propagation. Pattern Recognit 48(4):1455–1464CrossRef Gan G, Ng MK-P (2015) Subspace clustering using affinity propagation. Pattern Recognit 48(4):1455–1464CrossRef
15.
Zurück zum Zitat Gan G, Ng MK-P (2015) Subspace clustering with automatic feature grouping. Pattern Recognit 48(11):3703–3713CrossRef Gan G, Ng MK-P (2015) Subspace clustering with automatic feature grouping. Pattern Recognit 48(11):3703–3713CrossRef
16.
Zurück zum Zitat Goil S, Nagesh H, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452. ACM Goil S, Nagesh H, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452. ACM
17.
Zurück zum Zitat Hu J, Pei J (2017) Subspace multi-clustering: a review. Knowl Inf Syst 56:1–28 Hu J, Pei J (2017) Subspace multi-clustering: a review. Knowl Inf Syst 56:1–28
18.
Zurück zum Zitat Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 246–256 Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 246–256
19.
Zurück zum Zitat Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining, pp 8–pp. IEEE Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining, pp 8–pp. IEEE
20.
Zurück zum Zitat Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2, pp 1214–1219. IEEE Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2, pp 1214–1219. IEEE
21.
Zurück zum Zitat Mai ST, Assent I, Storgaard M (2016) Anydbc: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1025–1034. ACM Mai ST, Assent I, Storgaard M (2016) Anydbc: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1025–1034. ACM
22.
Zurück zum Zitat McWilliams B, Montana G (2014) Subspace clustering of high-dimensional data: a predictive approach. Data Min Knowl Discov 28(3):736–772MathSciNetCrossRef McWilliams B, Montana G (2014) Subspace clustering of high-dimensional data: a predictive approach. Data Min Knowl Discov 28(3):736–772MathSciNetCrossRef
23.
Zurück zum Zitat Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 977–986. ACM Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 977–986. ACM
24.
Zurück zum Zitat Nie F, Wang X, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: AAAI, pp 1969–1976 Nie F, Wang X, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: AAAI, pp 1969–1976
25.
Zurück zum Zitat Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: International conference on machine learning, pp 1062–1070 Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: International conference on machine learning, pp 1062–1070
26.
Zurück zum Zitat Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998. SIAM Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998. SIAM
27.
Zurück zum Zitat Procopiuc CM, Jones M, Agarwal PK, Murali T (2002) A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 418–427. ACM Procopiuc CM, Jones M, Agarwal PK, Murali T (2002) A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 418–427. ACM
28.
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef
29.
Zurück zum Zitat Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397MathSciNetCrossRef Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397MathSciNetCrossRef
30.
Zurück zum Zitat Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH
31.
Zurück zum Zitat Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13:1620–1628CrossRef Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13:1620–1628CrossRef
32.
Zurück zum Zitat Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714CrossRef Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714CrossRef
33.
Zurück zum Zitat Zhu B, Mozo A, Ordozgoiti B (2016) PSCEG: an unbiased parallel subspace clustering algorithm using exact grids. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) Zhu B, Mozo A, Ordozgoiti B (2016) PSCEG: an unbiased parallel subspace clustering algorithm using exact grids. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
34.
Zurück zum Zitat Zhu L, Cao L, Yang J, Lei J (2014) Evolving soft subspace clustering. Appl Soft Comput 14:210–228CrossRef Zhu L, Cao L, Yang J, Lei J (2014) Evolving soft subspace clustering. Appl Soft Comput 14:210–228CrossRef
Metadaten
Titel
ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
verfasst von
Kavan Fatehi
Mohsen Rezvani
Mansoor Fateh
Publikationsdatum
22.04.2020
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-020-00884-7

Weitere Artikel der Ausgabe 4/2020

Pattern Analysis and Applications 4/2020 Zur Ausgabe

Premium Partner