nach oben

Pattern Analysis and Applications

Erschienen in:

22.04.2020 | Theoretical advances

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

verfasst von: Kavan Fatehi, Mohsen Rezvani, Mansoor Fateh

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.

Vorheriger Artikel Automatic plagiarism detection in obfuscated text

Nächster Artikel Unsupervised visual domain adaptation via discriminative dictionary evolution

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: DASFAA , vol 4443, pp 152–163. Springer

Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGKDD record, vol 28, pp 61–72. ACM

Agrawal R, Gehrke JE, Gunopulos D, Raghavan P (1999) Automatic subspace clustering of high dimensional data for data mining applications, Dec 14. US Patent 6,003,029

Aksehirli E, Goethals B, Müller E (2015) Efficient cluster detection by ordered neighborhoods. In: International conference on big data analytics and knowledge discovery, pp 15–27. Springer

Assent I, Krieger R, Müller E, Seidl T (2007) Dusc: dimensionality unbiased subspace clustering. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 409–414. IEEE

Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining, 2004. ICDM’04, pp 27–34. IEEE

Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78MathSciNetCrossRef

Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446CrossRef

Chu Y-H, Huang J-W, Chuang K-T, Yang D-N, Chen M-S (2010) Density conscious subspace clustering for high-dimensional data. IEEE Trans Knowl Data Eng 22(1):16–30CrossRef

10.

De Raedt L, Jaeger M, Lee SD, Mannila H (2010) A theory of inductive query answering. In: Inductive databases and constraint-based data mining, pp 79–103. Springer

11.

Deng Z, Choi K-S, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106MathSciNetCrossRef

12.

Dongen S (2000) Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science)

13.

Esmin AA, Coelho RA, Matwin S (2015) A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif Intell Rev 44(1):23–45CrossRef

14.

Gan G, Ng MK-P (2015) Subspace clustering using affinity propagation. Pattern Recognit 48(4):1455–1464CrossRef

15.

Gan G, Ng MK-P (2015) Subspace clustering with automatic feature grouping. Pattern Recognit 48(11):3703–3713CrossRef

16.

Goil S, Nagesh H, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452. ACM

17.

Hu J, Pei J (2017) Subspace multi-clustering: a review. Knowl Inf Syst 56:1–28

18.

Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 246–256

19.

Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining, pp 8–pp. IEEE

20.

Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2, pp 1214–1219. IEEE

21.

Mai ST, Assent I, Storgaard M (2016) Anydbc: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1025–1034. ACM

22.

McWilliams B, Montana G (2014) Subspace clustering of high-dimensional data: a predictive approach. Data Min Knowl Discov 28(3):736–772MathSciNetCrossRef

23.

Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 977–986. ACM

24.

Nie F, Wang X, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: AAAI, pp 1969–1976

25.

Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: International conference on machine learning, pp 1062–1070

26.

Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998. SIAM

27.

Procopiuc CM, Jones M, Agarwal PK, Murali T (2002) A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 418–427. ACM

28.

Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef

29.

Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397MathSciNetCrossRef

30.

Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH

31.

Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13:1620–1628CrossRef

32.

Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714CrossRef

33.

Zhu B, Mozo A, Ordozgoiti B (2016) PSCEG: an unbiased parallel subspace clustering algorithm using exact grids. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)

34.

Zhu L, Cao L, Yang J, Lei J (2014) Evolving soft subspace clustering. Appl Soft Comput 14:210–228CrossRef

Titel: ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
verfasst von: Kavan Fatehi
Mohsen Rezvani
Mansoor Fateh
Publikationsdatum: 22.04.2020
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 4/2020
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-020-00884-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2020

Multi-view region-adaptive multi-temporal DMM and RGB action recognition

TimeFly algorithm: a novel behavior-inspired movie recommendation paradigm

Margin setting algorithm for pattern classification via spheres

An instance and variable selection approach in pixel-based classification for automatic white blood cells segmentation

Cost-sensitive sample shifting in feature space

Local binary pattern-based on-road vehicle detection in urban traffic scene

Premium Partner