Top

Soft Computing

Published in:

11-05-2018 | Methodologies and Application

Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

Authors: Xian Fang, Zhixin Tie, Yinan Guan, Shanshan Rao

Published in: Soft Computing | Issue 14/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (\(\alpha \)), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.

previous article An approach using Dempster–Shafer evidence theory to fuse multi-source observations for dam safety estimation

next article Tauberian theorems for statistical summability methods of sequences of fuzzy numbers

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105

Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM sigmod international conference on management of data, vol 28, pp 49–60

Barbieri F, Mazzoni A, Logothetis NK, Panzeri S, Brunel N (2014) Stimulus dependence of local field potential spectra: experiment versus theory. J Neurosci 34(44):14589–14605CrossRef

Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37(1):54–115CrossRefMATH

Carpenter GA, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3(2):129–152CrossRef

Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330CrossRef

Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203CrossRefMATH

Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef

Ding SF, Du MJ, Sun TF, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313CrossRef

Ding SF, Jia HJ, Du MJ, Xue Y (2018) A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci 429:215–228MathSciNetCrossRef

Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef

Dutta M, Mahanta AK, Pujari AK (2005) QROCK: a quick version of the ROCK algorithm for clustering of categorical data. Pattern Recogn Lett 26(15):2364–2373CrossRef

Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, vol 96, pp 226–231

Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172

Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82CrossRef

Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512–521

Horn D, Gottlieb A (2002) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88(1):1–4

Huang JL, Zhu QS, Yang LJ, Cheng DD, Wu QW (2017) QCC: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357MathSciNetCrossRefMATH

Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRef

Kohonen T (1998) The self-organizing map. Neurocomputing 21(1):1–6MathSciNetCrossRefMATH

Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recogn 58:39–48CrossRef

Li YL, Shen Y (2010) An automatic fuzzy c-means algorithm for image segmentation. Soft Comput 14(2):123–128CrossRef

Liew AW, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075CrossRef

Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297

Madan S, Dana KJ (2015) Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering. Pattern Anal Appl 19:1–18

Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef

Omran MGH, Engelbrecht AP, Salman A (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605CrossRef

Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRef

Rasmussen CE (2000) The infinite gaussian mixture model. Adv Neural Inf Process Syst 12:554–560

Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef

Tomasev N, Radovanovic M, Mladenic D, Lvanovic M (2014) The role of hubness in clustering high-dimensional data. IEEE Trans Knowl Data Eng 26(3):739–751CrossRef

Van der Maaten LJP (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245MathSciNetMATH

Van der Maaten LJP, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605MATH

Wang SL, Wang DK, Li CY, Li Y, Ding GY (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402CrossRef

Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: International conference on very large data bases, Inc, pp 186–195

Wu YC (2014) A top-down information theoretic word clustering algorithm for phrase recognition. Inf Sci 275:213–225CrossRef

Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef

Xu DK, Tian YJ (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193MathSciNetCrossRef

Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86CrossRefMATH

Zang WK, Ren LY, Zhang WQ, Liu XY (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8):1750023MathSciNetCrossRef

Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. Acm Sigmod Record 25(2):103–114CrossRef

Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182CrossRef

Title: Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding
Authors: Xian Fang
Zhixin Tie
Yinan Guan
Shanshan Rao
Publication date: 11-05-2018
Publisher: Springer Berlin Heidelberg
Published in: Soft Computing / Issue 14/2019
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-018-3221-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 14/2019

A formal model of semantic computing

EQ-algebras based on fuzzy hyper EQ-filters

Utilization of nature-inspired algorithms for gas condensate reservoir optimization

An Arabic text categorization approach using term weighting and multiple reducts

Secondary load frequency control for multi-microgrids: HiL real-time simulation

On weak implication algebra

Premium Partner