Skip to main content
Top
Published in: Soft Computing 14/2019

11-05-2018 | Methodologies and Application

Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

Authors: Xian Fang, Zhixin Tie, Yinan Guan, Shanshan Rao

Published in: Soft Computing | Issue 14/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value (\(\alpha \)), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105
go back to reference Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM sigmod international conference on management of data, vol 28, pp 49–60 Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM sigmod international conference on management of data, vol 28, pp 49–60
go back to reference Barbieri F, Mazzoni A, Logothetis NK, Panzeri S, Brunel N (2014) Stimulus dependence of local field potential spectra: experiment versus theory. J Neurosci 34(44):14589–14605CrossRef Barbieri F, Mazzoni A, Logothetis NK, Panzeri S, Brunel N (2014) Stimulus dependence of local field potential spectra: experiment versus theory. J Neurosci 34(44):14589–14605CrossRef
go back to reference Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37(1):54–115CrossRefMATH Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37(1):54–115CrossRefMATH
go back to reference Carpenter GA, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3(2):129–152CrossRef Carpenter GA, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3(2):129–152CrossRef
go back to reference Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330CrossRef Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330CrossRef
go back to reference Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203CrossRefMATH Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203CrossRefMATH
go back to reference Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef
go back to reference Ding SF, Du MJ, Sun TF, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313CrossRef Ding SF, Du MJ, Sun TF, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313CrossRef
go back to reference Ding SF, Jia HJ, Du MJ, Xue Y (2018) A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci 429:215–228MathSciNetCrossRef Ding SF, Jia HJ, Du MJ, Xue Y (2018) A semi-supervised approximate spectral clustering algorithm based on HMRF model. Inf Sci 429:215–228MathSciNetCrossRef
go back to reference Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef
go back to reference Dutta M, Mahanta AK, Pujari AK (2005) QROCK: a quick version of the ROCK algorithm for clustering of categorical data. Pattern Recogn Lett 26(15):2364–2373CrossRef Dutta M, Mahanta AK, Pujari AK (2005) QROCK: a quick version of the ROCK algorithm for clustering of categorical data. Pattern Recogn Lett 26(15):2364–2373CrossRef
go back to reference Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, vol 96, pp 226–231 Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, vol 96, pp 226–231
go back to reference Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172 Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172
go back to reference Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82CrossRef Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82CrossRef
go back to reference Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512–521 Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512–521
go back to reference Horn D, Gottlieb A (2002) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88(1):1–4 Horn D, Gottlieb A (2002) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88(1):1–4
go back to reference Huang JL, Zhu QS, Yang LJ, Cheng DD, Wu QW (2017) QCC: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357MathSciNetCrossRefMATH Huang JL, Zhu QS, Yang LJ, Cheng DD, Wu QW (2017) QCC: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357MathSciNetCrossRefMATH
go back to reference Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRef Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRef
go back to reference Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recogn 58:39–48CrossRef Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recogn 58:39–48CrossRef
go back to reference Li YL, Shen Y (2010) An automatic fuzzy c-means algorithm for image segmentation. Soft Comput 14(2):123–128CrossRef Li YL, Shen Y (2010) An automatic fuzzy c-means algorithm for image segmentation. Soft Comput 14(2):123–128CrossRef
go back to reference Liew AW, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075CrossRef Liew AW, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075CrossRef
go back to reference Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297 Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297
go back to reference Madan S, Dana KJ (2015) Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering. Pattern Anal Appl 19:1–18 Madan S, Dana KJ (2015) Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering. Pattern Anal Appl 19:1–18
go back to reference Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef
go back to reference Omran MGH, Engelbrecht AP, Salman A (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605CrossRef Omran MGH, Engelbrecht AP, Salman A (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605CrossRef
go back to reference Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRef Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRef
go back to reference Rasmussen CE (2000) The infinite gaussian mixture model. Adv Neural Inf Process Syst 12:554–560 Rasmussen CE (2000) The infinite gaussian mixture model. Adv Neural Inf Process Syst 12:554–560
go back to reference Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef
go back to reference Tomasev N, Radovanovic M, Mladenic D, Lvanovic M (2014) The role of hubness in clustering high-dimensional data. IEEE Trans Knowl Data Eng 26(3):739–751CrossRef Tomasev N, Radovanovic M, Mladenic D, Lvanovic M (2014) The role of hubness in clustering high-dimensional data. IEEE Trans Knowl Data Eng 26(3):739–751CrossRef
go back to reference Van der Maaten LJP (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245MathSciNetMATH Van der Maaten LJP (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245MathSciNetMATH
go back to reference Van der Maaten LJP, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605MATH Van der Maaten LJP, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605MATH
go back to reference Wang SL, Wang DK, Li CY, Li Y, Ding GY (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402CrossRef Wang SL, Wang DK, Li CY, Li Y, Ding GY (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402CrossRef
go back to reference Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: International conference on very large data bases, Inc, pp 186–195 Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: International conference on very large data bases, Inc, pp 186–195
go back to reference Wu YC (2014) A top-down information theoretic word clustering algorithm for phrase recognition. Inf Sci 275:213–225CrossRef Wu YC (2014) A top-down information theoretic word clustering algorithm for phrase recognition. Inf Sci 275:213–225CrossRef
go back to reference Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef
go back to reference Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86CrossRefMATH Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 100(1):68–86CrossRefMATH
go back to reference Zang WK, Ren LY, Zhang WQ, Liu XY (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8):1750023MathSciNetCrossRef Zang WK, Ren LY, Zhang WQ, Liu XY (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8):1750023MathSciNetCrossRef
go back to reference Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. Acm Sigmod Record 25(2):103–114CrossRef Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. Acm Sigmod Record 25(2):103–114CrossRef
go back to reference Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182CrossRef Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182CrossRef
Metadata
Title
Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding
Authors
Xian Fang
Zhixin Tie
Yinan Guan
Shanshan Rao
Publication date
11-05-2018
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 14/2019
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3221-y

Other articles of this Issue 14/2019

Soft Computing 14/2019 Go to the issue

Premium Partner