Skip to main content
Erschienen in: Neural Computing and Applications 11/2019

26.07.2018 | Original Article

A local cores-based hierarchical clustering algorithm for data sets with complex structures

verfasst von: Dongdong Cheng, Qingsheng Zhu, Jinlong Huang, Quanwang Wu, Lijun Yang

Erschienen in: Neural Computing and Applications | Ausgabe 11/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hierarchical clustering is of great importance in data analysis. Although there are a number of hierarchical clustering algorithms including agglomerative methods, divisive methods and hybrid methods, most of them are sensitive to noise points, suffer from high computational cost and cannot effectively discover clusters with complex structures. When recognizing patterns from complex structures, humans intuitively tend to discover obvious clusters in dense regions firstly and then deal with objects on the border. Inspired by this idea, we propose a local cores-based hierarchical clustering algorithm called HCLORE. The proposed method first partitions the data set into several clusters by finding local cores, instead of optimizing an objective function through iteration like K-means; then temporarily removes points with lower local density, so that the boundary between clusters is clearer; after that merges clusters according to a newly defined similarities between clusters; and finally points with lower local density are assigned to the same clusters as their local cores belong to. The experimental results on synthetic data sets and real data sets show that our algorithm is more effective and efficient than existing methods when processing data sets with complex structures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The MATLAB code is available upon request.
 
Literatur
1.
Zurück zum Zitat Bouguettaya A, Yu Q, Liu XM, Zhou XM, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797CrossRef Bouguettaya A, Yu Q, Liu XM, Zhou XM, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797CrossRef
2.
Zurück zum Zitat Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203CrossRef Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203CrossRef
3.
Zurück zum Zitat Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586CrossRef Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586CrossRef
4.
Zurück zum Zitat Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2016) Decentralized clustering by finding loose and distributed density cores. Inf Sci (in press) Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2016) Decentralized clustering by finding loose and distributed density cores. Inf Sci (in press)
5.
Zurück zum Zitat Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(C):238–253CrossRef Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(C):238–253CrossRef
6.
Zurück zum Zitat Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef
7.
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231 Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
8.
9.
Zurück zum Zitat Fu LM, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform 8:3CrossRef Fu LM, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform 8:3CrossRef
10.
Zurück zum Zitat Gionis A, Mannila H (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRef Gionis A, Mannila H (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRef
11.
Zurück zum Zitat Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23CrossRef Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23CrossRef
12.
Zurück zum Zitat Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) Qcc: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357MathSciNetCrossRef Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) Qcc: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357MathSciNetCrossRef
13.
Zurück zum Zitat Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666CrossRef Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666CrossRef
14.
Zurück zum Zitat Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell Proc 3776:1–10CrossRef Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell Proc 3776:1–10CrossRef
15.
Zurück zum Zitat Jia HJ, Ding SF, Meng LH, Fan SY (2014) A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput Appl 25(7–8):1557–1567CrossRef Jia HJ, Ding SF, Meng LH, Fan SY (2014) A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput Appl 25(7–8):1557–1567CrossRef
16.
Zurück zum Zitat Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRef Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRef
17.
Zurück zum Zitat King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101CrossRef King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101CrossRef
18.
Zurück zum Zitat Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering. Pattern Recognit Lett 73:52–59CrossRef Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering. Pattern Recognit Lett 73:52–59CrossRef
19.
Zurück zum Zitat Lin CR, Chen MS (2005) Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Trans Knowl Data Eng 17(2):145–159CrossRef Lin CR, Chen MS (2005) Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Trans Knowl Data Eng 17(2):145–159CrossRef
21.
Zurück zum Zitat Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171(C):9–22CrossRef Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171(C):9–22CrossRef
22.
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297
23.
Zurück zum Zitat Moss WW, Hendrick JA (1973) Numerical taxonomy. Ann Rev Entomol 18:227–258CrossRef Moss WW, Hendrick JA (1973) Numerical taxonomy. Ann Rev Entomol 18:227–258CrossRef
24.
Zurück zum Zitat Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North ChelmsfordMATH Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North ChelmsfordMATH
25.
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef
26.
Zurück zum Zitat Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, London Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, London
27.
Zurück zum Zitat Samaria FS, Hater AC (2014) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision. IEEE, pp 138–142 Samaria FS, Hater AC (2014) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision. IEEE, pp 138–142
28.
Zurück zum Zitat Shao JM, He X, Bohm C, Yang QL, Plant C (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Data Eng 25(4):893–905CrossRef Shao JM, He X, Bohm C, Yang QL, Plant C (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Data Eng 25(4):893–905CrossRef
29.
Zurück zum Zitat Wang G, Son Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985CrossRef Wang G, Son Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985CrossRef
31.
Zurück zum Zitat Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef
32.
Zurück zum Zitat Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218CrossRef Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218CrossRef
35.
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD record, vol 25. ACM, pp 103–114 Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD record, vol 25. ACM, pp 103–114
36.
Zurück zum Zitat Zhang X, Wang W, Norvag K, Sebag M (2010) K-ap: generating specified k clusters by efficient affinity propagation. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 1187–1192 Zhang X, Wang W, Norvag K, Sebag M (2010) K-ap: generating specified k clusters by efficient affinity propagation. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 1187–1192
37.
Zurück zum Zitat Zhu QS, Feng J, Huang JL (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36CrossRef Zhu QS, Feng J, Huang JL (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36CrossRef
Metadaten
Titel
A local cores-based hierarchical clustering algorithm for data sets with complex structures
verfasst von
Dongdong Cheng
Qingsheng Zhu
Jinlong Huang
Quanwang Wu
Lijun Yang
Publikationsdatum
26.07.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 11/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3641-8

Weitere Artikel der Ausgabe 11/2019

Neural Computing and Applications 11/2019 Zur Ausgabe

Premium Partner