Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 3/2021

20.09.2020 | Original Article

GDPC: generalized density peaks clustering algorithm based on order similarity

verfasst von: Xiaofei Yang, Zhiling Cai, Ruijia Li, William Zhu

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, San FranciscoMATH Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, San FranciscoMATH
2.
Zurück zum Zitat Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM computing surveys (CSUR) 31(3):264–323CrossRef Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM computing surveys (CSUR) 31(3):264–323CrossRef
3.
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discovery Data Min 96(34):226–231 Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discovery Data Min 96(34):226–231
4.
Zurück zum Zitat Ankerst M, Breunig MM, Kriegel H, Sander J (1999) Optics: ordering points to identify the clustering structure. Proc ACM Sigmod Rec 28(2):49–60CrossRef Ankerst M, Breunig MM, Kriegel H, Sander J (1999) Optics: ordering points to identify the clustering structure. Proc ACM Sigmod Rec 28(2):49–60CrossRef
5.
Zurück zum Zitat Xu XW, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of 14th IEEE international conference data engineering (ICDE),Orlando, Florida, USA, pp 324–331 Xu XW, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of 14th IEEE international conference data engineering (ICDE),Orlando, Florida, USA, pp 324–331
6.
Zurück zum Zitat Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd international conference on very large data bases(VLDB), Athens, Greece, pp 186–195 Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd international conference on very large data bases(VLDB), Athens, Greece, pp 186–195
7.
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef
8.
Zurück zum Zitat Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145CrossRef Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145CrossRef
9.
Zurück zum Zitat Mehmood R, Zhang GZ, Bie RF, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef Mehmood R, Zhang GZ, Bie RF, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef
10.
Zurück zum Zitat Liu YH, Ma ZM, Yu F (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220CrossRef Liu YH, Ma ZM, Yu F (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220CrossRef
11.
Zurück zum Zitat Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349CrossRef Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349CrossRef
12.
Zurück zum Zitat Guo ZS, Huang TY, Cai ZL, Zhu W (2018) A new local density for density peak clustering. In: Advances in knowledge discovery and data mining- 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia. Proceedings, part III (PAKDD ). Lecture notes in computer science, 10939. pp 426–438 Guo ZS, Huang TY, Cai ZL, Zhu W (2018) A new local density for density peak clustering. In: Advances in knowledge discovery and data mining- 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia. Proceedings, part III (PAKDD ). Lecture notes in computer science, 10939. pp 426–438
13.
Zurück zum Zitat Ding JJ, He XX, Yuan JQ, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796CrossRef Ding JJ, He XX, Yuan JQ, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796CrossRef
14.
Zurück zum Zitat Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40CrossRef
15.
Zurück zum Zitat Seyedi AS, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328CrossRef Seyedi AS, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328CrossRef
16.
Zurück zum Zitat Jiang JH, Chen YJ, Meng XQ, Wang LM, Li KQ (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys Stat Mech Appl 523(1):702–713CrossRef Jiang JH, Chen YJ, Meng XQ, Wang LM, Li KQ (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys Stat Mech Appl 523(1):702–713CrossRef
17.
Zurück zum Zitat Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226MathSciNetCrossRef Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226MathSciNetCrossRef
18.
Zurück zum Zitat Chang H, Yeung D (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203CrossRef Chang H, Yeung D (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203CrossRef
19.
Zurück zum Zitat Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034CrossRef Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034CrossRef
20.
Zurück zum Zitat Xie B, Li LJ, Mi JS (2016) A novel approach for ranking in interval-valued information systems. J Intell Fuzzy Syst 30(1):523–534CrossRef Xie B, Li LJ, Mi JS (2016) A novel approach for ranking in interval-valued information systems. J Intell Fuzzy Syst 30(1):523–534CrossRef
21.
Zurück zum Zitat Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: 21st international conference on data engineering (ICDE’05), Tokoyo, Japan, pp 341–352 Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: 21st international conference on data engineering (ICDE’05), Tokoyo, Japan, pp 341–352
22.
Zurück zum Zitat Hua Q, Bai LJ, Wang XZ, Liu YC (2012) Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition. Neurocomputing 86:150–157CrossRef Hua Q, Bai LJ, Wang XZ, Liu YC (2012) Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition. Neurocomputing 86:150–157CrossRef
23.
Zurück zum Zitat Tan AH, Wei WZ, Tao YZ (2017) On the belief structures and reductions of multigranulation spaces with decisions. Int J Approx Reason 88:39–52MathSciNetCrossRef Tan AH, Wei WZ, Tao YZ (2017) On the belief structures and reductions of multigranulation spaces with decisions. Int J Approx Reason 88:39–52MathSciNetCrossRef
24.
25.
Zurück zum Zitat Zhu W (2009b) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225MathSciNetCrossRef Zhu W (2009b) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225MathSciNetCrossRef
26.
Zurück zum Zitat Li RJ, Yang XF, Qin XL, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184(15):104905–104913CrossRef Li RJ, Yang XF, Qin XL, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184(15):104905–104913CrossRef
27.
Zurück zum Zitat Singh D, Febbo PG, Ross K et al (2012) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209CrossRef Singh D, Febbo PG, Ross K et al (2012) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209CrossRef
28.
Zurück zum Zitat Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Method Softw 1(1):23–34CrossRef Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Method Softw 1(1):23–34CrossRef
29.
Zurück zum Zitat Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554CrossRef Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554CrossRef
30.
Zurück zum Zitat Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142CrossRef
31.
Zurück zum Zitat Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (COIL-20) Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (COIL-20)
32.
Zurück zum Zitat Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. WACV, pp 138–142 Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. WACV, pp 138–142
33.
Zurück zum Zitat Asuncion A, Newman D (2007) Uci machine learning repository Asuncion A, Newman D (2007) Uci machine learning repository
34.
Zurück zum Zitat Li ZJ, Tang YC (2018) comparative density peaks clustering. Expert Syst Appl 95:236–247CrossRef Li ZJ, Tang YC (2018) comparative density peaks clustering. Expert Syst Appl 95:236–247CrossRef
35.
Zurück zum Zitat Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining, pp 47–58 Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining, pp 47–58
36.
Zurück zum Zitat Nie FP, Wang XQ, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. AAAI, pp 1969–1976 Nie FP, Wang XQ, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. AAAI, pp 1969–1976
37.
Zurück zum Zitat Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRef Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRef
38.
Zurück zum Zitat Zhang W, Wang XG, Zhao DL, Tang XO (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: Proceedings of the 12th European conference on computer vision, pp 28–441 Zhang W, Wang XG, Zhao DL, Tang XO (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: Proceedings of the 12th European conference on computer vision, pp 28–441
39.
Zurück zum Zitat Zheng X, Cai D, He XF, Ma WY, Lin XY (2004) Locality preserving clustering for image database. In: Proceedings of the 12th ACM international conference on multimedia, New York, NY, USA, pp 885–891 Zheng X, Cai D, He XF, Ma WY, Lin XY (2004) Locality preserving clustering for image database. In: Proceedings of the 12th ACM international conference on multimedia, New York, NY, USA, pp 885–891
40.
Zurück zum Zitat Wu MR, Schölkopf B (2006) A local learning approach for clustering. In: Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp 1529–1536 Wu MR, Schölkopf B (2006) A local learning approach for clustering. In: Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp 1529–1536
41.
Zurück zum Zitat Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586CrossRef Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586CrossRef
Metadaten
Titel
GDPC: generalized density peaks clustering algorithm based on order similarity
verfasst von
Xiaofei Yang
Zhiling Cai
Ruijia Li
William Zhu
Publikationsdatum
20.09.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 3/2021
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01198-0

Weitere Artikel der Ausgabe 3/2021

International Journal of Machine Learning and Cybernetics 3/2021 Zur Ausgabe

Neuer Inhalt