Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 6/2020

13.11.2019 | Original Article

Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph

verfasst von: Jian-cong Fan, Pei-ling Jia, Linqiang Ge

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering by fast search and detection of density peaks (DPC, Density Peaks Clustering) is a relatively novel clustering algorithm published in the Science journal. As a density-based clustering algorithm, DPC produces better clustering results while using less parameters than other relevant algorithms. However, we found that the DPC algorithm does not perform well if clusters with different densities are very close. To address this problem, we propose a new DPC algorithm by incorporating an improved mutual k-nearest-neighbor graph (Mk-NNG) into DPC. Our Mk-NNG-DPC algorithm leverages the distance matrix of data samples to improve the Mk-NNG, and then utilizes DPC to constrain and select cluster centers. The proposed Mk-NNG-DPC algorithm ensures an instance to be allocated to the fittest cluster. Experimental results on synthetic and real world datasets show that our Mk-NNG-DPC algorithm can effectively and efficiently improve clustering performance, even for clusters with arbitrary shapes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, BurlingtonMATH Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, BurlingtonMATH
2.
Zurück zum Zitat Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New YorkMATH Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New YorkMATH
3.
Zurück zum Zitat Chifu AG, Hristea F, Mothe J, Popescu M (2015) Word sense discrimination in information retrieval: a spectral clustering-based approach. Inf Process Manag 51(2):16–31 Chifu AG, Hristea F, Mothe J, Popescu M (2015) Word sense discrimination in information retrieval: a spectral clustering-based approach. Inf Process Manag 51(2):16–31
4.
Zurück zum Zitat Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis. Wiley, HobokenMATH Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis. Wiley, HobokenMATH
5.
Zurück zum Zitat Kearns M, Mansour Y, Ng AY (1999) An information-theoretic analysis of hard and soft assignment methods for clustering. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 495–520 Kearns M, Mansour Y, Ng AY (1999) An information-theoretic analysis of hard and soft assignment methods for clustering. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 495–520
6.
Zurück zum Zitat Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769 Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
7.
Zurück zum Zitat Bishnu PS, Bhattacherjee V (2013) A modified K-modes clustering algorithm. pattern recognition and machine intelligence, Volume 8251 of the series Lecture Notes in Computer Science, 2013, pp 60–66 Bishnu PS, Bhattacherjee V (2013) A modified K-modes clustering algorithm. pattern recognition and machine intelligence, Volume 8251 of the series Lecture Notes in Computer Science, 2013, pp 60–66
8.
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD international conference on management of data, 1996, pp 103–114 Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD international conference on management of data, 1996, pp 103–114
9.
Zurück zum Zitat Karypis G, Han E, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput 32(8):68–75 Karypis G, Han E, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput 32(8):68–75
11.
Zurück zum Zitat Ester M, Kriegel H, Sander J, Xu X, Simoudis E, Han J, Fayyad UM (eds) (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI Press, pp 226–231 Ester M, Kriegel H, Sander J, Xu X, Simoudis E, Han J, Fayyad UM (eds) (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI Press, pp 226–231
12.
Zurück zum Zitat Hinneburg A, Gabriel HH (2007) DENCLUE 2.0: fast clustering based on kernel density estimation. In: Proceedings of the 2007 international conference on intelligent data analysis (IDA’07), Ljubljana, Slovenia, 2007, pp 70–80 Hinneburg A, Gabriel HH (2007) DENCLUE 2.0: fast clustering based on kernel density estimation. In: Proceedings of the 2007 international conference on intelligent data analysis (IDA’07), Ljubljana, Slovenia, 2007, pp 70–80
13.
Zurück zum Zitat Banerjee A, Shan H (2010) Model-based clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning, pp 686–689 Banerjee A, Shan H (2010) Model-based clustering. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning, pp 686–689
14.
Zurück zum Zitat Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595 Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595
16.
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496 Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
17.
Zurück zum Zitat Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976MathSciNetMATH Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976MathSciNetMATH
18.
Zurück zum Zitat Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619 Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
19.
Zurück zum Zitat Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905 Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
20.
Zurück zum Zitat Arias-Castro E, Chen G, Lerman G (2011) Spectral clustering based on local linear approximations. Electron J Stat 5(1):1537–1587MathSciNetMATH Arias-Castro E, Chen G, Lerman G (2011) Spectral clustering based on local linear approximations. Electron J Stat 5(1):1537–1587MathSciNetMATH
21.
Zurück zum Zitat Székely GJ, Rizzo ML (2005) Hierarchical clustering via Joint between-within distances: extending ward’s minimum variance method. J Classif 22(2):151–183MathSciNetMATH Székely GJ, Rizzo ML (2005) Hierarchical clustering via Joint between-within distances: extending ward’s minimum variance method. J Classif 22(2):151–183MathSciNetMATH
22.
Zurück zum Zitat Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396 Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
23.
Zurück zum Zitat McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New JerseyMATH McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New JerseyMATH
24.
25.
Zurück zum Zitat Nayak J, Naik B, Behera HS (2014) Fuzzy C-means (FCM) clustering algorithm: a decade review from 2000 to 2014. Computational Intelligence in Data Mining-Volume 2, Volume 32 of the series Smart Innovation, Systems and Technologies, pp 133–149 Nayak J, Naik B, Behera HS (2014) Fuzzy C-means (FCM) clustering algorithm: a decade review from 2000 to 2014. Computational Intelligence in Data Mining-Volume 2, Volume 32 of the series Smart Innovation, Systems and Technologies, pp 133–149
26.
Zurück zum Zitat Brito MR, Chávez EL, Quiroz AJ, Yukich JE (1997) Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42MathSciNetMATH Brito MR, Chávez EL, Quiroz AJ, Yukich JE (1997) Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat Probab Lett 35(1):33–42MathSciNetMATH
27.
Zurück zum Zitat Sardana D, Bhatnagar R (2014) Graph clustering using mutual K-nearest neighbors. Active Media Technology, Volume 8610 of the series Lecture Notes in Computer Science, pp 35–48 Sardana D, Bhatnagar R (2014) Graph clustering using mutual K-nearest neighbors. Active Media Technology, Volume 8610 of the series Lecture Notes in Computer Science, pp 35–48
28.
Zurück zum Zitat Xie J, Gao H, Xie W (2016) K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset. Sci Sin Inf 46(2):258–280 Xie J, Gao H, Xie W (2016) K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset. Sci Sin Inf 46(2):258–280
30.
Zurück zum Zitat Cover TM, Thomas JA (2001) Elements of information theory. Wiley, HobokenMATH Cover TM, Thomas JA (2001) Elements of information theory. Wiley, HobokenMATH
31.
Zurück zum Zitat Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211(10):172–181 Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211(10):172–181
32.
Zurück zum Zitat Wang F, Zhang C (2005) Spectral clustering for time series. In: Proceedings of third international conference on advances in pattern recognition, ICAPR 2005, Bath, UK, August 22–25, 2005, pp 345–354 Wang F, Zhang C (2005) Spectral clustering for time series. In: Proceedings of third international conference on advances in pattern recognition, ICAPR 2005, Bath, UK, August 22–25, 2005, pp 345–354
33.
Zurück zum Zitat Xu X, Ding S, Du M, Xue Y (2018) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754 Xu X, Ding S, Du M, Xue Y (2018) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754
34.
Zurück zum Zitat Du M, Ding S, Xue Y (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140 Du M, Ding S, Xue Y (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140
35.
Zurück zum Zitat Bai X, Yang P, Shi X (2017) An overlapping community detection algorithm based on density peaks. Neurocomputing 226(2):7–15 Bai X, Yang P, Shi X (2017) An overlapping community detection algorithm based on density peaks. Neurocomputing 226(2):7–15
36.
Zurück zum Zitat Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, Berlin Heidelberg, pp 160–172 Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, Berlin Heidelberg, pp 160–172
37.
Zurück zum Zitat Li J, Huang X, Selke C, Yong J (2007) A fast algorithm for finding correlation clusters in noise data. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining, pp 639–647 Li J, Huang X, Selke C, Yong J (2007) A fast algorithm for finding correlation clusters in noise data. In: Proceedings of the 11th Pacific-Asia conference on knowledge discovery and data mining, pp 639–647
38.
Zurück zum Zitat Zhang T-T, Yuan B (2018) Density-based multiscale analysis for clustering in strong noise settings with varying densities. IEEE Access 6:25861–25873 Zhang T-T, Yuan B (2018) Density-based multiscale analysis for clustering in strong noise settings with varying densities. IEEE Access 6:25861–25873
39.
Zurück zum Zitat Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29(11):5304–5318MathSciNet Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29(11):5304–5318MathSciNet
40.
Zurück zum Zitat Wang X, Xing H-J, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654 Wang X, Xing H-J, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
41.
Zurück zum Zitat Wang R, Wang X, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475 Wang R, Wang X, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
42.
Zurück zum Zitat Wang X, Wang R, Chen X (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715 Wang X, Wang R, Chen X (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
43.
Zurück zum Zitat Wang X, Zhang T, Wang R (2019) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380 Wang X, Zhang T, Wang R (2019) Non-iterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380
44.
Zurück zum Zitat Lin JCW, Yang L, Fournier-Viger P, Hong TP (2018) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238 Lin JCW, Yang L, Fournier-Viger P, Hong TP (2018) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238
45.
Zurück zum Zitat Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77 Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
46.
Zurück zum Zitat Chen CM, Xiang B, Liu Y, Wang KH (2019) A secure authentication protocol for internet of vehicles. IEEE ACCESS 7(1):12047–12057 Chen CM, Xiang B, Liu Y, Wang KH (2019) A secure authentication protocol for internet of vehicles. IEEE ACCESS 7(1):12047–12057
47.
Zurück zum Zitat Chen CM, Xiang B, Wang KH, Yeh KH, Wu TY (2018) A robust mutual authentication with a key agreement scheme for session initiation protocol. Appl Sci 8(10):1 Chen CM, Xiang B, Wang KH, Yeh KH, Wu TY (2018) A robust mutual authentication with a key agreement scheme for session initiation protocol. Appl Sci 8(10):1
48.
Zurück zum Zitat Yang C, Huang L, Li F (2018) Exponential synchronization control of discontinuous non-autonomous networks and autonomous coupled networks. Complexity 1:1–10 Yang C, Huang L, Li F (2018) Exponential synchronization control of discontinuous non-autonomous networks and autonomous coupled networks. Complexity 1:1–10
49.
Zurück zum Zitat Lian D, Xianwen F, Chuangxia H (2017) Global exponential convergence in a delayed almost periodic nicholsons blowflies model with discontinuous harvesting. Math Methods Appl Sci 41(5):1954–1965MathSciNetMATH Lian D, Xianwen F, Chuangxia H (2017) Global exponential convergence in a delayed almost periodic nicholsons blowflies model with discontinuous harvesting. Math Methods Appl Sci 41(5):1954–1965MathSciNetMATH
50.
Zurück zum Zitat Lian D, Lihong H, Zhenyuan G (2017) Periodic attractor for reactiondiffusion high-order hopfield neural networks with time-varying delays. Comput Math Appl 73(2):233–245MathSciNetMATH Lian D, Lihong H, Zhenyuan G (2017) Periodic attractor for reactiondiffusion high-order hopfield neural networks with time-varying delays. Comput Math Appl 73(2):233–245MathSciNetMATH
51.
Zurück zum Zitat Huang C, Liu B, Tian X, Yang L, Zhang X (2019) Global convergence on asymptotically almost periodic SICNNs with nonlinear decay functions. Neural Process Lett 49(2):625–641 Huang C, Liu B, Tian X, Yang L, Zhang X (2019) Global convergence on asymptotically almost periodic SICNNs with nonlinear decay functions. Neural Process Lett 49(2):625–641
52.
Zurück zum Zitat Huang C, Zhang H, Huang L (2019) Almost periodicity analysis for a delayed Nicholson’s blowflies model with nonlinear density-dependent mortality term. Commun Pure Appl Anal 18(6):3337–3349MathSciNet Huang C, Zhang H, Huang L (2019) Almost periodicity analysis for a delayed Nicholson’s blowflies model with nonlinear density-dependent mortality term. Commun Pure Appl Anal 18(6):3337–3349MathSciNet
53.
Zurück zum Zitat Huang C, Zhang H (2019) Periodicity of non-autonomous inertial neural networks involving proportional delays and non-reduced order method. Int J Biomath 12(02):1950016MathSciNetMATH Huang C, Zhang H (2019) Periodicity of non-autonomous inertial neural networks involving proportional delays and non-reduced order method. Int J Biomath 12(02):1950016MathSciNetMATH
54.
Zurück zum Zitat Huang C, Cao J, Wen F, Yang X (2016) Stability analysis of SIR model with distributed delay on complex networks. PLoS One 11(8):e0158813 Huang C, Cao J, Wen F, Yang X (2016) Stability analysis of SIR model with distributed delay on complex networks. PLoS One 11(8):e0158813
55.
Zurück zum Zitat Li Y, Fan JC, Pan JS, Mao GH, Wu GK (2019) A novel rough fuzzy clustering algorithm with a new similarity measurement. J Internet Technol 20(4):1 Li Y, Fan JC, Pan JS, Mao GH, Wu GK (2019) A novel rough fuzzy clustering algorithm with a new similarity measurement. J Internet Technol 20(4):1
56.
Zurück zum Zitat Fan J-C, Li Y, Tang Lei-Yu, Geng-Kun W (2018) RoughPSO: rough set-based particle swarm optimisation. Int J Bio-Inspired Comput 12(4):245–253 Fan J-C, Li Y, Tang Lei-Yu, Geng-Kun W (2018) RoughPSO: rough set-based particle swarm optimisation. Int J Bio-Inspired Comput 12(4):245–253
Metadaten
Titel
Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph
verfasst von
Jian-cong Fan
Pei-ling Jia
Linqiang Ge
Publikationsdatum
13.11.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 6/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-019-01031-3

Weitere Artikel der Ausgabe 6/2020

International Journal of Machine Learning and Cybernetics 6/2020 Zur Ausgabe

Neuer Inhalt