Skip to main content
Erschienen in: Soft Computing 2/2021

25.08.2020 | Methodologies and Application

A methodology for automatic parameter-tuning and center selection in density-peak clustering methods

Erschienen in: Soft Computing | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The density-peak clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach. The method has the advantage of allowing non-convex clusters, and clusters of variable size and density, to be grouped together, but it also has some limitations, such as the visual location of centers and the parameter tuning. This paper describes an optimization-based methodology for automatic parameter/center selection applicable both to the DPC and to other algorithms derived from it. The objective function is an internal/external cluster validity index, and the decisions are the parameterization of the algorithm and the choice of centers. The internal validation measures lead to an automatic parameter-tuning process, and the external validation measures lead to the so-called optimal rules, which are a tool to bound the performance of a given algorithm from above on the set of parameterizations. A numerical experiment with real data was performed for the DPC and for the fuzzy weighted k-nearest neighbor (FKNN-DPC) which validates the automatic parameter-tuning methodology and demonstrates its efficiency compared to the state of the art.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Accessing scopus on 27th September 2019 gave 1475 references
 
Literatur
Zurück zum Zitat Bai L, Cheng X, Liang J, Shen H, Guo Y (2017) Fast density clustering strategies based on the \(k-\)means algorithm. Pattern Recognit 71:375–386CrossRef Bai L, Cheng X, Liang J, Shen H, Guo Y (2017) Fast density clustering strategies based on the \(k-\)means algorithm. Pattern Recognit 71:375–386CrossRef
Zurück zum Zitat Bie R, Mehmood R, Ruan S, Sun Y, Dawood H (2016) Adaptive fuzzy clustering by fast search and find of density peaks. Pers Ubiquit Comput 20(5):785–793CrossRef Bie R, Mehmood R, Ruan S, Sun Y, Dawood H (2016) Adaptive fuzzy clustering by fast search and find of density peaks. Pers Ubiquit Comput 20(5):785–793CrossRef
Zurück zum Zitat Bu F, Chen Z, Li P, Tang T, Zhang Y (2016) A high-order CFS algorithm for clustering big data. Mob Inf Syst 2016(4356127):1–8 Bu F, Chen Z, Li P, Tang T, Zhang Y (2016) A high-order CFS algorithm for clustering big data. Mob Inf Syst 2016(4356127):1–8
Zurück zum Zitat Chen G, Zhang X, Wang Z, Li F (2015) Robust support vector data description for outlier detection with noise or uncertain data. Knowl-Based Syst 90:129–137CrossRef Chen G, Zhang X, Wang Z, Li F (2015) Robust support vector data description for outlier detection with noise or uncertain data. Knowl-Based Syst 90:129–137CrossRef
Zurück zum Zitat Chen J-Y, He H-H (2015) Research on density-based clustering algorithm for mixed data with determine cluster centers automatically. Acta Autom Sin 41(10):1798–1813MATH Chen J-Y, He H-H (2015) Research on density-based clustering algorithm for mixed data with determine cluster centers automatically. Acta Autom Sin 41(10):1798–1813MATH
Zurück zum Zitat Chen J-Y, He H-H (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf Sci 345:271–293CrossRef Chen J-Y, He H-H (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf Sci 345:271–293CrossRef
Zurück zum Zitat Chen M, Li L, Wang B, Cheng J, Pan L, Chen X (2016) Effectively clustering by finding density backbone based-on kNN. Pattern Recognit 60:486–498CrossRef Chen M, Li L, Wang B, Cheng J, Pan L, Chen X (2016) Effectively clustering by finding density backbone based-on kNN. Pattern Recognit 60:486–498CrossRef
Zurück zum Zitat Criminisi A, Shotton J, Konukoglu E (2011) Decision forests for classification, regression, density estimation, manifold. Microsoft Research technical report Criminisi A, Shotton J, Konukoglu E (2011) Decision forests for classification, regression, density estimation, manifold. Microsoft Research technical report
Zurück zum Zitat Ding J, Chen Z, He X, Zhan Y (2016) Clustering by finding density peaks based on Chebyshev’s inequality. In: Chinese control conference, CCC, pp 7169–7172 Ding J, Chen Z, He X, Zhan Y (2016) Clustering by finding density peaks based on Chebyshev’s inequality. In: Chinese control conference, CCC, pp 7169–7172
Zurück zum Zitat Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796CrossRef Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796CrossRef
Zurück zum Zitat Du M, Ding S, Jia H (2016) Study on density peaks clustering based on \(k-\)nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef Du M, Ding S, Jia H (2016) Study on density peaks clustering based on \(k-\)nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRef
Zurück zum Zitat Du M, Ding S, Xue Y (2017) A novel density peaks clustering algorithm for mixed data. Pattern Recognit Lett 97:46–53CrossRef Du M, Ding S, Xue Y (2017) A novel density peaks clustering algorithm for mixed data. Pattern Recognit Lett 97:46–53CrossRef
Zurück zum Zitat Gao J, Zhao L, Chen Z, Li P, Xu H, Hu Y (2016) ICFS: an improved fast search and find of density peaks clustering algorithm. In: Proceedings—2016 IEEE 14th international conference on dependable, autonomic and secure computing, DASC 2016, 2016 IEEE 14th international conference on pervasive intelligence and computing, PICom 2016, 2016 IEEE 2nd international conference on big data intelligence and computing, DataCom 2016 and 2016 IEEE Cyber Science and Technology Congress, CyberSciTech 2016, DASC-PICom-DataCom-CyberSciTech 2016, pp 537–543 Gao J, Zhao L, Chen Z, Li P, Xu H, Hu Y (2016) ICFS: an improved fast search and find of density peaks clustering algorithm. In: Proceedings—2016 IEEE 14th international conference on dependable, autonomic and secure computing, DASC 2016, 2016 IEEE 14th international conference on pervasive intelligence and computing, PICom 2016, 2016 IEEE 2nd international conference on big data intelligence and computing, DataCom 2016 and 2016 IEEE Cyber Science and Technology Congress, CyberSciTech 2016, DASC-PICom-DataCom-CyberSciTech 2016, pp 537–543
Zurück zum Zitat Gong S, Zhang Y (2016) EDDPC: an efficient distributed density peaks clustering algorithm. Comput Res Dev 53(6):1400–1409 Gong S, Zhang Y (2016) EDDPC: an efficient distributed density peaks clustering algorithm. Comput Res Dev 53(6):1400–1409
Zurück zum Zitat Guo P, Xing W, Yubing W, Yue C, Ying Z (2017) Research on automatic determining clustering centers algorithm based on linear regression analysis. In: 2nd International conference on image, vision and computing, pp 1016–1023 Guo P, Xing W, Yubing W, Yue C, Ying Z (2017) Research on automatic determining clustering centers algorithm based on linear regression analysis. In: 2nd International conference on image, vision and computing, pp 1016–1023
Zurück zum Zitat Hofmeyr DP (2017) Clustering by minimum cut hyperplanes. IEEE Trans Pattern Anal Mach Intell 39(8):1547–1560CrossRef Hofmeyr DP (2017) Clustering by minimum cut hyperplanes. IEEE Trans Pattern Anal Mach Intell 39(8):1547–1560CrossRef
Zurück zum Zitat Hua J-L, Yu J, Yang M-S (2016) Correlative density-based clustering. J Comput Theor Nanosci 13(10):6935–6943CrossRef Hua J-L, Yu J, Yang M-S (2016) Correlative density-based clustering. J Comput Theor Nanosci 13(10):6935–6943CrossRef
Zurück zum Zitat Jiang J, Hao D, Chen Y, Parmar M, Li K (2018) GDPC: gravitation-based density peaks clustering algorithm. Physica A 502:345–355CrossRef Jiang J, Hao D, Chen Y, Parmar M, Li K (2018) GDPC: gravitation-based density peaks clustering algorithm. Physica A 502:345–355CrossRef
Zurück zum Zitat Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput J 57:539–555CrossRef Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput J 57:539–555CrossRef
Zurück zum Zitat Kun D, Ze W, Rui Z, Chao Y (2016) Clustering by exponential density analysis and find of cluster centers based on genetic algorithm. In: Proceedings of SPIE—the international society for optical engineering (ICDIP 2016), vol 10033 Kun D, Ze W, Rui Z, Chao Y (2016) Clustering by exponential density analysis and find of cluster centers based on genetic algorithm. In: Proceedings of SPIE—the international society for optical engineering (ICDIP 2016), vol 10033
Zurück zum Zitat Li M, Huang J, Wang J (2016) Paralleled fast search and find of density peaks clustering algorithm on gpus with cuda. Int J Netw Distrib Comput 4(3):173–181 Li M, Huang J, Wang J (2016) Paralleled fast search and find of density peaks clustering algorithm on gpus with cuda. Int J Netw Distrib Comput 4(3):173–181
Zurück zum Zitat Li Z, Tang Y (2018) Comparative density peaks clustering. Expert Syst Appl 95:236–247CrossRef Li Z, Tang Y (2018) Comparative density peaks clustering. Expert Syst Appl 95:236–247CrossRef
Zurück zum Zitat Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering. Pattern Recognit Lett 73:52–59CrossRef Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering. Pattern Recognit Lett 73:52–59CrossRef
Zurück zum Zitat Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226MathSciNetCrossRef Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226MathSciNetCrossRef
Zurück zum Zitat Liu S, Zhou B, Huang D, Shen L (2017) Clustering mixed data by fast search and find of density peaks. Math Probl Eng 2017(5060842):1–7 Liu S, Zhou B, Huang D, Shen L (2017) Clustering mixed data by fast search and find of density peaks. Math Probl Eng 2017(5060842):1–7
Zurück zum Zitat Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10, pp 911–916. IEEE Computer Society, Washington Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10, pp 911–916. IEEE Computer Society, Washington
Zurück zum Zitat López-García ML, García-Ródenas R, Gómez AG (2015) K-means algorithms for functional data. Neurocomputing 151:231–245CrossRef López-García ML, García-Ródenas R, Gómez AG (2015) K-means algorithms for functional data. Neurocomputing 151:231–245CrossRef
Zurück zum Zitat Lu J, Zhu Q (2017) An effective algorithm based on density clustering framework. IEEE Access 5:4991–5000CrossRef Lu J, Zhu Q (2017) An effective algorithm based on density clustering framework. IEEE Access 5:4991–5000CrossRef
Zurück zum Zitat Mehmood R, Bie R, Jiao L, Dawood H, Sun Y (2016a) Adaptive cutoff distance: clustering by fast search and find of density peaks. J Intell Fuzzy Sys 31(5):2619–2628CrossRef Mehmood R, Bie R, Jiao L, Dawood H, Sun Y (2016a) Adaptive cutoff distance: clustering by fast search and find of density peaks. J Intell Fuzzy Sys 31(5):2619–2628CrossRef
Zurück zum Zitat Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016b) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016b) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217CrossRef
Zurück zum Zitat Rodríguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef Rodríguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef
Zurück zum Zitat Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, vol 7, pp 410–420 Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, vol 7, pp 410–420
Zurück zum Zitat Tabor J, Spurek P (2014) Cross-entropy clustering. Pattern Recognit 47(9):3046–3059CrossRef Tabor J, Spurek P (2014) Cross-entropy clustering. Pattern Recognit 47(9):3046–3059CrossRef
Zurück zum Zitat Tao L, Li W, Jin Y (2017) An optimal density peak algorithm based on data field and information entropy. In: ACM international conference proceeding series, vol Part F128770 Tao L, Li W, Jin Y (2017) An optimal density peak algorithm based on data field and information entropy. In: ACM international conference proceeding series, vol Part F128770
Zurück zum Zitat Wang G, Song Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985CrossRef Wang G, Song Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985CrossRef
Zurück zum Zitat Wang J, Zhu C, Zhou Y, Zhu X, Wang Y, Zhang W (2017) From partition-based clustering to density-based clustering: fast find clusters with diverse shapes and densities in spatial databases. IEEE Access 6:1718–1729CrossRef Wang J, Zhu C, Zhou Y, Zhu X, Wang Y, Zhang W (2017) From partition-based clustering to density-based clustering: fast find clusters with diverse shapes and densities in spatial databases. IEEE Access 6:1718–1729CrossRef
Zurück zum Zitat Wang M, Zuo W, Wang Y (2016) An improved density peaks-based clustering method for social circle discovery in social networks. Neurocomputing 179:219–227CrossRef Wang M, Zuo W, Wang Y (2016) An improved density peaks-based clustering method for social circle discovery in social networks. Neurocomputing 179:219–227CrossRef
Zurück zum Zitat Wang X-F, Xu Y (2017) Fast clustering using adaptive density peak detection. Stat Methods Med Res 26(6):2800–2811MathSciNetCrossRef Wang X-F, Xu Y (2017) Fast clustering using adaptive density peak detection. Stat Methods Med Res 26(6):2800–2811MathSciNetCrossRef
Zurück zum Zitat Wiwie C, Baumbach J, Röttger R (2015) Comparing the performance of biomedical clustering methods. Nat Methods 12(11):1033–1038CrossRef Wiwie C, Baumbach J, Röttger R (2015) Comparing the performance of biomedical clustering methods. Nat Methods 12(11):1033–1038CrossRef
Zurück zum Zitat Xie J, Gao H, Xie W, Liu X, Grant P (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(k-\)nearest neighbors. Inf Sci 354:19–40CrossRef Xie J, Gao H, Xie W, Liu X, Grant P (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(k-\)nearest neighbors. Inf Sci 354:19–40CrossRef
Zurück zum Zitat Xu J, Wang G, Deng W (2016) DenPEHC: density peak based efficient hierarchical clustering. Inf Sci 373:200–218CrossRef Xu J, Wang G, Deng W (2016) DenPEHC: density peak based efficient hierarchical clustering. Inf Sci 373:200–218CrossRef
Zurück zum Zitat Xu X, Ding S, Xu H, Liao H, Xue Y (2019) A feasible density peaks clustering algorithm with a merging strategy. Soft Comput 23(13):5171–5183CrossRef Xu X, Ding S, Xu H, Liao H, Xue Y (2019) A feasible density peaks clustering algorithm with a merging strategy. Soft Comput 23(13):5171–5183CrossRef
Zurück zum Zitat Yang X-H, Zhu Q-P, Huang Y-J, Xiao J, Wang L, Tong F-C (2017) Parameter-free laplacian centrality peaks clustering. Pattern Recognit Lett 100:167–173CrossRef Yang X-H, Zhu Q-P, Huang Y-J, Xiao J, Wang L, Tong F-C (2017) Parameter-free laplacian centrality peaks clustering. Pattern Recognit Lett 100:167–173CrossRef
Zurück zum Zitat Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on \(k\)-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220CrossRef Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on \(k\)-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220CrossRef
Zurück zum Zitat Zang W, Ren L, Zhang W, Liu X (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8) Zang W, Ren L, Zhang W, Liu X (2017) Automatic density peaks clustering using DNA genetic algorithm optimized data field and Gaussian process. Int J Pattern Recognit Artif Intell 31(8)
Zurück zum Zitat Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Tech. Rep., pp 01–04 Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Tech. Rep., pp 01–04
Metadaten
Titel
A methodology for automatic parameter-tuning and center selection in density-peak clustering methods
Publikationsdatum
25.08.2020
Erschienen in
Soft Computing / Ausgabe 2/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-05244-5

Weitere Artikel der Ausgabe 2/2021

Soft Computing 2/2021 Zur Ausgabe

Premium Partner