Skip to main content
Erschienen in: Soft Computing 12/2020

24.10.2019 | Methodologies and Application

A cluster validity evaluation method for dynamically determining the near-optimal number of clusters

verfasst von: Xiangjun Li, Wei Liang, Xinping Zhang, Song Qing, Pei-Chann Chang

Erschienen in: Soft Computing | Ausgabe 12/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cluster validity evaluation is a hot issue in clustering algorithm research. Aiming at determining the optimal number of clusters in cluster validity evaluation, this paper proposes a new cluster validity index Ratio of Deviation of Sum-of-squares and Euclid distance (RDSED), and designs a cluster validity evaluation method based on RDSED which is suitable to dynamically determine the near-optimal number of clusters. Firstly, based on the analysis of the relationships of the intra-class and inter-class, the concepts of sum-of-squares of within-cluster, sum-of-squares of between-cluster, total sum-of-squares, sum of intra-cluster distance and average distance between clusters are proposed, and then a cluster validity index RDSED based on these concepts is constructed. Secondly, a cluster validity evaluation method based on RDSED for dynamically determining the near-optimal number of clusters is designed. In this method, RDSED value is calculated from large to small in the range of clustering number and this index value is used to dynamically terminate the clustering validity verification process, and finally the near-optimal number of clusters and clustering partition results are obtained. Experiment results of artificial datasets and real datasets show that, compared with some classical clustering validity evaluation method, the proposed cluster validity evaluation method can obtain the near-optimal number of clusters that is closest to the real cluster number in most cases and can effectively evaluate clustering partition results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014a) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29 Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014a) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29
Zurück zum Zitat Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014b) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29 Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014b) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29
Zurück zum Zitat Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49 Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49
Zurück zum Zitat Campo DN, Stegmayer G, Milone DH (2016a) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556 Campo DN, Stegmayer G, Milone DH (2016a) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556
Zurück zum Zitat Campo DN, Stegmayer G, Milone DH (2016b) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556 Campo DN, Stegmayer G, Milone DH (2016b) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556
Zurück zum Zitat Davies DL, Bouldin DW (1979) A clustering separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227 Davies DL, Bouldin DW (1979) A clustering separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Zurück zum Zitat Draszawka K, Szymański J (2011) External validation measures for nested clustering of text documents. Stud Computat Intell 369:207–225 Draszawka K, Szymański J (2011) External validation measures for nested clustering of text documents. Stud Computat Intell 369:207–225
Zurück zum Zitat Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MathSciNetMATH Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MathSciNetMATH
Zurück zum Zitat Gurrutxaga I, Muguerza J, Arbelaitz O, Perez JM, Martin JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit Lett 32:505–515 Gurrutxaga I, Muguerza J, Arbelaitz O, Perez JM, Martin JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit Lett 32:505–515
Zurück zum Zitat Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145MATH Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145MATH
Zurück zum Zitat Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017a) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6 Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017a) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6
Zurück zum Zitat Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017b) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6 Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017b) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6
Zurück zum Zitat Hartigan J (1975) Clustering algorithms. Wiley, NewYorkMATH Hartigan J (1975) Clustering algorithms. Wiley, NewYorkMATH
Zurück zum Zitat Holzinger KJ, Harman HH (1941) Factor analysis. University of Chicago Press, ChicagoMATH Holzinger KJ, Harman HH (1941) Factor analysis. University of Chicago Press, ChicagoMATH
Zurück zum Zitat Ilham A, Wahono RS, Supriyanto C, Wijaya A (2019) U-control chart based differential evolution clustering for determining the number of cluster in k-means. Int J Intell Eng Syst 2019(12):306–316 Ilham A, Wahono RS, Supriyanto C, Wijaya A (2019) U-control chart based differential evolution clustering for determining the number of cluster in k-means. Int J Intell Eng Syst 2019(12):306–316
Zurück zum Zitat Kashyap Manish, Bhattacharya Mahua (2017) A density invariant approach to clustering. Neural Comput Appl 28:1695–1713 Kashyap Manish, Bhattacharya Mahua (2017) A density invariant approach to clustering. Neural Comput Appl 28:1695–1713
Zurück zum Zitat Kole DK, Halder A (2010) An efficient dynamic image segmentation algorithm using a hybrid technique based on particle S warm optimization and genetic algorithm. In: 2010 international conference on advances in computer engineering. pp 252–255 Kole DK, Halder A (2010) An efficient dynamic image segmentation algorithm using a hybrid technique based on particle S warm optimization and genetic algorithm. In: 2010 international conference on advances in computer engineering. pp 252–255
Zurück zum Zitat Krzanowski W, Lai Y (1985) A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44:23–34MathSciNetMATH Krzanowski W, Lai Y (1985) A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44:23–34MathSciNetMATH
Zurück zum Zitat Kuo RJ, Syu YJ, Chen Z-Y, Tien FC (2012) Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci 195:124–140 Kuo RJ, Syu YJ, Chen Z-Y, Tien FC (2012) Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci 195:124–140
Zurück zum Zitat Lee JS, Olafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181:732–746 Lee JS, Olafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181:732–746
Zurück zum Zitat Lee SH, Jeong YS, Kim JY, Jeong MK (2018) A new clustering validity index for arbitrary shape of clusters. Pattern Recognit Lett 112:263–269 Lee SH, Jeong YS, Kim JY, Jeong MK (2018) A new clustering validity index for arbitrary shape of clusters. Pattern Recognit Lett 112:263–269
Zurück zum Zitat Li H, He H, Wen Y (2015) Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation. Optik 126:4817–4822 Li H, He H, Wen Y (2015) Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation. Optik 126:4817–4822
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: Statistics, Berkeley, Calif., pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: Statistics, Berkeley, Calif., pp 281–297
Zurück zum Zitat Naïja Y, Sinaoui KB (2012) Interpretability-based validity methods for clustering results evaluation. J Intell Inf Syst 39(1):109–139 Naïja Y, Sinaoui KB (2012) Interpretability-based validity methods for clustering results evaluation. J Intell Inf Syst 39(1):109–139
Zurück zum Zitat Naldi M, Carvalho A, Campello R (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetMATH Naldi M, Carvalho A, Campello R (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetMATH
Zurück zum Zitat Omran MG, Salman A, Engelbrecht AP (2005) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344MathSciNet Omran MG, Salman A, Engelbrecht AP (2005) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344MathSciNet
Zurück zum Zitat Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19:361–394 Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19:361–394
Zurück zum Zitat Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850 Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Zurück zum Zitat Rezaee MR, Lelieveldt BPF, Reiber JHC (1998) A new cluster validity index for the fuzzy c-means. Pattern Recognit Lett 19(3–4):237–246MATH Rezaee MR, Lelieveldt BPF, Reiber JHC (1998) A new cluster validity index for the fuzzy c-means. Pattern Recognit Lett 19(3–4):237–246MATH
Zurück zum Zitat Rojas-Thomas JC, Santos M, Mora M (2017) New internal index for clustering validation based on graphs. Expert Syst Appl 86:334–349 Rojas-Thomas JC, Santos M, Mora M (2017) New internal index for clustering validation based on graphs. Expert Syst Appl 86:334–349
Zurück zum Zitat Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65MATH Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65MATH
Zurück zum Zitat Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12:1555–1565 Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12:1555–1565
Zurück zum Zitat Salehian S, Subraminiam SK (2015) Unequal clustering by improved particle swarm optimization in wireless sensor network. Procedia Comput Sci 62:403–409 Salehian S, Subraminiam SK (2015) Unequal clustering by improved particle swarm optimization in wireless sensor network. Procedia Comput Sci 62:403–409
Zurück zum Zitat Sneath PHA, Sokal RR (1973) Numerical taxonomy, books in biology. W.H. Freeman and Company, San FranciscoMATH Sneath PHA, Sokal RR (1973) Numerical taxonomy, books in biology. W.H. Freeman and Company, San FranciscoMATH
Zurück zum Zitat Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Doctoral dissertation. The University of Texas, Austin Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Doctoral dissertation. The University of Texas, Austin
Zurück zum Zitat Yang L, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J (2017) Ground truth bias in external cluster validity indices. Pattern Recognit 65:58–70 Yang L, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J (2017) Ground truth bias in external cluster validity indices. Pattern Recognit 65:58–70
Zurück zum Zitat Zhao Q, Xu M, Fränti P (2009a) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322 Zhao Q, Xu M, Fränti P (2009a) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322
Zurück zum Zitat Zhao Q, Xu M, Fränti P (2009b) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322 Zhao Q, Xu M, Fränti P (2009b) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322
Zurück zum Zitat Zhou ZH (2016) Machine learning. Tsinghua University Press, Beijing, pp 214–217 Zhou ZH (2016) Machine learning. Tsinghua University Press, Beijing, pp 214–217
Zurück zum Zitat Zhou S, Xu Z (2018) A novel internal validity index based on the cluster centre and the nearest neighbour cluster. Appl Soft Comput 71:78–88 Zhou S, Xu Z (2018) A novel internal validity index based on the cluster centre and the nearest neighbour cluster. Appl Soft Comput 71:78–88
Metadaten
Titel
A cluster validity evaluation method for dynamically determining the near-optimal number of clusters
verfasst von
Xiangjun Li
Wei Liang
Xinping Zhang
Song Qing
Pei-Chann Chang
Publikationsdatum
24.10.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 12/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04449-7

Weitere Artikel der Ausgabe 12/2020

Soft Computing 12/2020 Zur Ausgabe