Skip to main content
Top
Published in: Soft Computing 12/2020

24-10-2019 | Methodologies and Application

A cluster validity evaluation method for dynamically determining the near-optimal number of clusters

Authors: Xiangjun Li, Wei Liang, Xinping Zhang, Song Qing, Pei-Chann Chang

Published in: Soft Computing | Issue 12/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Cluster validity evaluation is a hot issue in clustering algorithm research. Aiming at determining the optimal number of clusters in cluster validity evaluation, this paper proposes a new cluster validity index Ratio of Deviation of Sum-of-squares and Euclid distance (RDSED), and designs a cluster validity evaluation method based on RDSED which is suitable to dynamically determine the near-optimal number of clusters. Firstly, based on the analysis of the relationships of the intra-class and inter-class, the concepts of sum-of-squares of within-cluster, sum-of-squares of between-cluster, total sum-of-squares, sum of intra-cluster distance and average distance between clusters are proposed, and then a cluster validity index RDSED based on these concepts is constructed. Secondly, a cluster validity evaluation method based on RDSED for dynamically determining the near-optimal number of clusters is designed. In this method, RDSED value is calculated from large to small in the range of clustering number and this index value is used to dynamically terminate the clustering validity verification process, and finally the near-optimal number of clusters and clustering partition results are obtained. Experiment results of artificial datasets and real datasets show that, compared with some classical clustering validity evaluation method, the proposed cluster validity evaluation method can obtain the near-optimal number of clusters that is closest to the real cluster number in most cases and can effectively evaluate clustering partition results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014a) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29 Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014a) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29
go back to reference Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014b) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29 Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014b) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29
go back to reference Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49 Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49
go back to reference Campo DN, Stegmayer G, Milone DH (2016a) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556 Campo DN, Stegmayer G, Milone DH (2016a) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556
go back to reference Campo DN, Stegmayer G, Milone DH (2016b) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556 Campo DN, Stegmayer G, Milone DH (2016b) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556
go back to reference Davies DL, Bouldin DW (1979) A clustering separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227 Davies DL, Bouldin DW (1979) A clustering separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
go back to reference Draszawka K, Szymański J (2011) External validation measures for nested clustering of text documents. Stud Computat Intell 369:207–225 Draszawka K, Szymański J (2011) External validation measures for nested clustering of text documents. Stud Computat Intell 369:207–225
go back to reference Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MathSciNetMATH Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MathSciNetMATH
go back to reference Gurrutxaga I, Muguerza J, Arbelaitz O, Perez JM, Martin JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit Lett 32:505–515 Gurrutxaga I, Muguerza J, Arbelaitz O, Perez JM, Martin JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit Lett 32:505–515
go back to reference Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145MATH Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145MATH
go back to reference Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017a) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6 Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017a) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6
go back to reference Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017b) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6 Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017b) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6
go back to reference Holzinger KJ, Harman HH (1941) Factor analysis. University of Chicago Press, ChicagoMATH Holzinger KJ, Harman HH (1941) Factor analysis. University of Chicago Press, ChicagoMATH
go back to reference Ilham A, Wahono RS, Supriyanto C, Wijaya A (2019) U-control chart based differential evolution clustering for determining the number of cluster in k-means. Int J Intell Eng Syst 2019(12):306–316 Ilham A, Wahono RS, Supriyanto C, Wijaya A (2019) U-control chart based differential evolution clustering for determining the number of cluster in k-means. Int J Intell Eng Syst 2019(12):306–316
go back to reference Kashyap Manish, Bhattacharya Mahua (2017) A density invariant approach to clustering. Neural Comput Appl 28:1695–1713 Kashyap Manish, Bhattacharya Mahua (2017) A density invariant approach to clustering. Neural Comput Appl 28:1695–1713
go back to reference Kole DK, Halder A (2010) An efficient dynamic image segmentation algorithm using a hybrid technique based on particle S warm optimization and genetic algorithm. In: 2010 international conference on advances in computer engineering. pp 252–255 Kole DK, Halder A (2010) An efficient dynamic image segmentation algorithm using a hybrid technique based on particle S warm optimization and genetic algorithm. In: 2010 international conference on advances in computer engineering. pp 252–255
go back to reference Krzanowski W, Lai Y (1985) A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44:23–34MathSciNetMATH Krzanowski W, Lai Y (1985) A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44:23–34MathSciNetMATH
go back to reference Kuo RJ, Syu YJ, Chen Z-Y, Tien FC (2012) Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci 195:124–140 Kuo RJ, Syu YJ, Chen Z-Y, Tien FC (2012) Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci 195:124–140
go back to reference Lee JS, Olafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181:732–746 Lee JS, Olafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181:732–746
go back to reference Lee SH, Jeong YS, Kim JY, Jeong MK (2018) A new clustering validity index for arbitrary shape of clusters. Pattern Recognit Lett 112:263–269 Lee SH, Jeong YS, Kim JY, Jeong MK (2018) A new clustering validity index for arbitrary shape of clusters. Pattern Recognit Lett 112:263–269
go back to reference Li H, He H, Wen Y (2015) Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation. Optik 126:4817–4822 Li H, He H, Wen Y (2015) Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation. Optik 126:4817–4822
go back to reference MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: Statistics, Berkeley, Calif., pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: Statistics, Berkeley, Calif., pp 281–297
go back to reference Naïja Y, Sinaoui KB (2012) Interpretability-based validity methods for clustering results evaluation. J Intell Inf Syst 39(1):109–139 Naïja Y, Sinaoui KB (2012) Interpretability-based validity methods for clustering results evaluation. J Intell Inf Syst 39(1):109–139
go back to reference Naldi M, Carvalho A, Campello R (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetMATH Naldi M, Carvalho A, Campello R (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289MathSciNetMATH
go back to reference Omran MG, Salman A, Engelbrecht AP (2005) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344MathSciNet Omran MG, Salman A, Engelbrecht AP (2005) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344MathSciNet
go back to reference Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19:361–394 Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19:361–394
go back to reference Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850 Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
go back to reference Rezaee MR, Lelieveldt BPF, Reiber JHC (1998) A new cluster validity index for the fuzzy c-means. Pattern Recognit Lett 19(3–4):237–246MATH Rezaee MR, Lelieveldt BPF, Reiber JHC (1998) A new cluster validity index for the fuzzy c-means. Pattern Recognit Lett 19(3–4):237–246MATH
go back to reference Rojas-Thomas JC, Santos M, Mora M (2017) New internal index for clustering validation based on graphs. Expert Syst Appl 86:334–349 Rojas-Thomas JC, Santos M, Mora M (2017) New internal index for clustering validation based on graphs. Expert Syst Appl 86:334–349
go back to reference Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65MATH Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65MATH
go back to reference Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12:1555–1565 Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12:1555–1565
go back to reference Salehian S, Subraminiam SK (2015) Unequal clustering by improved particle swarm optimization in wireless sensor network. Procedia Comput Sci 62:403–409 Salehian S, Subraminiam SK (2015) Unequal clustering by improved particle swarm optimization in wireless sensor network. Procedia Comput Sci 62:403–409
go back to reference Sneath PHA, Sokal RR (1973) Numerical taxonomy, books in biology. W.H. Freeman and Company, San FranciscoMATH Sneath PHA, Sokal RR (1973) Numerical taxonomy, books in biology. W.H. Freeman and Company, San FranciscoMATH
go back to reference Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Doctoral dissertation. The University of Texas, Austin Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Doctoral dissertation. The University of Texas, Austin
go back to reference Yang L, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J (2017) Ground truth bias in external cluster validity indices. Pattern Recognit 65:58–70 Yang L, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J (2017) Ground truth bias in external cluster validity indices. Pattern Recognit 65:58–70
go back to reference Zhao Q, Xu M, Fränti P (2009a) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322 Zhao Q, Xu M, Fränti P (2009a) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322
go back to reference Zhao Q, Xu M, Fränti P (2009b) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322 Zhao Q, Xu M, Fränti P (2009b) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322
go back to reference Zhou ZH (2016) Machine learning. Tsinghua University Press, Beijing, pp 214–217 Zhou ZH (2016) Machine learning. Tsinghua University Press, Beijing, pp 214–217
go back to reference Zhou S, Xu Z (2018) A novel internal validity index based on the cluster centre and the nearest neighbour cluster. Appl Soft Comput 71:78–88 Zhou S, Xu Z (2018) A novel internal validity index based on the cluster centre and the nearest neighbour cluster. Appl Soft Comput 71:78–88
Metadata
Title
A cluster validity evaluation method for dynamically determining the near-optimal number of clusters
Authors
Xiangjun Li
Wei Liang
Xinping Zhang
Song Qing
Pei-Chann Chang
Publication date
24-10-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 12/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04449-7

Other articles of this Issue 12/2020

Soft Computing 12/2020 Go to the issue

Premium Partner