Skip to main content
Erschienen in: Neural Processing Letters 2/2021

19.01.2021

Estimating the Optimal Number of Clusters Via Internal Validity Index

verfasst von: Shibing Zhou, Fei Liu, Wei Song

Erschienen in: Neural Processing Letters | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Estimating the optimal number of clusters (NC) is pivotal in cluster analysis. From the viewpoint of sample geometry, a novel internal clustering validity index, which is termed the between-within cluster (BWC) index, is designed in this paper. Moreover, a method is proposed to estimate the optimal NC. The BWC index improves the well-known Silhouette index. BWC validates the clustering results from a certain clustering algorithm (e.g., affinity propagation or hierarchical) and estimates the optimal NC for many kinds of data sets, including synthetic data sets, benchmark data sets, UCI data sets, gene expression data sets, and images. Theoretical analysis and experimental studies demonstrate the effectiveness and high efficiency of the new index and method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. pp 281–297
2.
Zurück zum Zitat Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: an Introduction to Cluster Analysis. Wiley & Sons, Hoboken, NJ, USA, pp 40–41CrossRef Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: an Introduction to Cluster Analysis. Wiley & Sons, Hoboken, NJ, USA, pp 40–41CrossRef
3.
Zurück zum Zitat Bradley PS, Mangasarian OL, Street WN (1996) Clustering via concave minimization. In: Proceedings of the NIPS, Denver, CO, USA. pp 368–374 Bradley PS, Mangasarian OL, Street WN (1996) Clustering via concave minimization. In: Proceedings of the NIPS, Denver, CO, USA. pp 368–374
4.
Zurück zum Zitat Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRef Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRef
5.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York, pp 550–554MATH Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York, pp 550–554MATH
6.
Zurück zum Zitat Cattinelli I, Valentini G, Paulesu E, Borghese NA (2013) A novel approach to the problem of non-uniqueness of the solution in hierarchical clustering. IEEE Trans Neural Netw Learn Syst 24(7):1166–1173CrossRef Cattinelli I, Valentini G, Paulesu E, Borghese NA (2013) A novel approach to the problem of non-uniqueness of the solution in hierarchical clustering. IEEE Trans Neural Netw Learn Syst 24(7):1166–1173CrossRef
7.
Zurück zum Zitat Bhargavi MS, Gowda SD (2015) A novel validity index with dynamic cut-off for determining true clusters. Pattern Recognit 48(11):3673–3687CrossRef Bhargavi MS, Gowda SD (2015) A novel validity index with dynamic cut-off for determining true clusters. Pattern Recognit 48(11):3673–3687CrossRef
8.
9.
Zurück zum Zitat Wu S, Chow TWS (2003) Self-Organizing-Map based clustering using a local clustering validity index. Neural Process Lett 17:253–271CrossRef Wu S, Chow TWS (2003) Self-Organizing-Map based clustering using a local clustering validity index. Neural Process Lett 17:253–271CrossRef
10.
Zurück zum Zitat Tasdemir K, Merényi E (2011) A validity index for prototype-based clustering of data sets with complex cluster structures. IEEE Trans Syst Man Cybern B Cybern 41(4):1039–1053CrossRef Tasdemir K, Merényi E (2011) A validity index for prototype-based clustering of data sets with complex cluster structures. IEEE Trans Syst Man Cybern B Cybern 41(4):1039–1053CrossRef
11.
Zurück zum Zitat Lee JS, Olafsson S (2013) A meta-learning approach for determining the number of clusters with consideration of nearest neighbors. Inf Sci 232:208–224MathSciNetCrossRef Lee JS, Olafsson S (2013) A meta-learning approach for determining the number of clusters with consideration of nearest neighbors. Inf Sci 232:208–224MathSciNetCrossRef
12.
Zurück zum Zitat Liu Y, Li Z, Xiong H et al (2013) Understanding and enhancement of internal clustering validation measures. IEEE Trans Cybern 43(3):982–994CrossRef Liu Y, Li Z, Xiong H et al (2013) Understanding and enhancement of internal clustering validation measures. IEEE Trans Cybern 43(3):982–994CrossRef
13.
Zurück zum Zitat Bezdek JC, Moshtaghi M, Runkler T, Leckie C (2016) The generalized C index for internal fuzzy cluster validity. IEEE Trans Fuzzy Syst 24(6):1500–1512CrossRef Bezdek JC, Moshtaghi M, Runkler T, Leckie C (2016) The generalized C index for internal fuzzy cluster validity. IEEE Trans Fuzzy Syst 24(6):1500–1512CrossRef
14.
Zurück zum Zitat Wu CH, Ouyang CS, Chen LW, Lu LW (2015) A new fuzzy clustering validity index with a median factor for centroid-based clustering. IEEE Trans Fuzzy Syst 23(3):701–718CrossRef Wu CH, Ouyang CS, Chen LW, Lu LW (2015) A new fuzzy clustering validity index with a median factor for centroid-based clustering. IEEE Trans Fuzzy Syst 23(3):701–718CrossRef
15.
Zurück zum Zitat Liang J, Zhao X, Li D et al (2012) Determining the number of clusters using information entropy for mixed data. Pattern Recognit 45(6):2251–2265CrossRef Liang J, Zhao X, Li D et al (2012) Determining the number of clusters using information entropy for mixed data. Pattern Recognit 45(6):2251–2265CrossRef
16.
Zurück zum Zitat Guo G, Chen L, Ye Y, Jiang Q (2017) Cluster validation method for determining the number of clusters in categorical sequences. IEEE Trans Neural Netw Learn Syst 28(12):2936–2948MathSciNetCrossRef Guo G, Chen L, Ye Y, Jiang Q (2017) Cluster validation method for determining the number of clusters in categorical sequences. IEEE Trans Neural Netw Learn Syst 28(12):2936–2948MathSciNetCrossRef
17.
Zurück zum Zitat Yang X, Song Q, Cao A (2006) A new cluster validity for data clustering. Neural Process Lett 23:325–344CrossRef Yang X, Song Q, Cao A (2006) A new cluster validity for data clustering. Neural Process Lett 23:325–344CrossRef
18.
Zurück zum Zitat Xu R, Xu J, Wunsch DC II (2012) A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Trans Syst Man Cybern B Cybern 42(4):1243–1256CrossRef Xu R, Xu J, Wunsch DC II (2012) A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Trans Syst Man Cybern B Cybern 42(4):1243–1256CrossRef
19.
Zurück zum Zitat Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27MathSciNetMATH Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27MathSciNetMATH
20.
Zurück zum Zitat Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRef Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRef
21.
Zurück zum Zitat Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227CrossRef Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227CrossRef
22.
Zurück zum Zitat Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21CrossRef Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21CrossRef
24.
Zurück zum Zitat Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. University of Texas at Austin, Austin Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. University of Texas at Austin, Austin
25.
Zurück zum Zitat Wang KJ, Li J, Zhang JY, Guo LX (2008) Experimental comparison of clusters number estimation for cluster analysis. Comput Eng 34(9):198–202 Wang KJ, Li J, Zhang JY, Guo LX (2008) Experimental comparison of clusters number estimation for cluster analysis. Comput Eng 34(9):198–202
26.
Zurück zum Zitat Kapp AV, Tibshirani R (2007) Are clusters found in one dataset present in another dataset? Biostatistics 8(1):9–31CrossRef Kapp AV, Tibshirani R (2007) Are clusters found in one dataset present in another dataset? Biostatistics 8(1):9–31CrossRef
27.
Zurück zum Zitat Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501CrossRef Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37:487–501CrossRef
28.
Zurück zum Zitat Arbelaitz O, Gurrutxaga I, Muguerza J et al (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256CrossRef Arbelaitz O, Gurrutxaga I, Muguerza J et al (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256CrossRef
29.
Zurück zum Zitat Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the ICML. pp 233–240 Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the ICML. pp 233–240
30.
Zurück zum Zitat Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy c-means model. IEEE Trans Fuzzy Syst 3(3):370–379CrossRef Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy c-means model. IEEE Trans Fuzzy Syst 3(3):370–379CrossRef
31.
Zurück zum Zitat Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern B Cybern 28(3):301–315CrossRef Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern B Cybern 28(3):301–315CrossRef
32.
Zurück zum Zitat Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409CrossRef Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409CrossRef
33.
Zurück zum Zitat Shieh HL (2014) Robust validity index for a modified subtractive clustering algorithm. Appl Soft Comput 22:47–59CrossRef Shieh HL (2014) Robust validity index for a modified subtractive clustering algorithm. Appl Soft Comput 22:47–59CrossRef
34.
Zurück zum Zitat Wang KJ, Zhang JY, Li D, Zhang XN, Guo T (2007) Adaptive affinity propagation clustering. Acta Autom Sin 33(12):1242–1246MATH Wang KJ, Zhang JY, Li D, Zhang XN, Guo T (2007) Adaptive affinity propagation clustering. Acta Autom Sin 33(12):1242–1246MATH
35.
Zurück zum Zitat Armstrong SA, Staunton JE, Silverman LB et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41–47CrossRef Armstrong SA, Staunton JE, Silverman LB et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41–47CrossRef
36.
Zurück zum Zitat Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific symposium on Biocomputing. pp 6–17 Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific symposium on Biocomputing. pp 6–17
37.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
38.
Zurück zum Zitat García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
39.
Zurück zum Zitat Jiang Y, Deng Z, Chung FL et al (2017) Recognition of epileptic EEG signals using a novel multiview TSK fuzzy system. IEEE Trans Fuzzy Syst 25(1):3–20CrossRef Jiang Y, Deng Z, Chung FL et al (2017) Recognition of epileptic EEG signals using a novel multiview TSK fuzzy system. IEEE Trans Fuzzy Syst 25(1):3–20CrossRef
Metadaten
Titel
Estimating the Optimal Number of Clusters Via Internal Validity Index
verfasst von
Shibing Zhou
Fei Liu
Wei Song
Publikationsdatum
19.01.2021
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 2/2021
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10427-8

Weitere Artikel der Ausgabe 2/2021

Neural Processing Letters 2/2021 Zur Ausgabe

Neuer Inhalt