Skip to main content
Erschienen in: Soft Computing 20/2019

19.10.2018 | Methodologies and Application

An unsupervised and robust validity index for clustering analysis

verfasst von: Yaru Wang, Shihong Yue, Zhenhua Hao, Mingliang Ding, Jia Li

Erschienen in: Soft Computing | Ausgabe 20/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The evaluation of clustering results plays an important role in clustering analysis and usually is completed by a validity index or several. But currently existing validity indexes are supervised since they greatly depend on prior information, such as specified clustering algorithms and optimal initializations. Once the prior information is unavailable, the evaluating results of these supervised validity indexes are no longer guaranteed, which lead to that their applicable ranges are greatly limited. In this paper, we firstly propose an estimation of the lower and upper bounds of the number of within-cluster distances in any dataset, and then an unsupervised validity index without needing any clustering algorithm and initialization is presented. A group of typical simulated and real datasets with various characteristics validate the proposed index in an unsupervised way. Experimental results demonstrate that the proposed index has higher accuracy in most tested datasets and has advantages in robustness and runtime compared with the other existing validity indexes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arbelaitz O, Gurrutxaga I, Muguerza J, Perez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256CrossRef Arbelaitz O, Gurrutxaga I, Muguerza J, Perez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243–256CrossRef
Zurück zum Zitat Azar AT, Hassanien AE (2015) Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft Comput 19(4):1115–1127CrossRef Azar AT, Hassanien AE (2015) Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft Comput 19(4):1115–1127CrossRef
Zurück zum Zitat Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkCrossRefMATH Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkCrossRefMATH
Zurück zum Zitat Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220MathSciNetCrossRef Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220MathSciNetCrossRef
Zurück zum Zitat Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal 1(2):224–227CrossRef Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal 1(2):224–227CrossRef
Zurück zum Zitat Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A Survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279CrossRef Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A Survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279CrossRef
Zurück zum Zitat Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of fifth fuzzy system symposium. Kobe, pp 247–250 Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of fifth fuzzy system symposium. Kobe, pp 247–250
Zurück zum Zitat Gao Y, Huang JZ, Wu L (2007) Learning classifier system ensemble and compact rule set. Connect Sci 19(4):321–337CrossRef Gao Y, Huang JZ, Wu L (2007) Learning classifier system ensemble and compact rule set. Connect Sci 19(4):321–337CrossRef
Zurück zum Zitat García-Gil D, Ramírez-Gallego S, García S, Herrera F (2017) A comparison on scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Anal 2(1):1CrossRef García-Gil D, Ramírez-Gallego S, García S, Herrera F (2017) A comparison on scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Anal 2(1):1CrossRef
Zurück zum Zitat Kim M, Ramakrishna RS (2005) New indices for cluster validity assessment. Pattern Recogn Lett 26(15):2353–2363CrossRef Kim M, Ramakrishna RS (2005) New indices for cluster validity assessment. Pattern Recogn Lett 26(15):2353–2363CrossRef
Zurück zum Zitat Lee C, Zaiane OR, Park H, Huang J, Greiner R (2008) Clustering high dimensional data: a graph-based relaxed optimization approach. Inf Sci 178(23):4501–4511MathSciNetCrossRef Lee C, Zaiane OR, Park H, Huang J, Greiner R (2008) Clustering high dimensional data: a graph-based relaxed optimization approach. Inf Sci 178(23):4501–4511MathSciNetCrossRef
Zurück zum Zitat Liu C, Wang W, Konan M, Wang S, Huang L, Tang Y, Zhang X (2017) A new validity index of feature subset for evaluating the dimensionality reduction algorithms. Knowl Based Syst 121:83–98CrossRef Liu C, Wang W, Konan M, Wang S, Huang L, Tang Y, Zhang X (2017) A new validity index of feature subset for evaluating the dimensionality reduction algorithms. Knowl Based Syst 121:83–98CrossRef
Zurück zum Zitat MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, California, pp 281–297 MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, California, pp 281–297
Zurück zum Zitat Maillo J, Ramirez S, Triguero I, Herrera F (2017) kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl Based Syst 117(15):3–15CrossRef Maillo J, Ramirez S, Triguero I, Herrera F (2017) kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl Based Syst 117(15):3–15CrossRef
Zurück zum Zitat Mamat R, Herawan T, Denis MM (2013) MAR: maximum attribute relative of soft set for clustering attribute selection. Knowl Based Syst 52:11–20CrossRef Mamat R, Herawan T, Denis MM (2013) MAR: maximum attribute relative of soft set for clustering attribute selection. Knowl Based Syst 52:11–20CrossRef
Zurück zum Zitat Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37(3):487–501CrossRefMATH Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37(3):487–501CrossRefMATH
Zurück zum Zitat Pelleg D, Moore A (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: 17th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 727–734 Pelleg D, Moore A (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: 17th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 727–734
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef
Zurück zum Zitat Tibshirani R, Walther G, Hastie T (2001) Estimation the number of clusters in a dataset via the gap statistic. J R Stat Soc A Stat 63(2):411–423CrossRefMATH Tibshirani R, Walther G, Hastie T (2001) Estimation the number of clusters in a dataset via the gap statistic. J R Stat Soc A Stat 63(2):411–423CrossRefMATH
Zurück zum Zitat Wang J, Lin C, Yang YC, Ho Y (2012) Walking pattern classification and walking distance estimation algorithms using gait phase information. IEEE Trans Bio-Med Eng 59(10):2884–2892CrossRef Wang J, Lin C, Yang YC, Ho Y (2012) Walking pattern classification and walking distance estimation algorithms using gait phase information. IEEE Trans Bio-Med Eng 59(10):2884–2892CrossRef
Zurück zum Zitat Wu KL, Yang MS (2002) Alternative c-means clustering algorithms. Pattern Recogn 35(10):2267–2278CrossRefMATH Wu KL, Yang MS (2002) Alternative c-means clustering algorithms. Pattern Recogn 35(10):2267–2278CrossRefMATH
Zurück zum Zitat Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal 13(13):841–847CrossRef Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal 13(13):841–847CrossRef
Zurück zum Zitat Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Net 16(3):645–678CrossRef Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Net 16(3):645–678CrossRef
Zurück zum Zitat Yue S, Wu T, Liu Z, Zhao X (2011) Fused multi-characteristic validity index: an application to reconstructed image evaluation in electrical tomography. Int J Comput Int Syst 4(5):1052–1061CrossRef Yue S, Wu T, Liu Z, Zhao X (2011) Fused multi-characteristic validity index: an application to reconstructed image evaluation in electrical tomography. Int J Comput Int Syst 4(5):1052–1061CrossRef
Zurück zum Zitat Yue S, Wang P, Wang J, Huang T (2013) Extension of the gap statistics index to fuzzy clustering. Soft Comput 17(10):1833–1846CrossRef Yue S, Wang P, Wang J, Huang T (2013) Extension of the gap statistics index to fuzzy clustering. Soft Comput 17(10):1833–1846CrossRef
Zurück zum Zitat Yue S, Wang J, Wang J, Bao X (2016) A new validity index for evaluating the clustering results by partitional clustering algorithms. Soft Comput 20(3):1127–1138CrossRef Yue S, Wang J, Wang J, Bao X (2016) A new validity index for evaluating the clustering results by partitional clustering algorithms. Soft Comput 20(3):1127–1138CrossRef
Metadaten
Titel
An unsupervised and robust validity index for clustering analysis
verfasst von
Yaru Wang
Shihong Yue
Zhenhua Hao
Mingliang Ding
Jia Li
Publikationsdatum
19.10.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 20/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3582-2

Weitere Artikel der Ausgabe 20/2019

Soft Computing 20/2019 Zur Ausgabe