Skip to main content
Erschienen in: Pattern Analysis and Applications 1/2009

01.02.2009 | Theoretical Advances

Development of assessment criteria for clustering algorithms

verfasst von: Sameh A. Salem, Asoke K. Nandi

Erschienen in: Pattern Analysis and Applications | Ausgabe 1/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, new measures—called clustering performance measures (CPMs)—for assessing the reliability of a clustering algorithm are proposed. These CPMs are defined using a validation measure, which determines how well the algorithm works with a given set of parameter values, and a repeatability measure, which is used for studying the stability of the clustering solutions and has the ability to estimate the correct number of clusters in a dataset. These proposed CPMs can be used to evaluate clustering algorithms that have a structure bias to certain types of data distribution as well as those that have no structure biases. Additionally, we propose a novel cluster validity index, V I index, which is able to handle non-spherical clusters. Five clustering algorithms on different types of real-world data and synthetic data are evaluated. The first dataset type refers to a communications signal dataset representing one modulation scheme under a variety of noise conditions, the second represents two breast cancer datasets, while the third type represents different synthetic datasets with arbitrarily shaped clusters. Additionally, comparisons with other methods for estimating the number of clusters indicate the applicability and reliability of the proposed cluster validity V I index and repeatability measure for correct estimation of the number of clusters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This paper is an extension of [24] and contains further investigations and experimental results. The current manuscript represents a significant extension.
 
2
Software codes for the CPMs may be available on request from sameh.salem@liverpool.ac.uk
 
Literatur
1.
Zurück zum Zitat Webb AR (2003) Statistical pattern recognition. Wiley, New York Webb AR (2003) Statistical pattern recognition. Wiley, New York
2.
Zurück zum Zitat Theodoridis S, Koutroubas K (2003) Pattern recognition. Academic Press, New York Theodoridis S, Koutroubas K (2003) Pattern recognition. Academic Press, New York
3.
Zurück zum Zitat Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: Part I, SIGMOD. Record 31(2):40–45 Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: Part I, SIGMOD. Record 31(2):40–45
4.
Zurück zum Zitat Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: Part II, SIGMOD. Record 31(3):19–27 Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: Part II, SIGMOD. Record 31(3):19–27
5.
Zurück zum Zitat Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227 Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
6.
Zurück zum Zitat Dunn JC (1973) A fuzzy relative of ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MATHCrossRefMathSciNet Dunn JC (1973) A fuzzy relative of ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MATHCrossRefMathSciNet
8.
Zurück zum Zitat Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654CrossRef Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654CrossRef
9.
Zurück zum Zitat Milligan GW, Cooper C (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRef Milligan GW, Cooper C (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRef
10.
Zurück zum Zitat Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588MATHCrossRef Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588MATHCrossRef
11.
Zurück zum Zitat Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B 28(3):301–315CrossRef Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B 28(3):301–315CrossRef
12.
Zurück zum Zitat Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847CrossRef Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847CrossRef
13.
Zurück zum Zitat Chou C, Su M, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7:205–220MathSciNet Chou C, Su M, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7:205–220MathSciNet
14.
Zurück zum Zitat Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistic. J R Stat Soc B 63(2):411–423MATHCrossRefMathSciNet Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistic. J R Stat Soc B 63(2):411–423MATHCrossRefMathSciNet
15.
Zurück zum Zitat Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffsMATH Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffsMATH
16.
Zurück zum Zitat Law MH, Jain AK (2003) Cluster validity by bootstrapping partitions. Technical report MSU-CSE-03-5, Department of Computer Science and Engineering, Michigan State University Law MH, Jain AK (2003) Cluster validity by bootstrapping partitions. Technical report MSU-CSE-03-5, Department of Computer Science and Engineering, Michigan State University
17.
Zurück zum Zitat Lange T, Braun M, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16:1299–1323MATHCrossRef Lange T, Braun M, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16:1299–1323MATHCrossRef
18.
Zurück zum Zitat Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific symposium on biocomputing. World Scientific, Singapore, pp 6–17 Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific symposium on biocomputing. World Scientific, Singapore, pp 6–17
19.
Zurück zum Zitat Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13:2573–2593MATHCrossRef Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13:2573–2593MATHCrossRef
20.
Zurück zum Zitat Jain A, Morean J (1987) Bootstrap techniques in cluster analysis. Pattern Recognit 20:547–568CrossRef Jain A, Morean J (1987) Bootstrap techniques in cluster analysis. Pattern Recognit 20:547–568CrossRef
21.
Zurück zum Zitat Tibshirani R, Walther G, Botstein D, Brown P (2001) Cluster validation by prediction strength. Technical report, Statistics Department, Stanford University, Stanford, CA Tibshirani R, Walther G, Botstein D, Brown P (2001) Cluster validation by prediction strength. Technical report, Statistics Department, Stanford University, Stanford, CA
23.
Zurück zum Zitat Lange T, Braun M, Roth V, Buhmann JM (2002) Stability-based model selection. Adv Neural Inf Process Syst 15:617–624 Lange T, Braun M, Roth V, Buhmann JM (2002) Stability-based model selection. Adv Neural Inf Process Syst 15:617–624
24.
Zurück zum Zitat Salem SA, Nandi AK (2005) New assessment criteria for clustering algorithms. In: Proceedings of the IEEE international workshop on machine learning for signal processing, Mystic, CT, USA, pp 285–290 Salem SA, Nandi AK (2005) New assessment criteria for clustering algorithms. In: Proceedings of the IEEE international workshop on machine learning for signal processing, Mystic, CT, USA, pp 285–290
25.
Zurück zum Zitat Proakis JG (2001) Digital communications. McGraw-Hill, Boston Proakis JG (2001) Digital communications. McGraw-Hill, Boston
27.
Zurück zum Zitat Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge
28.
Zurück zum Zitat Cormen TH, Leiserson CE, Rivest LR, Stein C (2001) Introduction to algorithms. ISBN 10:0-262-03293-7. The MIT Press, London Cormen TH, Leiserson CE, Rivest LR, Stein C (2001) Introduction to algorithms. ISBN 10:0-262-03293-7. The MIT Press, London
29.
Zurück zum Zitat Fischer B, Buhmann JM (2003) Path based clustering for grouping smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25:1–6CrossRef Fischer B, Buhmann JM (2003) Path based clustering for grouping smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25:1–6CrossRef
30.
Zurück zum Zitat Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
31.
Zurück zum Zitat Fonseca JRS, Cardoso MGMS (2007) Mixture-model cluster analysis using information theoretical criteria. Intell Data Anal 11:155–173 Fonseca JRS, Cardoso MGMS (2007) Mixture-model cluster analysis using information theoretical criteria. Intell Data Anal 11:155–173
32.
33.
34.
Zurück zum Zitat Hu X, Xu L (2004) Investigation on several model selection criteria for determining the number of clusters. Neural Inf Process Lett Rev 4:1–10 Hu X, Xu L (2004) Investigation on several model selection criteria for determining the number of clusters. Neural Inf Process Lett Rev 4:1–10
35.
Zurück zum Zitat Jain AK, Murty MN, Flyn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef Jain AK, Murty MN, Flyn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef
36.
Zurück zum Zitat Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
37.
Zurück zum Zitat Kohonen T (1997) Self-organizing maps. Springer, Heidelberg Kohonen T (1997) Self-organizing maps. Springer, Heidelberg
38.
Zurück zum Zitat Chen T, Chen L-K, MA K-K (1999) Colour image indexing using SOM for region-of-interest retrieval. Pattern Anal Appl 2(2):164–171CrossRefMathSciNet Chen T, Chen L-K, MA K-K (1999) Colour image indexing using SOM for region-of-interest retrieval. Pattern Anal Appl 2(2):164–171CrossRefMathSciNet
39.
Zurück zum Zitat Zhang S, Ganesan R, Xistris GD (1996) Self-organizing neural networks for automated machinery monitoring systems. Mech Syst Signal Process 10(5):517–532CrossRef Zhang S, Ganesan R, Xistris GD (1996) Self-organizing neural networks for automated machinery monitoring systems. Mech Syst Signal Process 10(5):517–532CrossRef
40.
Zurück zum Zitat Chen GW, Luo JB, Parker KJ (1998) Image segmentation via adaptive k means clustering and knowledge-based morphological operations with biomedical operations. IEEE Trans Image Process 7(12):1673–1683CrossRef Chen GW, Luo JB, Parker KJ (1998) Image segmentation via adaptive k means clustering and knowledge-based morphological operations with biomedical operations. IEEE Trans Image Process 7(12):1673–1683CrossRef
41.
Zurück zum Zitat Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49CrossRefMathSciNet Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49CrossRefMathSciNet
42.
Zurück zum Zitat Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 727–734 Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 727–734
43.
Zurück zum Zitat Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithm. Plenum Press, New York Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithm. Plenum Press, New York
44.
45.
Zurück zum Zitat Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. IEEE Trans Syst Man Cybern Part B 29(6):778–801CrossRef Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. IEEE Trans Syst Man Cybern Part B 29(6):778–801CrossRef
46.
Zurück zum Zitat Xu W, Nandi AK, Zhang J (2003) Novel fuzzy reinforcement learning vector quantization algorithm and its application in image compression. IEEE Proc Vis Image Signal Process 150(5):292–298CrossRef Xu W, Nandi AK, Zhang J (2003) Novel fuzzy reinforcement learning vector quantization algorithm and its application in image compression. IEEE Proc Vis Image Signal Process 150(5):292–298CrossRef
47.
Zurück zum Zitat Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, (KDD). Portland,OR, USA, pp 226–231 Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, (KDD). Portland,OR, USA, pp 226–231
48.
Zurück zum Zitat Ankerst M, Breuing M, Kriegel H, Sander J (1996) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the international conference on management of data, (SIGMOD). ACM Press, Philadelphia 28(2):49–60 Ankerst M, Breuing M, Kriegel H, Sander J (1996) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the international conference on management of data, (SIGMOD). ACM Press, Philadelphia 28(2):49–60
49.
Zurück zum Zitat Jack LB, Nandi AK (2004) Microarray data using the self organising oscillator network. In: Proceedings of EUSIPCO 2004, Vienna, Austria, pp 2183–2186 Jack LB, Nandi AK (2004) Microarray data using the self organising oscillator network. In: Proceedings of EUSIPCO 2004, Vienna, Austria, pp 2183–2186
50.
Zurück zum Zitat Von Luxburg U (2006) A tutorial on spectral clustering. Max Planck Institute for Biological Cybernetics. Technical report no. TR-149 Von Luxburg U (2006) A tutorial on spectral clustering. Max Planck Institute for Biological Cybernetics. Technical report no. TR-149
Metadaten
Titel
Development of assessment criteria for clustering algorithms
verfasst von
Sameh A. Salem
Asoke K. Nandi
Publikationsdatum
01.02.2009
Verlag
Springer-Verlag
Erschienen in
Pattern Analysis and Applications / Ausgabe 1/2009
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-007-0099-1

Weitere Artikel der Ausgabe 1/2009

Pattern Analysis and Applications 1/2009 Zur Ausgabe