Skip to main content

2012 | OriginalPaper | Buchkapitel

5. Selecting External Validation Measures for K-means Clustering

verfasst von : Junjie Wu

Erschienen in: Advances in K-means Clustering

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cluster validity is a long standing challenge in the clustering literature. While many evaluation measures have been developed for cluster validity, these measures often provide inconsistent information about the clustering performance, and the best suitable measures to use remain unclear in practice. Our study in this chapter fills this crucial void by giving an organized study of sixteen external validation measures for K-means clustering. Specifically, we first propose a filtering criterion based on the uniform effect of K-means, and apply it for the identification of defective measures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. Methods Mol. Biol. 224 (2003), 159–182 (2003) Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. Methods Mol. Biol. 224 (2003), 159–182 (2003)
Zurück zum Zitat Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.: Model-based evaluation of clustering validation measures. Pattern Recognit. 40 , 807–824 (2007) MATHCrossRef Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.: Model-based evaluation of clustering validation measures. Pattern Recognit. 40 , 807–824 (2007) MATHCrossRef
Zurück zum Zitat Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35 (2), 137–154 (2000) MathSciNetMATHCrossRef Childs, A., Balakrishnan, N.: Some approximations to the multivariate hypergeometric distribution with applications to hypothesis testing. Comput. Stat. Data Anal. 35 (2), 137–154 (2000) MathSciNetMATHCrossRef
Zurück zum Zitat Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley-Interscience, New York (2006)MATH Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley-Interscience, New York (2006)MATH
Zurück zum Zitat Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003) Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
Zurück zum Zitat van Dongen, S.: Performance criteria for graph clustering and markov cluster experiments. Technical report, Amsterdam, The Netherlands (2000) van Dongen, S.: Performance criteria for graph clustering and markov cluster experiments. Technical report, Amsterdam, The Netherlands (2000)
Zurück zum Zitat Fowlkes, E., Mallows, C.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc 78 , 553–569 (1983) MATHCrossRef Fowlkes, E., Mallows, C.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc 78 , 553–569 (1983) MATHCrossRef
Zurück zum Zitat Goodman, L., Kruskal, W.: Measures of association for cross classification. J. Am. Stat. Assoc 49 , 732–764 (1954) MATH Goodman, L., Kruskal, W.: Measures of association for cross classification. J. Am. Stat. Assoc 49 , 732–764 (1954) MATH
Zurück zum Zitat Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Rec. 31 (2), 40–45 (2002) CrossRef Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Rec. 31 (2), 40–45 (2002) CrossRef
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2 , 193–218 (1985) CrossRef Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2 , 193–218 (1985) CrossRef
Zurück zum Zitat Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)MATH Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)MATH
Zurück zum Zitat Kendall, M.: Rank Correlation Methods. Hafner Publishing Company, New York (1955)MATH Kendall, M.: Rank Correlation Methods. Hafner Publishing Company, New York (1955)MATH
Zurück zum Zitat Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999) Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Zurück zum Zitat Meila, M.: Comparing clusterings by the variation of information. In: Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 173–187 (2003) Meila, M.: Comparing clusterings by the variation of information. In: Proceedings of the 16th Annual Conference on Computational Learning Theory, pp. 173–187 (2003)
Zurück zum Zitat Meila, M.: Comparing clusterings–an axiomatic view. In: Proceedings of the 22nd International Conference on Machine learning, pp. 577–584 (2005) Meila, M.: Comparing clusterings–an axiomatic view. In: Proceedings of the 22nd International Conference on Machine learning, pp. 577–584 (2005)
Zurück zum Zitat Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Dordrecht (1996)MATHCrossRef Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Dordrecht (1996)MATHCrossRef
Zurück zum Zitat Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66 , 846–850 (1971) CrossRef Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66 , 846–850 (1971) CrossRef
Zurück zum Zitat Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979) Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Zurück zum Zitat Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000) Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text Mining (2000)
Zurück zum Zitat Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the AAAI Workshop on AI for Web Search (2000) Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Proceedings of the AAAI Workshop on AI for Web Search (2000)
Zurück zum Zitat Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 877–886. New York, NY, USA (2009) Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 877–886. New York, NY, USA (2009)
Zurück zum Zitat Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Mach. Learn. 55 (3), 311–331 (2004) MATHCrossRef Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Mach. Learn. 55 (3), 311–331 (2004) MATHCrossRef
Zurück zum Zitat Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8 (3), 374–384 (2005) CrossRef Zhong, S., Ghosh, J.: Generative model-based document clustering: a comparative study. Knowl. Inf. Syst. 8 (3), 374–384 (2005) CrossRef
Metadaten
Titel
Selecting External Validation Measures for K-means Clustering
verfasst von
Junjie Wu
Copyright-Jahr
2012
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-29807-3_5