Skip to main content

2015 | OriginalPaper | Buchkapitel

Evaluation of Relative Indexes for Multi-objective Clustering

verfasst von : Tomáš Bartoň, Pavel Kordík

Erschienen in: Hybrid Artificial Intelligent Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the biggest challenges in clustering is finding a robust and versatile criterion to evaluate the quality of clustering results. In this paper, we investigate the extent to which unsupervised criteria can be used to obtain clusters highly correlated to external labels. We show that the usefulness of these criteria is data-dependent and for most data sets multiple criteria are required in order to identify the best performing clustering algorithm. We present a multi-objective evolutionary clustering algorithm capable of finding a set of high-quality solutions. For the real world data sets examined the Pareto front can offer better clusterings than simply optimizing a single unsupervised criterion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Proceedings of the Sixth International Conference on Data Mining. ICDM 2006, pp. 107–118. IEEE Computer Society, Washington, DC (2006) Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Proceedings of the Sixth International Conference on Data Mining. ICDM 2006, pp. 107–118. IEEE Computer Society, Washington, DC (2006)
2.
Zurück zum Zitat Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective Data Clustering. In: CVPR, vol. 2, pp. 424–430 (2004) Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective Data Clustering. In: CVPR, vol. 2, pp. 424–430 (2004)
3.
Zurück zum Zitat MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967) MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
4.
Zurück zum Zitat Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967) Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967)
5.
Zurück zum Zitat Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)MathSciNet Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)MathSciNet
6.
Zurück zum Zitat Xu, R., Wunsch, I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005) Xu, R., Wunsch, I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
7.
Zurück zum Zitat Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Global optimization, meta clustering and consensus clustering for class prediction. In: Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009, pp. 1463–1470. IEEE Press, Piscataway (2009) Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Global optimization, meta clustering and consensus clustering for class prediction. In: Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009, pp. 1463–1470. IEEE Press, Piscataway (2009)
8.
Zurück zum Zitat Halkidi, M., Vazirgiannis, M., Batistakis, Y.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)MATH Halkidi, M., Vazirgiannis, M., Batistakis, Y.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)MATH
9.
Zurück zum Zitat Bartoň, T., Kordík, P.: Encoding time series data for better clustering results. In: Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E. (eds.) Int. Joint Conf. CISIS 2012-ICEUTE 2012-SOCO 2012. AISC, vol. 189, pp. 467–475. Springer, Heidelberg (2013) Bartoň, T., Kordík, P.: Encoding time series data for better clustering results. In: Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E. (eds.) Int. Joint Conf. CISIS 2012-ICEUTE 2012-SOCO 2012. AISC, vol. 189, pp. 467–475. Springer, Heidelberg (2013)
10.
Zurück zum Zitat Hubert, L., Levin, J.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976) Hubert, L., Levin, J.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976)
11.
Zurück zum Zitat Milligan, G.W.: A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)CrossRefMATHMathSciNet Milligan, G.W.: A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)CrossRefMATHMathSciNet
12.
Zurück zum Zitat Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50(2), 159–179 (1985)CrossRef Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50(2), 159–179 (1985)CrossRef
13.
Zurück zum Zitat Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007) Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
14.
Zurück zum Zitat Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979) Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
15.
Zurück zum Zitat Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MATHMathSciNet Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MATHMathSciNet
16.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J., Corporation, E.: The Elements of Statistical Learning. Springer, Dordrecht (2009) CrossRefMATH Hastie, T., Tibshirani, R., Friedman, J., Corporation, E.: The Elements of Statistical Learning. Springer, Dordrecht (2009) CrossRefMATH
17.
Zurück zum Zitat Albatineh, A., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Classif. 23(2), 301–313 (2006)MathSciNet Albatineh, A., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Classif. 23(2), 301–313 (2006)MathSciNet
18.
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985) Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
19.
Zurück zum Zitat Faceli, K., de Souto, M.C.P., de Araujo, D.S.A., de Carvalho, A.C.P.L.F.: Multi-objective clustering ensemble for gene expression data analysis. Neurocomputing 72(13–15), 2763–2774 (2009)CrossRef Faceli, K., de Souto, M.C.P., de Araujo, D.S.A., de Carvalho, A.C.P.L.F.: Multi-objective clustering ensemble for gene expression data analysis. Neurocomputing 72(13–15), 2763–2774 (2009)CrossRef
20.
Zurück zum Zitat Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)MATHMathSciNet Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)MATHMathSciNet
21.
Zurück zum Zitat Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987) Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)
22.
Zurück zum Zitat Tumer, K., Agogino, A.K.: Ensemble clustering with voting active clusters. Pattern Recogn. Lett. 29(14), 1947–1953 (2008) Tumer, K., Agogino, A.K.: Ensemble clustering with voting active clusters. Pattern Recogn. Lett. 29(14), 1947–1953 (2008)
23.
Zurück zum Zitat He, Z., Xu, X., Deng, S.: k-ANMI: A mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008) He, Z., Xu, X., Deng, S.: k-ANMI: A mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008)
24.
Zurück zum Zitat Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004) CrossRef Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004) CrossRef
25.
Zurück zum Zitat Corne, D., Jerram, N., Knowles, J., Oates, M.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001) Corne, D., Jerram, N., Knowles, J., Oates, M.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001)
26.
Zurück zum Zitat Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
27.
Zurück zum Zitat Milligan, G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 325–342 (1980)CrossRef Milligan, G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 325–342 (1980)CrossRef
28.
Zurück zum Zitat Milligan, G., Cooper, M.: A study of standardization of variables in cluster analysis. J. Classif. 5(2), 181–204 (1988)MathSciNet Milligan, G., Cooper, M.: A study of standardization of variables in cluster analysis. J. Classif. 5(2), 181–204 (1988)MathSciNet
Metadaten
Titel
Evaluation of Relative Indexes for Multi-objective Clustering
verfasst von
Tomáš Bartoň
Pavel Kordík
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19644-2_39