Skip to main content
Top

2015 | OriginalPaper | Chapter

Evaluation of Relative Indexes for Multi-objective Clustering

Authors : Tomáš Bartoň, Pavel Kordík

Published in: Hybrid Artificial Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

One of the biggest challenges in clustering is finding a robust and versatile criterion to evaluate the quality of clustering results. In this paper, we investigate the extent to which unsupervised criteria can be used to obtain clusters highly correlated to external labels. We show that the usefulness of these criteria is data-dependent and for most data sets multiple criteria are required in order to identify the best performing clustering algorithm. We present a multi-objective evolutionary clustering algorithm capable of finding a set of high-quality solutions. For the real world data sets examined the Pareto front can offer better clusterings than simply optimizing a single unsupervised criterion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Proceedings of the Sixth International Conference on Data Mining. ICDM 2006, pp. 107–118. IEEE Computer Society, Washington, DC (2006) Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Proceedings of the Sixth International Conference on Data Mining. ICDM 2006, pp. 107–118. IEEE Computer Society, Washington, DC (2006)
2.
go back to reference Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective Data Clustering. In: CVPR, vol. 2, pp. 424–430 (2004) Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective Data Clustering. In: CVPR, vol. 2, pp. 424–430 (2004)
3.
go back to reference MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967) MacQueen, J.B.: Some Methods for Classification and Analysis of MultiVariate Observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
4.
go back to reference Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967) Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373–380 (1967)
5.
go back to reference Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)MathSciNet Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. (JMLR) 3, 583–617 (2002)MathSciNet
6.
go back to reference Xu, R., Wunsch, I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005) Xu, R., Wunsch, I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
7.
go back to reference Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Global optimization, meta clustering and consensus clustering for class prediction. In: Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009, pp. 1463–1470. IEEE Press, Piscataway (2009) Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Global optimization, meta clustering and consensus clustering for class prediction. In: Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009, pp. 1463–1470. IEEE Press, Piscataway (2009)
8.
go back to reference Halkidi, M., Vazirgiannis, M., Batistakis, Y.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)MATH Halkidi, M., Vazirgiannis, M., Batistakis, Y.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)MATH
9.
go back to reference Bartoň, T., Kordík, P.: Encoding time series data for better clustering results. In: Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E. (eds.) Int. Joint Conf. CISIS 2012-ICEUTE 2012-SOCO 2012. AISC, vol. 189, pp. 467–475. Springer, Heidelberg (2013) Bartoň, T., Kordík, P.: Encoding time series data for better clustering results. In: Herrero, Á., Snášel, V., Abraham, A., Zelinka, I., Baruque, B., Quintián, H., Calvo, J.L., Sedano, J., Corchado, E. (eds.) Int. Joint Conf. CISIS 2012-ICEUTE 2012-SOCO 2012. AISC, vol. 189, pp. 467–475. Springer, Heidelberg (2013)
10.
go back to reference Hubert, L., Levin, J.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976) Hubert, L., Levin, J.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976)
11.
go back to reference Milligan, G.W.: A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)CrossRefMATHMathSciNet Milligan, G.W.: A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)CrossRefMATHMathSciNet
12.
go back to reference Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50(2), 159–179 (1985)CrossRef Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 50(2), 159–179 (1985)CrossRef
13.
go back to reference Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007) Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
14.
go back to reference Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979) Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
15.
go back to reference Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MATHMathSciNet Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MATHMathSciNet
16.
go back to reference Hastie, T., Tibshirani, R., Friedman, J., Corporation, E.: The Elements of Statistical Learning. Springer, Dordrecht (2009) CrossRefMATH Hastie, T., Tibshirani, R., Friedman, J., Corporation, E.: The Elements of Statistical Learning. Springer, Dordrecht (2009) CrossRefMATH
17.
go back to reference Albatineh, A., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Classif. 23(2), 301–313 (2006)MathSciNet Albatineh, A., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Classif. 23(2), 301–313 (2006)MathSciNet
18.
go back to reference Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985) Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
19.
go back to reference Faceli, K., de Souto, M.C.P., de Araujo, D.S.A., de Carvalho, A.C.P.L.F.: Multi-objective clustering ensemble for gene expression data analysis. Neurocomputing 72(13–15), 2763–2774 (2009)CrossRef Faceli, K., de Souto, M.C.P., de Araujo, D.S.A., de Carvalho, A.C.P.L.F.: Multi-objective clustering ensemble for gene expression data analysis. Neurocomputing 72(13–15), 2763–2774 (2009)CrossRef
20.
go back to reference Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)MATHMathSciNet Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)MATHMathSciNet
21.
go back to reference Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987) Kvålseth, T.O.: Entropy and correlation: some comments. IEEE Trans. Syst. Man Cybern. 17(3), 517–519 (1987)
22.
go back to reference Tumer, K., Agogino, A.K.: Ensemble clustering with voting active clusters. Pattern Recogn. Lett. 29(14), 1947–1953 (2008) Tumer, K., Agogino, A.K.: Ensemble clustering with voting active clusters. Pattern Recogn. Lett. 29(14), 1947–1953 (2008)
23.
go back to reference He, Z., Xu, X., Deng, S.: k-ANMI: A mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008) He, Z., Xu, X., Deng, S.: k-ANMI: A mutual information based clustering algorithm for categorical data. Inf. Fusion 9(2), 223–233 (2008)
24.
go back to reference Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004) CrossRef Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004) CrossRef
25.
go back to reference Corne, D., Jerram, N., Knowles, J., Oates, M.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001) Corne, D., Jerram, N., Knowles, J., Oates, M.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001)
26.
go back to reference Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
27.
go back to reference Milligan, G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 325–342 (1980)CrossRef Milligan, G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 325–342 (1980)CrossRef
28.
go back to reference Milligan, G., Cooper, M.: A study of standardization of variables in cluster analysis. J. Classif. 5(2), 181–204 (1988)MathSciNet Milligan, G., Cooper, M.: A study of standardization of variables in cluster analysis. J. Classif. 5(2), 181–204 (1988)MathSciNet
Metadata
Title
Evaluation of Relative Indexes for Multi-objective Clustering
Authors
Tomáš Bartoň
Pavel Kordík
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-19644-2_39

Premium Partner