Skip to main content
Erschienen in: Soft Computing 3/2016

14.12.2014 | Methodologies and Application

Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA

verfasst von: Javier Arellano-Verdejo, Enrique Alba, Salvador Godoy-Calderon

Erschienen in: Soft Computing | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering algorithms, a fundamental base for data mining procedures and learning techniques, suffer from the lack of efficient methods for determining the optimal number of clusters to be found in an arbitrary dataset. The few methods existing in the literature always use some sort of evolutionary algorithm having a cluster validation index as its objective function. In this article, a new evolutionary algorithm, based on a hybrid model of global and local heuristic search, is proposed for the same task, and some experimentation is done with different datasets and indexes. Due to its design, independent of any clustering procedure, it is applicable to virtually any clustering method like the widely used \(k\)-means algorithm. Moreover, the use of non-parametric statistical tests over the experimental results, clearly show the proposed algorithm to be more efficient than other evolutionary algorithms currently used for the same task.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arabas J, Michalewicz Z, Mulawka J (1994) GAVaPS-a genetic algorithm with varying population size. In: Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence. IEEE, pp 73–78 Arabas J, Michalewicz Z, Mulawka J (1994) GAVaPS-a genetic algorithm with varying population size. In: Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence. IEEE, pp 73–78
Zurück zum Zitat Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on k-means algorithm for optimal clustering in rn. Inf Sci 146(1):221–237CrossRefMathSciNetMATH Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on k-means algorithm for optimal clustering in rn. Inf Sci 146(1):221–237CrossRefMathSciNetMATH
Zurück zum Zitat Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208CrossRefMATH Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208CrossRefMATH
Zurück zum Zitat Bellis MA, Jarman I, Downing J, Perkins C, Beynon C, Hughes K, Lisboa P (2012) Using clustering techniques to identify localities with multiple health and social needs. Health Place 18(2):138–143CrossRef Bellis MA, Jarman I, Downing J, Perkins C, Beynon C, Hughes K, Lisboa P (2012) Using clustering techniques to identify localities with multiple health and social needs. Health Place 18(2):138–143CrossRef
Zurück zum Zitat Cao J, Wu Z, Wu J, Liu W (2012) Towards information-theoretic k-means clustering for image indexing. Signal Process 39(2):1–12 Cao J, Wu Z, Wu J, Liu W (2012) Towards information-theoretic k-means clustering for image indexing. Signal Process 39(2):1–12
Zurück zum Zitat Chang L, Duarte MM, Sucar L, Morales EF (2012) A bayesian approach for object classification based on clusters of sift local features. Expert Syst Appl 39(2):1679–1686CrossRef Chang L, Duarte MM, Sucar L, Morales EF (2012) A bayesian approach for object classification based on clusters of sift local features. Expert Syst Appl 39(2):1679–1686CrossRef
Zurück zum Zitat Cortina-Borja M (2012) Handbook of parametric and nonparametric statistical procedures. J R Stat Soc: Ser A (Stat Soc) 175(3):829–829CrossRef Cortina-Borja M (2012) Handbook of parametric and nonparametric statistical procedures. J R Stat Soc: Ser A (Stat Soc) 175(3):829–829CrossRef
Zurück zum Zitat Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. Syst Man Cybern Part A: Syst Hum IEEE Trans 38(1):218–237CrossRef Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. Syst Man Cybern Part A: Syst Hum IEEE Trans 38(1):218–237CrossRef
Zurück zum Zitat Davies David L, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intel 2:224–227CrossRef Davies David L, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intel 2:224–227CrossRef
Zurück zum Zitat Franek L, Abdala D, Vega-Pons S, Jiang X (2011) Image segmentation fusion using general ensemble clustering methods. Comput Vis-ACCV 2010:373–384 Franek L, Abdala D, Vega-Pons S, Jiang X (2011) Image segmentation fusion using general ensemble clustering methods. Comput Vis-ACCV 2010:373–384
Zurück zum Zitat Garcia S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms? behaviour: a case study on the cec 2005 special session on real parameter optimization. J Heuristics 15(6):617–644CrossRefMATH Garcia S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms? behaviour: a case study on the cec 2005 special session on real parameter optimization. J Heuristics 15(6):617–644CrossRefMATH
Zurück zum Zitat Gordon AD (1999) Classification. Chapman & Hall/CRC Monographs on Statistics & Applied Probability Gordon AD (1999) Classification. Chapman & Hall/CRC Monographs on Statistics & Applied Probability
Zurück zum Zitat Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit 41(9):2742–2756CrossRefMATH Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit 41(9):2742–2756CrossRefMATH
Zurück zum Zitat Jarboui B, Cheikh M, Siarry P, Rebai A (2007) Combinatorial particle swarm optimization (cpso) for partitional clustering problem. Appl Math Comput 192(2):337–345CrossRefMathSciNetMATH Jarboui B, Cheikh M, Siarry P, Rebai A (2007) Combinatorial particle swarm optimization (cpso) for partitional clustering problem. Appl Math Comput 192(2):337–345CrossRefMathSciNetMATH
Zurück zum Zitat Kanade PM, Hall LO (2003) Fuzzy ants as a clustering concept. In: Fuzzy Information Processing Society, 2003. NAFIPS 2003. 22nd International Conference of the North American, pp 227–232. IEEE Kanade PM, Hall LO (2003) Fuzzy ants as a clustering concept. In: Fuzzy Information Processing Society, 2003. NAFIPS 2003. 22nd International Conference of the North American, pp 227–232. IEEE
Zurück zum Zitat Kwedlo W (2011) A clustering method combining differential evolution with the \(k\)-means algorithm. Pattern Recognit Lett 32(12):1613–1621 Kwedlo W (2011) A clustering method combining differential evolution with the \(k\)-means algorithm. Pattern Recognit Lett 32(12):1613–1621
Zurück zum Zitat Lee W-P, Chen SW (2010) Automatic clustering with differential evolution using a cluster number oscillation method. Intelligent Systems and Applications pp 218–237 Lee W-P, Chen SW (2010) Automatic clustering with differential evolution using a cluster number oscillation method. Intelligent Systems and Applications pp 218–237
Zurück zum Zitat Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004) Fgka: a fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM symposium on Applied computing, pp 622–623. ACM Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004) Fgka: a fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM symposium on Applied computing, pp 622–623. ACM
Zurück zum Zitat Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern 24(12):1650–1654CrossRef Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern 24(12):1650–1654CrossRef
Zurück zum Zitat Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179CrossRef Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179CrossRef
Zurück zum Zitat Omran M, Engelbrecht AP, Salman A (2005) Particle swarm optimization method for image clustering. Int J Pattern Recognit Artif Intel 19(03):297–321CrossRef Omran M, Engelbrecht AP, Salman A (2005) Particle swarm optimization method for image clustering. Int J Pattern Recognit Artif Intel 19(03):297–321CrossRef
Zurück zum Zitat Parsopoulos KE (2009) Cooperative micro-differential evolution for high-dimensional problems. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp 531–538. ACM Parsopoulos KE (2009) Cooperative micro-differential evolution for high-dimensional problems. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp 531–538. ACM
Zurück zum Zitat Saha I, Maulik U, Bandyopadhyay S (2009) A new differential evolution based fuzzy clustering for automatic cluster evolution. Advance Computing Conference, 2009. IACC 2009. IEEE International pp 706–711 Saha I, Maulik U, Bandyopadhyay S (2009) A new differential evolution based fuzzy clustering for automatic cluster evolution. Advance Computing Conference, 2009. IACC 2009. IEEE International pp 706–711
Zurück zum Zitat Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359CrossRefMathSciNetMATH Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359CrossRefMathSciNetMATH
Zurück zum Zitat Villa A, Chanussot J, Benediktsson JA, Jutten C, Dambreville R (2012) Unsupervised methods for the classification of hyperspectral images with low spatial resolution. Pattern Recognit 46(6):1556–1568CrossRef Villa A, Chanussot J, Benediktsson JA, Jutten C, Dambreville R (2012) Unsupervised methods for the classification of hyperspectral images with low spatial resolution. Pattern Recognit 46(6):1556–1568CrossRef
Zurück zum Zitat Xie XL, Beni GA (1991) Validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–847CrossRef Xie XL, Beni GA (1991) Validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–847CrossRef
Zurück zum Zitat Yan H, Chen K, Liu L, Yi Z (2010) Scale: a scalable framework for efficiently clustering transactional data. Data Min Knowl Discov 20(1):1–27CrossRefMathSciNet Yan H, Chen K, Liu L, Yi Z (2010) Scale: a scalable framework for efficiently clustering transactional data. Data Min Knowl Discov 20(1):1–27CrossRefMathSciNet
Zurück zum Zitat Yang Y, Liao Y (2011) A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst Appl 38(9):1311–1320MathSciNet Yang Y, Liao Y (2011) A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst Appl 38(9):1311–1320MathSciNet
Metadaten
Titel
Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA
verfasst von
Javier Arellano-Verdejo
Enrique Alba
Salvador Godoy-Calderon
Publikationsdatum
14.12.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 3/2016
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1548-6

Weitere Artikel der Ausgabe 3/2016

Soft Computing 3/2016 Zur Ausgabe

Methodologies and Application

A twice face recognition algorithm

Premium Partner