Skip to main content
Top

2018 | OriginalPaper | Chapter

Data Clustering Using Grouping Hyper-heuristics

Authors : Anas Elhag, Ender Özcan

Published in: Evolutionary Computation in Combinatorial Optimization

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Grouping problems represent a class of computationally hard to solve problems requiring optimal partitioning of a given set of items with respect to multiple criteria varying dependent on the domain. A recent work proposed a general-purpose selection hyper-heuristic search framework with reusable components, designed for rapid development of grouping hyper-heuristics to solve grouping problems. The framework was tested only on the graph colouring problem domain. Extending the previous work, this study compares the performance of selection hyper-heuristics implemented using the framework, pairing up various heuristic/operator selection and move acceptance methods for data clustering. The selection hyper-heuristic performs the search processing a single solution at any decision point and controls a fixed set of generic low level heuristics specifically designed for the grouping problems based on a bi-objective formulation. An archive of high quality solutions, capturing the trade-off between the number of clusters and overall error of clustering, is maintained during the search process. The empirical results verify the effectiveness of a successful selection hyper-heuristic, winner of a recent hyper-heuristic challenge for data clustering on a set of benchmark problem instances.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998)MATH Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998)MATH
2.
go back to reference Agustın-Blas, L.E., Salcedo-Sanz, S., Jiménez-Fernández, S., Carro-Calvo, L., Del Ser, J., Portilla-Figueras, J.A.: A new grouping genetic algorithm for clustering problems. Expert Syst. App. 39(10), 9695–9703 (2012)CrossRef Agustın-Blas, L.E., Salcedo-Sanz, S., Jiménez-Fernández, S., Carro-Calvo, L., Del Ser, J., Portilla-Figueras, J.A.: A new grouping genetic algorithm for clustering problems. Expert Syst. App. 39(10), 9695–9703 (2012)CrossRef
3.
go back to reference Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 39(12), 2464–2477 (2006)CrossRef Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 39(12), 2464–2477 (2006)CrossRef
4.
go back to reference Park, Y.J., Song, M.S.: A genetic algorithm for clustering problems. In: Koza, J.R., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, 22–25 July, pp. 568–575. Morgan Kaufmann (1998) Park, Y.J., Song, M.S.: A genetic algorithm for clustering problems. In: Koza, J.R., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, 22–25 July, pp. 568–575. Morgan Kaufmann (1998)
5.
go back to reference Burke, E.K., Gendreau, M., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: a survey of the state of the art. JORS 64(12), 1695–1724 (2013)CrossRef Burke, E.K., Gendreau, M., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: a survey of the state of the art. JORS 64(12), 1695–1724 (2013)CrossRef
6.
go back to reference Elhag, A., Özcan, E.: A grouping hyper-heuristic framework: application on graph colouring. Expert Syst. App. 42(13), 5491–5507 (2015)CrossRef Elhag, A., Özcan, E.: A grouping hyper-heuristic framework: application on graph colouring. Expert Syst. App. 42(13), 5491–5507 (2015)CrossRef
7.
go back to reference Talbi, E.G., Bessiere, P.: A parallel genetic algorithm for the graph partitioning problem. In: Proceedings of the 5th International Conference on Supercomputing, pp. 312–320. ACM (1991) Talbi, E.G., Bessiere, P.: A parallel genetic algorithm for the graph partitioning problem. In: Proceedings of the 5th International Conference on Supercomputing, pp. 312–320. ACM (1991)
8.
go back to reference Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRef Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRef
10.
go back to reference Radcliffe, N.J.: Formal analysis and random respectful recombination. In: Proceedings of the 4th International Conference on Genetic Algorithm, pp. 222–229 (1991) Radcliffe, N.J.: Formal analysis and random respectful recombination. In: Proceedings of the 4th International Conference on Genetic Algorithm, pp. 222–229 (1991)
11.
go back to reference Radcliffe, N.J., Surry, P.D.: Fitness variance of formae and performance prediction. In: Whitley, L.D., Vose, M.D. (eds.) FOGA, pp. 51–72. Morgan Kaufmann Publishers Inc. (1994) Radcliffe, N.J., Surry, P.D.: Fitness variance of formae and performance prediction. In: Whitley, L.D., Vose, M.D. (eds.) FOGA, pp. 51–72. Morgan Kaufmann Publishers Inc. (1994)
12.
go back to reference Falkenauer, E.: The grouping genetic algorithms: widening the scope of the GAs. Belg. J. Oper. Res., Stat. Comput. Sci. (JORBEL), 33(1–2), 79–102 (1992) Falkenauer, E.: The grouping genetic algorithms: widening the scope of the GAs. Belg. J. Oper. Res., Stat. Comput. Sci. (JORBEL), 33(1–2), 79–102 (1992)
13.
go back to reference Brown, C.E., Sumichrast, R.T.: Impact of the replacement heuristic in a grouping genetic algorithm. Comput. & OR 30(11), 1575–1593 (2003)CrossRef Brown, C.E., Sumichrast, R.T.: Impact of the replacement heuristic in a grouping genetic algorithm. Comput. & OR 30(11), 1575–1593 (2003)CrossRef
14.
go back to reference Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17, 107–145 (2001)CrossRef Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17, 107–145 (2001)CrossRef
15.
go back to reference Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRef Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRef
16.
go back to reference Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRef Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRef
17.
go back to reference Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef
18.
go back to reference Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Upper Saddle River (1988)MATH Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Upper Saddle River (1988)MATH
19.
go back to reference Chang, D.X., Zhang, X.D., Zheng, C.W.: A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recogn. 42(7), 1210–1222 (2009)CrossRef Chang, D.X., Zhang, X.D., Zheng, C.W.: A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recogn. 42(7), 1210–1222 (2009)CrossRef
20.
go back to reference MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Number 14 in 1, California, USA, pp. 281–297 (1967) MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Number 14 in 1, California, USA, pp. 281–297 (1967)
21.
go back to reference Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B 39(1), 1–38 (1977)MathSciNetMATH Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B 39(1), 1–38 (1977)MathSciNetMATH
22.
go back to reference Voorhees, E.M.: The effectiveness and efficiency of agglomerative hierarchical clustering in document retrieval. Ph.D. thesis (1985) Voorhees, E.M.: The effectiveness and efficiency of agglomerative hierarchical clustering in document retrieval. Ph.D. thesis (1985)
23.
24.
go back to reference Ankerst, M., Breunig, M.M., Peter Kriegel, H., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 1999, pp. 49–60. ACM Press (1999) Ankerst, M., Breunig, M.M., Peter Kriegel, H., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 1999, pp. 49–60. ACM Press (1999)
25.
go back to reference Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 24–45 (2004)CrossRef Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 24–45 (2004)CrossRef
26.
go back to reference Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn. 41(9), 2742–2756 (2008)CrossRef Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn. 41(9), 2742–2756 (2008)CrossRef
27.
go back to reference Özcan, E., Misir, M., Ochoa, G., Burke, E.K.: A reinforcement learning-great-deluge hyper-heuristic for examination timetabling. Int. J. Appl. Metaheuristic Comput. 1(1), 39–59 (2010)CrossRef Özcan, E., Misir, M., Ochoa, G., Burke, E.K.: A reinforcement learning-great-deluge hyper-heuristic for examination timetabling. Int. J. Appl. Metaheuristic Comput. 1(1), 39–59 (2010)CrossRef
28.
29.
go back to reference Dueck, G.: New optimization heuristics: the great deluge algorithm and the record-to-record travel. J. Comput. Phys. 104, 86–92 (1993)CrossRef Dueck, G.: New optimization heuristics: the great deluge algorithm and the record-to-record travel. J. Comput. Phys. 104, 86–92 (1993)CrossRef
30.
go back to reference Misir, M., Verbeeck, K., Causmaecker, P.D., Berghe, G.V.: A new hyper-heuristic as a general problem solver: an implementation in hyflex. J. Sched. 16(3), 291–311 (2013)MathSciNetCrossRef Misir, M., Verbeeck, K., Causmaecker, P.D., Berghe, G.V.: A new hyper-heuristic as a general problem solver: an implementation in hyflex. J. Sched. 16(3), 291–311 (2013)MathSciNetCrossRef
31.
go back to reference Burke, E., Curtois, T., Hyde, M., Kendall, G., Ochoa, G., Petrovic, S., Vazquez-Rodriguez, J.: Hyflex: a flexible framework for the design and analysis of hyper-heuristics. In: Proceedings of the Multidisciplinary International Scheduling Conference (MISTA09), pp. 790–797 (2009) Burke, E., Curtois, T., Hyde, M., Kendall, G., Ochoa, G., Petrovic, S., Vazquez-Rodriguez, J.: Hyflex: a flexible framework for the design and analysis of hyper-heuristics. In: Proceedings of the Multidisciplinary International Scheduling Conference (MISTA09), pp. 790–797 (2009)
32.
go back to reference Bache, K., Lichman, M.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2013) Bache, K., Lichman, M.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2013)
Metadata
Title
Data Clustering Using Grouping Hyper-heuristics
Authors
Anas Elhag
Ender Özcan
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-77449-7_7

Premium Partner