Skip to main content
Top

2017 | OriginalPaper | Chapter

New Representations in Genetic Programming for Feature Construction in k-Means Clustering

Authors : Andrew Lensen, Bing Xue, Mengjie Zhang

Published in: Simulated Evolution and Learning

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRef Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRef
2.
go back to reference García, A.J., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)CrossRef García, A.J., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)CrossRef
3.
go back to reference Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)MATH Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)MATH
4.
go back to reference Tseng, L.Y., Yang, S.B.: A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recogn. 33(7), 1251–1259 (2000)CrossRef Tseng, L.Y., Yang, S.B.: A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recogn. 33(7), 1251–1259 (2000)CrossRef
5.
go back to reference Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer Science & Business Media, Heidelberg (1998)CrossRefMATH Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer Science & Business Media, Heidelberg (1998)CrossRefMATH
6.
go back to reference Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C 40(2), 121–144 (2010)CrossRef Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C 40(2), 121–144 (2010)CrossRef
7.
go back to reference Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection, vol. 1. MIT press, Cambridge (1992)MATH Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection, vol. 1. MIT press, Cambridge (1992)MATH
8.
go back to reference Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2015)CrossRefMATH Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2015)CrossRefMATH
9.
go back to reference Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)CrossRef Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)CrossRef
10.
go back to reference Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)CrossRef Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)CrossRef
11.
go back to reference Nanda, S.J., Panda, G.: A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol. Comput. 16, 1–18 (2014)CrossRef Nanda, S.J., Panda, G.: A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol. Comput. 16, 1–18 (2014)CrossRef
12.
go back to reference Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2014)MATH Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, Boca Raton (2014)MATH
13.
go back to reference Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, pp. 226–231 (1996) Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, pp. 226–231 (1996)
14.
15.
go back to reference Boric, N., Estévez, P.A.: Genetic programming-based clustering using an information theoretic fitness measure. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007) Boric, N., Estévez, P.A.: Genetic programming-based clustering using an information theoretic fitness measure. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)
16.
go back to reference Ahn, C.W., Oh, S., Oh, M.: A genetic programming approach to data clustering. In: Kim, T., Adeli, H., Grosky, W.I., Pissinou, N., Shih, T.K., Rothwell, E.J., Kang, B.-H., Shin, S.-J. (eds.) MulGraB 2011. CCIS, vol. 263, pp. 123–132. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27186-1_15 CrossRef Ahn, C.W., Oh, S., Oh, M.: A genetic programming approach to data clustering. In: Kim, T., Adeli, H., Grosky, W.I., Pissinou, N., Shih, T.K., Rothwell, E.J., Kang, B.-H., Shin, S.-J. (eds.) MulGraB 2011. CCIS, vol. 263, pp. 123–132. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-27186-1_​15 CrossRef
17.
go back to reference Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRef Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRef
18.
go back to reference Lichman, M.: UCI machine learning repository (2013) Lichman, M.: UCI machine learning repository (2013)
Metadata
Title
New Representations in Genetic Programming for Feature Construction in k-Means Clustering
Authors
Andrew Lensen
Bing Xue
Mengjie Zhang
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-68759-9_44

Premium Partner