Skip to main content
Erschienen in: Memetic Computing 1/2016

01.03.2016 | Regular Research Paper

Genetic programming for feature construction and selection in classification on high-dimensional data

verfasst von: Binh Tran, Bing Xue, Mengjie Zhang

Erschienen in: Memetic Computing | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahmed S, Zhang M, Peng L (2012) Genetic programming for biomarker detection in mass spectrometry data. In: Advances in Artificial Intelligence, Lecture Notes in Computer Science vol. 7691, pp 266–278 Ahmed S, Zhang M, Peng L (2012) Genetic programming for biomarker detection in mass spectrometry data. In: Advances in Artificial Intelligence, Lecture Notes in Computer Science vol. 7691, pp 266–278
2.
Zurück zum Zitat Ahmed S, Zhang M, Peng L (2013) Enhanced feature selection for biomarker discovery in lc-ms data using gp. In: IEEE Congress on Evolutionary Computation (CEC’13), pp 584–591 Ahmed S, Zhang M, Peng L (2013) Enhanced feature selection for biomarker discovery in lc-ms data using gp. In: IEEE Congress on Evolutionary Computation (CEC’13), pp 584–591
3.
Zurück zum Zitat Ahmed S, Zhang M, Peng L, Xue B (2014) Multiple feature construction for effective biomarker identification and classification using genetic programming. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO ’14, ACM, pp 249–256 Ahmed S, Zhang M, Peng L, Xue B (2014) Multiple feature construction for effective biomarker identification and classification using genetic programming. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO ’14, ACM, pp 249–256
4.
Zurück zum Zitat Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci 99(10):6562–6566CrossRefMATH Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci 99(10):6562–6566CrossRefMATH
5.
Zurück zum Zitat Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann Publishers Inc, USACrossRefMATH Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann Publishers Inc, USACrossRefMATH
6.
Zurück zum Zitat Bhowan U, Johnston M, Zhang M, Yao X (2014) Reusing genetic programming for ensemble selection in classification of unbalanced data. Evolut Comput IEEE Trans 18(6):893–908CrossRef Bhowan U, Johnston M, Zhang M, Yao X (2014) Reusing genetic programming for ensemble selection in classification of unbalanced data. Evolut Comput IEEE Trans 18(6):893–908CrossRef
7.
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28CrossRef
8.
Zurück zum Zitat De Stefano C, Fontanella F, Marrocco C, di Freca AS (2014) A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit Lett 35:130–141CrossRef De Stefano C, Fontanella F, Marrocco C, di Freca AS (2014) A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit Lett 35:130–141CrossRef
9.
Zurück zum Zitat Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 03(02):185–205CrossRef Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 03(02):185–205CrossRef
11.
Zurück zum Zitat Estébanez C, Valls JM, Aler R (2008) Gppe: a method to generate ad-hoc feature extractors for prediction in financial domains. Appl Intell 29(2):174–185CrossRef Estébanez C, Valls JM, Aler R (2008) Gppe: a method to generate ad-hoc feature extractors for prediction in financial domains. Appl Intell 29(2):174–185CrossRef
12.
Zurück zum Zitat Guo H, Nandi AK (2006) Breast cancer diagnosis using genetic programming generated feature. Pattern Recognit 39(5):980–987CrossRef Guo H, Nandi AK (2006) Breast cancer diagnosis using genetic programming generated feature. Pattern Recognit 39(5):980–987CrossRef
13.
Zurück zum Zitat Guo L, Rivero D, Dorado J, Munteanu CR, Pazos A (2011) Automatic feature extraction using genetic programming: An application to epileptic eeg classification. Expert Syst Appl 38(8):10425–10436CrossRef Guo L, Rivero D, Dorado J, Munteanu CR, Pazos A (2011) Automatic feature extraction using genetic programming: An application to epileptic eeg classification. Expert Syst Appl 38(8):10425–10436CrossRef
14.
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11:931–934CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11:931–934CrossRef
15.
Zurück zum Zitat Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefMATH Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefMATH
16.
Zurück zum Zitat Krawiec K (2002) Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet Program Evolv Mach 3:329–343CrossRefMATH Krawiec K (2002) Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet Program Evolv Mach 3:329–343CrossRefMATH
17.
Zurück zum Zitat Krawiec K (2010) Evolutionary feature selection and construction. In: Encyclopedia of Machine Learning, Springer, pp 353–357 Krawiec K (2010) Evolutionary feature selection and construction. In: Encyclopedia of Machine Learning, Springer, pp 353–357
18.
Zurück zum Zitat Langdon WB, Buxton BF (2004) Genetic programming for mining dna chip data from cancer patients. Genet Program Evolv Mach 5(3):251–257CrossRef Langdon WB, Buxton BF (2004) Genetic programming for mining dna chip data from cancer patients. Genet Program Evolv Mach 5(3):251–257CrossRef
19.
Zurück zum Zitat Lin Y, Bhanu B (2005) Evolutionary feature synthesis for object recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 35(2):156–171CrossRef Lin Y, Bhanu B (2005) Evolutionary feature synthesis for object recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 35(2):156–171CrossRef
20.
Zurück zum Zitat Lones M, Smith SL, Alty JE, Lacy SE, Possin KL, Jamieson D, Tyrrell AM et al (2014) Evolving classifiers to recognize the movement characteristics of parkinson’s disease patients. Evolut Comput IEEE Trans 18(4):559–576CrossRef Lones M, Smith SL, Alty JE, Lacy SE, Possin KL, Jamieson D, Tyrrell AM et al (2014) Evolving classifiers to recognize the movement characteristics of parkinson’s disease patients. Evolut Comput IEEE Trans 18(4):559–576CrossRef
21.
Zurück zum Zitat Mohamad M, Omatu S, Deris S, Yoshioka M, Abdullah A, Ibrahim Z (2013) An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes. Algorithms Mol Biol 8(1):15 Mohamad M, Omatu S, Deris S, Yoshioka M, Abdullah A, Ibrahim Z (2013) An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes. Algorithms Mol Biol 8(1):15
22.
Zurück zum Zitat Muharram M, Smith G (2005) Evolutionary constructive induction. IEEE Trans Knowl Data Eng 17:1518–1528CrossRef Muharram M, Smith G (2005) Evolutionary constructive induction. IEEE Trans Knowl Data Eng 17:1518–1528CrossRef
23.
Zurück zum Zitat Nekkaa M, Boughaci D (2015) A memetic algorithm with support vector machine for feature selection and classification. Memet Comput 7(1):59–73CrossRef Nekkaa M, Boughaci D (2015) A memetic algorithm with support vector machine for feature selection and classification. Memet Comput 7(1):59–73CrossRef
24.
Zurück zum Zitat Neshatian K, Zhang M (2009) Dimensionality reduction in face detection: A genetic programming approach. In: 24th International Conference on Image and Vision Computing, pp 391–396 Neshatian K, Zhang M (2009) Dimensionality reduction in face detection: A genetic programming approach. In: 24th International Conference on Image and Vision Computing, pp 391–396
25.
Zurück zum Zitat Neshatian K, Zhang M (2011) Using genetic programming for context-sensitive feature scoring in classification problems. Connect Sci 23(3):183–207CrossRef Neshatian K, Zhang M (2011) Using genetic programming for context-sensitive feature scoring in classification problems. Connect Sci 23(3):183–207CrossRef
26.
Zurück zum Zitat Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evolut Comput 16(5):645–661CrossRef Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evolut Comput 16(5):645–661CrossRef
27.
Zurück zum Zitat Patterson G, Zhang M (2007) Fitness functions in genetic programming for classification with unbalanced data. In: AI 2007: Advances in Artificial Intelligence, Springer, pp 769–775 Patterson G, Zhang M (2007) Fitness functions in genetic programming for classification with unbalanced data. In: AI 2007: Advances in Artificial Intelligence, Springer, pp 769–775
28.
Zurück zum Zitat Russell S, Norvig P (2009) Artificial Intelligence: a modern approach, 3rd edn. Prentice Hall Press, USA Russell S, Norvig P (2009) Artificial Intelligence: a modern approach, 3rd edn. Prentice Hall Press, USA
29.
Zurück zum Zitat Smith M, Bull L (2005) Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genet Program Evol Mach 6:265–281CrossRef Smith M, Bull L (2005) Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genet Program Evol Mach 6:265–281CrossRef
30.
Zurück zum Zitat Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21:631–643CrossRef Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21:631–643CrossRef
31.
Zurück zum Zitat Wang P, Emmerich M, Li R, Tang K, Back T, Yao X (2015) Convex hull-based multiobjective genetic programming for maximizing receiver operating characteristic performance. Evolut Comput IEEE Trans 19(2):188–200CrossRef Wang P, Emmerich M, Li R, Tang K, Back T, Yao X (2015) Convex hull-based multiobjective genetic programming for maximizing receiver operating characteristic performance. Evolut Comput IEEE Trans 19(2):188–200CrossRef
32.
Zurück zum Zitat Yu H, Gu G, Liu H, Shen J, Zhao J (2009) A modified ant colony optimization algorithm for tumor marker gene selection. Genom Proteom Bioinf 7(4):200–208CrossRef Yu H, Gu G, Liu H, Shen J, Zhao J (2009) A modified ant colony optimization algorithm for tumor marker gene selection. Genom Proteom Bioinf 7(4):200–208CrossRef
33.
Zurück zum Zitat Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248CrossRefMATH Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248CrossRefMATH
Metadaten
Titel
Genetic programming for feature construction and selection in classification on high-dimensional data
verfasst von
Binh Tran
Bing Xue
Mengjie Zhang
Publikationsdatum
01.03.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Memetic Computing / Ausgabe 1/2016
Print ISSN: 1865-9284
Elektronische ISSN: 1865-9292
DOI
https://doi.org/10.1007/s12293-015-0173-y

Weitere Artikel der Ausgabe 1/2016

Memetic Computing 1/2016 Zur Ausgabe

Editorial

Editorial