Skip to main content
Top

2017 | OriginalPaper | Chapter

Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification

Authors : William La Cava, Sara Silva, Leonardo Vanneschi, Lee Spector, Jason Moore

Published in: Applications of Evolutionary Computation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present a new classification method that uses genetic programming (GP) to evolve feature transformations for a deterministic, distanced-based classifier. This method, called M4GP, differs from common approaches to classifier representation in GP in that it does not enforce arbitrary decision boundaries and it allows individuals to produce multiple outputs via a stack-based GP system. In comparison to typical methods of classification, M4GP can be advantageous in its ability to produce readable models. We conduct a comprehensive study of M4GP, first in comparison to other GP classifiers, and then in comparison to six common machine learning classifiers. We conduct full hyper-parameter optimization for all of the methods on a suite of 16 biomedical data sets, ranging in size and difficulty. The results indicate that M4GP outperforms other GP methods for classification. M4GP performs competitively with other machine learning methods in terms of the accuracy of the produced models for most problems. M4GP also exhibits the ability to detect epistatic interactions better than the other methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Source code available from http://​github.​com/​lacava/​ellyn.
 
Literature
1.
go back to reference Arnaldo, I., O’Reilly, U.-M., Veeramachaneni, K.: Building predictive models via feature synthesis, pp. 983–990. ACM Press (2015) Arnaldo, I., O’Reilly, U.-M., Veeramachaneni, K.: Building predictive models via feature synthesis, pp. 983–990. ACM Press (2015)
2.
go back to reference Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006) Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006)
3.
go back to reference Choi, W.-J.: Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images. Inf. Sci. 212, 57–78 (2012)CrossRef Choi, W.-J.: Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images. Inf. Sci. 212, 57–78 (2012)CrossRef
4.
go back to reference dos Santos, J.A., Ferreira, C.D.: A relevance feedback method based on genetic programming for classification of remote sensing images. Inf. Sci. 181(13), 2671–2684 (2011)CrossRef dos Santos, J.A., Ferreira, C.D.: A relevance feedback method based on genetic programming for classification of remote sensing images. Inf. Sci. 181(13), 2671–2684 (2011)CrossRef
5.
go back to reference Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Appl. Rev. 40(2), 121–144 (2010) Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Appl. Rev. 40(2), 121–144 (2010)
6.
7.
go back to reference Guyon, I.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
8.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef
9.
go back to reference Helmuth, T., Spector, L., Matheson, J.: Solving uncompromising problems with lexicase selection. IEEE Trans. Evol. Comput. PP(99), 1 (2014) Helmuth, T., Spector, L., Matheson, J.: Solving uncompromising problems with lexicase selection. IEEE Trans. Evol. Comput. PP(99), 1 (2014)
10.
go back to reference Icke, I., Bongard, J.C.: Improving genetic programming based symbolic regression using deterministic machine learning. In: 2013 IEEE Congress on Evolutionary Computation (CEC), pp. 1763–1770. IEEE (2013) Icke, I., Bongard, J.C.: Improving genetic programming based symbolic regression using deterministic machine learning. In: 2013 IEEE Congress on Evolutionary Computation (CEC), pp. 1763–1770. IEEE (2013)
11.
go back to reference Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 48–60. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44303-3_5 Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 48–60. Springer, Heidelberg (2014). doi:10.​1007/​978-3-662-44303-3_​5
12.
go back to reference Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRef
13.
go back to reference Kishore, J.K.: Application of genetic programming for multicategory pattern classification. IEEE Trans. Evol. Comput. 4(3), 242–258 (2000)CrossRef Kishore, J.K.: Application of genetic programming for multicategory pattern classification. IEEE Trans. Evol. Comput. 4(3), 242–258 (2000)CrossRef
14.
go back to reference Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)MATH Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)MATH
15.
go back to reference Cava, L.: Inference of compact nonlinear dynamic models by epigenetic local search. Eng. Appl. Artif. Intell. 55, 292–306 (2016)CrossRef Cava, L.: Inference of compact nonlinear dynamic models by epigenetic local search. Eng. Appl. Artif. Intell. 55, 292–306 (2016)CrossRef
16.
go back to reference La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 741–748. ACM, New York (2016) La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 741–748. ACM, New York (2016)
17.
go back to reference Li, T.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)CrossRef Li, T.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)CrossRef
18.
go back to reference Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2013) Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2013)
19.
go back to reference Liu, H.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRef Liu, H.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRef
20.
go back to reference Liu, L.: Evolutionary compact embedding for large-scale image classification. Inf. Sci. 316, 567–581 (2015)CrossRef Liu, L.: Evolutionary compact embedding for large-scale image classification. Inf. Sci. 316, 567–581 (2015)CrossRef
21.
go back to reference Loveard, T., Ciesielski, V.: Representing classification problems in genetic programming. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2, pp. 1070–1077. IEEE (2001) Loveard, T., Ciesielski, V.: Representing classification problems in genetic programming. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2, pp. 1070–1077. IEEE (2001)
22.
go back to reference McConaghy, T.: FFX fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 235–260. Springer, Heidelberg (2011)CrossRef McConaghy, T.: FFX fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 235–260. Springer, Heidelberg (2011)CrossRef
23.
go back to reference Melin, P.: A new neural network model based on the LVQ algorithm for multi-class classification of arrhythmias. Inf. Sci. 279, 483–497 (2014)MathSciNetCrossRef Melin, P.: A new neural network model based on the LVQ algorithm for multi-class classification of arrhythmias. Inf. Sci. 279, 483–497 (2014)MathSciNetCrossRef
24.
go back to reference Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56(1–3), 73–82 (2003)CrossRef Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56(1–3), 73–82 (2003)CrossRef
25.
go back to reference Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4), 445–455 (2010)CrossRef Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4), 445–455 (2010)CrossRef
26.
go back to reference Moore, J.H., Greene, C.S., Hill, D.P.: Identification of novel genetic models of glaucoma using the emergent genetic programming-based artificial intelligence system. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, pp. 17–35. Springer, Heidelberg (2015)CrossRef Moore, J.H., Greene, C.S., Hill, D.P.: Identification of novel genetic models of glaucoma using the emergent genetic programming-based artificial intelligence system. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, pp. 17–35. Springer, Heidelberg (2015)CrossRef
27.
go back to reference Muñoz, L., Silva, S., Trujillo, L.: M3GP Multiclass Classification with GP. In: Genetic Programming, pp. 78–91. Springer, Heidelberg (2015) Muñoz, L., Silva, S., Trujillo, L.: M3GP Multiclass Classification with GP. In: Genetic Programming, pp. 78–91. Springer, Heidelberg (2015)
28.
go back to reference Murphy, K.P.: Machine learning: a probabilistic perspective. a probabilistic perspective. Adaptive computation. MIT Press, Cambridge (2012)MATH Murphy, K.P.: Machine learning: a probabilistic perspective. a probabilistic perspective. Adaptive computation. MIT Press, Cambridge (2012)MATH
29.
go back to reference Nguyen, T.: Hidden Markov models for cancer classification using gene expression profiles. Inf. Sci. 316, 293–307 (2015)CrossRef Nguyen, T.: Hidden Markov models for cancer classification using gene expression profiles. Inf. Sci. 316, 293–307 (2015)CrossRef
30.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetMATH
31.
go back to reference Perkis, T.: Stack-based genetic programming. In: Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, pp. 148–153. IEEE (1994) Perkis, T.: Stack-based genetic programming. In: Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, pp. 148–153. IEEE (1994)
33.
go back to reference Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014) Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)
34.
go back to reference Silva, S., Muñoz, L., Trujillo, L., Ingalalli, V., Castelli, M., Vanneschi, L.: Multiclass classificatin through multidimensional clustering. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, vol. 13. Springer, Ann Arbor (2015) Silva, S., Muñoz, L., Trujillo, L., Ingalalli, V., Castelli, M., Vanneschi, L.: Multiclass classificatin through multidimensional clustering. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, vol. 13. Springer, Ann Arbor (2015)
35.
go back to reference Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408 (2012) Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408 (2012)
36.
go back to reference Tibshirani, R.: Diagnosis of multiple cancer types by Shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)CrossRef Tibshirani, R.: Diagnosis of multiple cancer types by Shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)CrossRef
37.
go back to reference Urbanowicz, R.J., Kiralis, J., Sinnott-Armstrong, N.A., Heberling, T., Fisher, J.M., Moore, J.H.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 5(1), 1 (2012)CrossRef Urbanowicz, R.J., Kiralis, J., Sinnott-Armstrong, N.A., Heberling, T., Fisher, J.M., Moore, J.H.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 5(1), 1 (2012)CrossRef
38.
go back to reference USGS. U.S. geological survey (USGS) earth resources observation systems (EROS) data center (EDC) USGS. U.S. geological survey (USGS) earth resources observation systems (EROS) data center (EDC)
39.
go back to reference Vanneschi, L.: Classification of oncologic data with genetic programming. J. Artif. Evol. Appl. 1–13, 1–13 (2009) Vanneschi, L.: Classification of oncologic data with genetic programming. J. Artif. Evol. Appl. 1–13, 1–13 (2009)
40.
go back to reference Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)CrossRef Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)CrossRef
Metadata
Title
Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification
Authors
William La Cava
Sara Silva
Leonardo Vanneschi
Lee Spector
Jason Moore
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-55849-3_11

Premium Partner