Skip to main content
Top

2016 | OriginalPaper | Chapter

Using Genetic Programming for Data Science: Lessons Learned

Authors : Steven Gustafson, Ram Narasimhan, Ravi Palla, Aisha Yousuf

Published in: Genetic Programming Theory and Practice XIII

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter we present a case study to demonstrate how the current state-of-the-art Genetic Programming (GP) fairs as a tool for the emerging field of Data Science. Data Science refers to the practice of extracting knowledge from data, often Big Data, to glean insights useful for predicting business, political or societal outcomes. Data Science tools are important to the practice as they allow Data Scientists to be productive and accurate. GP has many features that make it amenable as a tool for Data Science, but GP is not widely considered as a Data Science method as of yet. Thus, we performed a real-world comparison of GP with a popular Data Science method to understand its strengths and weaknesses. GP proved to find equally strong solutions, leveraged the new Big Data infrastructure, and was able to provide several benefits like direct feature importance and solution confidence. GP lacked the ability to quickly build and test models, required much more intensive computing power, and, due to its lack of commercial maturity, created some challenges for productization as well as integration with data management and visualization capabilities. The lessons learned leads to several recommendations that provide a path for future research to focus on key areas to improve GP as a Data Science tool.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Arnaldo I, Veeramachaneni K, O’Reilly UM (2014) Flash: A GP-GPU ensemble learning system for handling large datasets. In: Nicolau M, et al. (eds.) 17th European conference on genetic programming. LNCS, vol. 8599. Springer, Granada, pp 13–24 Arnaldo I, Veeramachaneni K, O’Reilly UM (2014) Flash: A GP-GPU ensemble learning system for handling large datasets. In: Nicolau M, et al. (eds.) 17th European conference on genetic programming. LNCS, vol. 8599. Springer, Granada, pp 13–24
go back to reference Castillo F, Kordon A, Sweeney J, Zirk W (2004) Using genetic programming in industrial statistical model building. In: O’Reilly UM, Yu T, Riolo RL, Worzel B (eds.) Genetic programming theory and practice II, Chap. 3 Springer, Ann Arbor, pp 31–48 Castillo F, Kordon A, Sweeney J, Zirk W (2004) Using genetic programming in industrial statistical model building. In: O’Reilly UM, Yu T, Riolo RL, Worzel B (eds.) Genetic programming theory and practice II, Chap. 3 Springer, Ann Arbor, pp 31–48
go back to reference De Rainville FM, Fortin FA, Gardner MA, Parizeau M, Gagne C (2012) DEAP: a python framework for evolutionary algorithms. In: Wagner S, Affenzeller M (eds.) GECCO 2012 evolutionary computation software systems (EvoSoft). ACM, Philadelphia, PA, pp 85–92 De Rainville FM, Fortin FA, Gardner MA, Parizeau M, Gagne C (2012) DEAP: a python framework for evolutionary algorithms. In: Wagner S, Affenzeller M (eds.) GECCO 2012 evolutionary computation software systems (EvoSoft). ACM, Philadelphia, PA, pp 85–92
go back to reference Dubcakova R (2011) Eureqa: software review. Genet. Program. Evolvable Mach. 12(2):173–178CrossRef Dubcakova R (2011) Eureqa: software review. Genet. Program. Evolvable Mach. 12(2):173–178CrossRef
go back to reference Fazenda P, McDermott J, O’Reilly UM (2012) A library to run evolutionary algorithms in the cloud using MapReduce. In: Di Chio C, et al. (eds.) Applications of evolutionary computing, EvoApplications 2012, LNCS, vol. 7248. Springer, Malaga, pp 416–425 Fazenda P, McDermott J, O’Reilly UM (2012) A library to run evolutionary algorithms in the cloud using MapReduce. In: Di Chio C, et al. (eds.) Applications of evolutionary computing, EvoApplications 2012, LNCS, vol. 7248. Springer, Malaga, pp 416–425
go back to reference Icke I, Bongard J (2013) Improving genetic programming based symbolic regression using deterministic machine learning. In: de la Fraga LG (ed.) 2013 IEEE conference on evolutionary computation, Cancun, vol. 1, pp 1763–1770 Icke I, Bongard J (2013) Improving genetic programming based symbolic regression using deterministic machine learning. In: de la Fraga LG (ed.) 2013 IEEE conference on evolutionary computation, Cancun, vol. 1, pp 1763–1770
go back to reference Jones E, Oliphant E, Peterson P, et al. (2001) Scipy: open source scientific tools for python. http://wwwscipyorg Jones E, Oliphant E, Peterson P, et al. (2001) Scipy: open source scientific tools for python. http://​wwwscipyorg
go back to reference Kordon AK, Smits GF (2001) Soft sensor development using genetic programming. In: Spector L, et al. (eds.) Proceedings of the genetic and evolutionary computation conference (GECCO-2001). Morgan Kaufmann, San Francisco, CA, pp 1346–1351 Kordon AK, Smits GF (2001) Soft sensor development using genetic programming. In: Spector L, et al. (eds.) Proceedings of the genetic and evolutionary computation conference (GECCO-2001). Morgan Kaufmann, San Francisco, CA, pp 1346–1351
go back to reference Koza JR (1992) The genetic programming paradigm: Genetically breeding populations of computer programs to solve problems. In: Soucek B, the IRIS Group (eds.) Dynamic, genetic, and chaotic programming. Wiley, New York, pp 203–321 Koza JR (1992) The genetic programming paradigm: Genetically breeding populations of computer programs to solve problems. In: Soucek B, the IRIS Group (eds.) Dynamic, genetic, and chaotic programming. Wiley, New York, pp 203–321
go back to reference O’Neill M, Vanneschi L, Gustafson S, Banzhaf W (2010) Open issues in genetic programming. Genet Program Evolvable Mach 11(3/4):339–363 (tenth Anniversary Issue: Progress in Genetic Programming and Evolvable Machines) O’Neill M, Vanneschi L, Gustafson S, Banzhaf W (2010) Open issues in genetic programming. Genet Program Evolvable Mach 11(3/4):339–363 (tenth Anniversary Issue: Progress in Genetic Programming and Evolvable Machines)
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetMATH
go back to reference Smits GF, Vladislavleva E, Kotanchek ME (2010) Scalable symbolic regression by continuous evolution with very small populations. In: Riolo R, McConaghy T, Vladislavleva E (eds.) Genetic programming theory and practice VIII. Genetic and evolutionary computation, Chap. 9, vol. 8. Springer, Ann Arbor, pp 147–160CrossRef Smits GF, Vladislavleva E, Kotanchek ME (2010) Scalable symbolic regression by continuous evolution with very small populations. In: Riolo R, McConaghy T, Vladislavleva E (eds.) Genetic programming theory and practice VIII. Genetic and evolutionary computation, Chap. 9, vol. 8. Springer, Ann Arbor, pp 147–160CrossRef
go back to reference van der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30CrossRef van der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30CrossRef
go back to reference van Harmelen F, Hendler JA, Hitzler P, Janowicz K (2015) Semantics for big data. AI Magazine 36(1):3–4 van Harmelen F, Hendler JA, Hitzler P, Janowicz K (2015) Semantics for big data. AI Magazine 36(1):3–4
go back to reference Veeramachaneni K, Arnaldo I, Derby O, O’Reilly UM (2015) FlexGP: Cloud-based ensemble learning with genetic programming for large regression problems. J Grid Comput 13(3):391–407CrossRef Veeramachaneni K, Arnaldo I, Derby O, O’Reilly UM (2015) FlexGP: Cloud-based ensemble learning with genetic programming for large regression problems. J Grid Comput 13(3):391–407CrossRef
go back to reference Wagner S, Kronberger G (2011) Algorithm and experiment design with heuristiclab: an open source optimization environment for research and education. In: Whitley D (ed.) GECCO 2011 tutorials. ACM, Dublin, pp 1411–1438 Wagner S, Kronberger G (2011) Algorithm and experiment design with heuristiclab: an open source optimization environment for research and education. In: Whitley D (ed.) GECCO 2011 tutorials. ACM, Dublin, pp 1411–1438
Metadata
Title
Using Genetic Programming for Data Science: Lessons Learned
Authors
Steven Gustafson
Ram Narasimhan
Ravi Palla
Aisha Yousuf
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-34223-8_7

Premium Partner