Skip to main content

GPTIPS 2: An Open-Source Software Platform for Symbolic Data Mining

  • Chapter
Handbook of Genetic Programming Applications

Abstract

GPTIPS is a free, open source MATLAB based software platform for symbolic data mining (SDM). It uses a multigene variant of the biologically inspired machine learning method of genetic programming (MGGP) as the engine that drives the automatic model discovery process. Symbolic data mining is the process of extracting hidden, meaningful relationships from data in the form of symbolic equations. In contrast to other data-mining methods, the structural transparency of the generated predictive equations can give new insights into the physical systems or processes that generated the data. Furthermore, this transparency makes the models very easy to deploy outside of MATLAB.

The rationale behind GPTIPS is to reduce the technical barriers to using, understanding, visualising and deploying GP based symbolic models of data, whilst at the same time remaining highly customisable and delivering robust numerical performance for power users. In this chapter, notable new features of the latest version of the software—GPTIPS 2—are discussed with these aims in mind. Additionally, a simplified variant of the MGGP high level gene crossover mechanism is proposed.

It is demonstrated that the new functionality of GPTIPS 2 (a) facilitates the discovery of compact symbolic relationships from data using multiple approaches, e.g. using novel gene-centric visualisation analysis to mitigate horizontal bloat and reduce complexity in multigene symbolic regression models (b) provides numerous methods for visualising the properties of symbolic models (c) emphasises the generation of graphically navigable libraries of models that are optimal in terms of the Pareto trade off surface of model performance and complexity and (d) expedites real world applications by the simple, rapid and robust deployment of symbolic models outside the software environment they were developed in.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A list of research literature using GPTIPS is maintained at https://sites.google.com/site/gptips4matlab/application-areas.

  2. 2.

    Currently, the Pareto tournament implementation does not support more than two objectives.

  3. 3.

    Although RMSE is the default fitness measure, this can be easily changed to, for example, MSE by a very minor edit to the file containing the default fitness function.

References

  • Koza J.R. (1992) Genetic programming: on the programming of computers by means of natural selection, The MIT Press, Cambridge (MA).

    MATH  Google Scholar 

  • Espejo, P.G., Ventura, S., Herrera, F. (2010) A survey on the application of genetic programming to classification, IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews, 40 (2), 121–144.

    Article  Google Scholar 

  • Morrison, G., Searson, D., Willis, M. (2010) Using genetic programming to evolve a team of data classifiers. World Academy of Science, Engineering and Technology, International Science Index 48, 4(12), 210–213.

    Google Scholar 

  • Pan, I., Das, S. (2014) When Darwin meets Lorenz: Evolving new chaotic attractors through genetic programming. arXiv preprint arXiv:1409.7842.

    Google Scholar 

  • Gandomi, A.H., Alavi, A.H. (2011) A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems, Neural Comput & Applic, 21(1), 171–187.

    Google Scholar 

  • Smits, G.F., Kotanchek, M. (2004) Pareto-front exploitation in symbolic regression, Genetic Programming Theory and Practice II, 283–299.

    Google Scholar 

  • Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R. (2007). Genetic programming: An introductory tutorial and a survey of techniques and applications. University of Essex, UK, Tech. Rep. CES-475.

    Google Scholar 

  • Pan, I., Pandey, D.S., Das, S. (2013) Global solar irradiation prediction using a multi-gene genetic programming approach. Journal of Renewable and Sustainable Energy, 5(6), 063129.

    Article  Google Scholar 

  • Barati, R., Neyshabouri, S.A.A.S., Ahmadi, G. (2014) Development of empirical models with high accuracy for estimation of drag coefficient of flow around a smooth sphere: An evolutionary approach. Powder Technology, 257, 11–19.

    Article  Google Scholar 

  • Floares, A.G., Luludachi, I. (2014) Inferring transcription networks from data. Springer Handbook of Bio-/Neuroinformatics, Springer Berlin Heidelberg, 311–326.

    Google Scholar 

  • Gandomi, A.H., Alavi, A.H. (2012) A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems. Neural Computing and Applications, 21(1), 171–187.

    Article  Google Scholar 

  • Searson, D.P. (2002) Non-linear PLS using genetic programming, PhD thesis, Newcastle University, UK.

    Google Scholar 

  • Searson D.P., Willis M.J., Montague, G.A. (2007) Co-evolution of non-linear PLS model components, Journal of Chemometrics, 21 (12), 592–603.

    Article  Google Scholar 

  • Searson, D.P., Leahy, D.E., Willis, M.J. (2010) GPTIPS: an open source genetic programming toolbox for multigene symbolic regression, Proceedings of the International MultiConference of Engineers and Computer Scientists 2010 (IMECS 2010), Hong Kong, 17–19 March.

    Google Scholar 

  • Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on, 6(2), 182–197.

    Article  Google Scholar 

  • Bi, J., Bennett, K.P. (2003) Regression error characteristic curves, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 43–50.

    Google Scholar 

  • Keijzer, M. (2004) Scaled symbolic regression, Genetic Programming and Evolvable Machines, 5, 259–269.

    Article  Google Scholar 

  • Storn, R., Price, K. (1997) Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4), 341–359.

    Article  MATH  MathSciNet  Google Scholar 

  • Luke, S., Panait, L. (2006) A comparison of bloat control methods for genetic programming, Evol. Comput., 14(3), 309–344.

    Article  Google Scholar 

  • Hoerl, A. E., Kennard, R.W. (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

    Article  MATH  MathSciNet  Google Scholar 

  • Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominic P. Searson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Searson, D.P. (2015). GPTIPS 2: An Open-Source Software Platform for Symbolic Data Mining. In: Gandomi, A., Alavi, A., Ryan, C. (eds) Handbook of Genetic Programming Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-20883-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20883-1_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20882-4

  • Online ISBN: 978-3-319-20883-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics