Skip to main content

2015 | OriginalPaper | Buchkapitel

7. Extremely Accurate Symbolic Regression for Large Feature Problems

verfasst von : Michael F. Korns

Erschienen in: Genetic Programming Theory and Practice XII

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As symbolic regression (SR) has advanced into the early stages of commercial exploitation, the poor accuracy of SR, still plaguing even the most advanced commercial packages, has become an issue for early adopters. Users expect to have the correct formula returned, especially in cases with zero noise and only one basis function with minimally complex grammar depth.
At a minimum, users expect the response surface of the SR tool to be easily understood, so that the user can know apriori on what classes of problems to expect excellent, average, or poor accuracy. Poor or unknown accuracy is a hinderence to greater academic and industrial acceptance of SR tools.
In a previous paper, we published a complex algorithm for modern symbolic regression which is extremely accurate for a large class of Symbolic Regression problems. The class of problems, on which SR is extremely accurate, was described in detail. This algorithm was extremely accurate, on a single processor, for up to 25 features (columns); and, a cloud configuration was used to extend the extreme accuracy up to as many as 100 features.
While the previous algorithm’s extreme accuracy for deep problems with a small number of features (25–100) was an impressive advance, there are many very important academic and industrial SR problems requiring from 100 to 1000 features.
In this chapter we extend the previous algorithm such that high accuracy is achieved on a wide range of problems, from 25 to 3000 features, using only a single processor. The class of problems, on which the enhanced algorithm is highly accurate, is described in detail. A definition of extreme accuracy is provided, and an informal argument of highly SR accuracy is outlined in this chapter.
The new enhanced algorithm is tested on a set of representative problems. The enhanced algorithm is shown to be robust, performing well even in the face of testing data containing up to 3000 features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Testing a single regression champion is not cheap. At a minimum testing a single regression champion requires as many evaluations as there are training examples as well as performing a simple regression. At a maximum testing a single regression champion may require performing a much more expensive multiple regression.
 
2
As a reminder, testing a single regression champion is not cheap. At a minimum testing a single regression champion requires as many evaluations as there are training examples as well as performing a simple regression. At a maximum testing a single regression champion may require performing a much more expensive multiple regression.
 
Literatur
Zurück zum Zitat Draper NR, Smith H (1981) Applied regression analysis, 2nd edn. Wiley, New YorkMATH Draper NR, Smith H (1981) Applied regression analysis, 2nd edn. Wiley, New YorkMATH
Zurück zum Zitat Hoerl A (1962) Application of ridge analysis to regression problems. Chem Eng Prog 58:54–59 Hoerl A (1962) Application of ridge analysis to regression problems. Chem Eng Prog 58:54–59
Zurück zum Zitat Korns MF (2011) Accuracy in symbolic regression. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic programming theory and practice IX, genetic and evolutionary computation. Springer, Ann Arbor, chap 8, pp 129–151. doi:10.1007/978-1-4614-1770-5-8 Korns MF (2011) Accuracy in symbolic regression. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic programming theory and practice IX, genetic and evolutionary computation. Springer, Ann Arbor, chap 8, pp 129–151. doi:10.1007/978-1-4614-1770-5-8
Zurück zum Zitat Korns MF (2012) A baseline symbolic regression algorithm. In: Riolo R, Vladislavleva E, Ritchie MD, Moore JH (eds) Genetic programming theory and practice X, genetic and evolutionary computation. Springer, Ann Arbor, chap 9, pp 117–137.doi:10.1007/978-1-4614-6846-2-9, URL http://dx.doi.org/10.1007/978-1-4614-6846-2-9 Korns MF (2012) A baseline symbolic regression algorithm. In: Riolo R, Vladislavleva E, Ritchie MD, Moore JH (eds) Genetic programming theory and practice X, genetic and evolutionary computation. Springer, Ann Arbor, chap 9, pp 117–137.doi:10.1007/978-1-4614-6846-2-9, URL http://​dx.​doi.​org/​10.​1007/​978-1-4614-6846-2-9
Zurück zum Zitat Korns MF (2013) Extreme accuracy in symbolic regression. In: Riolo R, Moore JH, Kotanchek M (eds) Genetic programming theory and practice XI, genetic and evolutionary computation. Springer, Ann Arbor, chap 1, pp 1–30. doi:10.1007/978-1-4939-0375-7-1 Korns MF (2013) Extreme accuracy in symbolic regression. In: Riolo R, Moore JH, Kotanchek M (eds) Genetic programming theory and practice XI, genetic and evolutionary computation. Springer, Ann Arbor, chap 1, pp 1–30. doi:10.1007/978-1-4939-0375-7-1
Zurück zum Zitat Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice V, genetic and evolutionary computation. Springer, Ann Arbor, chap 12, pp 201–220. doi:10.1007/978-0-387-76308-8-12 Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice V, genetic and evolutionary computation. Springer, Ann Arbor, chap 12, pp 201–220. doi:10.1007/978-0-387-76308-8-12
Zurück zum Zitat McConaghy T (2011) FFX: fast, scalable, deterministic symbolic regression technology. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic programming theory and practice IX, genetic and evolutionary computation. Springer, Ann Arbor, chap 13, pp 235–260. doi:10.1007/978-1-4614-1770-5-13, http://trent.st/content/2011-GPTP-FFX-paper.pdf McConaghy T (2011) FFX: fast, scalable, deterministic symbolic regression technology. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic programming theory and practice IX, genetic and evolutionary computation. Springer, Ann Arbor, chap 13, pp 235–260. doi:10.1007/978-1-4614-1770-5-13, http://​trent.​st/​content/​2011-GPTP-FFX-paper.​pdf
Zurück zum Zitat Nelder J, Wedderburn R (1972) Generalized linear models. J Royal Stat Soc Ser A 135:370–384 Nelder J, Wedderburn R (1972) Generalized linear models. J Royal Stat Soc Ser A 135:370–384
Zurück zum Zitat Smits G, Kotanchek M (2004) Pareto-front exploitation in symbolic regression. In: O’Reilly UM, Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice II. Springer, Ann Arbor, chap 17, pp 283–299. doi:10.1007/0-387-23254-0-17 Smits G, Kotanchek M (2004) Pareto-front exploitation in symbolic regression. In: O’Reilly UM, Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice II. Springer, Ann Arbor, chap 17, pp 283–299. doi:10.1007/0-387-23254-0-17
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B (Methodological) 58:267–288MATHMathSciNet Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B (Methodological) 58:267–288MATHMathSciNet
Metadaten
Titel
Extremely Accurate Symbolic Regression for Large Feature Problems
verfasst von
Michael F. Korns
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-16030-6_7

Premium Partner