Skip to main content
Top

2013 | Book

Genetic Programming Theory and Practice X

Editors: Rick Riolo, Ekaterina Vladislavleva, Marylyn D Ritchie, Jason H. Moore

Publisher: Springer New York

Book Series : Genetic and Evolutionary Computation

insite
SEARCH

About this book

These contributions, written by the foremost international researchers and practitioners of Genetic Programming (GP), explore the synergy between theoretical and empirical results on real-world problems, producing a comprehensive view of the state of the art in GP.

Topics in this volume include: evolutionary constraints, relaxation of selection mechanisms, diversity preservation strategies, flexing fitness evaluation, evolution in dynamic environments, multi-objective and multi-modal selection, foundations of evolvability, evolvable and adaptive evolutionary operators, foundation of injecting expert knowledge in evolutionary search, analysis of problem difficulty and required GP algorithm complexity, foundations in running GP on the cloud – communication, cooperation, flexible implementation, and ensemble methods. Additional focal points for GP symbolic regression are: (1) The need to guarantee convergence to solutions in the function discovery mode; (2) Issues on model validation; (3) The need for model analysis workflows for insight generation based on generated GP solutions – model exploration, visualization, variable selection, dimensionality analysis; (4) Issues in combining different types of data.

Readers will discover large-scale, real-world applications of GP to a variety of problem domains via in-depth presentations of the latest and most significant results.

Table of Contents

Frontmatter
Chapter 1. Evolving SQL Queries from Examples with Developmental Genetic Programming
Abstract
Large databases are becoming ever more ubiquitous, as are the opportunities for discovering useful knowledge within them. Evolutionary computation methods such as genetic programming have previously been applied to several aspects of the problem of discovering knowledge in databases. The more specific task of producing human-comprehensible SQL queries has several potential applications but has thus far been explored only to a limited extent. In this chapter we show howdevelopmental genetic programming can automatically generate SQL queries from sets of positive and negative examples. We show that a developmental genetic programming system can produce queries that are reasonably accurate while excelling in human comprehensibility relative to the well-known C5.0 decision tree generation system.
Thomas Helmuth, Lee Spector
Chapter 2. A Practical Platform for On-Line Genetic Programming for Robotics
Abstract
There is growing interest in on-line evolution forautonomous robots. On-line learningis critical to achieve high levels of autonomy in the face of dynamic environments, tasks, and other variable elements encountered in real world environments. Although a number of successes have been achieved with on-line evolution, these successes are largely limited to fairly simple learning paradigms, e.g. training small neural networks of relatively few weights and in simulated environments. The shortage of more complex learning paradigms is largely due to the limitations of affordable robotic platforms, which tend to be woefully underpowered for such applications.
In this paper we introduce a simple robotics platform based on Commodity Off The Shelf (COTS) designprinciples that makes on-line genetic programming for robotics practical and affordable. We compare the relative strengths and weaknesses of a number of different build options. As a proof-of-concept we compare three variations of evolutionary learning models for a color-following problem on a robot based on one of the designs: a simple neural network learning framework of the type typically seen in current research, a more extensive learning model that could not be supported by traditional low-cost research robots, and a simple evolutionary algorithm, but using standard tree-based genetic programming representation, which is also beyond the scope of traditional low-cost research robots. Our results show that the more powerful evolutionary models enabled by more powerful robots significantly improves the on-line evolutionary performance and thus that there are practical benefits to the COTS based
Terence Soule, Robert B. Heckendorn
Chapter 3. Cartesian Genetic Programming for Image Processing
Abstract
Combining domain knowledge about both imaging processing and machine learning techniques can expand the abilities of Genetic Programming when used for image processing. We successfully demonstrate our new approach on several different problem domains. We show that the approach is fast, scalable and robust. In addition, by virtue of using off-the-shelf image processing libraries we can generate human readable programs that incorporate sophisticated domain knowledge.
Simon Harding, Jürgen Leitner, Jürgen Schmidhuber
Chapter 4. A New Mutation Paradigm for Genetic Programming
Abstract
Lévy flights are a class of random walksdirectly inspired by observing animal foraging habits, where a power-law distribution of the stride length can be often observed. This implies that, while the vast majority of the strides will be short, on rare occasions, the strides are gigantic. We propose a mutation mechanism in Linear Genetic Programming inspired by this ethological behavior, thus obtaining a self-adaptive mutation rate. We experimentally test this original approach on three different classes of problems: Boolean regression, quadratic polynomial regression, and surface reconstruction. We find that in all cases, our method outperforms the generic, commonly used constant mutation rate of one over the size of the genotype. Moreover, we compare different common values of the power-law exponent to the another self-adaptive mutation mechanism directly inspired by Simulated Annealing. We conclude that our novel method is a viable alternative to constant and self-adaptive mutation rates, especially because it tends to reduce the number of parameters of genetic programming.
Christian Darabos, Mario Giacobini, Ting Hu, Jason H. Moore
Chapter 5. Introducing an Age-Varying Fitness Estimation Function
Abstract
We present a method for estimating fitness functions that are computationally expensive for an exact evaluation. The proposed estimation method applies a number of partial evaluations based on incomplete information or uncertainties. We show how this method can yield results that are close to similar methods where fitness is measured over the entire dataset, but at a fraction of the speed or memory usage, and in a parallelizable manner. We describe our experience in applying this method to a real world application in the form of evolving equity trading strategies.
Babak Hodjat, Hormoz Shahrzad
Chapter 6. EC-Star: A Massive-Scale, Hub and Spoke, Distributed Genetic Programming System
Abstract
We describe a new Genetic Programming systemnamed EC-Star. It is supported by an open infrastructure, commercial-volunteer-client parallelization framework. The framework enables robust and massive-scale evolution and motivates the hub and spoke network topology of EC-Star’s distributed GP model. In this model an Evolution Coordinator occupies the hub and an Evolutionary Engine occupies each spoke. The Evolution Coordinator uses a layered framework to dispatch high performing, partially evaluated candidate solutions for additional fitness-case exposure, genetic mixing, and evolution to its Evolutionary Engines. It operates asynchronously with each Evolutionary Engine and never blocks waiting for results from an Evolutionary Engine.
Una-May O’Reilly, Mark Wagy, Babak Hodjat
Chapter 7. Genetic Analysis of Prostate Cancer Using Computational Evolution, Pareto-Optimization and Post-processing
Abstract
Given infinite time, humans would progress through modeling complex data in a manner that is dependent on prior expert knowledge. The goal of the present study is make extensions and enhancements to a computational evolution system (CES) that has the ultimate objective of tinkering with data as a human would. This is accomplished by providing flexibility in the model-building process and a meta-layer that learns how to generate better models. The key to the CES system is the ability to identify and exploit expert knowledge from biological databases or prior analytical results. Our prior results have demonstrated that CES is capable of efficiently navigating these large and rugged fitness landscapes toward the discovery of biologically meaningful genetic models of disease. Further, we have shown that the efficacy of CES is improved dramatically when the system is provided with statistical or biological expert knowledge. The goal of the present study was to apply CES to the genetic analysis of prostate cancer aggressiveness in a large sample of European Americans. We introduce here the use of Pareto-optimization to help address overfitting in the learning system. We further introduce a post-processing step that uses hierarchical cluster analysis to generate expert knowledge from the landscape of best models and their predictions across patients. We find that the combination of Pareto-optimization and post-processing of results greatly improves the genetic analysis of prostate cancer.
Jason H. Moore, Douglas P. Hill, Arvis Sulovari, La Creis Kidd
Chapter 8. Meta-Dimensional Analysis of Phenotypes Using the Analysis Tool for Heritable and Environmental Network Associations (ATHENA): Challenges with Building Large Networks
Abstract
The search for the underlying heritability ofcomplex traits has led to an explosion of data generation and analysis in the field of human genomics. With these technological advances, we have made some progress in the identification of genes and proteins associated with common, complex human diseases. Still, our understanding of the genetic architecture of complex traits remains limited and additional research is needed to illuminate the genetic and environmental factors important for the disease process, much of which will include looking at variation in DNA, RNA, protein, etc. in ameta-dimensional analysis framework. We have developed amachine learning technique, ATHENA: Analysis Tool for Heritable and Environmental Network Associations, to address this issue of integrating data from multiple “-omics” technologies to identify models that explain or predict the genetic architecture of complex traits. In this chapter, we discuss the challenges in handling meta-dimensional data usinggrammatical evolution neural networks (GENN) which are one modeling component ofATHENA, and a characterization of the models identified in simulation studies to explore the ability of GENN to build complex, meta-dimensional models. Challenges remain to further understand the evolutionary process for GENN, and an explanation of the simplicity of the models. This work highlights potential areas for extension and improvement of the GENN approach within ATHENA.
Marylyn D. Ritchie, Emily R. Holzinger, Scott M. Dudek, Alex T. Frase, Prabhakar Chalise, Brooke Fridley
Chapter 9. A Baseline Symbolic Regression Algorithm
Abstract
Recent advances in symbolic regression (SR) have promoted the field into the early stages of commercial exploitation. This is the expected maturation history for an academic field which is progressing rapidly. The original published symbolic regression algorithms in (Koza 1994) have long since been replaced by techniques such as pareto front, age layered population structures, and even age pareto front optimization. The lack of specific techniques for optimizing embedded real numbers, in the original algorithms, has been replaced with sophisticated techniques for optimizing embedded constants. Symbolic regression is coming of age as a technology.
As the discipline of Symbolic Regression (SR) has matured, the first commercial SR packages have appeared. There is at least one commercial package on the market for several years http://​www.​rmltech.​com/​. There is now at least one well documented commercial symbolic regression package available for Mathmatica www.​evolved-analytics.​com. There is at least one very well done open source symbolic regression package available for free download http://​ccsl.​mae.​cornell.​edu/​eureqa. Yet, even as the sophistication of commercial SR packages increases, there have been glaring issues with SR accuracy even on simple problems (Korns 2011). The depth and breadth of SR adoption in industry and academia will be greatly affected by the demonstrable accuracy of available SR algorithms and tools.
In this chapter we develop a complete public domain algorithm for modern symbolic regression which is reasonably competitive with current commercial SR packages, and calibrate its accuracy on a set of previously published sample problems. This algorithm is designed as a baseline for further public domain research on SR algorithm simplicity and accuracy. No claim is made placing this baseline algorithm on a par with commercial packages – especially as the commercial offerings can be expected to relentlessly improve in the future. However this baseline is a great improvement over the original published algorithms, and is an attempt to consolidate the latest published research into a simplified baseline algorithm of similar speed and accuracy.
The baseline algorithm presented herein is called Age Weighted Pareto Optimization. It is an amalgamation of recent published techniques in pareto front optimization (Kotanchek et al., 2007), age layered population structures (Hornby 2006), age fitness pareto optimization (Schmidt and Hipson 2010), and specialized embedded abstract constant optimization (Korns 2010). The complete pseudo code for the baseline algorithm is presented in this paper. It is developed step by step as enhancements to the original published SR algorithm (Koza 1992) with justifications for each enhancement. Before-after speed and accuracy comparisons are made for each enhancement on a series of previously published sample problems.
Michael F. Korns
Chapter 10. Symbolic Regression Model Comparison Approach Using Transmitted Variation
Abstract
Model evaluation in symbolic regression generated by GP is of critical importance for successful industrial applications. Typically this model evaluation is achieved by a tradeoff between model complexity and R 2. The chapter introduces a model comparison approach based on the transmission of variation from the inputs to the output. The approach is illustrated with three different data sets from real industrial applications.
Flor A. Castillo, Carlos M. Villa, Arthur K. Kordon
Chapter 11. A Framework for the Empirical Analysis of Genetic Programming System Performance
Abstract
This chapter introduces a framework for statistically sound, reproducible empirical research in Genetic Programming (GP). It provides tools to understand GP algorithms and heuristics and their interaction with problems of varying difficulty. Following an approach where scientific claims are broken down to testable statistical hypotheses and GP runs are treated as experiments, the framework helps to achieve statistically verified results of high reproducibility.
Oliver Flasch, Thomas Bartz-Beielstein
Chapter 12. More or Less? Two Approaches to Evolving Game-Playing Strategies
Abstract
We present two opposing approaches to the evolution of game strategies, one wherein a minimal amount of domain expertise is injected into the process, the other infusing the evolutionary setup with expertise in the form of domain heuristics. We show that the first approach works well for several popular board games, while the second produces top-notch solvers for the hard game of FreeCell.
Amit Benbassat, Achiya Elyasaf, Moshe Sipper
Chapter 13. Symbolic Regression Is Not Enough: It Takes a Village to Raise a Model
Abstract
From a real-world perspective, good enough has been achieved in the core representations and evolutionary strategies of genetic programming assuming state-of-the-art algorithms and implementations are being used. What is needed for industrial symbolic regression are tools to (a) explore and refine the data, (b) explore the developed model space and extract insight and guidance from the available sample of the infinite possibilities of model forms and (c) identify appropriate models for deployment as predictors, emulators, etc. This chapter focuses on the approaches used in DataModeler to address the modeling life cycle. A special focus in this chapter is the identification of driving variables and metavariables. Exploiting the diversity of search paths followed during independent evolutions and, then, looking at the distributions of variables and metavariable usage also provides an opportunity to gather key insights. The goal in this framework, however, is not to replace the modeler but, rather, to augment the inclusion of context and collection of insight by removing mechanistic requirements and facilitating the ability to think. We believe that the net result is higher quality and more robust models.
Mark E. Kotanchek, Ekaterina Vladislavleva, Guido Smits
Chapter 14. FlexGP.py: Prototyping Flexibly-Scaled, Flexibly-Factored Genetic Programming for the Cloud
Abstract
Running genetic programming on the cloud presents researchers with great opportunities and challenges. We argue that standard island algorithms do not have the properties of elasticity and robustness required to run well on the cloud. We present a prototyped design for a decentralized, heterogeneous, robust, self-scaling, self-factoring, self-aggregating genetic programming algorithm. We investigate its properties using a software “sandbox”.
James McDermott, Kalyan Veeramachaneni, Una-May O’Reilly
Chapter 15. Representing Communication and Learning in Femtocell Pilot Power Control Algorithms
Abstract
The overall goal of evolving algorithms for femtocells is to create a continuous on-line evolution of the femtocell pilot power control algorithm to optimize their coverage. Two aspects of intelligence are used for increasing the complexity of the input and the behaviour, communication and learning. In this initial study we investigate how to evolve more complex behaviour in decentralized control algorithms by changing the representation of communication and learning. The communication is addressed by allowing the femtocell to identify its neighbours and take the values of its neighbours into account when making decisions regarding the increase or decrease of pilot power. Learning is considered in two variants: the use of input parameters and the implementation of a built-in reinforcement procedure. The reinforcement allows learning during the simulation in addition to the execution of fixed commands. The experiments compare the new representation in the form of different terminal symbols in a grammar. The results show that there are differences between the communication and learning combinations and that the best solution uses both communication and learning.
Erik Hemberg, Lester Ho, Michael O’Neill, Holger Claussen
Backmatter
Metadata
Title
Genetic Programming Theory and Practice X
Editors
Rick Riolo
Ekaterina Vladislavleva
Marylyn D Ritchie
Jason H. Moore
Copyright Year
2013
Publisher
Springer New York
Electronic ISBN
978-1-4614-6846-2
Print ISBN
978-1-4614-6845-5
DOI
https://doi.org/10.1007/978-1-4614-6846-2

Premium Partner