Skip to main content
Top

2018 | Book

Genetic Programming Theory and Practice XIV

insite
SEARCH

About this book

These contributions, written by the foremost international researchers and practitioners of Genetic Programming (GP), explore the synergy between theoretical and empirical results on real-world problems, producing a comprehensive view of the state of the art in GP. Chapters in this volume include:

Similarity-based Analysis of Population Dynamics in GP Performing Symbolic Regression

Hybrid Structural and Behavioral Diversity Methods in GP

Multi-Population Competitive Coevolution for Anticipation of Tax Evasion

Evolving Artificial General Intelligence for Video Game Controllers

A Detailed Analysis of a PushGP Run

Linear Genomes for Structured Programs

Neutrality, Robustness, and Evolvability in GP

Local Search in GP

PRETSL: Distributed Probabilistic Rule Evolution for Time-Series Classification

Relational Structure in Program Synthesis Problems with Analogical Reasoning

An Evolutionary Algorithm for Big Data Multi-Class Classification Problems

A Generic Framework for Building Dispersion Operators in the Semantic Space

Assisting Asset Model Development with Evolutionary Augmentation

Building Blocks of Machine Learning Pipelines for Initialization of a Data Science Automation Tool

Readers will discover large-scale, real-world applications of GP to a variety of problem domains via in-depth presentations of the latest and most significant results.

Table of Contents

Frontmatter
Chapter 1. Similarity-Based Analysis of Population Dynamics in Genetic Programming Performing Symbolic Regression
Abstract
Population diversity plays an important role in the evolutionary dynamics of genetic programming (GP). In this paper we use structural and semantic similarity measures to investigate the evolution of diversity in three GP algorithmic flavors: standard GP, offspring selection GP (OS-GP), and age-layered population structure GP (ALPS-GP). Empirical measurements on two symbolic regression benchmark problems reveal important differences between the dynamics of the tested configurations. In standard GP, after an initial decrease, population diversity remains almost constant until the end of the run. The higher variance of the phenotypic similarity values suggests that small changes on individual genotypes have significant effects on their corresponding phenotypes. By contrast, strict offspring selection within the OS-GP algorithm causes a significantly more pronounced diversity loss at both genotypic and, in particular, phenotypic levels. The pressure for adaptive change increases phenotypic robustness in the face of genotypic perturbations, leading to less genotypic variability on the one hand, and very low phenotypic diversity on the other hand. Finally, the evolution of similarities in ALPS-GP follows a periodic pattern marked by the time interval when the bottom layer is reinitialized with new individuals. This pattern is easily noticed in the lower layers characterized by shorter migration intervals, and becomes less and less noticeable on the upper layers.
Stephan M. Winkler, Michael Affenzeller, Bogdan Burlacu, Gabriel Kronberger, Michael Kommenda, Philipp Fleck
Chapter 2. An Investigation of Hybrid Structural and Behavioral Diversity Methods in Genetic Programming
Abstract
Premature convergence is a serious problem that plagues genetic programming, stifling its search performance. Several genetic diversity maintenance techniques have been proposed for combating premature convergence and improving search efficiency in genetic programming. Recent research has shown that while genetic diversity is important, focusing directly on sustaining behavioral diversity may be more beneficial. These two related areas have received a lot of attention, yet they have often been developed independently. We investigated the feasibility of hybrid genetic and behavioral diversity techniques on a suite of problems.
Armand R. Burks, William F. Punch
Chapter 3. Investigating Multi-Population Competitive Coevolution for Anticipation of Tax Evasion
Abstract
We investigate the application of a version of Genetic Programming with grammars, called Grammatical Evolution, and a multi-population competitive coevolutionary algorithm for anticipating tax evasion in the domain of U.S. Partnership tax regulations. A problem in tax auditing is that as soon as one evasion scheme is detected a new, slightly mutated, variant of that scheme appears. Multi-population competitive coevolutionary algorithms are disposed to explore adversarial problems, such as the arms-race between tax evader and auditor. In addition, we use Genetic Programming and grammars to represent and search the transactions of tax evaders and tax audit policies. Grammars are helpful for representing and biasing the search space. The feasibility of the method is studied with an example of adversarial coevolution in tax evasion. We study the dynamics and the solutions of the competing populations in this scenario, and note that we are able to replicate some of the expected behavior.
Erik Hemberg, Jacob Rosen, Una-May O’Reilly
Chapter 4. Evolving Artificial General Intelligence for Video Game Controllers
Abstract
The General Video Game Playing Competition (GVGAI) defines a challenge of creating controllers for general video game playing, a testbed—as it were—for examining the issue of artificial general intelligence. We develop herein a game controller that mimics human learning behavior, focusing on the ability to generalize from experience and diminish learning time as new games present themselves. We use genetic programming to evolve hyper-heuristic-based general players. Our results show the effectiveness of evolution in meeting the generality challenge.
Itay Azaria, Achiya Elyasaf, Moshe Sipper
Chapter 5. A Detailed Analysis of a PushGP Run
Abstract
In evolutionary computation we potentially have the ability to save and analyze every detail in an run. This data is often thrown away, however, in favor of focusing on the final outcomes, typically captured and presented in the form of summary statistics and performance plots. Here we use graph database tools to store every parent–child relationship in a single genetic programming run, and examine the key ancestries in detail, tracing back from an solution to see how it was evolved over the course of 20 generations. To visualize this genetic programming run, the ancestry graph is extracted, running from the solution(s) in the final generation up to their ancestors in the initial random population. The key instructions in the solution are also identified, and a genetic ancestry graph is constructed, a subgraph of the ancestry graph containing only those individuals that contributed genetic information (or instructions) to the solution. These visualizations and our ability to trace these key instructions throughout the run allow us to identify general inheritance patterns and key evolutionary moments in this run.
Nicholas Freitag McPhee, Mitchell D. Finzel, Maggie M. Casale, Thomas Helmuth, Lee Spector
Chapter 6. Linear Genomes for Structured Programs
Abstract
In most genetic programming systems, candidate solution programs themselves serve as genome upon which variation operators act. However, because of the hierarchical structure of computer programs and the syntactic constraints that they must obey, it is difficult to implement variation operators that affect different parts of programs with uniform probability. This lack of uniformity can have detrimental effects on evolutionary search, such as increases in code bloat. In prior work, structured programs were linearized prior to variation in order to facilitate uniform variation. However, this necessitated syntactic repair after variation, which reintroduced non-uniformities. In this chapter we describe a new approach that uses linear genomes that are translated into hierarchical programs for execution. We present the new approach in detail and show how it facilitates both uniform variation and the evolution of programs with meaningful structure.
Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, Saul Shanabrook
Chapter 7. Neutrality, Robustness, and Evolvability in Genetic Programming
Abstract
Redundant mapping from genotype to phenotype is common in evolutionary algorithms, especially in genetic programming (GP). Such a redundancy leads to neutrality, a situation where mutations to a genotype may not alter its phenotypic outcome. The effects of neutrality can be better understood by quantitatively analyzing its two observed properties, robustness and evolvability. In this chapter, we summarize our previous work on this topic in examining a compact Linear GP algorithm. Due to the choice of this particular system we can characterize its entire genotype, phenotype, and fitness networks, and quantitatively measure robustness and evolvability at the genotypic, phenotypic, and fitness levels. We then investigate the relationship between robustness and evolvability at those different levels. Technically, we use an ensemble of random walkers and hill climbers to study how robustness and evolvability are related to the structure of genotypic, phenotypic, and fitness networks and influence the evolutionary search process.
Ting Hu, Wolfgang Banzhaf
Chapter 8. Local Search is Underused in Genetic Programming
Abstract
There are two important limitations of standard tree-based genetic programming (GP). First, GP tends to evolve unnecessarily large programs, what is referred to as bloat. Second, GP uses inefficient search operators that focus on modifying program syntax. The first problem has been studied extensively, with many works proposing bloat control methods. Regarding the second problem, one approach is to use alternative search operators, for instance geometric semantic operators, to improve convergence. In this work, our goal is to experimentally show that both problems can be effectively addressed by incorporating a local search optimizer as an additional search operator. Using real-world problems, we show that this rather simple strategy can improve the convergence and performance of tree-based GP, while also reducing program size. Given these results, a question arises: Why are local search strategies so uncommon in GP? A small survey of popular GP libraries suggests to us that local search is underused in GP systems. We conclude by outlining plausible answers for this question and highlighting future work.
Leonardo Trujillo, Emigdio Z-Flores, Perla S. Juárez-Smith, Pierrick Legrand, Sara Silva, Mauro Castelli, Leonardo Vanneschi, Oliver Schütze, Luis Muñoz
Chapter 9. PRETSL: Distributed Probabilistic Rule Evolution for Time-Series Classification
Abstract
The EC-Star rule-set representation is extended to allow probabilistic classifiers. This allows the distributed age-layered evolution of probabilistic rule sets. The method is tested on 20 UCI data problems, as well as a larger dataset of arterial blood pressure waveforms. Results show consistent improvement in all cases compared to binary classification rule-sets.
Babak Hodjat, Hormoz Shahrzad, Risto Miikkulainen, Lawrence Murray, Chris Holmes
Chapter 10. Discovering Relational Structure in Program Synthesis Problems with Analogical Reasoning
Abstract
Much recent progress in Genetic Programming (GP) can be ascribed to work in semantic GP, which facilitates program induction by considering program behavior on individual fitness cases. It is therefore interesting to consider whether alternative decompositions of fitness cases might also provide useful information. The one we present here is motivated by work in analogical reasoning. So-called proportional analogies (‘gills are to fish as lungs are to mammals’) have a hierarchical relational structure that can be captured using the formalism of Structural Information Theory. We show how proportional analogy problems can be solved with GP and, conversely, how analogical reasoning can be engaged in GP to provide for problem decomposition. The idea is to treat pairs of fitness cases as if they formed a proportional analogy problem, identify relational consistency between them, and use it to inform the search process.
Jerry Swan, Krzysztof Krawiec
Chapter 11. An Evolutionary Algorithm for Big Data Multi-Class Classification Problems
Abstract
As symbolic regression (SR) has advanced into the early stages of commercial exploitation, the poor accuracy of SR still plagues even advanced commercial packages, and has become an issue for industrial users. Users expect a correct formula to be returned, especially in cases with zero noise and only one basis function with minimal complexity. At a minimum, users expect the response surface of the SR tool to be easily understood, so that the user can know a priori on what classes of problems to expect excellent, average, or poor accuracy. Poor or unknown accuracy is a hindrance to greater academic and industrial acceptance of SR tools. In several previous papers, we presented a complex algorithm for modern SR, which is extremely accurate for a large class of SR problems on noiseless data. Further research has shown that these extremely accurate SR algorithms also improve accuracy in noisy circumstances—albeit not extreme accuracy. Armed with these SR successes, we naively thought that achieving extreme accuracy applying GP to symbolic multi-class classification would be an easy goal. However, it seems algorithms having extreme accuracy in SR do not translate directly into symbolic multi-class classification. Furthermore, others have encountered serious issues applying GP to symbolic multi-class classification (Castelli et al. Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC, vol 7835, pp 334–343. Springer, Vienna, 2013). This is the first paper in a planned series developing the necessary algorithms for extreme accuracy in GP applied to symbolic multi-class classification. We develop an evolutionary algorithm for optimizing a single symbolic multi-class classification candidate. It is designed for big-data situations where the computational effort grows linearly as the number of features and training points increase. The algorithm’s behavior is demonstrated on theoretical problems, UCI benchmarks, and industry test cases.
Michael F. Korns
Chapter 12. A Generic Framework for Building Dispersion Operators in the Semantic Space
Abstract
This chapter proposes a generic framework to build geometric dispersion (GD) operators for Geometric Semantic Genetic Programming in the context of symbolic regression, followed by two concrete instantiations of the framework: a multiplicative geometric dispersion operator and an additive geometric dispersion operator. These operators move individuals in the semantic space in order to balance the population around the target output in each dimension, with the objective of expanding the convex hull defined by the population to include the desired output vector. An experimental analysis was conducted in a testbed composed of sixteen datasets showing that dispersion operators can improve GSGP search and that the multiplicative version of the operator is overall better than the additive version.
Luiz Otavio V. B. Oliveira, Fernando E. B. Otero, Gisele L. Pappa
Chapter 13. Assisting Asset Model Development with Evolutionary Augmentation
Abstract
In this chapter, we explore how Genetic Programming can assist and augment the expert-driven process of developing data-driven models. In our use case, modelers must develop hundreds of models that represent individual properties of parts, components, assets, systems and meta-systems like power plants. Each of these models is developed with an objective in mind, like estimating the useful remaining life or detecting anomalies. As such, the modeler uses their expert judgment, as well as available data to select the most appropriate method. In this initial paper, we examine the most basic example of when the experts select a kind of regression modeling approach and develop models from data. We then use that captured domain knowledge from their processes, as well as end models to determine if Genetic Programming can augment, assist and improve their final results. We show that while Genetic Programming can indeed find improved solutions according to an error metric, it is much harder for Genetic Programming to find models that do not increase complexity. Also, we find that one approach in particular shows promise as a way to incorporate domain knowledge.
Steven Gustafson, Arun Subramaniyan, Aisha Yousuf
Chapter 14. Identifying and Harnessing the Building Blocks of Machine Learning Pipelines for Sensible Initialization of a Data Science Automation Tool
Abstract
As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a supervised classification problem. Further, we analyze a large database of pipelines that were previously used to solve various supervised classification problems and identify 100 short series of machine learning operations that appear the most frequently, which we call the building blocks of machine learning pipelines. We harness these building blocks to initialize TPOT with promising solutions, and find that this sensible initialization method significantly improves TPOT’s performance on one benchmark at no cost of significantly degrading performance on the others. Thus, sensible initialization with machine learning pipeline building blocks shows promise for GP-based AutoML systems, and should be further refined in future work.
Randal S. Olson, Jason H. Moore
Backmatter
Metadata
Title
Genetic Programming Theory and Practice XIV
Editors
Rick Riolo
Bill Worzel
Brian Goldman
Bill Tozier
Copyright Year
2018
Electronic ISBN
978-3-319-97088-2
Print ISBN
978-3-319-97087-5
DOI
https://doi.org/10.1007/978-3-319-97088-2

Premium Partner