Skip to main content

2020 | Buch

Genetic Programming Theory and Practice XVII

herausgegeben von: Wolfgang Banzhaf, Prof. Dr. Erik Goodman, Leigh Sheneman, Dr. Leonardo Trujillo, Bill Worzel

Verlag: Springer International Publishing

Buchreihe : Genetic and Evolutionary Computation

insite
SUCHEN

Über dieses Buch

These contributions, written by the foremost international researchers and practitioners of Genetic Programming (GP), explore the synergy between theoretical and empirical results on real-world problems, producing a comprehensive view of the state of the art in GP. In this year’s edition, the topics covered include many of the most important issues and research questions in the field, such as: opportune application domains for GP-based methods, game playing and co-evolutionary search, symbolic regression and efficient learning strategies, encodings and representations for GP, schema theorems, and new selection mechanisms.The volume includes several chapters on best practices and lessons learned from hands-on experience. Readers will discover large-scale, real-world applications of GP to a variety of problem domains via in-depth presentations of the latest and most significant results.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Characterizing the Effects of Random Subsampling on Lexicase Selection
Abstract
Lexicase selection is a proven parent-selection algorithm designed for genetic programming problems, especially for uncompromising test-based problems where many distinct test cases must all be passed. Previous work has shown that random subsampling techniques can improve lexicase selection’s problem-solving success; here, we investigate why. We test two types of random subsampling lexicase variants: down-sampled lexicase, which uses a random subset of all training cases each generation; and cohort lexicase, which collects candidate solutions and training cases into small groups for testing, reshuffling those groups each generation. We show that both of these subsampling lexicase variants improve problem-solving success by facilitating deeper evolutionary searches; that is, they allow populations to evolve for more generations (relative to standard lexicase) given a fixed number of test-case evaluations. We also demonstrate that the subsampled variants require less computational effort to find solutions, even though subsampling hinders lexicase’s ability to preserve specialists. Contrary to our expectations, we did not find any evidence of systematic loss of phenotypic diversity maintenance due to subsampling, though we did find evidence that cohort lexicase is significantly better at preserving phylogenetic diversity than down-sampled lexicase.
Austin J. Ferguson, Jose Guadalupe Hernandez, Daniel Junghans, Alexander Lalejini, Emily Dolson, Charles Ofria
Chapter 2. It Is Time for New Perspectives on How to Fight Bloat in GP
Abstract
The present and future of evolutionary algorithms depends on the proper use of modern parallel and distributed computing infrastructures. Although still sequential approaches dominate the landscape, available multi-core, many-core and distributed systems will make users and researchers to more frequently deploy parallel version of the algorithms. In such a scenario, new possibilities arise regarding the time saved when parallel evaluation of individuals are performed. And this time saving is particularly relevant in Genetic Programming. This paper studies how evaluation time influences not only time to solution in parallel/distributed systems, but may also affect size evolution of individuals in the population, and eventually will reduce the bloat phenomenon GP features. This paper considers time and space as two sides of a single coin when devising a more natural method for fighting bloat. This new perspective allows us to understand that new methods for bloat control can be derived, and the first of such a method is described and tested. Experimental data confirms the strength of the approach: using computing time as a measure of individuals’ complexity allows to control the growth in size of genetic programming individuals.
Francisco Fernández de Vega, Gustavo Olague, Francisco Chávez, Daniel Lanza, Wolfgang Banzhaf, Erik Goodman
Chapter 3. Explorations of the Semantic Learning Machine Neuroevolution Algorithm: Dynamic Training Data Use, Ensemble Construction Methods, and Deep Learning Perspectives
Abstract
The recently proposed Semantic Learning Machine (SLM) neuroevolution algorithm is able to construct Neural Networks (NNs) over unimodal error landscapes in any supervised learning problem where the error is measured as a distance to the known targets. This chapter studies how different methods of dynamically using the training data affect the resulting generalization of the SLM algorithm. Across four real-world binary classification datasets, SLM is shown to outperform the Multi-layer Perceptron, with statistical significance, after parameter tuning is performed in both algorithms. Furthermore, this chapter also studies how different ensemble constructions methods influence the resulting generalization. The results show that the stochastic nature of SLM already confers enough diversity to the ensembles such that Bagging and Boosting cannot improve upon a simple averaging ensemble construction method. Finally, some initial results with SLM and Convolutional NNs are presented and future Deep Learning perspectives are discussed.
Ivo Gonçalves, Marta Seca, Mauro Castelli
Chapter 4. Can Genetic Programming Perform Explainable Machine Learning for Bioinformatics?
Abstract
Although proven powerful in making predictions and finding patterns, machine learning algorithms often struggle to provide explanations and translational knowledge when applied to many problems, especially in biomedical sciences. This is often resulted by the highly complex structure employed by machine learning algorithms to represent and model the relationship of the predictors and the response. The prediction accuracy is increased at the cost of having a “black-box” model that is not amenable for interpretation. Genetic programming may provide a potential solution to explainable machine learning for bioinformatics where learned knowledge and patterns can be translated to clinical actions. In this study, we employed an LGP algorithm for a bioinformatics classification problem. We developed feature selection analysis methods and aimed at explaining which features are influential in the prediction, and whether such an influence is through individual effects or synergistic effects of combining with other features.
Ting Hu
Chapter 5. Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication
Abstract
Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available. Such scenarios often require specific model properties such as interpretability, robustness, trustworthiness and plausibility, that are not easily achievable using standard approaches like genetic programming for symbolic regression. In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues. The algorithm uses a context-free grammar to produce models that are parameterized by a non-linear least squares local optimization procedure. A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions. Enumeration order is established via heuristics designed to improve search efficiency. Empirical tests on a comprehensive benchmark suite show that our approach is competitive with genetic programming in many noiseless problems while maintaining desirable properties such as simple, reliable models and reproducibility.
Lukas Kammerer, Gabriel Kronberger, Bogdan Burlacu, Stephan M. Winkler, Michael Kommenda, Michael Affenzeller
Chapter 6. Temporal Memory Sharing in Visual Reinforcement Learning
Abstract
Video games provide a well-defined study ground for the development of behavioural agents that learn through trial-and-error interaction with their environment, or reinforcement learning (RL). They cover a diverse range of environments that are designed to be challenging for humans, all through a high-dimensional visual interface. Tangled Program Graphs (TPG) is a recently proposed genetic programming algorithm that emphasizes emergent modularity (i.e. automatic construction of multi-agent organisms) in order to build successful RL agents more efficiently than state-of-the-art solutions from other sub-fields of artificial intelligence, e.g. deep neural networks. However, TPG organisms represent a direct mapping from input to output with no mechanism to integrate past experience (previous inputs). This is a limitation in environments with partial observability. For example, TPG performed poorly in video games that explicitly require the player to predict the trajectory of a moving object. In order to make these calculations, players must identify, store, and reuse important parts of past experience. In this work, we describe an approach to supporting this type of short-term temporal memory in TPG, and show that shared memory among subsets of agents within the same organism seems particularly important. In addition, we introduce heterogeneous TPG organisms composed of agents with distinct types of representation that collaborate through shared memory. In this study, heterogeneous organisms provide a parsimonious approach to supporting agents with task-specific functionality, image processing capabilities in the case of this work. Taken together, these extensions allow TPG to discover high-scoring behaviours for the Atari game Breakout, which is an environment it failed to make significant progress on previously.
Stephen Kelly, Wolfgang Banzhaf
Chapter 7. The Evolution of Representations in Genetic Programming Trees
Abstract
Artificially intelligent machines have to explore their environment, store information about it, and use this information to improve future decision making. As such, the quest is to either provide these systems with internal models about their environment or to imbue machines with the ability to create their own models—ideally the later. These models are mental representations of the environment, and we have previously shown that neuroevolution is a powerful method to create artificially intelligent machines (also referred to as agents) that can form said representations. Furthermore, we have shown that one can quantify representations and use that quantity to augment the performance of a genetic algorithm. Instead of just optimizing for performance, one can also positively select for agents that have better representations. The neuroevolutionary approach, that improves performance and lets these agents develop representations, works well for Markov Brains, which are a form of Cartesian Genetic Programming network. Conventional artificial neural networks and their recurrent counterparts, RNNs and LSTMs, are however primarily trained by backpropagation and not evolved, and they behave differently with respect to their ability to form representations. When evolved, RNNs and LSTMs do not form sparse and distinct representations, they “smear” the information about individual concepts of the environment over all nodes in the system. This ultimately makes these systems more brittle and less capable. The question we seek to address, now, is how can we create systems that evolve to have meaningful representations while preventing them from smearing these representations? We look at genetic programming trees as an interesting computational paradigm, as they can take a lot of information in through their various leaves, but at the same time condense that computation into a single node in the end. We hypothesize that this computational condensation could also prevent the smearing of information. Here, we explore how these tree structures evolve and form representations, and we test to what degree these systems either “smear” or condense information.
Douglas Kirkpatrick, Arend Hintze
Chapter 8. How Competitive Is Genetic Programming in Business Data Science Applications?
Abstract
The paper evaluates GP’s competitiveness in business data science-driven applications and suggests the necessary steps to increase its reach, impact and competitiveness. First, the key business needs for Data Science are identified and discussed, followed by an analysis of the competitive landscape and popularity of Data Science methods. The competitive advantages and weaknesses of GP as well its impressive application record are reviewed. Two business applications with high value creation—inferential sensors and nonlinear business forecasting—are identified and described. The recommended action items to increase competitive presence of GP in Data Science business applications include: develop a successful marketing strategy toward statistical, machine/deep learning, and business communities; broaden application areas; improve professional development tools; and increase GP visibility and teaching in Data Science classes.
Arthur Kordon, Theresa Kotanchek, Mark Kotanchek
Chapter 9. Using Modularity Metrics as Design Features to Guide Evolution in Genetic Programming
Abstract
Genetic Programming has advanced the state of the art in the field of software synthesis. However, it has still not been able to produce some of the more complex programs routinely written by humans. One of the heuristics human programmers use to build complex software is the organization of code into reusable modules. Ever since the introduction of the concept of Automatically Defined Functions (ADFs) by John Koza in the 1990s, the genetic programming community has also expressed the need to evolve modular programs, but despite this interest and several subsequent innovations, the goal of evolving large-scale software built on reusable modules has not yet been achieved. In this chapter, we first discuss two modularity metrics—Reuse and Repetition—and describe the procedure for calculating them from program code and corresponding execution traces. We then introduce the concept of design features, which can be used alongside error measures to guide evolution. We also demonstrate the use of modularity design features in parent selection.
Anil Kumar Saini, Lee Spector
Chapter 10. Evolutionary Computation and AI Safety
Research Problems Impeding Routine and Safe Real-World Application of Evolution
Abstract
Recent developments in artificial intelligence and machine learning have spurred interest in the growing field of AI safety, which studies how to prevent human-harming accidents when deploying AI systems. This paper thus explores the intersection of AI safety with evolutionary computation, to show how safety issues arise in evolutionary computation and how understanding from evolutionary computational and biological evolution can inform the broader study of AI safety.
Joel Lehman
Chapter 11. Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?
Abstract
In the context of Genetic Programming Symbolic Regression, we empirically investigate the prior on the output prediction, that is, the distribution of the output prior to observing data. We distinguish between the prior due to initialisation and due to evolutionary search. We also investigate the effect on the prior of maximum tree depth and the effect of different function sets and different independent variable distributions. We find that priors are highly diffuse and sometimes include support for extreme values. We compare priors to values for dependent variables observed in benchmarks and real-world problems, finding that mismatches occur and can affect algorithm behaviour and performance. As a further application of our results, we investigate the behaviour of mutation operators in semantic space.
Miguel Nicolau, James McDermott
Chapter 12. Hands-on Artificial Evolution Through Brain Programming
Abstract
This paper is about the evolution of a bio-inspired methodologyhttp://​evovision.​cicese.​mx that mimics the cortical visual pathways. The methodology has been extensively tested on problems with different levels of complexity with outstanding results. After a review of the main works, the problem of classification of digitized art is introduced. An image database of five classes downloaded from the Kaggle web site is used as a benchmark for evolutionary learning. A comparison with convolutional neural network from scratch and the well-known AlexNet is provided to illustrate the quality of the proposal in comparison with the state-of-the-art.
Gustavo Olague, Mariana Chan-Ley
Chapter 13. Comparison of Linear Genome Representations for Software Synthesis
Abstract
In many genetic programming systems, the program variation and execution processes operate on different program representations. The representations on which variation operates are referred to as genomes. Unconstrained linear genome representations can provide a variety of advantages, including reduced complexity of program generation, variation, simplification and serialization operations. The Plush genome representation, which uses epigenetic markers on linear genomes to express nonlinear structures, has supported the production of state-of-the-art results in program synthesis with the PushGP genetic programming system. Here we present a new, simpler, non-epigenetic alternative to Plush, called Plushy, that appears to maintain all of the advantages of Plush while providing additional benefits. These results illustrate the virtues of unconstrained linear genome representations more generally, and may be transferable to genetic programming systems that target different languages for evolved programs.
Edward Pantridge, Thomas Helmuth, Lee Spector
Chapter 14. Enhanced Optimization with Composite Objectives and Novelty Pulsation
Abstract
An important benefit of multi-objective search is that it maintains a diverse population of candidates, which helps in deceptive problems in particular. Not all diversity is useful, however: candidates that optimize only one objective while ignoring others are rarely helpful. A recent solution is to replace the original objectives by their linear combinations, thus focusing the search on the most useful tradeoffs between objectives. To compensate for the loss of diversity, this transformation is accompanied by a selection mechanism that favors novelty. This paper improves this approach further by introducing novelty pulsation, i.e. a systematic method to alternate between novelty selection and local optimization. In the highly deceptive problem of discovering minimal sorting networks, it finds state-of-the-art solutions significantly faster than before. In fact, our method so far has established a new world record for the 20-line sorting network with 91 comparators. In the real-world problem of stock trading, it discovers solutions that generalize significantly better on unseen data. Composite Novelty Pulsation is therefore a promising approach to solving deceptive real-world problems through multi-objective optimization.
Hormoz Shahrzad, Babak Hodjat, Camille Dollé, Andrei Denissov, Simon Lau, Donn Goodhew, Justin Dyer, Risto Miikkulainen
Chapter 15. New Pathways in Coevolutionary Computation
Abstract
The simultaneous evolution of two or more species with coupled fitness—coevolution—has been put to good use in the field of evolutionary computation. Herein, we present two new forms of coevolutionary algorithms, which we have recently designed and applied with success. OMNIREP is a cooperative coevolutionary algorithm that discovers both a representation and an encoding for solving a particular problem of interest. SAFE is a commensalistic coevolutionary algorithm that maintains two coevolving populations: a population of candidate solutions and a population of candidate objective functions needed to measure solution quality during evolution.
Moshe Sipper, Jason H. Moore, Ryan J. Urbanowicz
Chapter 16. 2019 Evolutionary Algorithms Review
Abstract
Evolutionary algorithm research and applications began over 50 years ago. Like other artificial intelligence techniques, evolutionary algorithms will likely see increased use and development due to the increased availability of computation, more robust and available open source software libraries, and the increasing demand for artificial intelligence techniques. As these techniques become more adopted and capable, it is the right time to take a perspective of their ability to integrate into society and the human processes they intend to augment. In this review, we explore a new taxonomy of evolutionary algorithms and resulting classifications that look at five main areas: the ability to manage the control of the environment with limiters, the ability to explain and repeat the search process, the ability to understand input and output causality within a solution, the ability to manage algorithm bias due to data or user design, and lastly, the ability to add corrective measures. These areas are motivated by today’s pressures on industry to conform to both societies concerns and new government regulatory rules. As many reviews of evolutionary algorithms exist, after motivating this new taxonomy, we briefly classify a broad range of algorithms and identify areas of future research.
Andrew N. Sloss, Steven Gustafson
Chapter 17. Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model
Abstract
Reinforcement learning (RL) tasks have often assumed a Markov decision process, which is to say, state information is ‘complete’, hence there is no need to learn what to learn from. However, recent advances—such as visual reinforcement learning—have enabled the tasks typically addressed using RL to expand to include significant amounts of partial observability. This implies that the representation needs to support multiple forms of memory, thus credit assignment needs to: find efficient ways to encode high dimensional data, as well has, determining under what conditions to save and recall specific pieces of information, and for what purpose. In this work, we assume the tangled program graph (TPG) formulation for genetic programming, where this has already demonstrated competitiveness with deep learning solutions to multiple RL tasks (under complete information). In this work, TPG is augmented with indexed memory using a probabilistic formulation of a write operation (defines long and short term memory) and an indexed read. Moreover, the register information specific to the programs co-operating within a program is used to provide the low dimensional encoding of state. We demonstrate that TPG can then successfully evolve a behaviour for a hero bot in the Dota 2 game engine when playing in a single lane 1-on-1 configuration with the game engine hero bot as the opponent. Specific recommendations are made regarding the design of an appropriate fitness function. We show that TPG without indexed memory completely fails to learn any useful behaviour. Only with indexed memory are useful hero behaviours discovered.
Robert J. Smith, Malcolm I. Heywood
Chapter 18. Modelling Genetic Programming as a Simple Sampling Algorithm
Abstract
This chapter proposes a new model of tree-based Genetic Programming (GP) as a simple sampling algorithm that samples minimal schemata (subsets of the solution space) described by a single concrete node at a single position in the expression tree. We show that GP explores these schemata in the same way across three benchmarks, rapidly converging the population to a specific function at each position throughout the upper layers of the expression tree. This convergence is driven by covariance between membership of a simple schema and rank fitness. We model this process using Price’s theorem and provide empirical evidence to support our model. The chapter closes with an outline of a modification of the standard GP algorithm that reinforces this bias by converging populations to fit schemata in an accelerated way.
David R. White, Benjamin Fowler, Wolfgang Banzhaf, Earl T. Barr
Chapter 19. An Evolutionary System for Better Automatic Software Repair
Abstract
When a test suite is considered as the specification, the paradigm is called test-suite based repair Monperrus (ACM Comput Surv 51(1):17, 2018). The test suite should contain at least one negative (i.e., initially failing) test that triggers the bug to be fixed and a number of positive (i.e., initially passing) tests that define the expected program behavior.
Yuan Yuan, Wolfgang Banzhaf
Backmatter
Metadaten
Titel
Genetic Programming Theory and Practice XVII
herausgegeben von
Wolfgang Banzhaf
Prof. Dr. Erik Goodman
Leigh Sheneman
Dr. Leonardo Trujillo
Bill Worzel
Copyright-Jahr
2020
Electronic ISBN
978-3-030-39958-0
Print ISBN
978-3-030-39957-3
DOI
https://doi.org/10.1007/978-3-030-39958-0

Premium Partner