Skip to main content

2017 | Buch

Genetic Programming

20th European Conference, EuroGP 2017, Amsterdam, The Netherlands, April 19-21, 2017, Proceedings

herausgegeben von: James McDermott, Mauro Castelli, Lukas Sekanina, Evert Haasdijk, Pablo García-Sánchez

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 20th European Conference on Genetic Programming, EuroGP 2017, held in Amsterdam, The Netherlands, in April 2017, co-located with the Evo* 2017 events, EvoCOP, EvoMUSART, and EvoApplications.
The 14 revised full papers presented together with 8 poster papers were carefully reviewed and selected from 32 submissions. The wide range of topics in this volume reflects the current state of research in the field. Thus, we see topics and applications including program synthesis, genetic improvement, grammatical representations, self-adaptation, multi-objective optimisation, program semantics, search landscapes, mathematical programming, games, operations research, networks, evolvable hardware, and program synthesis benchmarks.

Inhaltsverzeichnis

Frontmatter

Oral Presentations

Frontmatter
Evolutionary Program Sketching
Abstract
Program synthesis can be posed as a satisfiability problem and approached with generic SAT solvers. Only short programs can be however synthesized in this way. Program sketching by Solar-Lezama assumes that a human provides a partial program (sketch), and that synthesis takes place only within the uncompleted parts of that program. This allows synthesizing programs that are overall longer, while maintaining manageable computational effort. In this paper, we propose Evolutionary Program Sketching (EPS), in which the role of sketch provider is handed over to genetic programming (GP). A GP algorithm evolves a population of partial programs, which are being completed by a solver while evaluated. We consider several variants of EPS, which vary in program terminals used for completion (constants, variables, or both) and in the way the completion outcomes are propagated to future generations. When applied to a range of benchmarks, EPS outperforms the conventional GP, also when the latter is given similar time budget.
Iwo Błądek, Krzysztof Krawiec
Exploring Fitness and Edit Distance of Mutated Python Programs
Abstract
Genetic Improvement (GI) is the process of using computational search techniques to improve existing software e.g. in terms of execution time, power consumption or correctness. As in most heuristic search algorithms, the search is guided by fitness with GI searching the space of program variants of the original software. The relationship between the program space and fitness is seldom simple and often quite difficult to analyse. This paper makes a preliminary analysis of GI’s fitness distance measure on program repair with three small Python programs. Each program undergoes incremental mutations while the change in fitness as measured by proportion of tests passed is monitored. We conclude that the fitnesses of these programs often does not change with single mutations and we also confirm the inherent discreteness of bug fixing fitness functions. Although our findings cannot be assumed to be general for other software they provide us with interesting directions for further investigation.
Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, David Cairns
Differentiable Genetic Programming
Abstract
We introduce the use of high order automatic differentiation, implemented via the algebra of truncated Taylor polynomials, in genetic programming. Using the Cartesian Genetic Programming encoding we obtain a high-order Taylor representation of the program output that is then used to back-propagate errors during learning. The resulting machine learning framework is called differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic regression, dCGP offers a new approach to the long unsolved problem of constant representation in GP expressions. On several problems of increasing complexity we find that dCGP is able to find the exact form of the symbolic expression as well as the constants values. We also demonstrate the use of dCGP to solve a large class of differential equations and to find prime integrals of dynamical systems, presenting, in both cases, results that confirm the efficacy of our approach.
Dario Izzo, Francesco Biscani, Alessio Mereta
Evolving Game State Features from Raw Pixels
Abstract
General video game playing is the art of designing artificial intelligence programs that are capable of playing different video games with little domain knowledge. One of the great challenges is how to capture game state features from different video games in a general way. The main contribution of this paper is to apply genetic programming to evolve game state features from raw pixels. A voting method is implemented to determine the actions of the game agent. Three different video games are used to evaluate the effectiveness of the algorithm: Missile Command, Frogger, and Space Invaders. The results show that genetic programming is able to find useful game state features for all three games.
Baozhu Jia, Marc Ebner
Emergent Tangled Graph Representations for Atari Game Playing Agents
Abstract
Organizing code into coherent programs and relating different programs to each other represents an underlying requirement for scaling genetic programming to more difficult task domains. Assuming a model in which policies are defined by teams of programs, in which team and program are represented using independent populations and coevolved, has previously been shown to support the development of variable sized teams. In this work, we generalize the approach to provide a complete framework for organizing multiple teams into arbitrarily deep/wide structures through a process of continuous evolution; hereafter the Tangled Program Graph (TPG). Benchmarking is conducted using a subset of 20 games from the Arcade Learning Environment (ALE), an Atari 2600 video game emulator. The games considered here correspond to those in which deep learning was unable to reach a threshold of play consistent with that of a human. Information provided to the learning agent is limited to that which a human would experience. That is, screen capture sensory input, Atari joystick actions, and game score. The performance of the proposed approach exceeds that of deep learning in 15 of the 20 games, with 7 of the 15 also exceeding that associated with a human level of competence. Moreover, in contrast to solutions from deep learning, solutions discovered by TPG are also very ‘sparse’. Rather than assuming that all of the state space contributes to every decision, each action in TPG is resolved following execution of a subset of an individual’s graph. This results in significantly lower computational requirements for model building than presently the case for deep learning.
Stephen Kelly, Malcolm I. Heywood
A General Feature Engineering Wrapper for Machine Learning Using -Lexicase Survival
Abstract
We propose a general wrapper for feature learning that interfaces with other machine learning methods to compose effective data representations. The proposed feature engineering wrapper (FEW) uses genetic programming to represent and evolve individual features tailored to the machine learning method with which it is paired. In order to maintain feature diversity, \(\epsilon \)-lexicase survival is introduced, a method based on \(\epsilon \)-lexicase selection. This survival method preserves semantically unique individuals in the population based on their ability to solve difficult subsets of training cases, thereby yielding a population of uncorrelated features. We demonstrate FEW with five different off-the-shelf machine learning methods and test it on a set of real-world and synthetic regression problems with dimensions varying across three orders of magnitude. The results show that FEW is able to improve model test predictions across problems for several ML methods. We discuss and test the scalability of FEW in comparison to other feature composition strategies, most notably polynomial feature expansion.
William La Cava, Jason Moore
Visualising the Search Landscape of the Triangle Program
Abstract
High order mutation analysis of a software engineering benchmark, including schema and local optima networks, suggests program improvements may not be as hard to find as is often assumed. (1) Bit-wise genetic building blocks are not deceptive and can lead to all global optima. (2) There are many neutral networks, plateaux and local optima, nevertheless in most cases near the human written C source code there are hill climbing routes including neutral moves to solutions.
William B. Langdon, Nadarajen Veerapen, Gabriela Ochoa
RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming
Abstract
Genetic programming (GP) has been shown to be a powerful tool for automatic modeling and program induction. It is often used to solve difficult symbolic regression tasks, with many examples in real-world domains. However, the robustness of GP-based approaches has not been substantially studied. In particular, the present work deals with the issue of outliers, data in the training set that represent severe errors in the measuring process. In general, a datum is considered an outlier when it sharply deviates from the true behavior of the system of interest. GP practitioners know that such data points usually bias the search and produce inaccurate models. Therefore, this work presents a hybrid methodology based on the RAndom SAmpling Consensus (RANSAC) algorithm and GP, which we call RANSAC-GP. RANSAC is an approach to deal with outliers in parameter estimation problems, widely used in computer vision and related fields. On the other hand, this work presents the first application of RANSAC to symbolic regression with GP, with impressive results. The proposed algorithm is able to deal with extreme amounts of contamination in the training set, evolving highly accurate models even when the amount of outliers reaches 90%.
Uriel López, Leonardo Trujillo, Yuliana Martinez, Pierrick Legrand, Enrique Naredo, Sara Silva
Symbolic Regression on Network Properties
Abstract
Networks are continuously growing in complexity, which creates challenges for determining their most important characteristics. While analytical bounds are often too conservative, the computational effort of algorithmic approaches does not scale well with network size. This work uses Cartesian Genetic Programming for symbolic regression to evolve mathematical equations that relate network properties directly to the eigenvalues of network adjacency and Laplacian matrices. In particular, we show that these eigenvalues are powerful features to evolve approximate equations for the network diameter and the isoperimetric number, which are hard to compute algorithmically. Our experiments indicate a good performance of the evolved equations for several real-world networks and we demonstrate how the generalization power can be influenced by the selection of training networks and feature sets.
Marcus Märtens, Fernando Kuipers, Piet Van Mieghem
Evolving Time-Invariant Dispatching Rules in Job Shop Scheduling with Genetic Programming
Abstract
Genetic Programming (GP) has achieved success in evolving dispatching rules for job shop scheduling problems, particularly in dynamic environments. However, there is still great potential to improve the performance of GP. One challenge that is yet to be addressed is the huge search space. In this paper, we propose a simple yet effective approach to improve the effectiveness and efficiency of GP. The new approach is based on a newly defined time-invariance property of dispatching rules, which is derived from the idea of translational invariance from machine learning. Then, we develop a new terminal selection scheme to guarantee the time-invariance throughout the GP process. The experimental studies show that by considering the time-invariance, GP can achieve much better rules in a much shorter time.
Yi Mei, Su Nguyen, Mengjie Zhang
Strategies for Improving the Distribution of Random Function Outputs in GSGP
Abstract
In the last years, different approaches have been proposed to introduce semantic information to genetic programming. In particular, the geometric semantic genetic programming (GSGP) and the interesting properties of its evolutionary operators have gotten the attention of the community. This paper is interested in the use of GSGP to solve symbolic regression problems, where semantics is defined by the output set generated by a given individual when applied to the training cases. In this scenario, both mutation and crossover operators defined with fitness function based on Manhattan distance use randomly built functions to generate offspring. However, the outputs of these random functions are not guaranteed to be uniformly distributed in the semantic space, as the functions are generated considering the syntactic space. We hypothesize that the non-uniformity of the semantics of these functions may bias the search, and propose three different standard normalization techniques to improve the distribution of the outputs of these random functions over the semantic space. The results are compared with a popular strategy that uses a logistic function as a wrapper to the outputs, and show that the strategies tested can improve the results of the previous method. The experimental analysis also indicates that a more uniform distribution of the semantics of these functions does not necessarily imply in better results in terms of test error.
Luiz Otavio V. B. Oliveira, Felipe Casadei, Gisele L. Pappa
Synthesis of Mathematical Programming Constraints with Genetic Programming
Abstract
We identify a novel application of Genetic Programming to automatic synthesis of mathematical programming (MP) models for business processes. Given a set of examples of states of a business process, the proposed Genetic Constraint Synthesis (GenetiCS) method constructs well-formed constraints for an MP model. The form of synthesized constraints (e.g., linear or polynomial) can be chosen accordingly to the nature of the process and the desired type of MP problem. In experimental part, we verify syntactic and semantic fidelity of the synthesized models to the actual benchmark models of varying complexity. The obtained symbolic models of constraints can be combined with an objective function of choice, fed into an off-shelf MP solver, and optimized.
Tomasz P. Pawlak, Krzysztof Krawiec
Grammatical Evolution of Robust Controller Structures Using Wilson Scoring and Criticality Ranking
Abstract
In process control it is essential that disturbances and parameter uncertainties do not affect the process in a negative way. Simultaneously optimizing an objective function for different scenarios can be solved in theory by evaluating candidate solutions on all scenarios. This is not feasible in real-world applications, where the scenario space often forms a continuum. A traditional approach is to approximate this evaluation using Monte Carlo sampling. To overcome the difficulty of choosing an appropriate sampling count and to reduce evaluations of low-quality solutions, a novel approach using Wilson scoring and criticality ranking within a grammatical evolution framework is presented. A nonlinear spring mass system is considered as benchmark example from robust control. The method is tested against Monte Carlo sampling and the results are compared to a backstepping controller. It is shown that the method is capable of outperforming state of the art methods.
Elias Reichensdörfer, Dirk Odenthal, Dirk Wollherr
Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data
Abstract
Feature construction is a pre-processing technique to create new features with better discriminating ability from the original features. Genetic programming (GP) has been shown to be a prominent technique for this task. However, applying GP to high-dimensional data is still challenging due to the large search space. Feature clustering groups similar features into clusters, which can be used for dimensionality reduction by choosing representative features from each cluster to form the feature subset. Feature clustering has been shown promising in feature selection; but has not been investigated in feature construction for classification. This paper presents the first work of utilising feature clustering in this area. We propose a cluster-based GP feature construction method called CGPFC which uses feature clustering to improve the performance of GP for feature construction on high-dimensional data. Results on eight high-dimensional datasets with varying difficulties show that the CGPFC constructed features perform better than the original full feature set and features constructed by the standard GP constructor based on the whole feature set.
Binh Tran, Bing Xue, Mengjie Zhang

Posters

Frontmatter
Geometric Semantic Crossover with an Angle-Aware Mating Scheme in Genetic Programming for Symbolic Regression
Abstract
Recent research shows that incorporating semantic knowledge into the genetic programming (GP) evolutionary process can improve its performance. This work proposes an angle-aware mating scheme for geometric semantic crossover in GP for symbolic regression. The angle-awareness guides the crossover operating on parents which have a large angle between their relative semantics to the target semantics. The proposed idea of angle-awareness has been incorporated into one state-of-the-art geometric crossover, the locally geometric semantic crossover. The experimental results show that, compared with locally geometric semantic crossover and the regular GP crossover, the locally geometric crossover with angle-awareness not only has a significantly better learning performance but also has a notable generalisation gain on unseen test data. Further analysis has been conducted to see the difference between the angle distribution of crossovers with and without angle-awareness, which confirms that the angle-awareness changes the original distribution of angles by decreasing the number of parents with zero degree while increasing their counterparts with large angles, leading to better performance.
Qi Chen, Bing Xue, Yi Mei, Mengjie Zhang
RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines
Abstract
Automatic Machine Learning is a growing area of machine learning that has a similar objective to the area of hyper-heuristics: to automatically recommend optimized pipelines, algorithms or appropriate parameters to specific tasks without much dependency on user knowledge. The background knowledge required to solve the task at hand is actually embedded into a search mechanism that builds personalized solutions to the task. Following this idea, this paper proposes RECIPE (REsilient ClassifIcation Pipeline Evolution), a framework based on grammar-based genetic programming that builds customized classification pipelines. The framework is flexible enough to receive different grammars and can be easily extended to other machine learning tasks. RECIPE overcomes the drawbacks of previous evolutionary-based frameworks, such as generating invalid individuals, and organizes a high number of possible suitable data pre-processing and classification methods into a grammar. Results of f-measure obtained by RECIPE are compared to those two state-of-the-art methods, and shown to be as good as or better than those previously reported in the literature. RECIPE represents a first step towards a complete framework for dealing with different machine learning tasks with the minimum required human intervention.
Alex G. C. de Sá, Walter José G. S. Pinto, Luiz Otavio V. B. Oliveira, Gisele L. Pappa
A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming
Abstract
Grammar Guided Genetic Programming has been applied to many problem domains. It is well suited to tackle program synthesis, as it has the capability to evolve code in arbitrary languages. Nevertheless, grammars designed to evolve code have always been tailored to specific problems resulting in bespoke grammars, which makes them difficult to reuse. In this study a more general approach to grammar design in the program synthesis domain is presented. The approach undertaken is to create a grammar for each data type of a language and combine these grammars for the problem at hand, without having to tailor a grammar for every single problem. The approach can be applied to arbitrary problem instances of program synthesis and can be used with any programming language. The approach is also extensible to use libraries available in a given language. The grammars presented can be applied to any grammar-based Genetic Programming approach and make it easy for researches to rerun experiments or test new problems. The approach is tested on a suite of benchmark problems and compared to PushGP, as it is the only GP system that has presented results on a wide range of benchmark problems. The object of this study is to match or outperform PushGP on these problems without tuning grammars to solve each specific problem.
Stefan Forstenlechner, David Fagan, Miguel Nicolau, Michael O’Neill
Improving the Tartarus Problem as a Benchmark in Genetic Programming
Abstract
For empirical research on computer algorithms, it is essential to have a set of benchmark problems on which the relative performance of different methods and their applicability can be assessed. In the majority of computational research fields there are established sets of benchmark problems; however, the field of genetic programming lacks a similarly rigorously defined set of benchmarks. There is a strong interest within the genetic programming community to develop a suite of benchmarks. Following recent surveys [7], the desirable characteristics of a benchmark problem are now better defined. In this paper the Tartarus problem is proposed as a tunably difficult benchmark problem for use in Genetic Programming. The justification for this proposal is presented, together with guidance on its usage as a benchmark.
Thomas D. Griffiths, Anikó Ekárt
A New Subgraph Crossover for Cartesian Genetic Programming
Abstract
While tree-based Genetic Programming is often used with crossover, Cartesian Genetic Programming is mostly used only with mutation as genetic operator. In this paper, a new crossover technique is introduced which recombines subgraphs of two selected graphs. Experiments on symbolic regression, boolean functions and image operator design problems indicate that the use of the subgraph crossover improves the search performance of Cartesian Genetic Programming. A preliminary comparison to a former proposed crossover technique indicates that the subgraph crossover performs better on our tested problems.
Roman Kalkreuth, Günter Rudolph, Andre Droschinsky
A Comparative Study of Different Grammar-Based Genetic Programming Approaches
Abstract
Grammars are useful formalisms to specify constraints, and not surprisingly, they have attracted the attention of Evolutionary Computation (EC) researchers to enforce problem restrictions. Context-Free-Grammar GP (CFG-GP) established the foundations for the application of grammars in Genetic Programming (GP), whilst Grammatical Evolution (GE) popularised the use of these approaches, becoming one of the most used GP variants. However, studies have shown that GE suffers from issues that have impact on its performance. To minimise these issues, several extensions have been proposed, which made the distinction between GE and CFG-GP less noticeable. Another direction was followed by Structured Grammatical Evolution (SGE) that maintains the separation between genotype and phenotype from GE, but overcomes most of its issues. Our goal is to perform a comparative study between CFG-GP, GE and SGE to examine their relative performance. The results show that in most of the selected benchmarks, CFG-GP and SGE have a similar performance, showing that SGE is a good alternative to GE.
Nuno Lourenço, Joaquim Ferrer, Francisco B. Pereira, Ernesto Costa
A Comparative Analysis of Dynamic Locality and Redundancy in Grammatical Evolution
Abstract
The most salient feature of Grammatical Evolution (GE) is a procedure which maps genotypes to phenotypes using the grammar production rules; however, the search effectiveness of GE may be affected by low locality and high redundancy, which can prevent GE to comply with the basic principle that offspring should inherit some traits from their parents. Indeed, many studies previously investigated the locality and redundancy of GE as originally proposed in [31]. In this paper, we extend those results by considering redundancy and locality during the evolution, rather than statically, hence trying to understand if and how they are influenced by the selective pressure determined by the fitness. Moreover, we consider not only the original GE formulation, but three other variants proposed later (BGE, \(\pi \)GE, and SGE). We experimentally find that there is an interaction between locality/redundancy and other evolution-related measures, namely diversity and growth of individual size. In particular, the combined action of the crossover operator and the genotype-phenotype mapper makes SGE less redundant at the beginning of the evolution, but with very high redundancy after some generations, due to the low phenotype diversity.
Eric Medvet
On Evolutionary Approximation of Sigmoid Function for HW/SW Embedded Systems
Abstract
Providing machine learning capabilities on low cost electronic devices is a challenging goal especially in the context of the Internet of Things paradigm. In order to deliver high performance machine intelligence on low power devices, suitable hardware accelerators have to be introduced. In this paper, we developed a method enabling to evolve a hardware implementation together with a corresponding software controller for key components of smart embedded systems. The proposed approach is based on a multi-objective design space exploration conducted by means of extended linear genetic programming. The approach was evaluated in the task of approximate sigmoid function design which is an important component of hardware implementations of neural networks. During these experiments, we automatically re-discovered some approximate sigmoid functions known from the literature. The method was implemented as an extension of an existing platform supporting concurrent evolution of hardware and software of embedded systems.
Milos Minarik, Lukas Sekanina
Backmatter
Metadaten
Titel
Genetic Programming
herausgegeben von
James McDermott
Mauro Castelli
Lukas Sekanina
Evert Haasdijk
Pablo García-Sánchez
Copyright-Jahr
2017
Electronic ISBN
978-3-319-55696-3
Print ISBN
978-3-319-55695-6
DOI
https://doi.org/10.1007/978-3-319-55696-3