nach oben

2008 | Buch

Genetic Programming

11th European Conference, EuroGP 2008, Naples, Italy, March 26-28, 2008. Proceedings

herausgegeben von: Michael O’Neill, Leonardo Vanneschi, Steven Gustafson, Anna Isabel Esparcia Alcázar, Ivanoe De Falco, Antonio Della Cioppa, Ernesto Tarantino

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The 11th European Conference on Genetic Programming, EuroGP 2008, took place in Naples, Italy from 26 to 28 March in the University of Naples Congress Centre with spectacular views over the Gulf of Naples. This volume contains the papers for the 21 oral presentations and 10 posters that were presented during this time. A diverse array of topics were covered re?ecting the current state of research in the ?eld of Genetic Programming, including the latest work on representations, theory, operators and analysis, evolvable hardware, agents and numerous applications. A rigorous, double-blind peer review process was employed, with each s- mission reviewed by at least three members of the international Program C- mittee. In total 61 papers were submitted this year, making an acceptance rate of 34% for full papers, and an overall acceptance rate of 51% including posters. S- mission of papers and the reviewing process were greatly assisted by the use of the MyReview management software originally developed by Philippe Rigaux, Bertrand Chardon and other colleagues from the Universit´e Paris-Sud Orsay, France. We are especially grateful to Marc Schoenauer from INRIA, France for managing this system. Reviewers were asked to nominate keywords specifying their area of expertise, and these keywords were matched to those selected by the authors of the submitted papers with the assistance of the optimal assignment feature of the conference management software.

Inhaltsverzeichnis

Frontmatter

Oral Presentations

Training Time and Team Composition Robustness in Evolved Multi-agent Systems

Evolutionary algorithms are effective at creating cooperative, multi-agent systems. However, current Island and Team algorithms show subtle but significant weaknesses when it comes to balancing member performance with member cooperation, leading to sub-optimal overall team performance. In this paper we apply a new class of cooperative multi-agent evolutionary algorithms called Orthogonal Evolution of Teams (OET) which produce higher levels of cooperation and specialization than current team algorithms. We also show that sophisticated behavior evolves much sooner using OET algorithms, even when training resources are significantly reduced. Finally, we show that when teams must be reformed, due to agent break down for example, those teams composed of individuals sampled from OET teams perform much better than teams composed of individuals sampled from teams created by other methods.

Russell Thomason, Robert B. Heckendorn, Terence Soule

Winning Ant Wars: Evolving a Human-Competitive Game Strategy Using Fitnessless Selection

We tell the story of

BrilliAnt

, the winner of the Ant Wars contest organized within GECCO’2007, Genetic and Evolutionary Computation Conference. The task for the Ant Wars contestants was to evolve a controller for a virtual ant that collects food in a square toroidal grid environment in the presence of a competing ant. BrilliAnt, submitted to the contest by our team, has been evolved through competitive one-population coevolution using genetic programming and a novel

fitnessless selection

method

In the paper, we detail the evolutionary setup that lead to BrilliAnt’s emergence, assess its human-competitiveness, and describe selected behavioral patterns observed in its strategy.

Wojciech Jaśkowski, Krzysztof Krawiec, Bartosz Wieloch

In Silicon No One Can Hear You Scream: Evolving Fighting Creatures

Virtual creatures operating in a physically realistic 3D environment, as originally introduced by Karl Sims, provide a challenging domain for artificial evolution. However, few coevolutionary experiments have been reported. Here we describe the results of our experiments on the evolution of physical combat among virtual creatures: essentially, we evolve creatures that trade blows with each other. While several authors have involved highly abstract forms of “combat” in their systems, this is (to our knowledge) the first example of realistic physical combat between virtual creatures, based on actual contact and physical damage. This poses the question of apportioning damage in a collision. Our solution is to assign damage proportionally to how much each colliding limb contributed to the occurrence and depth of the collision. Our system successfully evolves a wide range of morphologies and fighting behaviours, which we describe visually and verbally. As with our previous efforts, our source code is publicly available.

Thomas Miconi

Real-Time, Non-intrusive Speech Quality Estimation: A Signal-Based Model

Speech quality estimation, as perceived by humans, is of vital importance to proper functioning of telecommunications networks. Speech quality can be degraded due to various network related problems. In this paper we present a model for speech quality estimation that is a function of various time and frequency domain features of human speech. We have employed a hybrid optimization approach, by using Genetic Programming (GP) to find a suitable structure for the desired model. In order to optimize the coefficients of the model we have employed a traditional GA and a numerical method known as linear scaling. The proposed model outperforms the ITU-T Recommendation P.563 in terms of prediction accuracy, which is the current non-intrusive speech quality estimation model. The proposed model also has a significantly reduced dimensionality. This may reduce the computational requirements of the model.

Adil Raja, Colin Flanagan

Good News: Using News Feeds with Genetic Programming to Predict Stock Prices

This paper introduces a new data set for use in the financial prediction domain, that of quantified News

Sentiment

. This data is automatically generated in real time from the Dow Jones network with news stories being classified as either Positive, Negative or Neutral in relation to a particular market or sector of interest.

We show that with careful consideration to fitness function and data representation, GP can be used effectively to find non-linear solutions for predicting large intraday price jumps on the S&P 500 up to an hour before they occur. The results show that GP was successfully able to predict stock price movement using these news

alone

, that is, without access to even current market price.

Fiacc Larkin, Conor Ryan

A Genetic Programming Approach to Deriving the Spectral Sensitivity of an Optical System

In color image processing, several sensors are used which respond to the light in the red, green and blue parts of the spectrum. When working with color images taken by an optical system it is very important to know the sensitivity of the entire optical system. The optical system consists of the sensor, lens and any filters which may be used. The response characteristics of the lens and filters can be measured inside the laboratory. However, for many digital cameras it is not clear how the sensors contained inside the camera respond to light. This information may not be available from the manufacturer of the camera. Even if we knew the response characteristics of the sensor, it may not be clear what algorithms are employed by the manufacturer before the data is finally stored as an image file. We show how genetic programming may be used to obtain the sensor response functions using a single image from a calibration target as input together with the reflectance data of this calibration target.

Marc Ebner

A SIMD Interpreter for Genetic Programming on GPU Graphics Cards

Mackey-Glass chaotic time series prediction and nuclear protein classification show the feasibility of evaluating genetic programming populations

directly

on parallel consumer gaming graphics processing units. Using a Linux KDE computer equipped with an nVidia GeForce 8800 GTX graphics processing unit card the C++ SPMD interpretter evolves programs at Giga GP operations per second (895 million GPops). We use the RapidMind general processing on GPU (GPGPU) framework to evaluate an entire population of a quarter of a million individual programs on a non-trivial problem in 4 seconds. An efficient reverse polish notation (RPN) tree based GP is given.

W. B. Langdon, Wolfgang Banzhaf

Partitioned Incremental Evolution of Hardware Using Genetic Programming

In an effort to enable evolutionary computation techniques to discover solutions for large and complex hardware systems, techniques have been devised to break the initial problem down into smaller sub-tasks. In particular, a decomposition approach has been described that is based on partitioning of the circuit test vectors, but it has its limitations. In an effort to address this, we have combined the partitioning method with an incrementally evolving genetic programming approach. The result, referred to as Partitioned Incremental Evolution of HARDware (PIE-HARD), exhibits solution-finding performance that is significantly better than that of other approaches.

David Jackson

Population Parallel GP on the G80 GPU

The availability of low cost powerful parallel graphics cards has stimulated a trend to port GP on Graphics Processing Units (GPUs). Previous works on GPUs have shown evaluation phase speedups for large training cases sets. Using the CUDA language on the G80 GPU, we show it is possible to efficiently interpret several GP programs in parallel, thus obtaining speedups also for small training sets starting at less than 100 training cases. Our scheme was embedded in the well-known ECJ library, providing an easy entry point for owners of G80 GPUs.

Denis Robilliard, Virginie Marion-Poty, Cyril Fonlupt

Operator Equalisation and Bloat Free GP

Research has shown that beyond a certain minimum program length the distributions of program functionality and fitness converge to a limit. Before that limit, however, there may be program-length classes with a higher or lower average fitness than that achieved beyond the limit. Ideally, therefore, GP search should be limited to program lengths that are within the limit and that can achieve optimum fitness. This has the dual benefits of providing the simplest/smallest solutions and preventing GP bloat thus shortening run times. Here we introduce a novel and simple technique, which we call

Operator Equalisation

, to control how GP will sample certain length classes. This allows us to finely and freely bias the search towards shorter or longer programs and also to search specific length classes during a GP run. This gives the user total control on the program length distribution, thereby completely freeing GP from bloat. Results show that we can automatically identify potentially optimal solution length classes quickly using small samples and that, for particular classes of problems, simple length biases can significantly improve the best fitness found during a GP run.

Stephen Dignum, Riccardo Poli

Practical Model of Genetic Programming’s Performance on Rational Symbolic Regression Problems

Many theoretical studies on GP are criticized for not being applicable to the real world. Here we present a practical model for the performance of a standard GP system in real problems. The model gives accurate predictions and has a variety of applications, including the assessment of the similarities and differences of different GP systems.

Mario Graff, Riccardo Poli

Semantic Building Blocks in Genetic Programming

We present a new mechanism for studying the impact of subtree crossover in terms of semantic building blocks. This approach allows us to completely and compactly describe the semantic action of crossover, and provide insight into what does (or doesn’t) make crossover effective. Our results make it clear that a very high proportion of crossover events (typically over 75% in our experiments) are guaranteed to perform no immediately useful search in the semantic space. Our findings also indicate a strong correlation between lack of progress and high proportions of fixed contexts. These results then suggest several new, theoretically grounded, research areas.

Nicholas Freitag McPhee, Brian Ohs, Tyler Hutchison

A Simple Powerful Constraint for Genetic Programming

This paper demonstrates the ability of Hereditary Repulsion to perform well on a range of diverse problem domains. Furthermore, we show that HR is practically invulnerable to the effects to overfitting and does not suffer a loss of generalisation, even in the late stages of evolution. We trace the source of this high quality performance to a pleasingly simple constraint at the heart of the HR algorithm. We confirm its effectiveness by incorporating the constraint into one of the benchmark systems, observing substantial improvements in the quality of generalisation in the evolved population.

Gearoid Murphy, Conor Ryan

Crossover, Sampling, Bloat and the Harmful Effects of Size Limits

Recent research [9,2] has enabled the accurate prediction of the limiting distribution of tree sizes for Genetic Programming with standard sub-tree swapping crossover when GP is applied to a flat fitness landscape. In that work, however, tree sizes are measured in terms of number of internal nodes. While the relationship between internal nodes and length is one-to-one for the case of

-ary trees, it is much more complex in the case of mixed arities. So, practically the

length

bias of subtree crossover remains unknown. This paper starts to fill this theoretical gap, by providing accurate estimates of the limiting distribution of lengths approached by tree-based GP with standard crossover in the absence of selection pressure. The resulting models confirm that short programs can be expected to be heavily resampled. Empirical validation shows that this is indeed the case. We also study empirically how the situation is modified by the application of program length limits. Surprisingly, the introduction of such limits further exacerbates the effect. However, this has more profound consequences than one might imagine at first. We analyse these consequences and predict that, in the presence of fitness, size limits may initially speed up bloat, almost completely defeating their original purpose (combating bloat). Indeed, experiments confirm that this is the case for the first 10 or 15 generations. This leads us to suggest a better way of using size limits. Finally, this paper proposes a novel technique to counteract bloat, sampling parsimony, the application of a penalty to resampling.

Stephen Dignum, Riccardo Poli

The Performance of a Selection Architecture for Genetic Programming

Hierarchical decomposition and reuse techniques are seen as making a vital contribution to the scalability of genetic programming systems. Existing techniques either try to identify and encapsulate useful code fragments as they evolve, or else they rely on intelligent prior deconstruction of the problem at hand. The alternative we propose is to base decomposition on a partitioning of the input test cases into subsets or ranges. To effect this, the program architecture of individuals is such that each subset is dealt with in an independently evolved branch, rooted at a selection node handling branch activation. Experimentation reveals that performance of systems employing this architecture is significantly better than that of more conventional systems.

David Jackson

A Comparison of Cartesian Genetic Programming and Linear Genetic Programming

Two prominent genetic programming approaches are the graph-based Cartesian Genetic Programming (CGP) and Linear Genetic Programming (LGP). Recently, a formal algorithm for constructing a directed acyclic graph (DAG) from a classical LGP instruction sequence has been established. Given graph-based LGP and traditional CGP, this paper investigates the similarities and differences between the two implementations, and establishes that the significant difference between them is each algorithm’s means of restricting inter-connectivity of nodes. The work then goes on to compare the performance of two representations each (with varied connectivity) of LGP and CGP to a directed cyclic graph (DCG) GP with no connectivity restrictions on a medical classification and regression benchmark.

Garnett Wilson, Wolfgang Banzhaf

Evolvability Via Modularity-Induced Mutational Focussing

This work postulates a mechanism by which random genotypic variation is directed towards favourable phenotypic variation. Evolvability is a poorly understood concept at present: it is unclear precisely how the genotype-phenotype map aligns random genotypic mutation with favourable phenotypic variation. By static analysis of the distribution of the genotypic representation of functionality, an emergent bias in the representation of the adapted and maladapted is shown. This bias is facilitated by a form of reuse modularity, and it serves to direct phenotypic variation to where there is selective opportunity.

Richard M. Downing

A Linear Estimation-of-Distribution GP System

We present N-gram GP, an estimation of distribution algorithm for the evolution of linear computer programs. The algorithm learns and samples a joint probability distribution of triplets of instructions (or 3-grams) at the same time as it is learning and sampling a program length distribution. We have tested N-gram GP on symbolic regressions problems where the target function is a polynomial of up to degree 12 and lawn-mower problems with lawn sizes of up to 12×12. Results show that the algorithm is effective and scales better on these problems than either linear GP or simple stochastic hill-climbing.

Riccardo Poli, Nicholas Freitag McPhee

Feature Discovery in Reinforcement Learning Using Genetic Programming

The goal of reinforcement learning is to find a policy that maximizes the expected reward accumulated by an agent over time based on its interactions with the environment; to this end, a function of the state of the agent has to be learned. It is often the case that states are better characterized by a set of features. However, finding a “good” set of features is generally a tedious task which requires a good domain knowledge. In this paper, we propose a genetic programming based approach for feature discovery in reinforcement learning. A population of individuals, each representing a set of features, is evolved, and individuals are evaluated by their average performance on short reinforcement learning trials. The results of experiments conducted on several benchmark problems demonstrate that the resulting features allow the agent to learn better policies in a reduced amount of episodes.

Sertan Girgin, Philippe Preux

Hardware Accelerators for Cartesian Genetic Programming

A new class of FPGA-based accelerators is presented for Cartesian Genetic Programming (CGP). The accelerators contain a genetic engine which is reused in all applications. Candidate programs (circuits) are evaluated using application-specific virtual reconfigurable circuit (VRC) and fitness unit. Two types of VRCs are proposed. The first one is devoted for symbolic regression problems over the fixed point representation. The second one is designed for evolution of logic circuits. In both cases a significant speedup of evolution (30–40 times) was obtained in comparison with a highly optimized software implementation of CGP. This speedup can be increased by creating multiple fitness units.

Zdenek Vasicek, Lukas Sekanina

Genetic Programming and Class-Wise Orthogonal Transformation for Dimension Reduction in Classification Problems

This paper describes a new method using genetic programming (GP) in dimension reduction for classification problems. Two issues have been considered: (a) transforming the original feature space to a set of new features (components) that are more useful in classification, (b) finding a ranking measure to select more significant features. The paper presents a new class-wise orthogonal transformation function to construct a variable terminal pool for the proposed GP system. Information entropy over class intervals is used as the ranking measure for the constructed features. The performance measure is the classification accuracy on 12 benchmark problems using constructed features in a decision tree classifier. The new approach is compared with the principle component analysis (PCA) method and the results show that the new approach outperforms the PCA method on most of the problems in terms of final classification performance and dimension reduction.

Kourosh Neshatian, Mengjie Zhang

Posters

Evolving Proactive Aggregation Protocols

We present an approach for the automated synthesis of proactive aggregation protocols using Genetic Programming and discuss major decisions in modeling and simulating distributed aggregation protocols. We develop a genotype, which is an abstract specification form for aggregation protocols. Finally we show the evolution of a distributed average protocol under various conditions to demonstrate the utility of our approach.

Thomas Weise, Michael Zapf, Kurt Geihs

GP Classification under Imbalanced Data sets: Active Sub-sampling and AUC Approximation

The problem of evolving binary classification models under increasingly unbalanced data sets is approached by proposing a strategy consisting of two components: Sub-sampling and ‘robust’ fitness function design. In particular, recent work in the wider machine learning literature has recognized that maintaining the original distribution of exemplars during training is often not appropriate for designing classifiers that are robust to degenerate classifier behavior. To this end we propose a ‘Simple Active Learning Heuristic’ (SALH) in which a subset of exemplars is sampled with uniform probability under a class balance enforcing rule for fitness evaluation. In addition, an efficient estimator for the Area Under the Curve (AUC) performance metric is assumed in the form of a modified Wilcoxon-Mann-Whitney (WMW) statistic. Performance is evaluated in terms of six representative UCI data sets and benchmarked against: canonical GP, SALH based GP, SALH and the modified WMW statistic, and deterministic classifiers (Naive Bayes and C4.5). The resulting SALH-WMW model is demonstrated to be both efficient and effective at providing solutions maximizing performance assessed in terms of AUC.

John Doucette, Malcolm I. Heywood

Exposing a Bias Toward Short-Length Numbers in Grammatical Evolution

Many automatically-synthesized programs have, like their hand-made counterparts, numerical parameters that need to be set properly before they can show an acceptable performance. Hence, any approach to the automatic synthesis of programs needs the ability to tune numerical parameters efficiently.

Grammatical Evolution (GE) is a promising grammar-based genetic programming technique that synthesizes numbers by concatenating digits. In this paper, we show that a naive application of this approach can lead to a serious number length bias that in turn affects efficiency. The root of the problem is the way the context-free grammar used by GE is defined. A simple, yet effective, solution to this problem is proposed.

Marco A. Montes de Oca

Cooperative Problem Decomposition in Pareto Competitive Classifier Models of Coevolution

Pareto competitive models of coevolution have the potential to provide a number of distinct advantages over the canonical approach to training under the Genetic Programming (GP) classifier domain. Recent work has specifically focused on the reformulation of training as a two-population competition, that is learners versus training exemplars. Such a scheme affords, for example, the capacity to decouple the fitness evaluation overhead from the data set size through sub sampling while naturally encouraging ‘teams’ or composite solutions as opposed to solutions based on a single individual alone. One outstanding question with respect to the latter characteristic is with regards to the nature of the team (archive) behavior in terms of pattern coverage. That is to say, which models are used when, and what are the implications for solution modularity as it relates, for example, to the assignment of exemplars to solution participants. The current work investigates two Pareto competitive approaches to classification under GP, with one configured to employ an explicitly cooperative multi-objective cost function based and the other employing the classical (error-based) cost function. We empirically demonstrate a critical distinction between the two with regards to problem decomposition, with the capacity to provide a decomposition into unique behaviors being much more prevalent when co-operative mechanisms are explicitly supported.

Andrew R. McIntyre, Malcolm I. Heywood

Integrating Categorical Variables with Multiobjective Genetic Programming for Classifier Construction

Genetic programming (GP) has proved successful at evolving pattern classifiers and although the paradigm lends itself easily to continuous pattern attributes, incorporating categorical attributes is little studied. Here we construct two synthetic datasets specifically to investigate the use of categorical attributes in GP and consider two possible approaches: indicator variables and integer mapping. We conclude that for ordered attributes, integer mapping yields the lowest errors. For purely nominal attributes, indicator variables give the best misclassification errors.

Khaled Badran, Peter Rockett

The Effects of Constant Neutrality on Performance and Problem Hardness in GP

The neutral theory of molecular evolution and the associated notion of neutrality have interested many researchers in Evolutionary Computation. The hope is that the presence of neutrality can aid evolution. However, despite the vast number of publications on neutrality, there is still a big controversy on its effects. The aim of this paper is to clarify under what circumstances neutrality could aid Genetic Programming using the traditional representation (i.e. tree-like structures) . For this purpose, we use fitness distance correlation as a measure of hardness. In addition we have conducted extensive empirical experimentation to corroborate the fitness distance correlation predictions. This has been done using two test problems with very different landscape features that represent two extreme cases where the different effects of neutrality can be emphasised. Finally, we study the distances between individuals and global optimum to understand how neutrality affects evolution (at least with the one proposed in this paper).

Edgar Galván-López, Stephen Dignum, Riccardo Poli

Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering

In this paper we apply multiobjective genetic programming to the cost-sensitive classification task of labelling spam e-mails. We consider three publicly-available spam corpora and make comparison with both support vector machines and naïve Bayes classifiers, both of which are held to perform well on the spam filtering problem. We find that for the high cost ratios of practical interest, our cost-sensitive multiobjective genetic programming gives the best results across a range of performance measures.

Yang Zhang, HongYu Li, Mahesan Niranjan, Peter Rockett

PlasmidPL: A Plasmid-Inspired Language for Genetic Programming

We present PlasmidPL, a plasmid-inspired programming language designed for Genetic Programming (GP), and based on a chemical metaphor. The basic data structures in PlasmidPL are circular virtual molecules or rings which may contain code and data. Rings may react with each other to perform computations on the rings themselves. A virtual chemical reactor stochastically chooses which reactions should occur and when. Code and data may be rewritten in the process, leading to a system that constantly modifies itself. In order to be closer to chemistry, PlasmidPL relies solely on the data and code stored in molecules.

After describing the language, we show some hand-written sample programs that implement initial program generation, mutation and crossover within self-modifying chemical programs. These programs are then used to solve a typical symbolic regression problem, as a feasibility study. Finally, we discuss future directions into specific application scenarios that can benefit from such a chemical model.

Lidia Yamamoto

Using Genetic Programming for Turing Machine Induction

Turing machines are playing an increasingly significant role in Computer Science domains such as bioinformatics. Instead of directly formulating a solution to a problem, a Turing machine which produces a solution algorithm is generated. The original problem is reduced to that of inducing an acceptor for a recursively enumerable language or a Turing machine transducer. This paper reports on a genetic programming system implemented to evolve Turing machine acceptors and transducers. Each element of the population is represented as a directed graph and graph crossover, mutation and reproduction are used to evolve each generation. The paper also presents a set of five acceptor and five transducer benchmark problems which can be used to test and compare different methodologies for generating Turing machines. The genetic programming system implemented evolved general solutions for all ten problems.

Amashini Naidoo, Nelishia Pillay

Altering Search Rates of the Meta and Solution Grammars in the mGGA

Adopting a meta-Grammar with Grammatical Evolution(GE) allows GE to evolve the grammar that it uses to specify the construction of a syntactically correct solution. The ability to evolve a grammar in the context of GE means that useful bias towards specific structures and solutions can be evolved during a run. This can lead to improved performance over the standard static grammar in terms of adapting to a dynamic environment and improved scalability to larger problem instances. This approach allows the evolution of modularity and reuse both on structural and symbol levels resulting in a compression of the representation of a solution. In this paper an analysis of altering the rate of sampling of the evolved solution grammars is undertaken. It is found that the majority of evolutionary search is currently focused on the generation of the solution grammars to such an extent that the candidate solutions are often hard-coded into them making the solution chromosome effectively redundant. This opens the door to future work in which we can explore how the search can be better balanced between the meta and solution grammars

Erik Hemberg, Michael O’Neill, Anthony Brabazon

Backmatter

Titel: Genetic Programming
herausgegeben von: Michael O’Neill
Leonardo Vanneschi
Steven Gustafson
Anna Isabel Esparcia Alcázar
Ivanoe De Falco
Antonio Della Cioppa
Ernesto Tarantino
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-78671-9
Print ISBN: 978-3-540-78670-2
DOI: https://doi.org/10.1007/978-3-540-78671-9