Skip to main content

2007 | Buch

Linear Genetic Programming

insite
SUCHEN

Über dieses Buch

Linear Genetic Programming presents a variant of genetic programming (GP) that evolves imperative computer programs as linear sequences of instructions, in contrast to the more traditional functional expressions or syntax trees. Primary characteristics of linear program structure are exploited to achieve acceleration of both execution time and evolutionary progress. Online analysis and optimization of program code lead to more efficient techniques and contribute to a better understanding of the method and its parameters. In particular, the reduction of structural variation step size and non-effective variations play a key role in finding higher quality and less complex solutions. Typical GP phenomena, such as non-effective code, neutral variations, and code growth are investigated from the perspective of linear GP.

This book serves as a reference for researchers; it also contains sufficient introductory material for students and those who are new to the field.

Inhaltsverzeichnis

Frontmatter

Introduction

Chapter 1. Introduction
Abstract
Natural evolution has always been a source of inspiration for science and engineering. “Endless forms, most beautiful”, as Darwin put it. Who would not marvel at the unbelievable variety of life, who would not admire its complexity? A process that could bring about such wonders in nature, couldn't we glean from it some tricks that would be useful for our own activities? Couldn't we learn some methods for the design of other systems, for instance, machines and their behaviors? It is along these lines of thought that algorithms where conceived, able to catch aspects of natural evolution.

Fundamental Analysis

Frontmatter
Chapter 2. Basic Concepts of Linear Genetic Programming
Abstract
In this chapter linear genetic programming (LGP) will be explored in further detail. The basis of the specific linear GP variant we want to investigate in this book will be described, in particular the programming language used for evolution, the representation of individuals, and the specific evolutionary algorithm employed. This will form the core of our LGP system, while fundamental concepts of linear GP will also be discussed, including various forms of program execution. Linear GP operates with imperative programs. All discussions and experiments in this book are conducted independently from a special type of programming language or processor architecture. Even though genetic programs are interpreted and partly noted in the high-level language C, the applied programming concepts exist principally in or may be translated into most modern imperative programming languages, down to the level of machine languages.
Chapter 3. Characteristics of the Linear Representation
Abstract
In the first instance linear genetic programming has been introduced for the benefit of shorter execution time. Genetic programs written as binary machine code do not have to pass through a time-consuming interpre- tation step (see Section 2.2). In this chapter we investigate other, more general features of the linear representation.One basic difference to a tree representation is the emergence of unused code parts in linear genetic pro- grams that are independent of program semantics. Another fundamental difference is that the data flow in a linear genetic program has a directed graph structure that is more general than a tree.
Chapter 4. A Comparison with Neural Networks
4.5 Summary and Conclusion
We reported on LGP applied to a number of medical classification tasks. It was demonstrated that, on average, genetic programming performs competitive to RPROP neural networks with respect to the generalization performance.
The runtime performance of genetic programming becomes especially important for time-critical applications or when operating with large data sets from real-world domains like medicine. Two techniques were presented that reduced the computational costs significantly.
First, the elimination of noneffective code from linear genetic programs resulted in an average decrease in runtime of about a factor of 5 here. Second, by using a demetic population in combination with an elitist migration strategy the number of effective generations was reduced by a factor of about 3, without decreasing the performance of the evolutionary algorithm.

Method Design

Frontmatter
Chapter 5. Linear Genetic Operators I — Segment Variations
Abstract
Crossover has been the traditional operator in tree-based GP for varying the content and size of programs. In this chapter we systematically introduce crossover and mutation operators for the linear program representation and compare their influence on prediction performance and the complexity of evolved solutions.
We can distinguish between two different levels of variation done by these operators. Macro variations operate on the instruction level (or macro level). In this perspective, an instruction represents the smallest unit. Micro variations operate on the level of instruction components (micro level) and manipulate registers, operators, and constants. Only macro variations influence program growth. Macro variations may be further divided into segment variations and instruction variations, depending on whether a contiguous subsequence of instructions or only one instruction is subjected to change. Only segment variations will form the subject of this chapter. Other variations will be treated in a subsequent chapter
We will see that the performance of a variation operator strongly depends on its maximum (and average) step size on the symbolic program structure, on its influence on code growth, and on the proportion of effective and neutral variations. Among other things, macro mutations with minimum step size will turn out to be most effective provided that a change of the structurally effective code can be guaranteed. We will also investigate how linear genetic programs can be manipulated more efficiently through respecting their functional structure.
Chapter 6. Linear Genetic Operators II — Instruction Mutations
Abstract
The experimental results from Chapter 5 have confirmed two important assumptions. First, when using recombination best results were obtained with a relatively low limit to segment length. Second, segment recombination has not been found to be more powerful than segment mutation. Both aspects motivate the use of mutations that affect one instruction only. The following considerations try to point out why linear programs in particular are likely to be served better by using only minimal mutation operations.
Chapter 7. Analysis of Control Parameters
Abstract
In the previous two chapters parameters have been analyzed that are closely related to one of the variation operators. In this chapter we analyze influences of more general system parameters that are relevant in linear genetic programming. In particular, the number of registers, the number of constants, the population size, and the maximum program length will be studied. Additionally, we compare different initialization techniques for linear genetic programs. Test problems are again the approximation problem mexican hat and the classification problem spiral introduced in Section 5.8.1.
Chapter 8. A Comparison with Tree-Based Genetic Programming
Abstract
In this chapter a comparison between the linear representation and the traditional tree representation of genetic programming is performed. The comparison examines prediction performance and model size based on two collections of benchmark problems that have been composed of artificial test problems and of real-world applications from bioinformatics, respectively. Both linear GP and tree-based GP use crossover for macro variations. Additionally, we apply the linear GP variant from Section 6.2.3 which works exclusively with (effective) instruction mutations to compare its performance for a larger number of problems. But first of all, we introduce tree-based GP in some further detail.

Advanced Techniques and Phenomena

Frontmatter
Chapter 9. Control of Diversity and Variation Step Size
Abstract
In this chapter we will investigate structural and semantic distance metrics for linear genetic programs. A causal connection between changes of genotypes and of phenotypes form a necessary condition for being able to control differences between genetic programs. The two objectives of this chapter are to show [1] how distance information between individuals can be used to control structural diversity of individuals and [2] how variation distance on effective code can be controlled probabilistically with different linear genetic operators.
Chapter 10. Code Growth and Neutral Variations
Abstract
This chapter brings together theories about neutral variations and code growth in genetic programming. We argue that neutral variations are important for the growth of code in GP runs. Other existing theories about code growth are verified for linear GP and are partly reevaluated from a different perspective.
In evolutionary computation neutral variations are argued to explore flat regions of the fitness landscape while non-neutral variations exploit regions with gradient information. We investigate the influence of different variation effects on growth of code and the prediction quality for different kinds of variation operators. It is well known that a high proportion of neutral code (introns) in genetic programs may increase the probability for variations to be neutral. But which type of variation creates the introns in the first place? For linear GP with minimum mutation, step size results show that neutral variations almost exclusively represent a cause for (rather than a result of) the emergence and growth of intron code. This part of the chapter is a continuation of our earlier studies [25].
We also examine different linear genetic operators regarding an implicit length bias. In contrast to an explicit bias, implicit bias does not result from the dynamics of the operator alone, but requires the existence of a fitness pressure.
We will close this chapter with a discussion about how to control code growth in linear GP. Different approaches are reviewed including variation-based methods and selection-based methods. Both may be applied specifically to effective code and/or to noneffective code.
Chapter 11. Evolution of Program Teams
Abstract
This chapter applies linear GP to the evolution of cooperative teams to several prediction problems. Different linear methods for combining outputs of the team programs are compared. These include hybrid approaches where [1] a neural network is used to optimize the weights of programs in a team for a common decision and [2] a real-numbered vector (the representation of evolution strategies) of weights is evolved in tandem with each team. The cooperative team approach results in an improved training and generalization performance compared to the standard GP method.
Epilogue
Abstract
What have we achieved and where do we go from here?
This book has discussed linear genetic programming, a variant of GP that employs sequences of imperative instructions as genetic material. We focused on properties and behaviors of the linear representation and argued that it has a number of advantages over tree-based GP. We also pointed out extended similarities with its biological counterpart — the DNA sequence — that have so far not been appreciated sufficiently in the literature. For example, the fact that non-coding subsequences can be kept in the code and manipulated silently, i.e., without being activated, is pretty much analogous to what happens in real genomes.
Thinking in terms of data flow and register connections allowed us to accelerate artificial evolution in linear GP by considerable factors, on the basis of both absolute runtime and number of generations. It also allowed us to considerably increase the efficiency of evolutionary search operators. This led to an induction of more powerful solutions while at the same time keeping solution size and code growth under control.
We used a variety of benchmark problems to produce empirical results that were able to shed light on fundamental questions in GP and linear GP, in particular. We ventured to explain non-trivial phenomena and to develop powerful techniques, all with an eye to their implications on the practice of GP. As an empirical text, the book is heavily based on experimental data. These were generated through thousands of GP runs, comprising millions of program evaluations done with billions of CPU cycles.
Naturally, this book cannot have the last word on linear GP. If it has helped to promote the popularity of the approach and has opened avenues for further inquiry, it has served a good purpose. We sincerely hope to have conveyed the message and convinced the reader to give this method a try.
Backmatter
Metadaten
Titel
Linear Genetic Programming
verfasst von
Markus F. Brameier
Wolfgang Banzhaf
Copyright-Jahr
2007
Verlag
Springer US
Electronic ISBN
978-0-387-31030-5
Print ISBN
978-0-387-31029-9
DOI
https://doi.org/10.1007/978-0-387-31030-5