Skip to main content

2015 | Buch

Genetic Programming Theory and Practice XII

insite
SUCHEN

Über dieses Buch

These contributions, written by the foremost international researchers and practitioners of Genetic Programming (GP), explore the synergy between theoretical and empirical results on real-world problems, producing a comprehensive view of the state of the art in GP. Topics in this volume include: gene expression regulation, novel genetic models for glaucoma, inheritable epigenetics, combinators in genetic programming, sequential symbolic regression, system dynamics, sliding window symbolic regression, large feature problems, alignment in the error space, HUMIE winners, Boolean multiplexer function, and highly distributed genetic programming systems. Application areas include chemical process control, circuit design, financial data mining and bioinformatics. Readers will discover large-scale, real-world applications of GP to a variety of problem domains via in-depth presentations of the latest and most significant results.

Inhaltsverzeichnis

Frontmatter
1. Application of Machine-Learning Methods to Understand Gene Expression Regulation
Abstract
With the development and application of high-throughput technologies, an enormous amount of biological data has been produced in the past few years. These large-scale datasets make it possible and necessary to implement machine learning techniques for mining biological insights. In this chapter, we describe several examples to show how machine learning approaches are used to elucidate the mechanism of transcriptional regulation mediated by transcription factors and histone modifications. We demonstrate that machine learning provides powerful tools to quantitatively relate gene expression with transcription factor binding and histone modifications, to identify novel regulatory DNA elements in the genomes, and to predict gene functions. We also discuss the advantages and limitations of genetic programming in analyzing and processing biological data.
Chao Cheng, William P. Worzel
2. Identification of Novel Genetic Models of Glaucoma Using the “EMERGENT” Genetic Programming-Based Artificial Intelligence System
Abstract
The genetic basis for primary open-angle glaucoma (POAG) is not yet understood but is likely the result of many interacting genetic variants that influence risk in the context of our local ecology. The complexity of the genotype to phenotype mapping relationship for common diseases like POAG necessitates analytical approaches that move beyond parametric statistical methods such as logistic regression that assume a particular mathematical model. This is particularly important in the era of big data where it is routine to collect and analyze data sets with hundreds of thousands of measured genetic variants in thousands of human subjects. We introduce here the Exploratory Modeling for Extracting Relationships using Genetic and Evolutionary Navigation Techniques (EMERGENT) algorithm as an artificial intelligence approach to the genetic analysis of common human diseases. EMERGENT builds models of genetic variation from lists of mathematical functions using a form of genetic programming called computational evolution. A key feature of the system is the ability to utilize pre-processed expert knowledge giving it the ability to explore model space much as a human would. We describe this system in detail and then apply it to the genetic analysis of POAG in the Glaucoma Gene Environment Initiative (GLAUGEN) study that included approximately 1,272 subjects with the disease and 1057 healthy controls. A total of 657,366 single-nucleotide polymorphisms (SNPs) from across the human genome were measured in these subjects and available for analysis. Analysis using the EMERGENT framework revealed a best model consisting of six SNPs that map to at least six different genes. Two of these genes have previously been associated with POAG in several studies. The others represent new hypotheses about the genetic basis of POAG. All of the SNPs are involved in non-additive gene-gene interactions. Further, the six genes are all directly or indirectly related through biological interactions to the vascular endothelial growth factor (VEGF) gene that is an actively investigated drug target for POAG. This study demonstrates the routine application of an artificial intelligence-based system for the genetic analysis of complex human diseases.
Jason H. Moore, Casey S. Greene, Douglas P. Hill
3. Inheritable Epigenetics in Genetic Programming
Abstract
Classical genetic programming solves problems by applying the Darwinian concepts of selection, survival and reproduction to a population of computer programs. Here we extend the biological analogy to incorporate epigenetic regulation through both learning and evolution. We begin the chapter with a discussion of Darwinian, Lamarckian, and Baldwinian approaches to evolutionary computation and describe how recent findings in biology differ conceptually from the computational strategies that have been proposed. Using inheritable Lamarckian mechanisms as inspiration, we propose a system that allows for updating of individuals in the population during their lifetime while simultaneously preserving both genotypic and phenotypic traits during reproduction. The implementation is made simple through the use of syntax-free, developmental, linear genetic programming. The representation allows for arbitrarily-ordered genomes to be syntactically valid programs, thereby creating a genetic programming approach upon which quasi-uniform epigenetic updating and inheritance can easily be applied. Generational updates are made using an epigenetic hill climber (EHC), and the epigenetic properties of genes are inherited during crossover and mutation. The addition of epigenetics results in faster convergence, less bloat, and an improved ability to find exact solutions on a number of symbolic regression problems.
William La Cava, Lee Spector
4. SKGP: The Way of the Combinator
Abstract
Genetic Programming (GP) is a machine learning technique that evolves programs using natural selection and populations dynamics. Much of the functionality of GP depends on the representation of programs in the population and how to handle illegal or type incoherent expressions that arise from crossover and mutation within a population of programs. The SKGP is a GP system that uses graphs of combinators to represent functions and a strong type system to inform the crossover and mutation operations during evolution. This produces a powerful, flexible system that has many benefits over more conventional systems. This paper describes the implementation of this system, gives some examples of successful applications constructed using the SKGP and describes future directions that may offer a more powerful GP system capable of producing more complex programs.
William P. Worzel, Duncan MacLean
5. Sequential Symbolic Regression with Genetic Programming
Abstract
This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function, in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference from GP. More importantly, they show the potential of the proposed approach: an effective way of applying geometric semantic operators to combine different (partial) solutions, and at the same time, avoiding the exponential growth problem arising from the use of semantic operators.
Luiz Otávio V.B. Oliveira, Fernando E.B. Otero, Gisele L. Pappa, Julio Albinati
6. Sliding Window Symbolic Regression for Detecting Changes of System Dynamics
Abstract
In this chapter we discuss sliding window symbolic regression and its ability to systematically detect changing dynamics in data streams. The sliding window defines the portion of the data visible to the algorithm during training and is moved over the data. The window is moved regularly based on the generations or on the current selection pressure when using offspring selection. The sliding window technique has the effect that population has to adapt to the constantly changing environmental conditions.
In the empirical section of this chapter, we focus on detecting change points of analyzed systems’ dynamics. We show its effectiveness on various artificial data sets and discuss the results obtained when the sliding window moved in each generation and when it is moved only when a selection pressure threshold is reached. The results show that sliding window symbolic regression can be used to detect change points in systems dynamics for the considered data sets.
Stephan M. Winkler, Michael Affenzeller, Gabriel Kronberger, Michael Kommenda, Bogdan Burlacu, Stefan Wagner
7. Extremely Accurate Symbolic Regression for Large Feature Problems
Abstract
As symbolic regression (SR) has advanced into the early stages of commercial exploitation, the poor accuracy of SR, still plaguing even the most advanced commercial packages, has become an issue for early adopters. Users expect to have the correct formula returned, especially in cases with zero noise and only one basis function with minimally complex grammar depth.
At a minimum, users expect the response surface of the SR tool to be easily understood, so that the user can know apriori on what classes of problems to expect excellent, average, or poor accuracy. Poor or unknown accuracy is a hinderence to greater academic and industrial acceptance of SR tools.
In a previous paper, we published a complex algorithm for modern symbolic regression which is extremely accurate for a large class of Symbolic Regression problems. The class of problems, on which SR is extremely accurate, was described in detail. This algorithm was extremely accurate, on a single processor, for up to 25 features (columns); and, a cloud configuration was used to extend the extreme accuracy up to as many as 100 features.
While the previous algorithm’s extreme accuracy for deep problems with a small number of features (25–100) was an impressive advance, there are many very important academic and industrial SR problems requiring from 100 to 1000 features.
In this chapter we extend the previous algorithm such that high accuracy is achieved on a wide range of problems, from 25 to 3000 features, using only a single processor. The class of problems, on which the enhanced algorithm is highly accurate, is described in detail. A definition of extreme accuracy is provided, and an informal argument of highly SR accuracy is outlined in this chapter.
The new enhanced algorithm is tested on a set of representative problems. The enhanced algorithm is shown to be robust, performing well even in the face of testing data containing up to 3000 features.
Michael F. Korns
8. How to Exploit Alignment in the Error Space: Two Different GP Models
Abstract
From a recent study, we know that if we are able to find two optimally aligned individuals, then we can reconstruct a globally optimal solution analytically for any regression problem. With this knowledge in mind, the objective of this chapter is to discuss two Genetic Programming (GP) models aimed at finding pairs of optimally aligned individuals. The first one of these models, already introduced in a previous publication, is ESAGP-1. The second model, discussed for the first time here, is called Pair Optimization GP (POGP). The main difference between these two models is that, while ESAGP-1 represents solutions in a traditional way, as single expressions (as in standard GP), in POGP individuals are pairs of expressions, that evolution should “push” towards the optimal alignment. The results we report for both these models are extremely encouraging. In particular, ESAGP-1 outperforms standard GP and geometric semantic GP on two complex real-life applications. At the same time, a preliminary set of results obtained on a set of symbolic regression benchmarks indicate that POGP, although rather new and still in need of improvement, is a very promising model, that deserves future developments and investigation.
Mauro Castelli, Leonardo Vanneschi, Sara Silva, Stefano Ruberto
9. Analyzing a Decade of Human-Competitive (“HUMIE”) Winners: What Can We Learn?
Abstract
Techniques in evolutionary computation (EC) have improved significantly over the years, leading to a substantial increase in the complexity of problems that can be solved by EC-based approaches. The HUMIES awards at the Genetic and Evolutionary Computation Conference are designed to recognize work that has not just solved some problem via techniques from evolutionary computation, but has produced a solution that is demonstrably human-competitive. In this chapter, we take a look across the winners of the past 10 years of the HUMIES awards, and analyze them to determine whether there are specific approaches that consistently show up in the HUMIE winners. We believe that this analysis may lead to interesting insights regarding prospects and strategies for producing further human competitive results.
Karthik Kannappan, Lee Spector, Moshe Sipper, Thomas Helmuth, William La Cava, Jake Wisdom, Omri Bernstein
10. Tackling the Boolean Multiplexer Function Using a Highly Distributed Genetic Programming System
Abstract
We demonstrate the effectiveness and power of the distributed GP platform, EC-Star, by comparing the computational power needed for solving an 11-multiplexer function, both on a single machine using a full-fitness evaluation method, as well as using distributed, age-layered, partial-fitness evaluations and a Pitts-style representation. We study the impact of age-layering and show how the system scales with distribution and tends towards smaller solutions. We also consider the effect of pool size and the choice of fitness function on convergence and total computation.
Hormoz Shahrzad, Babak Hodjat
Backmatter
Metadaten
Titel
Genetic Programming Theory and Practice XII
herausgegeben von
Rick Riolo
William P. Worzel
Mark Kotanchek
Copyright-Jahr
2015
Electronic ISBN
978-3-319-16030-6
Print ISBN
978-3-319-16029-0
DOI
https://doi.org/10.1007/978-3-319-16030-6