Skip to main content
main-content
Top

About this book

This proposed text appears to be a good introduction to evolutionary computation for use in applied statistics research. The authors draw from a vast base of knowledge about the current literature in both the design of evolutionary algorithms and statistical techniques. Modern statistical research is on the threshold of solving increasingly complex problems in high dimensions, and the generalization of its methodology to parameters whose estimators do not follow mathematically simple distributions is underway. Many of these challenges involve optimizing functions for which analytic solutions are infeasible. Evolutionary algorithms represent a powerful and easily understood means of approximating the optimum value in a variety of settings. The proposed text seeks to guide readers through the crucial issues of optimization problems in statistical settings and the implementation of tailored methods (including both stand-alone evolutionary algorithms and hybrid crosses of these procedures with standard statistical algorithms like Metropolis-Hastings) in a variety of applications. This book would serve as an excellent reference work for statistical researchers at an advanced graduate level or beyond, particularly those with a strong background in computer science.

Table of Contents

Frontmatter

Chapter 1. Introduction

Abstract
A new line of research has started since the fifties originated by a new awareness in scientific community that natural and biological mechanisms could provide not only the traditional object of study but suggest new disciplines theories and methods by themselves. We may cite the Rosenblatt’s perceptron, one of the first examples of a learning machine, and the Box’s evolutionary approach to industrial productivity, an early attempt to modeling a process using the concepts of evolutionary behavior of living creatures. This short introduction aims at highlighting the basic ideas that led to an unprecedented enlargement of the guidelines for developing theoretical and applied research in a large body of scientific disciplines. An outline of the present book as far as the impact of the new research technologies on statistics and strictly related subjects is concerned is given with special emphasis on methods inspired by natural biological evolution.
Roberto Baragona, Francesco Battaglia, Irene Poli

Chapter 2. Evolutionary Computation

Abstract
The evolutionary computation methods are introduced by discussing their origin inside the artificial intelligence framework, and the contributions of Darwin’s theory of natural evolution and Genetics. We attempt to highlight the main features of an evolutionary computation method, and describe briefly some of them: evolutionary programming, evolution strategies, genetic algorithm, estimation of distribution algorithms, differential evolution. The remainder of the chapter is devoted to a closer illustration of genetic algorithms and more recent advancements, to the problem of convergence and to the practical use of them. A final section on the relationship between genetic algorithms and random sampling techniques is included.
Roberto Baragona, Francesco Battaglia, Irene Poli

Chapter 3. Evolving Regression Models

Abstract
Regression models are well established tools in statistical analysis which date back early to the eighteenth century. Nonetheless, problems involved in their implementation and application in a wide number of fields are still the object of active research. Preliminary to the regression model estimation there is an identification step which has to be performed for selecting the variables of interest, detecting the relationships of interest among them, distinguishing dependent and independent variables. On the other hand, generalized regression models often have nonlinear and non convex log-likelihood, therefore maximum likelihood estimation requires optimization of complicated functions. In this chapter evolutionary computation methods are presented that have been developed to either support or surrogate analytic tools if the problem size and complexity limit their efficiency.
Roberto Baragona, Francesco Battaglia, Irene Poli

Chapter 4. Time Series Linear and Nonlinear Models

Abstract
Modeling time series includes the three steps of identification, parameter estimation and diagnostic checking. As far as linear models are concerned model building has been extensively studied and well established both theory and practice allow the user to proceed along reliable guidelines. Ergodicity, stationarity and Gaussianity properties are generally assumed to ensure that the structure of a stochastic process may be estimated safely enough from an observed time series. We will limit in this chapter to discrete parameter stochastic processes, that is a collection of random variables indexed by integers that are given the meaning of time. Such stochastic process may be called time series though we shall denote a finite single realization of it as a time series as well. Real time series data are often found that do not conform to our hypotheses. Then we have to model non stationary and non Gaussian time series that require special assumptions and procedures to ensure that identification and estimation may be performed, and special statistics for diagnostic checking. Several devices are available that allow such time series to be handled and remain within the domain of linear models. However there are features that prevent us from building linear models able to explain and predict the behavior of a time series correctly. Examples are asymmetric limit cycles, jump phenomena and dependence between amplitude and frequency that cannot be modeled accurately by linear models. Nonlinear models may account for time series irregular behavior by allowing the parameters of the model to vary with time. This characteristic feature means by itself that the stochastic process is not stationary and cannot be reduced to stationarity by any appropriate transform. As a consequence, the observed time series data have to be used to fit a model with varying parameters. These latter may influence either the mean or the variance of the time series and according to their specification different classes of nonlinear models may be characterized. Linear models are defined by a single structure while nonlinear models may be specified by a multiplicity of different structures. So classes of nonlinear models have been introduced each of which may be applied successfully to real time series data sets that are commonly observed in well delimited application fields. Contributions of evolutionary computing techniques will be reviewed in this chapter for linear models, as regards identification stage and subset models, and to a rather larger extent for some classes of nonlinear models, concerned with identification and parameter estimation. Beginning with the popular autoregressive moving-average linear models, we shall outline the relevant applications of evolutionary computing to the domains of threshold models, including piecewise linear, exponential and autoregressive conditional heteroscedastic structures, bilinear models and artificial neural networks.
Roberto Baragona, Francesco Battaglia, Irene Poli

Chapter 5. Design of Experiments

Abstract
In several research areas, such as biology, chemistry, or material science, experimentation is complex, very expensive and time consuming, so an efficient plan of experimentation is essential to achieve good results and avoid unnecessary waste of resources. An accurate statistical design of the experiments is important also to tackle the uncertainty in the experimental results derived from systematic and random errors that frequently obscure the effects under investigation. In this chapter we will first present the essentials of designing experiments and then describe the evolutionary approach to design in high dimensional settings.
Roberto Baragona, Francesco Battaglia, Irene Poli

Chapter 6. Outliers

Abstract
Outliers, that is outlying observations, sometimes also known as aberrant observations, are being often studied in the literature closely related to missing data treatment and validation procedures. An interesting issue is concerned with the influence of outliers on the estimates of moments of the data distribution or indexes relevant for further data analysis and model building. The approach of robust statistics is oriented in such direction to ensure that good reliable estimates may be obtained even in the presence of gross errors or unusual measures originated by unexpected events. The approach we shall cope with here aims instead at discovering such outliers and either setting them apart or correcting them according to some properly fitted data model. The very complexity of such a problem prompted soon for employing general heuristic methods for outlier detection and size estimation for independent sample data. Owing to the dependence structure outlier analysis in time series proved to be much more difficult as observations have to be checked not only as regards their distance from the mean but as far as relationships among neighboring observations and correlation function are concerned. For this reason we shall give a brief account of evolutionary computing applications within independent data analysis framework while more detailed discussion will be devoted to outliers and influential observations in time series analysis.
Roberto Baragona, Francesco Battaglia, Irene Poli

Chapter 7. Cluster Analysis

Abstract
Meta heuristic methods have been often applied to partitioning problems. On one hand this proceeded from the fact that heuristic methods have always been applied to such problems since their earliest formulations. On the other hand, meta heuristic found a promising field of application because cluster analysis has two characteristic features that make it specially suitable for designing algorithms in this framework. The solution space is large, and grows fast with the problem dimension. The solutions form a discrete set that cannot be explored by the gradient – based methods or whatever method that is grounded on the exploitation of the properties of analytic functions. A large lot of algorithms based on genetic evolutionary computation have been proposed and have been found excellent solvers of partitioning problems. In this chapter we shall recall the usual classification of cluster algorithms and explain which class may be successfully handled by genetic evolutionary computation techniques. While most chapter is devoted to crisp partition problem, the fuzzy partition problem will be discussed as well. Then, the theoretical framework offered by the mixture distributions will be examined related to evolutionary computing estimation techniques. Also we will account for the genetic algorithms-based approach to the CART technique for classification. Applications of genetic algorithms for clustering time series will be described. Finally, the multiobjective clustering and implementation in the genetic algorithms framework will be outlined. Some examples and comparisons will illustrate the evolutionary computing methods for cluster analysis.
Roberto Baragona, Francesco Battaglia, Irene Poli

Backmatter

Additional information

Premium Partner

    Image Credits