Test data generation with a Kalman filter-based adaptive genetic algorithm

doi:10.1016/j.jss.2014.11.035

Journal of Systems and Software

Volume 103, May 2015, Pages 343-352

https://doi.org/10.1016/j.jss.2014.11.035 Get rights and content

Highlights

•
We introduce a new approach for generating test data, based on adaptive optimisation.
•
The adaptive optimisation framework uses feedback from the optimisation process to adjust parameter values of a genetic algorithm during the search.
•
Our approach is compared to a state of the art test data optimisation algorithm that does not adapt parameter values online, and a conspicuous adaptive optimisation algorithm, outperforming both methods in a wide range of problems.

Abstract

Software testing is a crucial part of software development. It enables quality assurance, such as correctness, completeness and high reliability of the software systems. Current state-of-the-art software testing techniques employ search-based optimisation methods, such as genetic algorithms to handle the difficult and laborious task of test data generation. Despite their general applicability, genetic algorithms have to be parameterised in order to produce results of high quality. Different parameter values may be optimal for different problems and even different problem instances. In this work, we introduce a new approach for generating test data, based on adaptive optimisation. The adaptive optimisation framework uses feedback from the optimisation process to adjust parameter values of a genetic algorithm during the search. Our approach is compared to a state of the art test data optimisation algorithm that does not adapt parameter values online, and a representative adaptive optimisation algorithm, outperforming both methods in a wide range of problems.

Introduction

Search-based optimisation methods, such as genetic algorithms, ant colony optimisation and simulated annealing have successfully been applied in solving a wide range of software testing problems (Ahmed, Hermadi, 2008, Ferrer, Chicano, Alba, 2012, Harman, Lakhotia, McMinn, 2007, Harman, McMinn, 2010, Mairhofer, Feldt, Torkar, 2011, McMinn, 2004). The commonality of these approaches is that they aim at optimising the quality of a given test data set. Quality could be measured with respect to certain coverage criteria, e.g. branch, statement, and method coverage for source code, or metrics such as transition coverage for model-based testing. Beside these function-oriented approaches, a variety of methods focus on non-functional aspects of test data optimisation, such as searching for input situations that break memory or storage requirements, automatic detection of memory leaks, stress testing and security testing. A systematic literature review of search-based testing for non-functional system properties by Afzal et al. (2009) presents a comprehensive overview of these methods.

Genetic algorithms (GAs) are among the most frequently applied search-based optimisation method in test data generation (Aleti et al., 2013). GAs provide approximate results where exact approaches cannot be devised and optimal solutions are hard to find. They can be used by practitioners as a ‘black box’, without mastering advanced theory, whereas more sophisticated exact approaches are tailored to the specific mathematical structure of the problem at hand, and can become completely inapplicable if small aspects of the problem change. Another advantage of GAs compared to traditional methods, such as gradient descent-based algorithms is the stochastic element which helps them get out of local optima.

In recent years, it has been acknowledged that the performance of GAs depend on the numerous parameters, such as crossover rate, mutation rate and population size (Aleti, 2012, Eiben, Hinterding, Michalewicz, 2007, Eiben, Smit, 2011). Algorithm parameters determine how the search space is explored. They are responsible for the flexibility and efficiency of the search procedure, and when configured appropriately, produce high-quality results regardless of the search space difficulty. Because of the influence of parameter values on algorithm performance, poor algorithm parameterisation hinders the discovery of solutions with high quality (Eiben, Hinterding, Michalewicz, 2007, Eiben, Smit, 2011, Lobo, 2011). It has been acknowledged that parameters required for optimal algorithm performance are not only domain-specific, but also have to be chosen based on the problem instance at hand (Eiben and Schut, 2008).

The configuration of the algorithm is usually the responsibility of the practitioner, who often is not an expert in search-based algorithms. General guidelines for successful parameter values give different recommendations: e.g. De Jong (1995) recommends using 0.6 for crossover rate, Grefenstette (1986) suggests 0.95, whereas Schaffer et al. (1989) recommends any value in the range [0.75, 0.95]. These general guidelines conflict with each other, because each one reports the best parameter configuration of a GA for a particular class of problems. For a newly arising problem, the search space is typically unknown, hence it is hard to formulate any general principles about algorithm parameter configurations for universal applicability (Mitchell, 1996). In these circumstances, algorithm parameterisation presents itself as an optimisation problem in its own right (Eiben and Smit, 2011). To save time, practitioners tend to configure the GA based on a few preliminary runs, where different parameter values are tried in an attempt to fine-tune the algorithm to a particular problem (Eiben and Smit, 2011). Depending on the number of parameters and their possible values, investigative trials for parameter optimisations can themselves be attempts to solve a combinatorially complex problem. Even if parameters are tuned to their optimal values, there is no guarantee that the performance of the algorithm will be optimal throughout the optimisation process. In fact, it has been empirically and theoretically demonstrated that different parameter settings are required for different stages of the optimisation process (Bäck, 1992, Bäck, Eiben, van der Vaart, 2000, Cervantes, Stephens, 2009, Hesser, Manner, 1991, Smith, Fogarty, 1996, Stephens, Olmedo, Vargas, Waelbroeck, 1998, Thierens, 2002). This means that tuning parameter values before the optimisation process does not guarantee optimal GA performance. This problem has been tackled by many researchers in the optimisation community, who propose to set GA parameter values during the run using feedback from the search, known as adaptive parameter control (Corne, Oates, Kell, 2002, Davis, 1989, Eiben, Hinterding, Michalewicz, 2007, Fialho, Costa, Schoenauer, Sebag, 2010, Goldberg, 1989, Hong, Wang, Chen, 2000, Igel, Kreutz, 2003, Julstrom, 1995, Lobo, Goldberg, 1997, Schlierkamp-Voosen, Mühlenbein, 1994, Tuson, Ross, 1998). The intuitive motivation comes from the way the optimisation process unfolds from a more diffused global search, requiring parameter values responsible for the exploration of unseen areas of the search space, to a more focused local optimisation process, requiring parameter values which help with the convergence of the algorithm.

In essence, adaptive parameter control methods monitor the behaviour of a GA run, and use the performance of the algorithm to adapt parameter values, such that successful parameter values are propagated to the next generation. The majority of adaptive parameter control methods found in the literature belong to the class of probability matching techniques (Corne, Oates, Kell, 2002, Davis, 1989, Hong, Wang, Chen, 2000, Igel, Kreutz, 2003, Julstrom, 1995), in which the probability of applying a parameter value is proportional to the quality of that parameter value. The earliest approaches (Hong, Wang, Chen, 2000, Igel, Kreutz, 2003) lacked the exploration of parameter values if the feedback from the algorithm was not in their favour in the initial phases of the process. In later work, a minimum selection probability was introduced (Igel and Kreutz, 2003), to ensure that under-performing parameter values did not disappear during the optimisation, in case they were beneficial in the later stages of the search. One of the more recent and mature examples of probability matching is the work by Igel and Kreutz (2003), where the selection probability for each parameter value incorporates a minimum probability of selection.

These adaptive parameter control methods assume that the improvement in the quality of the solutions is directly related to the use of certain parameter values. However, GAs are stochastic systems that may produce different results for the same parameter values (DeJong, 2007). Ideally, the randomness induced by the stochastic behaviour of GAs should be taken care of by the parameter control strategy. To address this issue, we introduce an adaptive genetic algorithm, which adjusts algorithm parameters during the optimisation process using a Kalman filter. The Kalman filter-based genetic algorithm (KFGA) redefines parameter values repeatedly based on a learning process that receives its feedback from the optimisation algorithm. The Kalman filter is employed to reduce the noise factor from the stochastic behaviour of GAs when identifying the effect of parameter values on the performance of the algorithm. KFGA outperforms a GA with pre-tuned parameter values and the probability matching technique by Igel and Kreutz (2003) when solving the problem of test data generation. The proposed method consistently finds test suites with better branch coverage than the state-of-the-art, as shown by the experiments performed on a set of open-source problem instances.

Section snippets

Test data generation

Software testing is used to estimate the quality of a software system. Quality may refer to functional attributes, which describe how the system should behave, or non-functional requirements as defined in ‘IEEE International Standard 1471 2000 – Systems and software engineering – Recommended practice for architectural description of software-intensive systems’ (ISO/IEC, 2000), such as reliability, efficiency, portability, maintainability, compatibility and usability.

A software test consists of

Optimisation model for test data generation

GAs evolve a population of solutions using the genetic operators: mutation, crossover, selection, and replacement procedure. The aim is to optimise some quality function(s) by searching the space of potential solutions. GAs start the optimisation process by randomly generating a set of solutions. In certain cases, this initial set of solutions is created by applying heuristic rules (e.g. greedy algorithm), or obtained using expert knowledge. Usually, the genetic operators are applied based on

Adjusting parameters of genetic algorithms

Genetic algorithms (GAs) have successfully been applied to Whole Test Suite Generation (Fraser and Arcuri, 2013) and many other optimisation problems in Software Engineering (Aleti, Grunske, Meedeniya, Moser, 2009, Grunske, 2006, Räihä, 2010). The performance of GAs is affected by the configuration of its parameter values, such as crossover rate, mutation rate and population size (Aleti, Moser, 2013, Bäck, Schütz, 1996, Deb, Beyer, 2001, Farmani, Wright, 2003). Parameter values can be

Effect assessment of parameter values using the Kalman filter

Formally, given a set ${υ_{1}, \dots, υ_{n}}$ of n algorithm parameters, where each parameter $υ_{i}$ has ${υ_{i 1}, \dots, υ_{i m}}$ values that can be discrete numbers or intervals of continuous numbers, parameter control derives the optimal next value $υ_{i j}$ to optimise the effect of $υ_{i}$ on the performance of the algorithm (Aleti, 2012). As an example, when the mutation rate $υ_{1}$ is dynamically adjusted by considering 4 intervals (m = 4), $υ_{12}$ stands for a mutation rate sampled from the second interval. In the discrete case of

Experimental evaluation

The new adaptive control method is applied to adjust the mutation and crossover rate of a GA during the optimisation process.² The experiments were conducted using EvoSuite (Fraser and Arcuri, 2013), described in Section 3. The Kalman filter-based adaptive parameter control was compared against pre-tuned parameter settings and probability matching technique

Conclusion

We presented a Kalman filter-based genetic algorithm (KFGA) for solving the problem of test case generation in software testing. The method uses a Kalman filter to reduce the effect of the stochastic behaviour of GAs when estimating the appropriate parameter values to use in each iteration of the optimisation process. KFGA was implemented in EvoSuite, a framework for whole test suite generation. A set of experiments were conducted using open-source libraries and programs with different

Acknowledgements

We would like to thank Gordon Fraser and the rest of the team who developed EvoSuite for making the source code available. We wish to acknowledge Monash University for the use of their Nimrod software in this work. The Nimrod project has been funded by the Australian Research Council and a number of Australian Government agencies, and was initially developed by the Distributed Systems Technology. This research was supported under Australian Research Council’s Discovery Projects funding scheme

Aldeida Aleti works in the area of optimisation, with a main focus in combinatorial optimisation problems. She is a lecturer and a DECRA research fellow at the Faculty of Information Technology, Monash University, Australia. Her research interests include optimisation of combinatorial problems, analysis of fitness landscapes, adaptive optimisation, robust optimisation, software quality and design, and software testing. Her main work lies in the intersection of software engineering and

References (58)

AfzalW. et al.
A systematic review of search-based testing for non-functional system properties
Inf. Softw. Technol.
(2009)
AhmedM.A. et al.
GA-based multiple paths test data generator
Comput. OR
(2008)
EibenA.E. et al.
Parameter tuning for configuring and analyzing evolutionary algorithms
Swarm Evol. Comput.
(2011)
IgelC. et al.
Operator adaptation in evolutionary computation and its application to structure optimization of neural networks
Neurocomputing
(2003)
RäihäO.
A survey on search-based software design
Comput. Sci. Rev.
(2010)
AletiA.
An Adaptive Approach to Controlling Parameters of Evolutionary Algorithms
(2012)
AletiA. et al.
Software architecture optimization methods: a systematic literature review
IEEE Trans. Softw. Eng.
(2013)
AletiA. et al.
Let the ants deploy your software – an ACO based deployment optimisation strategy
ASE
(2009)
AletiA. et al.
Entropy-based adaptive range parameter control for evolutionary algorithms
Proceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference
(2013)
ArcuriA. et al.
Parameter tuning or default values? An empirical investigation in search-based software engineering
Empirical Softw. Eng.
(2013)

BäckT.

The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm

Parallel Problem Solving from Nature 2,PPSN-II

(1992)

BäckT. et al.

An empirical study on GAs without parameters

Parallel Problem Solving from Nature – PPSN VI (6th PPSN’2000)

(2000)

BäckT. et al.

Intelligent mutation rate control in canonical genetic algorithms

Lect. Notes Comput. Sci.

(1996)

CervantesJ. et al.

Limitations of existing mutation rate heuristics and how a rank GA overcomes them

IEEE Trans. Evol. Comput.

(2009)

CorneD. et al.

On fitness distributions and expected fitness gain of mutation rates in parallel evolutionary algorithms

Parallel Problem Solving from Nature – PPSN VII (7th PPSN’02)

(2002)

DavisL.

Adapting operator probabilities in genetic algorithms

Proceedings of the Third International Conference on Genetic Algorithms

(1989)

De JongK.A.

An Analysis of the Behavior of a Class of Genetic Adaptive Systems

(1995)

DebK. et al.

Self-adaptive genetic algorithms with simulated binary crossover

Evol. Comput.

(2001)

DeJongK.

Parameter setting in EAs: a 30 year perspective

EibenA.E. et al.

Parameter control in evolutionary algorithms

IEEE Trans. Evol. Comput.

(2007)

EibenA.E. et al.

New ways to calibrate evolutionary algorithms

FarmaniR. et al.

Self-adaptive fitness formulation for constrained optimization

IEEE Trans. Evol. Comput.

(2003)

FerrerJ. et al.

Evolutionary algorithms for the multi-objective test data generation problem

Softw. Pract. Exper.

(2012)

FialhoÁ. et al.

Analyzing bandit-based adaptive operator selection mechanisms

Ann. Math. Artif. Intell.

(2010)

FraserG. et al.

Whole test suite generation

IEEE Trans. Softw. Eng.

(2013)

GoldbergA. et al.

Applications of feasible path analysis to program testing

ACM SIGSOFT International Symposium in Software Testing and Analysis, Proceedings

(1994)

GoldbergD.E.

Genetic Algorithms in Search, Optimization and Machine Learning

(1989)

GongD. et al.

Generating test data for both path coverage and fault detection using genetic algorithms

Front. Comput. Sci.

(2013)

GrefenstetteJ.J.

Optimization of control parameters for genetic algorithms

IEEE Trans. Syst. Man Cybern.

(1986)

Cited by (32)

RGA: A lightweight and effective regeneration genetic algorithm for coverage-oriented software test data generation
2016, Information and Software Technology
Citation Excerpt :
And also some research results manifest that the recommended default parameter settings are usually more effective by statistical analysis [30]. Online adaptive technique seems attractive to have positive effects for improving the efficiency and effectiveness of GAs automatically and dynamically [31,32]. For example, a new adaptive optimization framework for generating test data is proposed in [32] to adjust parameter values of a genetic algorithm during the search by using feedback from the optimization process.
Genetic algorithm (GA) is an important intelligent method in the area of automatic software test data generation. However, existing GAs tend to get trapped in the local optimal solution, leading to population aging, which can significantly reduce the benefits of GA-based software testing and increase cost and effort. Although much attention has been focused on solving this problem by improving chromosome population, genetic operations, and genetic parameters adjustment, the applicability of most of the algorithms proposed is often narrow because of the complex operations involved and nondeterminism inherited from traditional GAs.
This paper proposes a new algorithm called the regenerate genetic algorithm (RGA), which is based on a new simple, stable, and easy-to-implement regeneration strategy that involves judging the population aging process.
We propose a new regeneration strategy—called Regenerate Genetic Algorithm (RGA)—that solves these problems easily and effectively. The proposed strategy defines population aging factors and process in order to determine the degree of population aging. Subsequently, when population aging has reached a certain limit, a population regeneration operation is triggered. In contrast to other improved methods, the proposed regeneration strategy for population aging easily achieves a stronger ability to jump out of the local optimal solution, thereby preventing population aging and effectively improving test coverage, without modifying any parameter of the original GA.
The proposed algorithm is experimentally evaluated by comparing it to the basic GA, Random Testing (RT) and several other methods in terms of both efficiency and effectiveness on the Siemens Suite of test programs and a more complex real program. The results obtained indicate that the proposed algorithm can effectively increase search efficiency, restrain population aging, increase test coverage, and reduce the number of test cases.
RGA has better optimization ability than the conventional algorithms, especially for large-scale and highly complex programs.
Stress Testing Control Loops in Cyber-physical Systems
2023, ACM Transactions on Software Engineering and Methodology
Testing Abstractions for Cyber-Physical Control Systems
2023, ACM Transactions on Software Engineering and Methodology
Automatic Test Data Generation based on the Prime Path Coverage Criterion: A Grouping-based GA Approach
2023, Research Square
Competitive Learning and Dynamic Genetic Algorithms for Robust Layout Designs Under Uncertainties
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Optimization Model of Path Selection for Software Testing and Its Evolution-based Solution
2022, Ruan Jian Xue Bao/Journal of Software

View all citing articles on Scopus

Lars Grunske is currently a Professor at the University of Stuttgart, Germany. He received his PhD degree in computer science from the University of Potsdam (Hasso-Plattner-Institute for Software Systems Engineering) in 2004. He was Junior Professor at the University of Kaiserslautern, Boeing Postdoctoral Research Fellow at the University of Queensland from 2004 to 2007 and lecturer at the Swinburne University of Technology, Australia from 2008 to 2011. He has active research interests in the areas modelling and verification of systems and software. His main focus is on automated analysis, mainly probabilistic and timed model checking and model-based dependability evaluation of complex software intensive systems.

View full text

Test data generation with a Kalman filter-based adaptive genetic algorithm

Highlights

Abstract

Introduction

Section snippets

Test data generation

Optimisation model for test data generation

Adjusting parameters of genetic algorithms

Effect assessment of parameter values using the Kalman filter

Experimental evaluation

Conclusion

Acknowledgements

Inf. Softw. Technol.

Comput. OR

Swarm Evol. Comput.

Neurocomputing

Comput. Sci. Rev.

An Adaptive Approach to Controlling Parameters of Evolutionary Algorithms

Software architecture optimization methods: a systematic literature review

IEEE Trans. Softw. Eng.

Let the ants deploy your software – an ACO based deployment optimisation strategy

ASE

Entropy-based adaptive range parameter control for evolutionary algorithms

Proceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference

Parameter tuning or default values? An empirical investigation in search-based software engineering

Empirical Softw. Eng.

The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm

Parallel Problem Solving from Nature 2,PPSN-II

An empirical study on GAs without parameters

Parallel Problem Solving from Nature – PPSN VI (6th PPSN’2000)

Intelligent mutation rate control in canonical genetic algorithms

Lect. Notes Comput. Sci.

Limitations of existing mutation rate heuristics and how a rank GA overcomes them

IEEE Trans. Evol. Comput.

On fitness distributions and expected fitness gain of mutation rates in parallel evolutionary algorithms

Parallel Problem Solving from Nature – PPSN VII (7th PPSN’02)

Adapting operator probabilities in genetic algorithms

Proceedings of the Third International Conference on Genetic Algorithms

An Analysis of the Behavior of a Class of Genetic Adaptive Systems

Self-adaptive genetic algorithms with simulated binary crossover

Evol. Comput.

Parameter setting in EAs: a 30 year perspective

Parameter control in evolutionary algorithms

IEEE Trans. Evol. Comput.

New ways to calibrate evolutionary algorithms

Self-adaptive fitness formulation for constrained optimization

IEEE Trans. Evol. Comput.

Evolutionary algorithms for the multi-objective test data generation problem

Softw. Pract. Exper.

Analyzing bandit-based adaptive operator selection mechanisms

Ann. Math. Artif. Intell.

Whole test suite generation

IEEE Trans. Softw. Eng.

Applications of feasible path analysis to program testing

ACM SIGSOFT International Symposium in Software Testing and Analysis, Proceedings

Genetic Algorithms in Search, Optimization and Machine Learning

Generating test data for both path coverage and fault detection using genetic algorithms

Front. Comput. Sci.

Optimization of control parameters for genetic algorithms

IEEE Trans. Syst. Man Cybern.