Test data generation with a Kalman filter-based adaptive genetic algorithm
Introduction
Search-based optimisation methods, such as genetic algorithms, ant colony optimisation and simulated annealing have successfully been applied in solving a wide range of software testing problems (Ahmed, Hermadi, 2008, Ferrer, Chicano, Alba, 2012, Harman, Lakhotia, McMinn, 2007, Harman, McMinn, 2010, Mairhofer, Feldt, Torkar, 2011, McMinn, 2004). The commonality of these approaches is that they aim at optimising the quality of a given test data set. Quality could be measured with respect to certain coverage criteria, e.g. branch, statement, and method coverage for source code, or metrics such as transition coverage for model-based testing. Beside these function-oriented approaches, a variety of methods focus on non-functional aspects of test data optimisation, such as searching for input situations that break memory or storage requirements, automatic detection of memory leaks, stress testing and security testing. A systematic literature review of search-based testing for non-functional system properties by Afzal et al. (2009) presents a comprehensive overview of these methods.
Genetic algorithms (GAs) are among the most frequently applied search-based optimisation method in test data generation (Aleti et al., 2013). GAs provide approximate results where exact approaches cannot be devised and optimal solutions are hard to find. They can be used by practitioners as a ‘black box’, without mastering advanced theory, whereas more sophisticated exact approaches are tailored to the specific mathematical structure of the problem at hand, and can become completely inapplicable if small aspects of the problem change. Another advantage of GAs compared to traditional methods, such as gradient descent-based algorithms is the stochastic element which helps them get out of local optima.
In recent years, it has been acknowledged that the performance of GAs depend on the numerous parameters, such as crossover rate, mutation rate and population size (Aleti, 2012, Eiben, Hinterding, Michalewicz, 2007, Eiben, Smit, 2011). Algorithm parameters determine how the search space is explored. They are responsible for the flexibility and efficiency of the search procedure, and when configured appropriately, produce high-quality results regardless of the search space difficulty. Because of the influence of parameter values on algorithm performance, poor algorithm parameterisation hinders the discovery of solutions with high quality (Eiben, Hinterding, Michalewicz, 2007, Eiben, Smit, 2011, Lobo, 2011). It has been acknowledged that parameters required for optimal algorithm performance are not only domain-specific, but also have to be chosen based on the problem instance at hand (Eiben and Schut, 2008).
The configuration of the algorithm is usually the responsibility of the practitioner, who often is not an expert in search-based algorithms. General guidelines for successful parameter values give different recommendations: e.g. De Jong (1995) recommends using 0.6 for crossover rate, Grefenstette (1986) suggests 0.95, whereas Schaffer et al. (1989) recommends any value in the range [0.75, 0.95]. These general guidelines conflict with each other, because each one reports the best parameter configuration of a GA for a particular class of problems. For a newly arising problem, the search space is typically unknown, hence it is hard to formulate any general principles about algorithm parameter configurations for universal applicability (Mitchell, 1996). In these circumstances, algorithm parameterisation presents itself as an optimisation problem in its own right (Eiben and Smit, 2011). To save time, practitioners tend to configure the GA based on a few preliminary runs, where different parameter values are tried in an attempt to fine-tune the algorithm to a particular problem (Eiben and Smit, 2011). Depending on the number of parameters and their possible values, investigative trials for parameter optimisations can themselves be attempts to solve a combinatorially complex problem. Even if parameters are tuned to their optimal values, there is no guarantee that the performance of the algorithm will be optimal throughout the optimisation process. In fact, it has been empirically and theoretically demonstrated that different parameter settings are required for different stages of the optimisation process (Bäck, 1992, Bäck, Eiben, van der Vaart, 2000, Cervantes, Stephens, 2009, Hesser, Manner, 1991, Smith, Fogarty, 1996, Stephens, Olmedo, Vargas, Waelbroeck, 1998, Thierens, 2002). This means that tuning parameter values before the optimisation process does not guarantee optimal GA performance. This problem has been tackled by many researchers in the optimisation community, who propose to set GA parameter values during the run using feedback from the search, known as adaptive parameter control (Corne, Oates, Kell, 2002, Davis, 1989, Eiben, Hinterding, Michalewicz, 2007, Fialho, Costa, Schoenauer, Sebag, 2010, Goldberg, 1989, Hong, Wang, Chen, 2000, Igel, Kreutz, 2003, Julstrom, 1995, Lobo, Goldberg, 1997, Schlierkamp-Voosen, Mühlenbein, 1994, Tuson, Ross, 1998). The intuitive motivation comes from the way the optimisation process unfolds from a more diffused global search, requiring parameter values responsible for the exploration of unseen areas of the search space, to a more focused local optimisation process, requiring parameter values which help with the convergence of the algorithm.
In essence, adaptive parameter control methods monitor the behaviour of a GA run, and use the performance of the algorithm to adapt parameter values, such that successful parameter values are propagated to the next generation. The majority of adaptive parameter control methods found in the literature belong to the class of probability matching techniques (Corne, Oates, Kell, 2002, Davis, 1989, Hong, Wang, Chen, 2000, Igel, Kreutz, 2003, Julstrom, 1995), in which the probability of applying a parameter value is proportional to the quality of that parameter value. The earliest approaches (Hong, Wang, Chen, 2000, Igel, Kreutz, 2003) lacked the exploration of parameter values if the feedback from the algorithm was not in their favour in the initial phases of the process. In later work, a minimum selection probability was introduced (Igel and Kreutz, 2003), to ensure that under-performing parameter values did not disappear during the optimisation, in case they were beneficial in the later stages of the search. One of the more recent and mature examples of probability matching is the work by Igel and Kreutz (2003), where the selection probability for each parameter value incorporates a minimum probability of selection.
These adaptive parameter control methods assume that the improvement in the quality of the solutions is directly related to the use of certain parameter values. However, GAs are stochastic systems that may produce different results for the same parameter values (DeJong, 2007). Ideally, the randomness induced by the stochastic behaviour of GAs should be taken care of by the parameter control strategy. To address this issue, we introduce an adaptive genetic algorithm, which adjusts algorithm parameters during the optimisation process using a Kalman filter. The Kalman filter-based genetic algorithm (KFGA) redefines parameter values repeatedly based on a learning process that receives its feedback from the optimisation algorithm. The Kalman filter is employed to reduce the noise factor from the stochastic behaviour of GAs when identifying the effect of parameter values on the performance of the algorithm. KFGA outperforms a GA with pre-tuned parameter values and the probability matching technique by Igel and Kreutz (2003) when solving the problem of test data generation. The proposed method consistently finds test suites with better branch coverage than the state-of-the-art, as shown by the experiments performed on a set of open-source problem instances.
Section snippets
Test data generation
Software testing is used to estimate the quality of a software system. Quality may refer to functional attributes, which describe how the system should behave, or non-functional requirements as defined in ‘IEEE International Standard 1471 2000 – Systems and software engineering – Recommended practice for architectural description of software-intensive systems’ (ISO/IEC, 2000), such as reliability, efficiency, portability, maintainability, compatibility and usability.
A software test consists of
Optimisation model for test data generation
GAs evolve a population of solutions using the genetic operators: mutation, crossover, selection, and replacement procedure. The aim is to optimise some quality function(s) by searching the space of potential solutions. GAs start the optimisation process by randomly generating a set of solutions. In certain cases, this initial set of solutions is created by applying heuristic rules (e.g. greedy algorithm), or obtained using expert knowledge. Usually, the genetic operators are applied based on
Adjusting parameters of genetic algorithms
Genetic algorithms (GAs) have successfully been applied to Whole Test Suite Generation (Fraser and Arcuri, 2013) and many other optimisation problems in Software Engineering (Aleti, Grunske, Meedeniya, Moser, 2009, Grunske, 2006, Räihä, 2010). The performance of GAs is affected by the configuration of its parameter values, such as crossover rate, mutation rate and population size (Aleti, Moser, 2013, Bäck, Schütz, 1996, Deb, Beyer, 2001, Farmani, Wright, 2003). Parameter values can be
Effect assessment of parameter values using the Kalman filter
Formally, given a set of n algorithm parameters, where each parameter has values that can be discrete numbers or intervals of continuous numbers, parameter control derives the optimal next value to optimise the effect of on the performance of the algorithm (Aleti, 2012). As an example, when the mutation rate is dynamically adjusted by considering 4 intervals (m = 4), stands for a mutation rate sampled from the second interval. In the discrete case of
Experimental evaluation
The new adaptive control method is applied to adjust the mutation and crossover rate of a GA during the optimisation process.2 The experiments were conducted using EvoSuite (Fraser and Arcuri, 2013), described in Section 3. The Kalman filter-based adaptive parameter control was compared against pre-tuned parameter settings and probability matching technique
Conclusion
We presented a Kalman filter-based genetic algorithm (KFGA) for solving the problem of test case generation in software testing. The method uses a Kalman filter to reduce the effect of the stochastic behaviour of GAs when estimating the appropriate parameter values to use in each iteration of the optimisation process. KFGA was implemented in EvoSuite, a framework for whole test suite generation. A set of experiments were conducted using open-source libraries and programs with different
Acknowledgements
We would like to thank Gordon Fraser and the rest of the team who developed EvoSuite for making the source code available. We wish to acknowledge Monash University for the use of their Nimrod software in this work. The Nimrod project has been funded by the Australian Research Council and a number of Australian Government agencies, and was initially developed by the Distributed Systems Technology. This research was supported under Australian Research Council’s Discovery Projects funding scheme
Aldeida Aleti works in the area of optimisation, with a main focus in combinatorial optimisation problems. She is a lecturer and a DECRA research fellow at the Faculty of Information Technology, Monash University, Australia. Her research interests include optimisation of combinatorial problems, analysis of fitness landscapes, adaptive optimisation, robust optimisation, software quality and design, and software testing. Her main work lies in the intersection of software engineering and
References (58)
- et al.
A systematic review of search-based testing for non-functional system properties
Inf. Softw. Technol.
(2009) - et al.
GA-based multiple paths test data generator
Comput. OR
(2008) - et al.
Parameter tuning for configuring and analyzing evolutionary algorithms
Swarm Evol. Comput.
(2011) - et al.
Operator adaptation in evolutionary computation and its application to structure optimization of neural networks
Neurocomputing
(2003) A survey on search-based software design
Comput. Sci. Rev.
(2010)An Adaptive Approach to Controlling Parameters of Evolutionary Algorithms
(2012)- et al.
Software architecture optimization methods: a systematic literature review
IEEE Trans. Softw. Eng.
(2013) - et al.
Let the ants deploy your software – an ACO based deployment optimisation strategy
ASE
(2009) - et al.
Entropy-based adaptive range parameter control for evolutionary algorithms
Proceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Computation Conference
(2013) - et al.
Parameter tuning or default values? An empirical investigation in search-based software engineering
Empirical Softw. Eng.
(2013)
The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm
Parallel Problem Solving from Nature 2,PPSN-II
An empirical study on GAs without parameters
Parallel Problem Solving from Nature – PPSN VI (6th PPSN’2000)
Intelligent mutation rate control in canonical genetic algorithms
Lect. Notes Comput. Sci.
Limitations of existing mutation rate heuristics and how a rank GA overcomes them
IEEE Trans. Evol. Comput.
On fitness distributions and expected fitness gain of mutation rates in parallel evolutionary algorithms
Parallel Problem Solving from Nature – PPSN VII (7th PPSN’02)
Adapting operator probabilities in genetic algorithms
Proceedings of the Third International Conference on Genetic Algorithms
An Analysis of the Behavior of a Class of Genetic Adaptive Systems
Self-adaptive genetic algorithms with simulated binary crossover
Evol. Comput.
Parameter setting in EAs: a 30 year perspective
Parameter control in evolutionary algorithms
IEEE Trans. Evol. Comput.
New ways to calibrate evolutionary algorithms
Self-adaptive fitness formulation for constrained optimization
IEEE Trans. Evol. Comput.
Evolutionary algorithms for the multi-objective test data generation problem
Softw. Pract. Exper.
Analyzing bandit-based adaptive operator selection mechanisms
Ann. Math. Artif. Intell.
Whole test suite generation
IEEE Trans. Softw. Eng.
Applications of feasible path analysis to program testing
ACM SIGSOFT International Symposium in Software Testing and Analysis, Proceedings
Genetic Algorithms in Search, Optimization and Machine Learning
Generating test data for both path coverage and fault detection using genetic algorithms
Front. Comput. Sci.
Optimization of control parameters for genetic algorithms
IEEE Trans. Syst. Man Cybern.
Cited by (32)
RGA: A lightweight and effective regeneration genetic algorithm for coverage-oriented software test data generation
2016, Information and Software TechnologyCitation Excerpt :And also some research results manifest that the recommended default parameter settings are usually more effective by statistical analysis [30]. Online adaptive technique seems attractive to have positive effects for improving the efficiency and effectiveness of GAs automatically and dynamically [31,32]. For example, a new adaptive optimization framework for generating test data is proposed in [32] to adjust parameter values of a genetic algorithm during the search by using feedback from the optimization process.
Stress Testing Control Loops in Cyber-physical Systems
2023, ACM Transactions on Software Engineering and MethodologyTesting Abstractions for Cyber-Physical Control Systems
2023, ACM Transactions on Software Engineering and MethodologyCompetitive Learning and Dynamic Genetic Algorithms for Robust Layout Designs Under Uncertainties
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Optimization Model of Path Selection for Software Testing and Its Evolution-based Solution
2022, Ruan Jian Xue Bao/Journal of Software
Aldeida Aleti works in the area of optimisation, with a main focus in combinatorial optimisation problems. She is a lecturer and a DECRA research fellow at the Faculty of Information Technology, Monash University, Australia. Her research interests include optimisation of combinatorial problems, analysis of fitness landscapes, adaptive optimisation, robust optimisation, software quality and design, and software testing. Her main work lies in the intersection of software engineering and artificial intelligence. For more details, please check her research profile at users.monash.edu.au/~aldeidaa.
Lars Grunske is currently a Professor at the University of Stuttgart, Germany. He received his PhD degree in computer science from the University of Potsdam (Hasso-Plattner-Institute for Software Systems Engineering) in 2004. He was Junior Professor at the University of Kaiserslautern, Boeing Postdoctoral Research Fellow at the University of Queensland from 2004 to 2007 and lecturer at the Swinburne University of Technology, Australia from 2008 to 2011. He has active research interests in the areas modelling and verification of systems and software. His main focus is on automated analysis, mainly probabilistic and timed model checking and model-based dependability evaluation of complex software intensive systems.