Empirical comparison of regression test selection algorithms

doi:10.1016/S0164-1212(00)00119-9

Journal of Systems and Software

Volume 57, Issue 1, 27 April 2001, Pages 79-90

https://doi.org/10.1016/S0164-1212(00)00119-9 Get rights and content

Abstract

In the maintenance phase, the regression test selection problem refers to selecting test cases from the initial suite of test cases used in the development phase. In this paper, we empirically compare five representative regression test selection algorithms, which include: Simulated Annealing, Reduction, Slicing, Dataflow, and Firewall algorithms. The comparison is based on eight quantitative and qualitative criteria. These criteria are: number of selected test cases, execution time, precision, inclusiveness, preprocessing requirements, type of maintenance, level of testing, and type of approach. The empirical results show that the five algorithms can be used for different requirements of regression testing. For example the Simulated Annealing algorithm can be used for emergency non-safety-critical maintenance situations with a large number of small modifications.

Introduction

Software maintenance involves modifying programs as a result of errors, or changes in the user requirements (Bennett, 1991). During such modifications, new errors may be introduced, causing unintended adverse side effects in the software. Regression testing aims to provide confidence that the modifications are correct and have not adversely affected other parts of the program. Regression testing can be progressive or corrective (Hartmann and Robson, 1988). The former involves retesting major changes to the program's specifications. The latter is performed on specifications that essentially remain unchanged, so that only minor modifications, which do not affect the overall program structure, require retesting.

For regression testing, it might be costly to repeat the whole set of test cases used in the initial development of the program and unreliable to choose a random subset of these test cases. Therefore, it may be useful to select a suitable subset of test cases that accomplishes the objectives of regression testing. This subproblem of regression testing is referred to as the regression test selection problem.

The selection of suitable test cases can be made in different ways, and a number of regression testing approaches and algorithms have been proposed. A strategy based on input partitioning and cause-effect graphing of the program specification has been described in Yau and Kishimoto (1987). Leung and White (1989) have suggested classifying the initial test cases as reusable, retestable, obsolete, and adding new-structural and new-specification test cases. Then test cases can be selected from some of or all five classes. They have also extended this approach to cover programs with global variables (Leung and White, 1990b) and proposed methods to limit integration regression testing to a small set of modules based on the “firewall” concept (Leung and White, 1990a, Leung and White, 1992). An analogous concept of class firewall has been proposed for regression testing object-oriented software systems (Hsia et al., 1997).

A slicing algorithm presented in Gallagher and Lyle (1988) decomposes a program and selects the test cases that detect “linkage” between the modified and unmodified code. Another slicing strategy, which is based on data flow testing and incremental data flow analysis, has been described in Harrold and Soffa (1988) and Gupta et al. (1996). A safe algorithm, which uses a module's dependence graph and selects test cases that will cause the modified module to produce different output than the original module, has been proposed in Rothermel and Harrold, 1997, Rothermel and Harrold, 1998. An incremental slicing algorithm has been proposed in Agrawal et al. (1993) which selects test cases whose outputs may be affected by the modifications made.

Path expressions are used in Benedusi et al. (1988) to represent the program and then algebraic operations are used to determine the paths affected by the modifications. This facilitates the selection of the initial test cases needed for retesting. A similar approach based on semantic differencing has been proposed in Binkley (1997). Also, textual differencing has been used in Vokolos and Frankl (1998). An optimal regression testing approach based on a 0–1 integer programming problem formulation has been proposed by Fischer et al. (1981) and extended in Hartmann and Robson (1990) and solved by natural optimization algorithms in Mansour and El-Fakih (1999). Further, some software tools that are based on regression testing techniques have been constructed (Chen et al., 1994; Vokolos and Frankl, 1997; White et al., 1993; von Mayrhauser and Zhang, 1999).

Despite this amount of research on regression testing algorithms, insufficient work has been done on effectiveness evaluation and comparison of these algorithms. A simple cost model has been suggested early in Leung and White (1991). Another coverage-based cost-effectiveness prediction model has been proposed in Rosenblum and Weyuker (1997). A theoretical analysis framework has been suggested in Rothermel and Harrold (1996). In particular, empirical work has been insufficient. We have reported some initial experimental comparative results in Baradhi and Mansour (1997). Wong et al. (1998) directly examines the costs and benefits of only minimizing the test suite size. Rothermel et al. (1998) compares the costs and benefits of minimizing the test suite to show that the fault-detection capabilities of the selected subset can be compromised by minimization. Graves et al. (1998) empirically compare three algorithms with random and retest-all techniques. The three algorithms are: edge-coverage-adequate minimization, safe, and dataflow-coverage-based. The comparison focuses on the abilities to reduce regression testing effort and uncover faults in modified programs.

In this paper, we describe a comparison framework and present an empirical comparative study of five representative regression test selection algorithms, which include: Simulated Annealing, Reduction, Slicing, Dataflow, and Firewall algorithms. The comparison framework is composed of the following eight criteria: the number of test cases selected for regression testing, execution time, precision, inclusiveness, preprocessing requirements, type of maintenance, level of testing, and type of approach. This study shows that the five algorithms have diverse properties for different criteria. This enables software engineers and project managers to make appropriate decisions and select the regression testing algorithm that best fits the requirements of their application and development process.

This paper is organized as follows. Section 2 describes the regression test selection problem and the program models used. Section 3 gives a brief description of the five implemented algorithms. Section 4 describes the comparison criteria. Section 5 presents and briefly discusses the experimental results. Section 6 summarizes these results and discusses the limitations of the empirical study. Section 7 contains conclusions.

Section snippets

Program modeling and regression test selection problem

We assume that a program is modeled by a control flow graph with M nodes. Each node represents a program segment, which corresponds to a control statement or to a contiguous sequence of assignment statements. The graph edges represent control flow. Also, the data flow information is kept for relevant nodes, which includes the definitions and uses of the variables in the program statements within segments. Uses are classified as either computation uses (c-uses) or predicate uses (p-uses)

Implemented algorithms

In this section, we briefly present the five algorithms that we have implemented for our comparative study.

Comparison criteria

In this section, we present eight criteria for evaluating and comparing the five regression test selection algorithms. These criteria are quantitative and qualitative. The quantitative criteria are:

(i) Number of test cases in R selected by an algorithm, from T, to be rerun for regression testing, which is denoted as #R. This number will be represented as a percentage value with respect to N, the cardinality of T.

(ii) Execution time of the program that implements a regression test selection

Empirical results and discussion

In this section, we experimentally and qualitatively compare the properties of the five regression test selection algorithms using the eight criteria. For the four quantitative criteria, the experiments have been done on a PC running a Pentium II 233 MHz CPU.

Fifteen subject program modules (M1–M15) are used, for which the flowgraphs and the initial suite of test cases, T, have been manually generated. We assume that the flowgraphs and the associated information are generated off-line so that

Summary of the results and limitations

Table 7 summarizes the comparison results for the eight quantitative and qualitative criteria. The terms used to describe the properties of the algorithms, such as slow, fast, high, etc…, are based on comparing the five algorithms with each other. However, we note the following threats and weaknesses in this empirical study:

(i) we are doing an assessment using only a limited number of test modules;

(ii) the subject modules used are not random and are small- to medium-size modules;

(iii) the test

Conclusions and further work

Table 7 shows that different algorithms have different favorable properties. Thus, choosing an algorithm from the five algorithms is dependent on the regression tester's particular requirements and the system under consideration. However, we make the following recommendations:

(i) SA for emergency non-safety-critical situations, where we have a large number of small modifications and that maintenance has to be done rapidly. SA is also recommended where initial test cases have different

Acknowledgements

We thank the anonymous referees for their suggestions, which improved the paper significantly. We also thank Nidal Araby for implementing the reduction algorithm.

Nashat Mansour is an Associate Professor of Computer Science at the Lebanese American University, Lebanon. He received his BE and MS degrees in Electrical Engineering from the University of New South Wales, Australia, and MS in Computer Engineering and Ph.D. in Computer Science from Syracuse University, New York. His research interests include software testing and natural computation algorithms. He is a member of IEEE-CS and is on the executive committee of the Lebanese Association for the

References (33)

K.H Bennett
The software maintenance of large software systems: management, methods and tools
Reliability Engineering and Systems Safety
(1991)
Agrawal, H., Horgan, J.R., Krauser, E.W., 1993. Incremental regression testing. In: Proceedings of the Conference on...
Baradhi, G., Mansour, N., 1997. A comparative study of five regression testing algorithms. In: Proceedings of the...
B Beizer
Software Testing Techniques
(1990)
Benedusi, P., Cimitile, A., De Carlini, U., 1988. Post-maintenance testing based on path change analysis. In:...
D Binkley
Semantics guided regression test cost reduction
IEEE Trans. Software Eng.
(1997)
Chen, Y., Rosenblum, D.S., Vo, K., 1994. Testtube: a system for selective regression testing. In: Proceedings of the...
Fischer, K., Raji, F., Chruscicki, A., 1981. A methodology for retesting modified software. In: Proceedings of the...
Gallagher, K.B., Lyle, J.R., 1988. Using program decomposition to guide modifications. In: Proceedings of the...
Graves, T.L., Harrold, M.J., Kim, J.M., Porter, A., Rothermel, G., 1998. An empirical study of regression test...

R Gupta et al.

Program slicing-based regression testing techniques

Software Testing, Verification and Reliability

(1996)

Harrold, M.J., Gupta, R., Soffa, M.L., 1993. A methodology for controlling the size of a test suite. ACM Trans....

Harrold, M.J., Soffa, M.L., 1988. An incremental approach to unit testing during maintenance. In: Proceedings of the...

Hartmann, J., Robson, D.J., 1988. Approaches to regression testing. In: Proceedings of the Conference on Software...

Hartmann, J., Robson, D.J., 1990. Techniques for selective revalidation. IEEE Software, January,...

P Hsia et al.

A technique for the selective revalidation of OO software

J. Software Maintenance

(1997)

Cited by (31)

A systematic review on search based mutation testing
2017, Information and Software Technology
Citation Excerpt :
According to the authors, if the trend continues, over 1700 SBST papers will have been published before the end of this decade. Some issues addressed by SBST are test data generation [7–11], test case selection [12–15], test case prioritization [14,16–19], functional testing [20] and non-functional testing [21–23]. Mutation testing is the most widely known criterion of the fault injection testing technique [24].
Search Based Software Testing refers to the use of meta-heuristics for the optimization of a task in the context of software testing. Meta-heuristics can solve complex problems in which an optimum solution must be found among a large amount of possibilities. The use of meta-heuristics in testing activities is promising because of the high number of inputs that should be tested. Previous studies on search based software testing have focused on the application of meta-heuristics for the optimization of structural and functional criteria. Recently, some researchers have proposed the use of SBST for mutation testing and explored solutions for the cost of application of this testing criterion.
The objective is to identify how SBST has been explored in the context of mutation testing, how fitness functions are defined and the challenges and research opportunities in the application of meta-heuristic search techniques.
A systematic review involving 263 papers published between 1996 and 2014 examined the studies on the use of meta-heuristic search techniques for the optimization of mutation testing.
The results show meta-heuristic search techniques have been applied for the optimization of test data generation, mutant generation and selection of effective mutation operators. Five meta-heuristic techniques, namely Genetic Algorithm, Ant Colony, Bacteriological Algorithm, Hill Climbing and Simulated Annealing have been used in search based mutation testing. The review addressed different fitness functions used to guide the search.
Search based mutation testing is a field of interest, however, some issues remain unexplored. For instance, the use of meta-heuristics for the selection of effective mutation operators was identified in only one study. The results have pointed a range of possibilities for new studies to be developed, i.e., identification of equivalent mutants, experimental studies and application to different domains, such as concurrent programs.
Recovering test-to-code traceability using slicing and textual analysis
2014, Journal of Systems and Software
Citation Excerpt :
Specifically, running some test suites can take hours, even days, so developers cannot exercise the system instantly or in reasonable time (Runeson, 2006). Past research has proposed several test case selection methods aimed at reducing the cost of regression testing (Engström et al., 2010; Mansour et al., 2001; Graves et al., 2001; Binkley, 1997; Hurdugaci and Zaidman, 2012). However, in industry there is still no systematic approach for test case selection where test case selection often relies on the developers expertise and judgment (Runeson, 2006; Causevic et al., 2010).
Test suites are a valuable source of up-to-date documentation as developers continuously modify them to reflect changes in the production code and preserve an effective regression suite. While maintaining traceability links between unit test and the classes under test can be useful to selectively retest code after a change, the value of having traceability links goes far beyond this potential savings. One key use is to help developers better comprehend the dependencies between tests and classes and help maintain consistency during refactoring. Despite its importance, test-to-code traceability is not common in software development and, when needed, traceability information has to be recovered during software development and evolution. We propose an advanced approach, named SCOTCH+ (Source code and COncept based Test to Code traceability Hunter), to support the developer during the identification of links between unit tests and tested classes. Given a test class, represented by a JUnit class, the approach first exploits dynamic slicing to identify a set of candidate tested classes. Then, external and internal textual information associated with the classes retrieved by slicing is analyzed to refine this set of classes and identify the final set of candidate tested classes. The external information is derived from the analysis of the class name, while internal information is derived from identifiers and comments. The approach is evaluated on five software systems. The results indicate that the accuracy of the proposed approach far exceeds the leading techniques found in the literature.
A systematic review on regression test selection techniques
2010, Information and Software Technology
Citation Excerpt :
This review is focused on test selection techniques for regression testing. Although techniques for regression test selection have been evaluated in previous work [3,15,36,65], no general solution has been put forward since no technique could possibly respond adequately to the complexity of the problem and the great diversity in requirements and preconditions in software systems and development organizations. Neither does any single study evaluate every aspect of the problem; e.g. Kim et al. [27] evaluate the effects of regression test application frequency, Elbaum et al. [11] investigate the impact that different modifications have on regression test selection techniques, several studies examine the ability to reduce regression testing effort [3,11,15,27,36,65,66] and to reveal faults [11,15,27,49].
Regression testing is verifying that previously functioning software remains after a change. With the goal of finding a basis for further research in a joint industry-academia research project, we conducted a systematic review of empirical evaluations of regression test selection techniques. We identified 27 papers reporting 36 empirical studies, 21 experiments and 15 case studies. In total 28 techniques for regression test selection are evaluated. We present a qualitative analysis of the findings, an overview of techniques for regression test selection and related empirical evidence. No technique was found clearly superior since the results depend on many varying factors. We identified a need for empirical studies where concepts are evaluated rather than small variations in technical implementations.
Reduction-based methods and metrics for selective regression testing
2002, Information and Software Technology
Citation Excerpt :
In this section, we use the proposed STM and RTM metrics to evaluate the test-selection adequacy for two coverage-based regression test-selection algorithms on the 60 test problems. The algorithms are: Gupta et al.'s DF algorithms [13], and Leung and White's SFW algorithm [1,7]. DF and SFW define all the definition-use pairs that may be affected by the modification and select test cases accordingly.
In corrective maintenance, modified software is regression tested using selected test cases in order to ensure that the modifications have not caused adverse effects. This activity of selective regression testing involves regression test selection, which refers to selecting test cases from the previously run test suite, and test-coverage identification. In this paper, we propose three test-selection methods and two coverage identification metrics. The three methods aim to reduce the number of selected test cases for retesting the modified software. The first method, referred to as modification-based reduction version 1 (MBR1), selects a reduced number of test cases based on the modification made and its effects in the software. The second method, referred to as modification-based reduction version 2 (MBR2) improves MBR1 by further omitting tests that do not cover the modification. The third method, referred to as precise reduction (PR), reduces the number of test cases selected by omitting non-modification-revealing tests from the initial test suite. The two coverage metrics are McCabe-based regression test metrics, which are referred to as the Reachability regression Test selection McCabe-based metric (RTM), and data-flow Slices regression Test McCabe-based metric (STM). These metrics aim to assist the regression tester in monitoring test-coverage adequacy, reveal any shortage or redundancy in the test suite, and assist in identifying, where additional tests may be required for regression testing.
We empirically compare MBR1, MBR2, and PR with three reduction and precision-oriented methods on 60 test-problems. The results show that PR selects the least number of test cases most of the time and omits non-modification-revealing tests. We also demonstrate the applicability of our proposed methods to object-oriented regression testing at the class level. Further, we illustrate typical application of the RTM and STM metrics using the 60 test-problems and two coverage-oriented selective regression-testing methods.
Test Case Selection through Novel Methodologies for Software Application Developments
2023, Symmetry
Software testing
2019, Handbook of Software Engineering

View all citing articles on Scopus

Rami Bahsoon received his BS and MS degrees in Computer Science from the Lebanese American University in 1997 and 2000, respectively. He is presently pursuing a Ph.D. in Computer Science at the University College of London. His research interest is in software architecture and testing methods.

Ghinwa Baradhi received her BS and MS degrees in Computer Science from the Lebanese American University in 1994 and 1997, respectively. Currently, she is a project manager at Emirates Bank.

View full text