Multi-objective test case prioritization in highly configurable systems: A case study
Introduction
Highly-Configurable Systems (HCSs) provide a common core functionality and a set of optional features to tailor variants of the system according to a given set of requirements (Cohen, Dwyer, Shi, 2008, von Rhein, Grebhahn, Apel, Siegmund, Beyer, Berger, 2015). For instance, operating systems such as Linux or eCos are examples of HCSs where functionality is added or removed by installing and uninstalling packages, e.g. Debian Wheezy offers more than 37,000 available packages (Debian, 2013). Content management systems are also examples of HCSs were configuration is managed in terms of modules, e.g. the e-commerce platform Prestashop has more than 3500 modules and visual templates (Segura et al., 2014). Recently, cloud applications are also being presented as configurable systems, e.g. the Amazon Elastic Compute Cloud (EC2) service offers 1758 different possible configurations (García-Galán et al., 2013).
HCSs are usually represented in terms of features. A feature depicts a choice to include a certain functionality in a system configuration (von Rhein et al., 2015). It is common that not all combinations of features are allowed or meaningful. In this case, additional constraints are defined between them, normally using a variability model, such as a feature model. A feature model represents all the possible configurations of the HCS in terms of features and constraints among them (Kang et al., 1990). A configuration is a valid composition of features satisfying all the constraints. Fig. 1 depicts a feature model representing a simplified family of mobile phones. The model illustrates how features and relationships among them are used to specify the commonalities and variabilities of the mobile phones. The following set of features represents a valid configuration of the model: {Mobile Phone, Calls, Screen, HD, GPS, Media, Camera}.
HCS testing is about deriving a set of configurations and testing each configuration (Perrouin et al., 2011). In this context, a test case is defined as a configuration of the HCS under test (i.e. a set of features) and a test suite is a set of test cases (Perrouin et al., 2011). Henceforth, the terms test case and configuration are used indistinctly. Testing HCSs is extremely challenging due to the potentially huge number of configurations under test. As an example, Eclipse (Debian, 2013) has more than 1650 plugins that can be combined (with restrictions) to form millions of different configurations of the development environment. This makes exhaustive testing of HCSs infeasible, that is, testing every single configuration is too expensive in general. Also, even when a manageable set of configurations is available, testing is irremediably limited by time and budget constraints which requires making tough decisions with the goal of finding as many faults as possible.
Typical approaches for HCS testing use a model-based approach, that is, they take an input feature model representing the HCS and return a valid set of feature configurations to be tested, i.e. a test suite. In particular, two main strategies have been adopted: test case selection and test case prioritization. Test case selection reduces the test space by selecting an effective and manageable subset of configurations to be tested (Devroey, Perrouin, Schobbens, 2014, Henard, Papadakis, Perrouin, Klein, Traon, 2013, Marijan, Gotlieb, Sen, Hervieu, 2013). Test case prioritization schedules test cases for execution in an order that attempts to increase their effectiveness at meeting some performance goal, typically detecting faults as soon as possible (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Lopez-Herrejon, Ferrer, Chicano, Haslinger, Egyed, Alba, 2014, Wang, Buchmann, Ali, Gotlieb, Pradhan, Liaaen, 2014). Both strategies are complementary and are often combined.
Test case prioritization in HCSs can be driven by different functional and non–functional objectives. Functional prioritization objectives are those based on the functional features of the system and their interactions. Some examples are those based on combinatorial interaction testing (Wang et al., 2014b), configuration dissimilarity (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Henard, Papadakis, Perrouin, Klein, Heymans, Traon, 2014, Sánchez, Segura, Ruiz-Cortés, 2014) or feature model complexity metrics (Sánchez, Segura, Parejo, Ruiz-Cortés, 2015, Sánchez, Segura, Ruiz-Cortés, 2014). Non–functional prioritization objectives consider extra–functional information such as user preferences (Ensan, Bagheri, Asadi, Gasevic, Biletskiy, 2011, Johansen, Haugen, Fleurey, Eldegard, Syversen, 2012), cost (Wang et al., 2014b), memory consumption (Lopez-Herrejon et al., 2014) or execution probability (Devroey et al., 2014a) to find the best ordering for test cases. In a previous work (Sánchez et al., 2015b), we performed a preliminary evaluation comparing the effectiveness of several functional and non–functional prioritization objectives in accelerating the detection of faults in an HCS. Results suggested that non–functional properties such as the number of changes or the number of defects in a previous version of the system were among the most effective prioritization criteria.
Challenges. Current approaches for test case prioritization in HCSs follow a single objective approach (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Johansen, Haugen, Fleurey, Eldegard, Syversen, 2012, Devroey, Perrouin, Cordy, Schobbens, Legay, Heymans, 2014, Ensan, Bagheri, Asadi, Gasevic, Biletskiy, 2011, Henard, Papadakis, Perrouin, Klein, Heymans, Traon, 2014, Lopez-Herrejon, Ferrer, Chicano, Haslinger, Egyed, Alba, 2014, Sánchez, Segura, Parejo, Ruiz-Cortés, 2015), that is, they either aim to maximize or minimize an objective (e.g. feature coverage) or another (e.g. suite size) but not both at the same time. Other works (Wang, Ali, Gotlieb, 2013, Wang, Buchmann, Ali, Gotlieb, Pradhan, Liaaen, 2014) combine several objectives into a single function by assigning them weights proportional to their relative importance. While this may be acceptable in certain scenarios, it may be unrealistic in others where users may wish to study the trade-offs among several objectives (Lopez-Herrejon et al., 2014). Thus, the potential benefits of optimizing multiple prioritization objectives simultaneously, both functional and non–functional, is a topic that remains unexplored.
A further challenge is related to the lack of HCSs with available code, variability models and fault reports that can be used to assess the effectiveness of testing approaches. As a result, authors typically evaluate their contributions in terms of performance (e.g. execution time) using synthetic feature models and data (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Henard, Papadakis, Perrouin, Klein, Traon, 2013, Qu, Cohen, Rothermel, 2008, Xu, Cohen, Motycka, Rothermel, 2013). This introduces significant threats to validity, limit the scope of their conclusions and, more importantly, it raises questions regarding the fault–detection effectiveness of the different algorithms and prioritization objectives.
Contributions. In this paper, we present a case study on multi–objective test case prioritization in HCSs. In particular, we model test case prioritization in HCSs as a multi–objective optimization problem, and we present a search–based algorithm to solve it based on the classical NSGA-II evolutionary algorithm. Additionally, we present seven objective functions based on both functional and non–functional properties of the HCS under test. Then, we report a comparison of 63 different combinations of up to three objectives in accelerating the detection of faults in the Drupal framework. Drupal is a highly modular open source web content management system for which we have mined a feature model and extracted real data from its issue tracking system and Git repository (Sánchez et al., 2015b). Results reveal that non–functional properties, such as the number of defects in previous versions of the system, accelerate the detection of faults more effectively than functional properties extracted from the feature model. Results also suggest that multi-objective prioritization is more effective at accelerating the detection of faults than mono-objective prioritization.
The rest of the paper is structured as follows: Section 2 introduces the concepts of feature models and multi-objective evolutionary algorithms. Section 3 presents the Drupal case study used to perform this work. In Section 4 and Section 5 we respectively describe the overview and definition of our approach and the multi-objective optimization algorithm proposed. Section 6 defines seven objective functions for HCSs based on functional and non-functional goals. The evaluation of our approach is described in Section 7. Section 8 presents the threats to validity of our work. The related work is discussed in Section 9. Finally, we summarize our conclusions and outline our future work in Section 10.
Section snippets
Feature models
A feature model defines all the possible configurations of a system or family of related systems (Benavides, Segura, Ruiz-Cortés, 2010, Kang, Cohen, Hess, Novak, Peterson, 1990). A feature model is visually represented as a tree–like structure in which nodes represent features, and edges denote the relationships among them. A feature can be defined as any increment in the functionality of the system (Batory, 2005). A configuration of the system is composed of a set of features satisfying all
The drupal case study
In this section, we present the Drupal case study fully reported by the authors in a previous work (Sánchez et al., 2015b). Drupal is a highly modular open source web content management framework written in PHP (Buytaert, 2015, Tomlinson, VanDyk, 2010). This tool can be used to build a variety of websites including internet portals, e-commerce applications and online newspapers (Tomlinson and VanDyk, 2010). Drupal has more than 30,000 modules that can be composed to form valid configurations of
Approach overview
In this section, we define the problem addressed and our approach illustrating it with an example.
Multi-objective optimization algorithm
We used a MOEA to solve the multi–objective test case prioritization problem in HCSs. In particular, we adapted NSGA-II due to its popularity and good performance for many multi-objective optimization problems. In short, the algorithm receives an attributed feature model as input and returns a set of prioritized test suites optimizing the target objectives. In the following, we describe the specific adaptations performed to NSGA-II to solve the multi–objective test case prioritization problem
Objective functions
In this section, we propose and formalize different objective functions for test case prioritization in HCSs. All the functions receive an attributed feature model representing the HCS under test (fm) and a test suite (ts) as inputs and return an integer value measuring the quality of the suite with respect to the optimization goal. Note that the following functions will be later combined to form multi-objective goals (see Section 7). To illustrate each function, we use the feature model in
Evaluation
This section explains the experiments conducted to explore the effectiveness of multi–objective test case prioritization in Drupal. First, we introduce the target research questions and the general experimental setup. Second, the results of the different experiments and the statistical results are reported.
Threats to validity
The factors that could have influenced our case study are summarized in the following internal and external validity threats.
Internal validity. This refers to whether there is sufficient evidence to support the conclusions and the sources of bias that could compromise those conclusions. Inadequate parameter setting is a common internal validity threat. In this paper, we used standard parameter values for the NSGA-II algorithm (Deb et al., 2002). Furthermore, to consider the effect of
Related work
In this section we summarize the pieces of work that most closely relate to us. We divide them into HCSs testing and general software testing.
HCSs testing. Within the context of HCSs, there has been a stark and recent interest in the area of Software Product Lines (SPLs) testing as evidenced by several systematic mapping studies (e.g. da Mota Silveira Neto, do Carmo Machado, McGregor, de Almeida, de Lemos Meira, 2011, do Carmo Machado, McGregor, Cavalcanti, de Almeida, 2014, Engström, Runeson,
Conclusions
This article presented a real–world case study on multi–objective test case prioritization in Drupal, a highly configurable web framework. In particular, we adapted the NSGA-II evolutionary algorithm to solve the multi-objective prioritization problem in HCSs. Our algorithm uses seven novel objective functions based on functional and non-functional properties of the HCS under test. We performed several experiments comparing the effectiveness of 63 different combinations of up to three of these
Material
For the sake of replicability, the source code of our algorithm, the Drupal attributed feature model, experimental results and statistical analysis scripts in R are publicly available at http://exemplar.us.es/demo/SanchezJSS2016 (100Mb).
Acknowledgments
This work has been partially supported by the European Commission (FEDER) and Spanish Government under CICYT projects TAPAS (TIN2012-32273) and BELI (TIN2015-70560-R), and the excellence network SEBASE (TIN2015-71841-REDT), the Andalusian Government project COPAS (P12-TIC-1867) and the Austrian Science Fund (FWF) projects P25513-N15.
Jose A. Parejo received his Ph.D. (with honors) from University of Sevilla in 2103, where he currenly works as senior lecturer of software engineering. He has worked in the industry as developer, architect and project manager from 2001 to 2007. His research interests include metaheuristic optimization and software engineering, focusing mainly on search-based software engineering. He serves regularly as reviewer for international journals and conferences.
References (82)
- et al.
Automated analysis of feature models 20 years later: a literature review
Inf. Syst.
(2010) - et al.
Multi-objective optimization
- et al.
Abstract test case generation for behavioural testing of software product lines
Software Product Line Conference
(2014) - et al.
Software product line testing - a systematic mapping study
Inf. Softw. Technol.
(2011) - et al.
Evolutionary search-based test generation for software product line feature models
Conference on Advanced Information Systems Engineering (CAiSE’12)
(2012) - et al.
Presence-condition simplification in highly configurable systems
International Conference on Software Engineering
(2015) - et al.
Multi-objective test prioritization in software product line testing: an industrial case study
Software Product Line Conference
(2014) - et al.
Continuous test suite augmentation in software product lines
Software Product Line Conference
(2013) - et al.
42 variability bugs in the linux kernel: a qualitative analysis
- et al.
Similarity-based prioritization in software product-line testing
Software Product Line Conference
(2014)
A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering
Softw. Test. Verif. Rel.
Assessing the maintainability of software product line feature models using structural metrics
Softw. Qual. Control
Feature models, grammars, and propositional formulas
Software Product Lines Conference (SPLC)
On the complexity of computing the hypervolume indicator
IEEE Trans. Evol. Comput.
Drupal Framework
Constructing interaction test suites for highly-configurable systems in the presence of constraints: a greedy approach
Trans. Softw. Eng
A systematic mapping study of software product lines testing
Inf. Softw. Technol.
A fast and elitist multiobjective genetic algorithm: NSGA-II
Evol. Comput. IEEE Trans.
Debian 7.0 Wheezy Released
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
Swarm Evol. Comput.
Towards statistical prioritization for software product lines testing
Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive (VAMOS)
On strategies for testing software product lines: a systematic literature review
Inf. Softw. Technol.
jMetal: a java framework for multi-objective optimization
Adv. Eng. Softw.
Selecting a cost-effective test case prioritization technique
Softw. Qual. J
Goal-oriented test case selection and prioritization for product line feature models
Conference Information Technology: New Generations
Empirical evaluation of pareto efficient multi-objective regression test case prioritisation
International Symposium on Software Testing and Analysis
Evolutionary algorithms for the multi-objective test data generation problem
Softw. Pract. Exp
Migrating to the cloud: a software product line based analysis
3rd International Conference on Cloud Computing and Services Science (CLOSER’13)
Evaluating improvements to a meta-heuristic search for constrained interaction testing
Empir. Softw. Eng.
An improved meta-heuristic search for constrained interaction testing
Search Based Software Engineering, 2009 1st International Symposium on
Predicting Fault Incidence Using Software Change History
Technical Report
Search-based software engineering: trends, techniques and applications
ACM Comput. Surv.
Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-wise Test Suites for Large Software Product Lines
Technical Report
Bypassing the combinatorial explosion: using similarity to generate and prioritize t-wise test configurations for software product lines
IEEE Trans. Softw. Eng.
Multi-objective test generation for software product lines
International Software Product Line Conference (SPLC)
A simple sequentially rejective multiple test procedure
Scand. J. Statist.
Design and analysis of cost-cognizant test case prioritization using genetic algorithm with test history
Computer Software and Applications Conference
A multi-objective technique to prioritize test cases based on latent semantic indexing
Properties of realistic feature models make combinatorial testing of product lines feasible
MODELS
Generating better partial covering arrays by modeling weights on sub-product lines
International Conference MODELS
Cited by (84)
A multi-objective evolutionary approach towards automated online controlled experiments
2023, Journal of Systems and SoftwareTesting of highly configurable cyber–physical systems — Results from a two-phase multiple case study
2023, Journal of Systems and SoftwareVariability testing of software product line: A preference-based dimensionality reduction approach
2022, Information and Software TechnologyCitation Excerpt :We can find many formulations for the VTSPL problem [3–12] where the authors suggest optimizing distinct objectives (we identified at least seven objectives) and applying different algorithms, mainly evolutionary ones. However, our recent paper [13] investigated seven objective functions for this problem, and the results showed that the optimization algorithms such as NSGA-III and PCA-NSGA-II took up to 15 h to find suitable solutions on large feature models such as Drupal [14]. Also, they generated many solutions in the final Pareto-front, which could lead the user to spend much time finding suitable solutions for his/her context.
Multi-objective test suite optimization for detection and localization of software faults
2022, Journal of King Saud University - Computer and Information SciencesA survey on different approaches for software test case prioritization
2021, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :Acquiring coverage information is tedious and costly in many cases. In this context, techniques without coverage information (Petke et al., 2015; Parejo et al., 2016) should be researched more. Luo et al. (2016) indicated a combination of static techniques (includes source code intervention) and dynamic techniques (uses run time execution data) might be more beneficial for prioritizing test cases.
A Regression Test Case Prioritization Framework for Software Sustainability
2024, Communications in Computer and Information Science
Jose A. Parejo received his Ph.D. (with honors) from University of Sevilla in 2103, where he currenly works as senior lecturer of software engineering. He has worked in the industry as developer, architect and project manager from 2001 to 2007. His research interests include metaheuristic optimization and software engineering, focusing mainly on search-based software engineering. He serves regularly as reviewer for international journals and conferences.
Ana B. Sánchez received her Ph.D. (with honors) from University of Seville in May 2016. She has worked as a research assistant in the Applied Software Engineering research group (ISA, www.isa.us.es) at the University of Seville in the area of automated testing of highly configurable systems. Contact her at [email protected].
Sergio Segura received his Ph.D. in software engineering (with honours) from Seville University, where he currently works as a senior lecturer. His research interests include software testing, software variability and search-based software engineering. He has co-authored some highly cited papers as well as tools used by universities and companies in various countries. He also serves regularly as a reviewer for international journals and conferences.
Antonio Ruiz-Cortés is an accredited full professor and Head of the Applied Software Engineering research group at the University of Seville, in which he received his Ph.D. (with honours). and M.Sc. in Computer Science. His research interests are in the areas of service oriented computing, software variability, software testing, and business process management. Further information about his publications, research projects and activities can be found at www.isa.us.es
Roberto Erick Lopez-Herrejon is an associate professor at the Department of Software Engineering and Information Technology of the École de Technologie Supérieure of the University of Quebec in Montreal, Canada. Prior he was a senior postdoctoral researcher at the Johannes Kepler University in Linz, Austria. He was an Austrian Science Fund (FWF) Lise Meitner Fellow (2012–2014) at the same institution. From 2008 to 2014 he was an External Lecturer at the Software Engineering Masters Programme of the University of Oxford, England. From 2010 to 2012 he held an FP7 Intra-European Marie Curie Fellowship sponsored by the European Commission. He obtained his Ph.D. from the University of Texas at Austin in 2006, funded in part by a Fulbright Fellowship. From 2005 to 2008, he was a Career Development Fellow at the Software Engineering Centre of the University of Oxford. His main expertise is in software customization, software product lines, and search based software engineering. [email protected] Department of Software Engineering and IT. École de Technologie Supérieure, (ÉTS), Notre-Dame Street Ouest. 1100, H3C 1K3. Montreal, Canada.
Alexander Egyed heads the Institute for Software Engineering and Automation at the Johannes Kepler University, Austria. He is also an adjunct assistant professor at the University of Southern California, USA. Before joining the JKU, he worked as a research scientist for Teknowledge Corporation, USA (2000–2007) and then as a Research Fellow at the University College London, UK (2007–2008). He received a doctorate degree in 2000 and a master of science degree in 1996, both in computer science, from the University of Southern California, USA under the mentorship of Dr. Barry Boehm. His research interests include software design modeling, requirements engineering, consistency checking and resolution, traceability, and change impact analysis. He is a member of ACM, ACM SigSoft, IEEE, and IEEE Computer Society.