Elsevier

Journal of Systems and Software

Volume 122, December 2016, Pages 287-310
Journal of Systems and Software

Multi-objective test case prioritization in highly configurable systems: A case study

https://doi.org/10.1016/j.jss.2016.09.045Get rights and content

Highlights

  • A multi-objective test case prioritization real-world case study is presented.

  • Seven objective functions based on functional and non-functional data are proposed.

  • Comparison of the effectiveness of 63 combinations of up to three objectives.

  • NSGA-II evolutionary algorithm to solve the multi-objective prioritization problem.

  • Multi-objective prioritization is more effective than mono-objective approaches.

Abstract

Test case prioritization schedules test cases for execution in an order that attempts to accelerate the detection of faults. The order of test cases is determined by prioritization objectives such as covering code or critical components as rapidly as possible. The importance of this technique has been recognized in the context of Highly-Configurable Systems (HCSs), where the potentially huge number of configurations makes testing extremely challenging. However, current approaches for test case prioritization in HCSs suffer from two main limitations. First, the prioritization is usually driven by a single objective which neglects the potential benefits of combining multiple criteria to guide the detection of faults. Second, instead of using industry-strength case studies, evaluations are conducted using synthetic data, which provides no information about the effectiveness of different prioritization objectives. In this paper, we address both limitations by studying 63 combinations of up to three prioritization objectives in accelerating the detection of faults in the Drupal framework. Results show that non–functional properties such as the number of changes in the features are more effective than functional metrics extracted from the configuration model. Results also suggest that multi-objective prioritization typically results in faster fault detection than mono-objective prioritization.

Introduction

Highly-Configurable Systems (HCSs) provide a common core functionality and a set of optional features to tailor variants of the system according to a given set of requirements (Cohen, Dwyer, Shi, 2008, von Rhein, Grebhahn, Apel, Siegmund, Beyer, Berger, 2015). For instance, operating systems such as Linux or eCos are examples of HCSs where functionality is added or removed by installing and uninstalling packages, e.g. Debian Wheezy offers more than 37,000 available packages (Debian, 2013). Content management systems are also examples of HCSs were configuration is managed in terms of modules, e.g. the e-commerce platform Prestashop has more than 3500 modules and visual templates (Segura et al., 2014). Recently, cloud applications are also being presented as configurable systems, e.g. the Amazon Elastic Compute Cloud (EC2) service offers 1758 different possible configurations (García-Galán et al., 2013).

HCSs are usually represented in terms of features. A feature depicts a choice to include a certain functionality in a system configuration (von Rhein et al., 2015). It is common that not all combinations of features are allowed or meaningful. In this case, additional constraints are defined between them, normally using a variability model, such as a feature model. A feature model represents all the possible configurations of the HCS in terms of features and constraints among them (Kang et al., 1990). A configuration is a valid composition of features satisfying all the constraints. Fig. 1 depicts a feature model representing a simplified family of mobile phones. The model illustrates how features and relationships among them are used to specify the commonalities and variabilities of the mobile phones. The following set of features represents a valid configuration of the model: {Mobile Phone, Calls, Screen, HD, GPS, Media, Camera}.

HCS testing is about deriving a set of configurations and testing each configuration (Perrouin et al., 2011). In this context, a test case is defined as a configuration of the HCS under test (i.e. a set of features) and a test suite is a set of test cases (Perrouin et al., 2011). Henceforth, the terms test case and configuration are used indistinctly. Testing HCSs is extremely challenging due to the potentially huge number of configurations under test. As an example, Eclipse (Debian, 2013) has more than 1650 plugins that can be combined (with restrictions) to form millions of different configurations of the development environment. This makes exhaustive testing of HCSs infeasible, that is, testing every single configuration is too expensive in general. Also, even when a manageable set of configurations is available, testing is irremediably limited by time and budget constraints which requires making tough decisions with the goal of finding as many faults as possible.

Typical approaches for HCS testing use a model-based approach, that is, they take an input feature model representing the HCS and return a valid set of feature configurations to be tested, i.e. a test suite. In particular, two main strategies have been adopted: test case selection and test case prioritization. Test case selection reduces the test space by selecting an effective and manageable subset of configurations to be tested (Devroey, Perrouin, Schobbens, 2014, Henard, Papadakis, Perrouin, Klein, Traon, 2013, Marijan, Gotlieb, Sen, Hervieu, 2013). Test case prioritization schedules test cases for execution in an order that attempts to increase their effectiveness at meeting some performance goal, typically detecting faults as soon as possible (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Lopez-Herrejon, Ferrer, Chicano, Haslinger, Egyed, Alba, 2014, Wang, Buchmann, Ali, Gotlieb, Pradhan, Liaaen, 2014). Both strategies are complementary and are often combined.

Test case prioritization in HCSs can be driven by different functional and non–functional objectives. Functional prioritization objectives are those based on the functional features of the system and their interactions. Some examples are those based on combinatorial interaction testing (Wang et al., 2014b), configuration dissimilarity (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Henard, Papadakis, Perrouin, Klein, Heymans, Traon, 2014, Sánchez, Segura, Ruiz-Cortés, 2014) or feature model complexity metrics (Sánchez, Segura, Parejo, Ruiz-Cortés, 2015, Sánchez, Segura, Ruiz-Cortés, 2014). Non–functional prioritization objectives consider extra–functional information such as user preferences (Ensan, Bagheri, Asadi, Gasevic, Biletskiy, 2011, Johansen, Haugen, Fleurey, Eldegard, Syversen, 2012), cost (Wang et al., 2014b), memory consumption (Lopez-Herrejon et al., 2014) or execution probability (Devroey et al., 2014a) to find the best ordering for test cases. In a previous work (Sánchez et al., 2015b), we performed a preliminary evaluation comparing the effectiveness of several functional and non–functional prioritization objectives in accelerating the detection of faults in an HCS. Results suggested that non–functional properties such as the number of changes or the number of defects in a previous version of the system were among the most effective prioritization criteria.

Challenges. Current approaches for test case prioritization in HCSs follow a single objective approach (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Johansen, Haugen, Fleurey, Eldegard, Syversen, 2012, Devroey, Perrouin, Cordy, Schobbens, Legay, Heymans, 2014, Ensan, Bagheri, Asadi, Gasevic, Biletskiy, 2011, Henard, Papadakis, Perrouin, Klein, Heymans, Traon, 2014, Lopez-Herrejon, Ferrer, Chicano, Haslinger, Egyed, Alba, 2014, Sánchez, Segura, Parejo, Ruiz-Cortés, 2015), that is, they either aim to maximize or minimize an objective (e.g. feature coverage) or another (e.g. suite size) but not both at the same time. Other works (Wang, Ali, Gotlieb, 2013, Wang, Buchmann, Ali, Gotlieb, Pradhan, Liaaen, 2014) combine several objectives into a single function by assigning them weights proportional to their relative importance. While this may be acceptable in certain scenarios, it may be unrealistic in others where users may wish to study the trade-offs among several objectives (Lopez-Herrejon et al., 2014). Thus, the potential benefits of optimizing multiple prioritization objectives simultaneously, both functional and non–functional, is a topic that remains unexplored.

A further challenge is related to the lack of HCSs with available code, variability models and fault reports that can be used to assess the effectiveness of testing approaches. As a result, authors typically evaluate their contributions in terms of performance (e.g. execution time) using synthetic feature models and data (Al-Hajjaji, Thum, Meinicke, Lochau, Saake, 2014, Henard, Papadakis, Perrouin, Klein, Traon, 2013, Qu, Cohen, Rothermel, 2008, Xu, Cohen, Motycka, Rothermel, 2013). This introduces significant threats to validity, limit the scope of their conclusions and, more importantly, it raises questions regarding the fault–detection effectiveness of the different algorithms and prioritization objectives.

Contributions. In this paper, we present a case study on multi–objective test case prioritization in HCSs. In particular, we model test case prioritization in HCSs as a multi–objective optimization problem, and we present a search–based algorithm to solve it based on the classical NSGA-II evolutionary algorithm. Additionally, we present seven objective functions based on both functional and non–functional properties of the HCS under test. Then, we report a comparison of 63 different combinations of up to three objectives in accelerating the detection of faults in the Drupal framework. Drupal is a highly modular open source web content management system for which we have mined a feature model and extracted real data from its issue tracking system and Git repository (Sánchez et al., 2015b). Results reveal that non–functional properties, such as the number of defects in previous versions of the system, accelerate the detection of faults more effectively than functional properties extracted from the feature model. Results also suggest that multi-objective prioritization is more effective at accelerating the detection of faults than mono-objective prioritization.

The rest of the paper is structured as follows: Section 2 introduces the concepts of feature models and multi-objective evolutionary algorithms. Section 3 presents the Drupal case study used to perform this work. In Section 4 and Section 5 we respectively describe the overview and definition of our approach and the multi-objective optimization algorithm proposed. Section 6 defines seven objective functions for HCSs based on functional and non-functional goals. The evaluation of our approach is described in Section 7. Section 8 presents the threats to validity of our work. The related work is discussed in Section 9. Finally, we summarize our conclusions and outline our future work in Section 10.

Section snippets

Feature models

A feature model defines all the possible configurations of a system or family of related systems (Benavides, Segura, Ruiz-Cortés, 2010, Kang, Cohen, Hess, Novak, Peterson, 1990). A feature model is visually represented as a tree–like structure in which nodes represent features, and edges denote the relationships among them. A feature can be defined as any increment in the functionality of the system (Batory, 2005). A configuration of the system is composed of a set of features satisfying all

The drupal case study

In this section, we present the Drupal case study fully reported by the authors in a previous work (Sánchez et al., 2015b). Drupal is a highly modular open source web content management framework written in PHP (Buytaert, 2015, Tomlinson, VanDyk, 2010). This tool can be used to build a variety of websites including internet portals, e-commerce applications and online newspapers (Tomlinson and VanDyk, 2010). Drupal has more than 30,000 modules that can be composed to form valid configurations of

Approach overview

In this section, we define the problem addressed and our approach illustrating it with an example.

Multi-objective optimization algorithm

We used a MOEA to solve the multi–objective test case prioritization problem in HCSs. In particular, we adapted NSGA-II due to its popularity and good performance for many multi-objective optimization problems. In short, the algorithm receives an attributed feature model as input and returns a set of prioritized test suites optimizing the target objectives. In the following, we describe the specific adaptations performed to NSGA-II to solve the multi–objective test case prioritization problem

Objective functions

In this section, we propose and formalize different objective functions for test case prioritization in HCSs. All the functions receive an attributed feature model representing the HCS under test (fm) and a test suite (ts) as inputs and return an integer value measuring the quality of the suite with respect to the optimization goal. Note that the following functions will be later combined to form multi-objective goals (see Section 7). To illustrate each function, we use the feature model in

Evaluation

This section explains the experiments conducted to explore the effectiveness of multi–objective test case prioritization in Drupal. First, we introduce the target research questions and the general experimental setup. Second, the results of the different experiments and the statistical results are reported.

Threats to validity

The factors that could have influenced our case study are summarized in the following internal and external validity threats.

Internal validity. This refers to whether there is sufficient evidence to support the conclusions and the sources of bias that could compromise those conclusions. Inadequate parameter setting is a common internal validity threat. In this paper, we used standard parameter values for the NSGA-II algorithm (Deb et al., 2002). Furthermore, to consider the effect of

Related work

In this section we summarize the pieces of work that most closely relate to us. We divide them into HCSs testing and general software testing.

HCSs testing. Within the context of HCSs, there has been a stark and recent interest in the area of Software Product Lines (SPLs) testing as evidenced by several systematic mapping studies (e.g. da Mota Silveira Neto, do Carmo Machado, McGregor, de Almeida, de Lemos Meira, 2011, do Carmo Machado, McGregor, Cavalcanti, de Almeida, 2014, Engström, Runeson,

Conclusions

This article presented a real–world case study on multi–objective test case prioritization in Drupal, a highly configurable web framework. In particular, we adapted the NSGA-II evolutionary algorithm to solve the multi-objective prioritization problem in HCSs. Our algorithm uses seven novel objective functions based on functional and non-functional properties of the HCS under test. We performed several experiments comparing the effectiveness of 63 different combinations of up to three of these

Material

For the sake of replicability, the source code of our algorithm, the Drupal attributed feature model, experimental results and statistical analysis scripts in R are publicly available at http://exemplar.us.es/demo/SanchezJSS2016 (100Mb).

Acknowledgments

This work has been partially supported by the European Commission (FEDER) and Spanish Government under CICYT projects TAPAS (TIN2012-32273) and BELI (TIN2015-70560-R), and the excellence network SEBASE (TIN2015-71841-REDT), the Andalusian Government project COPAS (P12-TIC-1867) and the Austrian Science Fund (FWF) projects P25513-N15.

Jose A. Parejo received his Ph.D. (with honors) from University of Sevilla in 2103, where he currenly works as senior lecturer of software engineering. He has worked in the industry as developer, architect and project manager from 2001 to 2007. His research interests include metaheuristic optimization and software engineering, focusing mainly on search-based software engineering. He serves regularly as reviewer for international journals and conferences.

References (82)

  • A. Arcuri et al.

    A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering

    Softw. Test. Verif. Rel.

    (2014)
  • E. Bagheri et al.

    Assessing the maintainability of software product line feature models using structural metrics

    Softw. Qual. Control

    (2011)
  • D. Batory

    Feature models, grammars, and propositional formulas

    Software Product Lines Conference (SPLC)

    (2005)
  • N. Beume et al.

    On the complexity of computing the hypervolume indicator

    IEEE Trans. Evol. Comput.

    (2009)
  • D. Buytaert

    Drupal Framework

    (2015)
  • M.B. Cohen et al.

    Constructing interaction test suites for highly-configurable systems in the presence of constraints: a greedy approach

    Trans. Softw. Eng

    (2008)
  • P.A. da Mota Silveira Neto et al.

    A systematic mapping study of software product lines testing

    Inf. Softw. Technol.

    (2011)
  • K. Deb et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    Evol. Comput. IEEE Trans.

    (2002)
  • Debian

    Debian 7.0 Wheezy Released

    (2013)
  • J. Derrac et al.

    A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms

    Swarm Evol. Comput.

    (2011)
  • X. Devroey et al.

    Towards statistical prioritization for software product lines testing

    Proceedings of the Eighth International Workshop on Variability Modelling of Software-Intensive (VAMOS)

    (2014)
  • I. do Carmo Machado et al.

    On strategies for testing software product lines: a systematic literature review

    Inf. Softw. Technol.

    (2014)
  • J.J. Durillo et al.

    jMetal: a java framework for multi-objective optimization

    Adv. Eng. Softw.

    (2011)
  • S. Elbaum et al.

    Selecting a cost-effective test case prioritization technique

    Softw. Qual. J

    (2004)
  • A. Ensan et al.

    Goal-oriented test case selection and prioritization for product line feature models

    Conference Information Technology: New Generations

    (2011)
  • M.G. Epitropakis et al.

    Empirical evaluation of pareto efficient multi-objective regression test case prioritisation

    International Symposium on Software Testing and Analysis

    (2015)
  • FaMa Tool Suite, http://www.isa.us.es/fama/, Accessed November...
  • J. Ferrer et al.

    Evolutionary algorithms for the multi-objective test data generation problem

    Softw. Pract. Exp

    (2012)
  • J. García-Galán et al.

    Migrating to the cloud: a software product line based analysis

    3rd International Conference on Cloud Computing and Services Science (CLOSER’13)

    (2013)
  • B. Garvin et al.

    Evaluating improvements to a meta-heuristic search for constrained interaction testing

    Empir. Softw. Eng.

    (2011)
  • B.J. Garvin et al.

    An improved meta-heuristic search for constrained interaction testing

    Search Based Software Engineering, 2009 1st International Symposium on

    (2009)
  • T.L. Graves et al.

    Predicting Fault Incidence Using Software Change History

    Technical Report

    (1998)
  • M. Harman et al.

    Search-based software engineering: trends, techniques and applications

    ACM Comput. Surv.

    (2012)
  • C. Henard et al.

    Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-wise Test Suites for Large Software Product Lines

    Technical Report

    (2012)
  • C. Henard et al.

    Bypassing the combinatorial explosion: using similarity to generate and prioritize t-wise test configurations for software product lines

    IEEE Trans. Softw. Eng.

    (2014)
  • C. Henard et al.

    Multi-objective test generation for software product lines

    International Software Product Line Conference (SPLC)

    (2013)
  • S. Holm

    A simple sequentially rejective multiple test procedure

    Scand. J. Statist.

    (1979)
  • Y.-C. Huang et al.

    Design and analysis of cost-cognizant test case prioritization using genetic algorithm with test history

    Computer Software and Applications Conference

    (2010)
  • M. Islam et al.

    A multi-objective technique to prioritize test cases based on latent semantic indexing

  • M.F. Johansen et al.

    Properties of realistic feature models make combinatorial testing of product lines feasible

    MODELS

    (2011)
  • M.F. Johansen et al.

    Generating better partial covering arrays by modeling weights on sub-product lines

    International Conference MODELS

    (2012)
  • Cited by (84)

    • Variability testing of software product line: A preference-based dimensionality reduction approach

      2022, Information and Software Technology
      Citation Excerpt :

      We can find many formulations for the VTSPL problem [3–12] where the authors suggest optimizing distinct objectives (we identified at least seven objectives) and applying different algorithms, mainly evolutionary ones. However, our recent paper [13] investigated seven objective functions for this problem, and the results showed that the optimization algorithms such as NSGA-III and PCA-NSGA-II took up to 15 h to find suitable solutions on large feature models such as Drupal [14]. Also, they generated many solutions in the final Pareto-front, which could lead the user to spend much time finding suitable solutions for his/her context.

    • Multi-objective test suite optimization for detection and localization of software faults

      2022, Journal of King Saud University - Computer and Information Sciences
    • A survey on different approaches for software test case prioritization

      2021, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      Acquiring coverage information is tedious and costly in many cases. In this context, techniques without coverage information (Petke et al., 2015; Parejo et al., 2016) should be researched more. Luo et al. (2016) indicated a combination of static techniques (includes source code intervention) and dynamic techniques (uses run time execution data) might be more beneficial for prioritizing test cases.

    • A Regression Test Case Prioritization Framework for Software Sustainability

      2024, Communications in Computer and Information Science
    View all citing articles on Scopus

    Jose A. Parejo received his Ph.D. (with honors) from University of Sevilla in 2103, where he currenly works as senior lecturer of software engineering. He has worked in the industry as developer, architect and project manager from 2001 to 2007. His research interests include metaheuristic optimization and software engineering, focusing mainly on search-based software engineering. He serves regularly as reviewer for international journals and conferences.

    Ana B. Sánchez received her Ph.D. (with honors) from University of Seville in May 2016. She has worked as a research assistant in the Applied Software Engineering research group (ISA, www.isa.us.es) at the University of Seville in the area of automated testing of highly configurable systems. Contact her at [email protected].

    Sergio Segura received his Ph.D. in software engineering (with honours) from Seville University, where he currently works as a senior lecturer. His research interests include software testing, software variability and search-based software engineering. He has co-authored some highly cited papers as well as tools used by universities and companies in various countries. He also serves regularly as a reviewer for international journals and conferences.

    Antonio Ruiz-Cortés is an accredited full professor and Head of the Applied Software Engineering research group at the University of Seville, in which he received his Ph.D. (with honours). and M.Sc. in Computer Science. His research interests are in the areas of service oriented computing, software variability, software testing, and business process management. Further information about his publications, research projects and activities can be found at www.isa.us.es

    Roberto Erick Lopez-Herrejon is an associate professor at the Department of Software Engineering and Information Technology of the École de Technologie Supérieure of the University of Quebec in Montreal, Canada. Prior he was a senior postdoctoral researcher at the Johannes Kepler University in Linz, Austria. He was an Austrian Science Fund (FWF) Lise Meitner Fellow (2012–2014) at the same institution. From 2008 to 2014 he was an External Lecturer at the Software Engineering Masters Programme of the University of Oxford, England. From 2010 to 2012 he held an FP7 Intra-European Marie Curie Fellowship sponsored by the European Commission. He obtained his Ph.D. from the University of Texas at Austin in 2006, funded in part by a Fulbright Fellowship. From 2005 to 2008, he was a Career Development Fellow at the Software Engineering Centre of the University of Oxford. His main expertise is in software customization, software product lines, and search based software engineering. [email protected] Department of Software Engineering and IT. École de Technologie Supérieure, (ÉTS), Notre-Dame Street Ouest. 1100, H3C 1K3. Montreal, Canada.

    Alexander Egyed heads the Institute for Software Engineering and Automation at the Johannes Kepler University, Austria. He is also an adjunct assistant professor at the University of Southern California, USA. Before joining the JKU, he worked as a research scientist for Teknowledge Corporation, USA (2000–2007) and then as a Research Fellow at the University College London, UK (2007–2008). He received a doctorate degree in 2000 and a master of science degree in 1996, both in computer science, from the University of Southern California, USA under the mentorship of Dr. Barry Boehm. His research interests include software design modeling, requirements engineering, consistency checking and resolution, traceability, and change impact analysis. He is a member of ACM, ACM SigSoft, IEEE, and IEEE Computer Society.

    View full text