Unified integration of many-objective optimization algorithm based on temporary offspring for software defects prediction

https://doi.org/10.1016/j.swevo.2021.100871Get rights and content

Abstract

Software defects prediction technology related to software products’ security and quality and provides guidance for software testing. To solve both the problem of datasets class imbalance in software defects prediction and support vector machine (SVM) parameter selection synchronously, high dimension software defects prediction model (HD-SDP) based on SVM is proposed. Including four objectives that the false positive rate of defects, probability of detection, F-metric, and Balance value. And a unified integration of many-objective optimization algorithm based on temporary offspring (UIMaOTO) is designed for this model to select the parameters of SVM and non-defective module synchronously. UIMaOTO adopts temporary offspring strategy to generate the formal offspring and then proposes the unified integration strategy to enhance the selection pressure of algorithm. UIMaOTO is compared to other state-of-the-art algorithms, and the experiment results are conducted on well-known DTLZ test suite. The results show that the proposed algorithm has better all-around performance and is competitive for many-objective optimization problems. At the same time, the UIMaOTO algorithm is used to address the HD-SDP model, and the performance is improved by 14.27% compared with other algorithms.

Introduction

With the development of science and the acceleration of the computerization process, software applications are inseparable from our lives, which have penetrated into the fields of military industry, aerospace, industrial manufacturing, financial field, and energy. The software security issue has been attracted people's attention due to it apply in the cutting-edge area widely. The development of systematization, complication, and integration for software has become an inevitable trend. Serious accidents that occurred caused by information security and software defects are even catastrophic. Therefore, it is necessary to avoid software defects and enhance software security. At present, most researchers believe that the main factor affecting software quality is software defects, which means that some wrong and hidden functional defects modules will cause the software system to crash in the software. So, timely prediction and correction of software defects is a necessary prerequisite for the production of high-quality software products. The technology of software defects prediction comes into being.

Software defects prediction [1] is a meaningful way to guide and evaluate software testing. Predicting the distribution of software defects accurately has essential significance for software testing, which enables testers to discover more defective modules in less effort and time. Fig. 1 shows the basic procedure of software defect prediction. An effective prediction model is constructed to discover whether there are defective modules in the unknown data through the analysis and training of historical data. The advantage of this technology is that it can reasonably use historical data to predict more defective modules accurately, and testers will apply more test resources on modules that are more likely to be defective. It can be seen that software defects prediction technology can virtually guarantee software quality, guide the work of software testing and reduce the labor and time cost consumed, which dramatically improves the efficiency of software testing.

Many scholars have conducted research and discussion in many aspects in view of the research problems of software defects prediction. Liu et al. [2] propose a software defects prediction model based on principal component analysis (PCA) and the parameters of support vector machine (SVM) optimized by chaotic particle swarm optimization (PSO) algorithm. Wang et al. [3] use the genetic algorithm to reduce redundant attributes’ adverse effects in data samples. Meanwhile, an efficient software defects prediction model is established using SVM, but the SVM parameters are not optimized. Cai et al. [4] propose a new under-sampled software defects prediction model with multi-objective cuckoo search algorithm and SVM (HMOCS-US-SVM), which is used to deal with the problem of class imbalance in the datasets and parameters selection. The SVM parameters involve the penalty coefficient C and a new parameter σ in radial basis kernel function (RBF). The formula is shown in Eq. (1). Where (xi,xj) is the training sample set, σ is the bandwidth of the radial basis kernel function.K(xi,xj)=exp{|xixj|2σ2}

In summary, the SVM parameter selection problem and the datasets class imbalance problem are two critical factors in solving the software defects prediction problem. It can be seen from the above works of literature that both swarm intelligent optimization algorithms [5, 6] and multi-objective algorithms [7] can be used to optimize the software defects prediction model. While PSO [8, 9] simulates the foraging behavior of birds, and the cuckoo search (CS) [10] simulates the brood parasitism mechanism of cuckoos. Meanwhile, multi-objective cuckoo search algorithm is used to solve software defect prediction problems involving the false positive rate of defects and probability of detection. There are also many indicators that can predict software defects, such as precision, harmonic mean, and error rate et al. Therefore, it is necessary to consider a many-objective optimization algorithm (MaOPs) [11] with four or more objectives to enhance the prediction accuracy. And many-objective optimization algorithm can be defined as follows:minf(x)=[f1(x),f2(x),,fi(x),,fn(x)]TSubjecttoxΩwhere x=(x1,x2,,xi,,xn) is a vector of n dimensional decision variables in the decision space Ω, m represents the number of objectives.

  • Multi-objective optimization problem: the number of objectives m=2orm=3.

  • Many-objective optimization problem: the number of objectives m4.

Deb et al. [12] improve NSGA-II by introducing a set of predefined reference-points that effectively solve the diversity of high-dimension problem. And NSGA-III is suitable for solving the problem of 3 to 15 objectives. Li et al. [13] propose a many-objective evolutionary algorithm based on dominance and decomposition strategies (MOEA/DD), which is used to solve unconstrained optimization problem. The many-objective optimization algorithm based on ensemble fitness ranking strategy (EFRRR) is proposed by Yuan et al. [14], which is used to solve the problem that aggregate functions cannot maintain diversity. Considering the grid level, crowding distance and coordinate point distance in the process of mating and environmental selection, a grid-based evolutionary algorithm (GrEA) [15] is proposed by Yang to trade-off the convergence and diversity of the population. Moreover, Bader et al. propose a hypervolume estimation algorithm (HypE) [16], which considers the hypervolume indicator rather than the actual indicator values. It uses the method of Monte Carlo simulation to solve the problem with many objectives. So, how to ensure well-converge and well-distributed of the algorithm is crucially important.

Original conference paper [17] considers the false positive rate (pf) and the probability of detection (pd) as two objective functions to construct the multi-objective software defect prediction model. Meanwhile, a multi-objective bat algorithm (MOBA) is designed to solve this model. Based on this, in order to describe the software defect prediction problem in many aspects and improve the prediction accuracy, this paper extends high dimension software defects prediction model (HD-SDP) based on SVM. And intelligent optimization algorithm has made a generous contribution to solving the many-objective problem in recent years. In this paper, we consider designing a many-objective optimization algorithm with better performance to optimize software defects prediction model, which is not found in existing works of literatures. The purpose is to improve predictive performance effectively. Furthermore, the detailed contributions are shown as follows:

  • (1)

    A new software defects prediction model with four objectives is proposed. Consider the aspects of the probability of detection rate pd, the false positive rate of defects pf, the harmonic mean Fmetric and the overall evolution indicators Balance, while optimizing the problems of software defects datasets class imbalance and SVM parameter selection simultaneously.

  • (2)

    Since the initial population has a great impact on the algorithm, we design a new framework of many-objective optimization algorithm based on temporary offspring. The formal offspring is generated by temporary offspring strategy, which combines the parents of the population with the formal offspring as the new initial population.

  • (3)

    The temporary offspring strategy balances the ability of convergence and diversity by computing the temporary offspring value. And the achievement scalar function and penalty angle function is adopted as formal offspring strategy to generate formal offspring. Also, a unified integration strategy is proposed to guide the population to evolve in a better direction.

The structure of this paper is organized as follows: Section 2 gives the related work about software defects prediction. Section 3 introduces the proposed model, which is used to predict defective model. In order to solve this model effectively, a new many-objective optimization algorithm is designed in Section 4. The comparison experiments and the detailed analyses are conducted in Section 5. At last, Section 6 gives the conclusion.

Section snippets

Related work

Software defects prediction technology has developed since the 1970s. In recent years, many software defects prediction methods have been proposed with the increasing software scale and testing cost. Software defects prediction technology is divided into two categories, static and dynamic software defects prediction technology. Dynamic software defects link time to defects modules. In contrast, static software defects prediction technology predicts the potential defects by exploiting the

The Proposed Model for Soft Defects Prediction (HD-SDP)

Firstly, eight data instances are introduced in the National Aeronautics and Space Administration (NASA) MDP test suites. The detailed information of eight datasets with removing duplicate modules is shown in Table 1, including the total number of modules, the number of defective modules, the defect rate and the description.

Since the software defects prediction model is tested on a completely different test set compared with the training set. Given a complexity metric data set for a software

The proposed UIMaOTO

In this section, we first introduce the main framework of the algorithm. Temporary offspring strategy and formal offspring strategy are introduced in Sections 4.2 and 4.3. Finally, the unified integration strategy is proposed in Section 4.4 to save individuals with good performance.

Experiments and Analyses

The common test suites include DTLZ [33] and MaOP [34], which are used to test the performance of algorithm. Section 5.1 gives the test results for UIMaOTO on DTLZ test suite and MaOP test suite, and Section 5.2 applied algorithm for HD-SDP model. The platform used in the experiments is MATLAB_2018a, and the CPU is Intel(R) Core (TM) i7-9750H CPU @ 2.60 GHz, RAM is 16GB.

Conclusions

This paper proposes a high dimension software defect prediction model (HD-SDP), which involved four different objectives: the false positive rate of defects, probability of detection, F-metric and Balance value. Meanwhile, we propose a unified integration of many-objective optimization algorithm (UIMaOTO) for this model. This algorithm designs a temporary offspring strategy to generate formal offspring so as to obtain a better initial population. Moreover, the model with the proposed algorithm

Author Statement

Xingjuan Cai: Conceptualization, Methodology. Shaojin Geng: Writing-original draft, review and editing. Di Wu: Data curation, Visualization. Jinjun Chen: Software, Validation.

Declaration of Competing Interest

The authors declare no conflict of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61806138 and 61772478, Key R and D Program of Shanxi Province (High Technology) under Grant No. 201903D121119, Natural Science Foundation of Shanxi Province under Grant No. 201801D121127. Postgraduate Education Innovation Project of Shanxi Province (Shaojin Geng) under Grant No. 2020SY436.

References (37)

  • M.S. Mohamed et al.

    Flight control system design using adaptive pigeon-inspired optimisation

    Int. J. Bio-Inspired Comput.

    (2020)
  • C.H. Xu et al.

    A privacy-preserving recommendation method based on multi-objective optimisation for mobile users

    Int. J. Bio-Inspired Comput.

    (2020)
  • N. Nedjah et al.

    Inspiration-wise swarm intelligence meta-heuristics for continuous optimisation: a survey - part I

    Int. J. Bio-Inspired Comput.

    (2020)
  • X.J. Cai et al.

    A sharding scheme based many-objective optimization algorithm for enhancing security in blockchain-enabled industrial Internet of Things

    IEEE Trans. Ind. Inf.

    (2020)
  • K. Deb et al.

    An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: Solving problems with box constraints

    IEEE Trans. Evol. Comput.

    (2014)
  • K. Li et al.

    An evolutionary many-objective optimization algorithm based on dominance and decomposition

    IEEE Trans. Evol. Comput.

    (2015)
  • Y. Yuan et al.

    Balancing convergence and diversity in decomposition-based many-objective optimizers

    IEEE Trans. Evol. Comput.

    (2016)
  • S.X. Yang et al.

    A grid-based evolutionary algorithm for many-objective optimization

    IEEE Trans. Evol. Comput.

    (2013)
  • Cited by (30)

    View all citing articles on Scopus
    View full text