Unified integration of many-objective optimization algorithm based on temporary offspring for software defects prediction
Introduction
With the development of science and the acceleration of the computerization process, software applications are inseparable from our lives, which have penetrated into the fields of military industry, aerospace, industrial manufacturing, financial field, and energy. The software security issue has been attracted people's attention due to it apply in the cutting-edge area widely. The development of systematization, complication, and integration for software has become an inevitable trend. Serious accidents that occurred caused by information security and software defects are even catastrophic. Therefore, it is necessary to avoid software defects and enhance software security. At present, most researchers believe that the main factor affecting software quality is software defects, which means that some wrong and hidden functional defects modules will cause the software system to crash in the software. So, timely prediction and correction of software defects is a necessary prerequisite for the production of high-quality software products. The technology of software defects prediction comes into being.
Software defects prediction [1] is a meaningful way to guide and evaluate software testing. Predicting the distribution of software defects accurately has essential significance for software testing, which enables testers to discover more defective modules in less effort and time. Fig. 1 shows the basic procedure of software defect prediction. An effective prediction model is constructed to discover whether there are defective modules in the unknown data through the analysis and training of historical data. The advantage of this technology is that it can reasonably use historical data to predict more defective modules accurately, and testers will apply more test resources on modules that are more likely to be defective. It can be seen that software defects prediction technology can virtually guarantee software quality, guide the work of software testing and reduce the labor and time cost consumed, which dramatically improves the efficiency of software testing.
Many scholars have conducted research and discussion in many aspects in view of the research problems of software defects prediction. Liu et al. [2] propose a software defects prediction model based on principal component analysis (PCA) and the parameters of support vector machine (SVM) optimized by chaotic particle swarm optimization (PSO) algorithm. Wang et al. [3] use the genetic algorithm to reduce redundant attributes’ adverse effects in data samples. Meanwhile, an efficient software defects prediction model is established using SVM, but the SVM parameters are not optimized. Cai et al. [4] propose a new under-sampled software defects prediction model with multi-objective cuckoo search algorithm and SVM (HMOCS-US-SVM), which is used to deal with the problem of class imbalance in the datasets and parameters selection. The SVM parameters involve the penalty coefficient and a new parameter in radial basis kernel function (RBF). The formula is shown in Eq. (1). Where is the training sample set, is the bandwidth of the radial basis kernel function.
In summary, the SVM parameter selection problem and the datasets class imbalance problem are two critical factors in solving the software defects prediction problem. It can be seen from the above works of literature that both swarm intelligent optimization algorithms [5, 6] and multi-objective algorithms [7] can be used to optimize the software defects prediction model. While PSO [8, 9] simulates the foraging behavior of birds, and the cuckoo search (CS) [10] simulates the brood parasitism mechanism of cuckoos. Meanwhile, multi-objective cuckoo search algorithm is used to solve software defect prediction problems involving the false positive rate of defects and probability of detection. There are also many indicators that can predict software defects, such as precision, harmonic mean, and error rate et al. Therefore, it is necessary to consider a many-objective optimization algorithm (MaOPs) [11] with four or more objectives to enhance the prediction accuracy. And many-objective optimization algorithm can be defined as follows:where is a vector of dimensional decision variables in the decision space , represents the number of objectives.
- •
Multi-objective optimization problem: the number of objectives .
- •
Many-objective optimization problem: the number of objectives .
Deb et al. [12] improve NSGA-II by introducing a set of predefined reference-points that effectively solve the diversity of high-dimension problem. And NSGA-III is suitable for solving the problem of 3 to 15 objectives. Li et al. [13] propose a many-objective evolutionary algorithm based on dominance and decomposition strategies (MOEA/DD), which is used to solve unconstrained optimization problem. The many-objective optimization algorithm based on ensemble fitness ranking strategy (EFRRR) is proposed by Yuan et al. [14], which is used to solve the problem that aggregate functions cannot maintain diversity. Considering the grid level, crowding distance and coordinate point distance in the process of mating and environmental selection, a grid-based evolutionary algorithm (GrEA) [15] is proposed by Yang to trade-off the convergence and diversity of the population. Moreover, Bader et al. propose a hypervolume estimation algorithm (HypE) [16], which considers the hypervolume indicator rather than the actual indicator values. It uses the method of Monte Carlo simulation to solve the problem with many objectives. So, how to ensure well-converge and well-distributed of the algorithm is crucially important.
Original conference paper [17] considers the false positive rate (pf) and the probability of detection (pd) as two objective functions to construct the multi-objective software defect prediction model. Meanwhile, a multi-objective bat algorithm (MOBA) is designed to solve this model. Based on this, in order to describe the software defect prediction problem in many aspects and improve the prediction accuracy, this paper extends high dimension software defects prediction model (HD-SDP) based on SVM. And intelligent optimization algorithm has made a generous contribution to solving the many-objective problem in recent years. In this paper, we consider designing a many-objective optimization algorithm with better performance to optimize software defects prediction model, which is not found in existing works of literatures. The purpose is to improve predictive performance effectively. Furthermore, the detailed contributions are shown as follows:
- (1)
A new software defects prediction model with four objectives is proposed. Consider the aspects of the probability of detection rate , the false positive rate of defects , the harmonic mean and the overall evolution indicators , while optimizing the problems of software defects datasets class imbalance and SVM parameter selection simultaneously.
- (2)
Since the initial population has a great impact on the algorithm, we design a new framework of many-objective optimization algorithm based on temporary offspring. The formal offspring is generated by temporary offspring strategy, which combines the parents of the population with the formal offspring as the new initial population.
- (3)
The temporary offspring strategy balances the ability of convergence and diversity by computing the temporary offspring value. And the achievement scalar function and penalty angle function is adopted as formal offspring strategy to generate formal offspring. Also, a unified integration strategy is proposed to guide the population to evolve in a better direction.
The structure of this paper is organized as follows: Section 2 gives the related work about software defects prediction. Section 3 introduces the proposed model, which is used to predict defective model. In order to solve this model effectively, a new many-objective optimization algorithm is designed in Section 4. The comparison experiments and the detailed analyses are conducted in Section 5. At last, Section 6 gives the conclusion.
Section snippets
Related work
Software defects prediction technology has developed since the 1970s. In recent years, many software defects prediction methods have been proposed with the increasing software scale and testing cost. Software defects prediction technology is divided into two categories, static and dynamic software defects prediction technology. Dynamic software defects link time to defects modules. In contrast, static software defects prediction technology predicts the potential defects by exploiting the
The Proposed Model for Soft Defects Prediction (HD-SDP)
Firstly, eight data instances are introduced in the National Aeronautics and Space Administration (NASA) MDP test suites. The detailed information of eight datasets with removing duplicate modules is shown in Table 1, including the total number of modules, the number of defective modules, the defect rate and the description.
Since the software defects prediction model is tested on a completely different test set compared with the training set. Given a complexity metric data set for a software
The proposed UIMaOTO
In this section, we first introduce the main framework of the algorithm. Temporary offspring strategy and formal offspring strategy are introduced in Sections 4.2 and 4.3. Finally, the unified integration strategy is proposed in Section 4.4 to save individuals with good performance.
Experiments and Analyses
The common test suites include DTLZ [33] and MaOP [34], which are used to test the performance of algorithm. Section 5.1 gives the test results for UIMaOTO on DTLZ test suite and MaOP test suite, and Section 5.2 applied algorithm for HD-SDP model. The platform used in the experiments is MATLAB_2018a, and the CPU is Intel(R) Core (TM) i7-9750H CPU @ 2.60 GHz, RAM is 16GB.
Conclusions
This paper proposes a high dimension software defect prediction model (HD-SDP), which involved four different objectives: the false positive rate of defects, probability of detection, F-metric and Balance value. Meanwhile, we propose a unified integration of many-objective optimization algorithm (UIMaOTO) for this model. This algorithm designs a temporary offspring strategy to generate formal offspring so as to obtain a better initial population. Moreover, the model with the proposed algorithm
Author Statement
Xingjuan Cai: Conceptualization, Methodology. Shaojin Geng: Writing-original draft, review and editing. Di Wu: Data curation, Visualization. Jinjun Chen: Software, Validation.
Declaration of Competing Interest
The authors declare no conflict of interest.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant Nos. 61806138 and 61772478, Key R and D Program of Shanxi Province (High Technology) under Grant No. 201903D121119, Natural Science Foundation of Shanxi Province under Grant No. 201801D121127. Postgraduate Education Innovation Project of Shanxi Province (Shaojin Geng) under Grant No. 2020SY436.
References (37)
- et al.
Bacterial foraging optimization algorithm in robotic cells with sequence-dependent setup times
Knowl. -Based Syst.
(2019) - et al.
Ensemble Particle Swarm Optimizer
Appl. Soft Comput.
(2017) - et al.
A dynamic neighborhood learning based particle swarm optimizer for global numerical optimization
Inf. Sci.
(2012) - et al.
A neural network approach for early detection of program modules having high risk in the maintenance phase
J. Syst. Softw.
(1995) - et al.
Comparison between MOEA/D and NSGA-III on a set of novel many and multi-objective benchmark problems with challenging difficulties
Swarm Evol. Comput.
(2019) - et al.
Hybrid many-objective particle swarm optimization algorithm for green coal production problem
Inf. Sci.
(2020) - et al.
A systematic literature review on fault prediction performance in software engineering
IEEE Trans. Softw. Eng.
(2012) - et al.
Software defect prediction model based on PCA-ISVM
Comput. Simul.
(2014) - et al.
The application of genetic algorithm support vector machine in software defect predection
Electron. Measur. Technol.
(2012) - et al.
An under-sampled software defect prediction method based on hybrid multi-objective cuckoo search
Concurr. Comput. : Pract. Exp.
(2019)
Flight control system design using adaptive pigeon-inspired optimisation
Int. J. Bio-Inspired Comput.
A privacy-preserving recommendation method based on multi-objective optimisation for mobile users
Int. J. Bio-Inspired Comput.
Inspiration-wise swarm intelligence meta-heuristics for continuous optimisation: a survey - part I
Int. J. Bio-Inspired Comput.
A sharding scheme based many-objective optimization algorithm for enhancing security in blockchain-enabled industrial Internet of Things
IEEE Trans. Ind. Inf.
An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: Solving problems with box constraints
IEEE Trans. Evol. Comput.
An evolutionary many-objective optimization algorithm based on dominance and decomposition
IEEE Trans. Evol. Comput.
Balancing convergence and diversity in decomposition-based many-objective optimizers
IEEE Trans. Evol. Comput.
A grid-based evolutionary algorithm for many-objective optimization
IEEE Trans. Evol. Comput.
Cited by (30)
Third-party software library migration at the method-level using multi-objective evolutionary search
2024, Swarm and Evolutionary ComputationA many-objective evolutionary algorithm assisted by ideal hyperplane
2024, Swarm and Evolutionary ComputationDynamic adaptive multi-objective optimization algorithm based on type detection
2024, Information SciencesCooperative-competitive two-stage game mechanism assisted many-objective evolutionary algorithm
2023, Information Sciences