Skip to main content
Top
Published in: Innovations in Systems and Software Engineering 4/2015

01-12-2015 | Original Paper

A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction

Authors: Yousef Abdi, Saeed Parsa, Yousef Seyfari

Published in: Innovations in Systems and Software Engineering | Issue 4/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software testing is a fundamental activity in the software development process aimed to determine the quality of software. To reduce the effort and cost of this process, defect prediction methods can be used to determine fault-prone software modules through software metrics to focus testing activities on them. Because of model interpretation and easily used by programmers and testers some recent studies presented classification rules to make prediction models. This study presents a rule-based prediction approach based on kernel k-means clustering algorithm and Distance based Multi-objective Particle Swarm Optimization (DSMOPSO). Because of discrete search space, we modified this algorithm and named it DSMOPSO-D. We prevent best global rules to dominate local rules by dividing the search space with kernel k-means algorithm and by taking different approaches for imbalanced and balanced clusters, we solved imbalanced data set problem. The presented model performance was evaluated by four publicly available data sets from the PROMISE repository and compared with other machine learning and rule learning algorithms. The obtained results demonstrate that our model presents very good performance, especially in large data sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arisholm E, Briand, LC, Johannessen E (2008) Data mining techniques, candidate measures and evaluation methods for building practically useful fault-proneness prediction models. Dissertation, University of Oslo Arisholm E, Briand, LC, Johannessen E (2008) Data mining techniques, candidate measures and evaluation methods for building practically useful fault-proneness prediction models. Dissertation, University of Oslo
2.
go back to reference Anil KJ (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666CrossRef Anil KJ (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666CrossRef
3.
go back to reference de Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882CrossRef de Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882CrossRef
4.
go back to reference Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38(4):4626–4636CrossRef Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38(4):4626–4636CrossRef
5.
go back to reference Catal C, Diri B (2009) A systematic review of software fault predictions studies. Expert Syst Appl 36(4):7346–7354CrossRef Catal C, Diri B (2009) A systematic review of software fault predictions studies. Expert Syst Appl 36(4):7346–7354CrossRef
6.
go back to reference Chulani S, Ray B, Santhanam P, Leszkowicz R (2003) Metrics for managing customer view of software quality. In: Proceedings of 9th IEEE international conference on software metrics symposium, pp 189–198 Chulani S, Ray B, Santhanam P, Leszkowicz R (2003) Metrics for managing customer view of software quality. In: Proceedings of 9th IEEE international conference on software metrics symposium, pp 189–198
7.
go back to reference Coello CA, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evol Comput 8(3):256–279CrossRef Coello CA, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evol Comput 8(3):256–279CrossRef
8.
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet
9.
go back to reference Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660CrossRef Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660CrossRef
10.
go back to reference Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using bayesian nets. Inf Softw Technol 49(1):32–43CrossRef Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using bayesian nets. Inf Softw Technol 49(1):32–43CrossRef
11.
go back to reference Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for IEEE clustering. Pattern Recogn 41(1):176–190MATHCrossRef Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for IEEE clustering. Pattern Recogn 41(1):176–190MATHCrossRef
12.
go back to reference Freitas AA (2008) A review of evolutionary algorithms for data mining. In: Maimon O, Rockach L (eds) Soft computing for knowledge discovery and data mining, 2nd edn. Springer, New York, pp 79–111CrossRef Freitas AA (2008) A review of evolutionary algorithms for data mining. In: Maimon O, Rockach L (eds) Soft computing for knowledge discovery and data mining, 2nd edn. Springer, New York, pp 79–111CrossRef
13.
go back to reference He H (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
14.
go back to reference Hu X, Eberhart R (2002) Multiobjective optimization using dynamic neighborhood paricle swarm optimization. In: Proceeding of second international conference on evolutionary computation, pp 1677–1681 Hu X, Eberhart R (2002) Multiobjective optimization using dynamic neighborhood paricle swarm optimization. In: Proceeding of second international conference on evolutionary computation, pp 1677–1681
15.
go back to reference Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceeding of IEEE international conference on neural networks, pp 1942–1948 Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceeding of IEEE international conference on neural networks, pp 1942–1948
16.
go back to reference Kennedy J, Spears W (1998) Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: Proceeding of IEEE international conference on computational intelligence, pp 74–77 Kennedy J, Spears W (1998) Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: Proceeding of IEEE international conference on computational intelligence, pp 74–77
17.
go back to reference Kim DW, Lee KY, Lee D, Lee KH (2005) Evaluation of the performance of clustering algorithms in kernel-induced feature space. Pattern Recogn 38(4):607–611CrossRef Kim DW, Lee KY, Lee D, Lee KH (2005) Evaluation of the performance of clustering algorithms in kernel-induced feature space. Pattern Recogn 38(4):607–611CrossRef
18.
go back to reference Khoshgoftaar TM, Gao K, Seliya N (2010) Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of 22nd IEEE international conference on tools with artificial intelligence, pp 137–144 Khoshgoftaar TM, Gao K, Seliya N (2010) Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of 22nd IEEE international conference on tools with artificial intelligence, pp 137–144
19.
go back to reference Koru G, Liu H (2005) Building effective defect prediction models in practice. IEEE Softw 22(6):23–29CrossRef Koru G, Liu H (2005) Building effective defect prediction models in practice. IEEE Softw 22(6):23–29CrossRef
20.
go back to reference Kwedlo W, Iwanowicz P (2010) Using genetic algorithm for selection of initial cluster centers for the k-means method. In: Proceeding of 10th international conference on artifical intelligence and soft computing, pp 165–172 Kwedlo W, Iwanowicz P (2010) Using genetic algorithm for selection of initial cluster centers for the k-means method. In: Proceeding of 10th international conference on artifical intelligence and soft computing, pp 165–172
21.
go back to reference Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
22.
go back to reference Lletı R, Ortiz MC, Sarabia LA, Sánchez MS (2004) Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. Anal Chim Acta 515(1):87–100CrossRef Lletı R, Ortiz MC, Sarabia LA, Sánchez MS (2004) Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. Anal Chim Acta 515(1):87–100CrossRef
23.
go back to reference Lounis H, Ait-Mehedine L (2004) Machine-learning techniques for software product quality assessment. In: Proceeding of 4th IEEE international conference on quality software, pp 102–109 Lounis H, Ait-Mehedine L (2004) Machine-learning techniques for software product quality assessment. In: Proceeding of 4th IEEE international conference on quality software, pp 102–109
25.
go back to reference Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J 20(7):739–762CrossRef Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J 20(7):739–762CrossRef
26.
go back to reference Mahaweerawat A, Sophatsathit P, Lursinsap C, Musilek P (2004) Fault prediction in object-oriented software using neural network techniques. In: Proceeding in Tech Conference on, pp 27–34 Mahaweerawat A, Sophatsathit P, Lursinsap C, Musilek P (2004) Fault prediction in object-oriented software using neural network techniques. In: Proceeding in Tech Conference on, pp 27–34
27.
go back to reference Mardia K, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London Mardia K, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London
28.
go back to reference Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef
29.
go back to reference Michalewicz Z (1994) Genetic algorithms + data structures = evolution programs. Springer, New YorkMATHCrossRef Michalewicz Z (1994) Genetic algorithms + data structures = evolution programs. Springer, New YorkMATHCrossRef
30.
go back to reference Mostaghim S, Teich J (2003) Strategies for finding good local guides in multiobjective particle swarm optimization. In: Proceeding fo third IEEE international conference on Swarm intelligence, pp 26–33 Mostaghim S, Teich J (2003) Strategies for finding good local guides in multiobjective particle swarm optimization. In: Proceeding fo third IEEE international conference on Swarm intelligence, pp 26–33
31.
go back to reference Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202CrossRef Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202CrossRef
32.
go back to reference Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans Softw Eng 33(10):675–686CrossRef Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans Softw Eng 33(10):675–686CrossRef
33.
go back to reference Prez-Miana E, Gras J-J (2006) Improving fault prediction using bayesian networks for the development of embedded software applications: research articles. Softw Test Verification Reliab 16(3):157–174CrossRef Prez-Miana E, Gras J-J (2006) Improving fault prediction using bayesian networks for the development of embedded software applications: research articles. Softw Test Verification Reliab 16(3):157–174CrossRef
34.
go back to reference Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231MATHCrossRef Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231MATHCrossRef
35.
go back to reference Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30CrossRef Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30CrossRef
36.
go back to reference Riquelme JC, Ruiz R, Rodríguez D, Moreno J (2008) Finding defective modules from highly unbalanced datasets. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 2(1):67–74 Riquelme JC, Ruiz R, Rodríguez D, Moreno J (2008) Finding defective modules from highly unbalanced datasets. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 2(1):67–74
37.
go back to reference Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65MATHCrossRef Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65MATHCrossRef
38.
go back to reference Shayeghi H, Mahdavi M, Bagheri A (2010) An improved DPSO with mutation based on similarity algorithm for optimization of transmission lines loading. Energy Convers Manag 51(12):2715–2723CrossRef Shayeghi H, Mahdavi M, Bagheri A (2010) An improved DPSO with mutation based on similarity algorithm for optimization of transmission lines loading. Energy Convers Manag 51(12):2715–2723CrossRef
39.
go back to reference Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2007) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceeding of IEEE international conference on information reuse and integration, pp 651–658 Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2007) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceeding of IEEE international conference on information reuse and integration, pp 651–658
40.
go back to reference Singh Y, Kaur A, Malhotra R (2009) Software fault pronennes prediction using support vector machines. In: Proceeding of IEEE international conference on engineering Singh Y, Kaur A, Malhotra R (2009) Software fault pronennes prediction using support vector machines. In: Proceeding of IEEE international conference on engineering
41.
42.
go back to reference Tan KC, Yu Q, Ang JH (2006) A dual-objective evolutionary algorithm for rules extraction in data mining. Comput Optim Appl 34(2):273–294MATHMathSciNetCrossRef Tan KC, Yu Q, Ang JH (2006) A dual-objective evolutionary algorithm for rules extraction in data mining. Comput Optim Appl 34(2):273–294MATHMathSciNetCrossRef
43.
go back to reference Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173MATH Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173MATH
44.
go back to reference Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443CrossRef Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443CrossRef
45.
go back to reference Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction sbased on support vector machine. In: Proceeding of 16th IEEE international conference on software reliability engineering, pp 213–222 Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction sbased on support vector machine. In: Proceeding of 16th IEEE international conference on software reliability engineering, pp 213–222
46.
go back to reference Zhongkai L, Zhencai Z, Shanzeng L (2010) A distance sorting based multi-objective particle swarm optimizer and its applications. Life Syst Model Intell Comput 98:30–36CrossRef Zhongkai L, Zhencai Z, Shanzeng L (2010) A distance sorting based multi-objective particle swarm optimizer and its applications. Life Syst Model Intell Comput 98:30–36CrossRef
Metadata
Title
A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction
Authors
Yousef Abdi
Saeed Parsa
Yousef Seyfari
Publication date
01-12-2015
Publisher
Springer London
Published in
Innovations in Systems and Software Engineering / Issue 4/2015
Print ISSN: 1614-5046
Electronic ISSN: 1614-5054
DOI
https://doi.org/10.1007/s11334-015-0258-2

Other articles of this Issue 4/2015

Innovations in Systems and Software Engineering 4/2015 Go to the issue

Premium Partner