Skip to main content
Erschienen in: Soft Computing 20/2019

25.09.2018 | Methodologies and Application

Binary teaching–learning-based optimization algorithm with a new update mechanism for sample subset optimization in software defect prediction

verfasst von: Thanh Tung Khuat, My Hanh Le

Erschienen in: Soft Computing | Ausgabe 20/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Software defect prediction has gained considerable attention in recent years. A broad range of computational methods has been developed for accurate prediction of faulty modules based on code and design metrics. One of the challenges in training classifiers is the highly imbalanced class distribution in available datasets, leading to an undesirable bias in the prediction performance for the minority class. Data sampling is a widespread technique to tackle this problem. However, traditional sampling methods, which depend mainly on random resampling from a given dataset, do not take advantage of useful information available in training sets, such as sample quality and representative instances. To cope with this limitation, evolutionary undersampling methods are usually used for identifying an optimal sample subset for the training dataset. This paper proposes a binary teaching–learning- based optimization algorithm employing a distribution-based solution update rule, namely BTLBOd, to generate a balanced subset of highly valuable examples. This subset is then applied to train a classifier for reliable prediction of potentially defective modules in a software system. Each individual in BTLBOd includes two vectors: a real-valued vector generated by the distribution-based update mechanism, and a binary vector produced from the corresponding real vector by a proposed mapping function. Empirical results showed that the optimal sample subset produced by BTLBOd might ameliorate the classification accuracy of the predictor on highly imbalanced software defect data. Obtained results also demonstrated the superior performance of the proposed sampling method compared to other popular sampling techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning, pp 39–50 Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning, pp 39–50
Zurück zum Zitat Akhlaghi M, Emami F, Nozhat N (2014) Binary TLBO algorithm assisted for designing plasmonic nano bi-pyramids-based absorption coefficient. J Mod Opt 61(13):1092–1096CrossRef Akhlaghi M, Emami F, Nozhat N (2014) Binary TLBO algorithm assisted for designing plasmonic nano bi-pyramids-based absorption coefficient. J Mod Opt 61(13):1092–1096CrossRef
Zurück zum Zitat Babaoglu I (2015) Artificial bee colony algorithm with distribution-based update rule. Appl Soft Comput 34:851–861CrossRef Babaoglu I (2015) Artificial bee colony algorithm with distribution-based update rule. Appl Soft Comput 34:851–861CrossRef
Zurück zum Zitat Barandela R, Valdovinos RM, Sánchez JS (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256MathSciNetCrossRef Barandela R, Valdovinos RM, Sánchez JS (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256MathSciNetCrossRef
Zurück zum Zitat Bowes D, Hall T, Petric J (2018) Software defect prediction: Do different classifiers find the same defects? Softw Qual J 26(2):525–552CrossRef Bowes D, Hall T, Petric J (2018) Software defect prediction: Do different classifiers find the same defects? Softw Qual J 26(2):525–552CrossRef
Zurück zum Zitat Bui TL, Vu VT, Dinh TTH (2018) A novel evolutionary multi-objective ensemble learning approach for forecasting currency exchange rates. Data Knowl Eng 114:40–66CrossRef Bui TL, Vu VT, Dinh TTH (2018) A novel evolutionary multi-objective ensemble learning approach for forecasting currency exchange rates. Data Knowl Eng 114:40–66CrossRef
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefMATH
Zurück zum Zitat Chen X, Shen Y, Cui Z, Ju X (2017) Applying feature selection to software defect prediction using multi-objective optimization. In: Proceedings of the 41st IEEE annual computer software and applications conference (COMPSAC), pp 54–59 Chen X, Shen Y, Cui Z, Ju X (2017) Applying feature selection to software defect prediction using multi-objective optimization. In: Proceedings of the 41st IEEE annual computer software and applications conference (COMPSAC), pp 54–59
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
Zurück zum Zitat Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36MathSciNetCrossRef Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36MathSciNetCrossRef
Zurück zum Zitat Ferri C, Hernandez-orallo J, Flach PA (2011) A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th international conference on machine learning, pp 657–664 Ferri C, Hernandez-orallo J, Flach PA (2011) A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th international conference on machine learning, pp 657–664
Zurück zum Zitat Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern 42(4):463–484CrossRef Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern 42(4):463–484CrossRef
Zurück zum Zitat Gholamia V, Chaub KW, Fadaeec F, Torkamanc J, Ghaffari A (2015) Modeling of groundwater level fluctuations using dendrochronology in alluvial aquifers. J Hydrol 529:1060–1069CrossRef Gholamia V, Chaub KW, Fadaeec F, Torkamanc J, Ghaffari A (2015) Modeling of groundwater level fluctuations using dendrochronology in alluvial aquifers. J Hydrol 529:1060–1069CrossRef
Zurück zum Zitat Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of international conference on intelligent computing, pp 878–887 Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of international conference on intelligent computing, pp 878–887
Zurück zum Zitat He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
Zurück zum Zitat He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE international joint conference on neural networks, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE international joint conference on neural networks, pp 1322–1328
Zurück zum Zitat Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70MathSciNetMATH Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70MathSciNetMATH
Zurück zum Zitat Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595CrossRefMATH Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595CrossRefMATH
Zurück zum Zitat Ji X, Ye H, Zhou J, Yin Y, Shen X (2017) An improved teaching-learning-based optimization algorithm and its application to a combinatorial optimization problem in foundry industry. Appl Soft Comput 57(C):504–516CrossRef Ji X, Ye H, Zhou J, Yin Y, Shen X (2017) An improved teaching-learning-based optimization algorithm and its application to a combinatorial optimization problem in foundry industry. Appl Soft Comput 57(C):504–516CrossRef
Zurück zum Zitat Kaboli M, Akhlaghi M (2016) Binary teaching-learning-based optimization algorithm is used to investigate the superscattering plasmonic nanodisk. Optics Spectrosc 120(6):958–963CrossRef Kaboli M, Akhlaghi M (2016) Binary teaching-learning-based optimization algorithm is used to investigate the superscattering plasmonic nanodisk. Optics Spectrosc 120(6):958–963CrossRef
Zurück zum Zitat Kiran MS (2017) Particle swarm optimization with a new update mechanism. Appl Soft Comput 60:670–678CrossRef Kiran MS (2017) Particle swarm optimization with a new update mechanism. Appl Soft Comput 60:670–678CrossRef
Zurück zum Zitat Liu W, Chawla S (2011) Class confidence weighted KNN algorithms for imbalanced data sets. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 345–356 Liu W, Chawla S (2011) Class confidence weighted KNN algorithms for imbalanced data sets. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 345–356
Zurück zum Zitat Lu Y, Cheung Y, Tang YY (2016) Hybrid sampling with bagging for class imbalance learning. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 14–26 Lu Y, Cheung Y, Tang YY (2016) Hybrid sampling with bagging for class imbalance learning. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 14–26
Zurück zum Zitat Rao RV (2015) Review of applications of TLBO algorithm and a tutorial for beginners to solve the unconstrained and constrained optimization problems. Decision Sci Lett 5(1):1–30 Rao RV (2015) Review of applications of TLBO algorithm and a tutorial for beginners to solve the unconstrained and constrained optimization problems. Decision Sci Lett 5(1):1–30
Zurück zum Zitat Rao RV, Patel V (2013) An improved teaching-learning-based optimization algorithm for solving unconstrained optimization problems. Scientia Iranica 20(3):710–720 Rao RV, Patel V (2013) An improved teaching-learning-based optimization algorithm for solving unconstrained optimization problems. Scientia Iranica 20(3):710–720
Zurück zum Zitat Rao RV, Savsani VJ, Vakharia DP (2011) Teaching-learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput Aided Des 43(3):303–315CrossRef Rao RV, Savsani VJ, Vakharia DP (2011) Teaching-learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput Aided Des 43(3):303–315CrossRef
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern 40(1):185–197CrossRef Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern 40(1):185–197CrossRef
Zurück zum Zitat Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recognit 48(5):1623–1637CrossRef Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recognit 48(5):1623–1637CrossRef
Zurück zum Zitat Turhan B, Menzies T, Bener AB, Stefano JD (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRef Turhan B, Menzies T, Bener AB, Stefano JD (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578CrossRef
Zurück zum Zitat Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE symposium on computational intelligence and data mining, pp 324–331 Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE symposium on computational intelligence and data mining, pp 324–331
Zurück zum Zitat Wang L, Zou F, Hei X, Yang D, Chen D, Jiang Q (2014) An improved teaching-learning-based optimization with neighborhood search for applications of ANN. Neurocomputing 143(C):231–247CrossRef Wang L, Zou F, Hei X, Yang D, Chen D, Jiang Q (2014) An improved teaching-learning-based optimization with neighborhood search for applications of ANN. Neurocomputing 143(C):231–247CrossRef
Zurück zum Zitat Yang P, Zhang Z, Zhou BB, Zomaya AY (2011) Sample subset optimization for classifying imbalanced biological data. In: Proceedings of the 15th Pacific-Asia conference on knowledge discovery and data mining, pp 333–344 Yang P, Zhang Z, Zhou BB, Zomaya AY (2011) Sample subset optimization for classifying imbalanced biological data. In: Proceedings of the 15th Pacific-Asia conference on knowledge discovery and data mining, pp 333–344
Zurück zum Zitat Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern 44(3):445–455CrossRef Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern 44(3):445–455CrossRef
Zurück zum Zitat Yang L, Liu S, Tsoka S, Papageorgiou LG (2015) Sample re-weighting hyper box classifier for multi-class data classification. Comput Ind Eng 85:44–56CrossRef Yang L, Liu S, Tsoka S, Papageorgiou LG (2015) Sample re-weighting hyper box classifier for multi-class data classification. Comput Ind Eng 85:44–56CrossRef
Zurück zum Zitat Yu K, Wang X, Wang Z (2016) An improved teaching-learning-based optimization algorithm for numerical and engineering optimization problems. J Intell Manuf 27(4):831–843CrossRef Yu K, Wang X, Wang Z (2016) An improved teaching-learning-based optimization algorithm for numerical and engineering optimization problems. J Intell Manuf 27(4):831–843CrossRef
Zurück zum Zitat Zhao W, Tang S, DaiAn W (2012) Improved kNN algorithm based on essential vector. Elektronika ir Elektrotechnika 123(7):119–122CrossRef Zhao W, Tang S, DaiAn W (2012) Improved kNN algorithm based on essential vector. Elektronika ir Elektrotechnika 123(7):119–122CrossRef
Zurück zum Zitat Zou F, Wang L, Hei X, Chen D, Yang D (2014) Teaching-learning-based optimization with dynamic group strategy for global optimization. Inf Sci 273:112–131CrossRef Zou F, Wang L, Hei X, Chen D, Yang D (2014) Teaching-learning-based optimization with dynamic group strategy for global optimization. Inf Sci 273:112–131CrossRef
Zurück zum Zitat Zou F, Wang L, Hei X, Chen D (2015) Teaching–learning-based optimization with learning experience of other learners and its application. Appl Soft Comput 37(C):725–736CrossRef Zou F, Wang L, Hei X, Chen D (2015) Teaching–learning-based optimization with learning experience of other learners and its application. Appl Soft Comput 37(C):725–736CrossRef
Metadaten
Titel
Binary teaching–learning-based optimization algorithm with a new update mechanism for sample subset optimization in software defect prediction
verfasst von
Thanh Tung Khuat
My Hanh Le
Publikationsdatum
25.09.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 20/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3546-6

Weitere Artikel der Ausgabe 20/2019

Soft Computing 20/2019 Zur Ausgabe

Premium Partner