Skip to main content

2019 | OriginalPaper | Buchkapitel

Cost-Sensitive Learner on Hybrid SMOTE-Ensemble Approach to Predict Software Defects

verfasst von : Inas Abuqaddom, Amjad Hudaib

Erschienen in: Computational and Statistical Methods in Intelligent Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A software defect is a mistake in a computer program or system that causes to have incorrect or unexpected results, or to behave in unintended ways. Machine learning methods are helpful in software defect prediction, even though with the challenge of imbalanced software defect distribution, such that the non-defect modules are much higher than defective modules. In this paper we introduce an enhancement for the most resent hybrid SMOTE-Ensemble approach to deal with software defects problem, utilizing the Cost-Sensitive Learner (CSL) to improve handling imbalanced distribution issue. This paper utilizes four public available datasets of software defects with different imbalanced ratio, and provides comparative performance analysis with the most resent powerful hybrid SMOTE-Ensemble approach to predict software defects. Experimental results show that utilizing multiple machine learning techniques to cope with imbalanced datasets will improve the prediction of software defects. Also, experimental results reveal that cost-sensitive learner performs very well with highly imbalanced datasets than with low imbalanced datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Menzies, T., Di Stefano, J.S.: How good is your blind spot sampling policy? Submitted to the 8th IEEE International Symposium on High Assurance Systems Engineering, 25–26 March 2004 Menzies, T., Di Stefano, J.S.: How good is your blind spot sampling policy? Submitted to the 8th IEEE International Symposium on High Assurance Systems Engineering, 25–26 March 2004
2.
Zurück zum Zitat Aleem, S., Capretz, L.F., Ahmed, F.: Benchmarking machine learning techniques for software defect detection. Int. J. Softw. Eng. Appl. (IJSEA) 6(3) (2015) Aleem, S., Capretz, L.F., Ahmed, F.: Benchmarking machine learning techniques for software defect detection. Int. J. Softw. Eng. Appl. (IJSEA) 6(3) (2015)
3.
Zurück zum Zitat Chitraranjan, C.D., et al.: Frequent substring-based sequence classification with an ensemble of support vector machines trained using reduced amino acid alphabets. In: IEEE 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA) (2011) Chitraranjan, C.D., et al.: Frequent substring-based sequence classification with an ensemble of support vector machines trained using reduced amino acid alphabets. In: IEEE 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA) (2011)
4.
Zurück zum Zitat Abaei, G., Selamat, A.: A survey on software fault detection based on different prediction approaches. Vietnam J. Comput. Sci. 1, 79–95 (2014)CrossRef Abaei, G., Selamat, A.: A survey on software fault detection based on different prediction approaches. Vietnam J. Comput. Sci. 1, 79–95 (2014)CrossRef
5.
Zurück zum Zitat Quah, T.S., Thwin, M.M.T.: Application of neural networks for software quality prediction using object-oriented metrics. In: Proceedings on International Conference on Software Maintenance, ICSM 2003, pp. 116–125. IEEE (2003) Quah, T.S., Thwin, M.M.T.: Application of neural networks for software quality prediction using object-oriented metrics. In: Proceedings on International Conference on Software Maintenance, ICSM 2003, pp. 116–125. IEEE (2003)
6.
Zurück zum Zitat Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 2–13 (2007)CrossRef Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 2–13 (2007)CrossRef
7.
Zurück zum Zitat Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support. Vector Mach. J. Syst. Softw. 81, 649–660 (2008)CrossRef Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support. Vector Mach. J. Syst. Softw. 81, 649–660 (2008)CrossRef
8.
Zurück zum Zitat Koru, A.G., Liu, H.: Building effective defect-prediction models in practice. IEEE Softw. 22, 23–29 (2005)CrossRef Koru, A.G., Liu, H.: Building effective defect-prediction models in practice. IEEE Softw. 22, 23–29 (2005)CrossRef
9.
Zurück zum Zitat Han, J., Kamber, M., Jian, P.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2011)MATH Han, J., Kamber, M., Jian, P.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2011)MATH
10.
Zurück zum Zitat Hernandez, J., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F.: An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP. Lecture Notes in Computer Science, vol. 8258. Springer, Heidelberg (2013) Hernandez, J., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F.: An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP. Lecture Notes in Computer Science, vol. 8258. Springer, Heidelberg (2013)
11.
Zurück zum Zitat Jiang, K., Lu, J., Xia, K.: A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE. King Fahd University of Petroleum & Minerals. Springer (2016) Jiang, K., Lu, J., Xia, K.: A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE. King Fahd University of Petroleum & Minerals. Springer (2016)
12.
Zurück zum Zitat Alsawalqah, H., Faris, H., Aljarah, I., Alnemer, L., Alhindawi, N.: Hybrid SMOTE-ensemble approach for software defect prediction. In: Silhavy, R., Silhavy, P., Prokopova, Z., Senkerik, R., Kominkova Oplatkova, Z. (eds) Software Engineering Trends and Techniques in Intelligent Systems. CSOC 2017. Advances in Intelligent Systems and Computing, vol. 575. Springer, Cham (2017) Alsawalqah, H., Faris, H., Aljarah, I., Alnemer, L., Alhindawi, N.: Hybrid SMOTE-ensemble approach for software defect prediction. In: Silhavy, R., Silhavy, P., Prokopova, Z., Senkerik, R., Kominkova Oplatkova, Z. (eds) Software Engineering Trends and Techniques in Intelligent Systems. CSOC 2017. Advances in Intelligent Systems and Computing, vol. 575. Springer, Cham (2017)
14.
Zurück zum Zitat Buhlmann, P.: Bagging, Boosting and Ensemble Methods. Seminar fur Statistik, ETH Zurich, Zurich Buhlmann, P.: Bagging, Boosting and Ensemble Methods. Seminar fur Statistik, ETH Zurich, Zurich
15.
Zurück zum Zitat Breiman, L.: Random Forests. Kluwer Academic Publishers, Dordrecht (2001). Manufactured in The NetherlandsMATH Breiman, L.: Random Forests. Kluwer Academic Publishers, Dordrecht (2001). Manufactured in The NetherlandsMATH
16.
Zurück zum Zitat Suleiman, D., Al-Naymata, G.: SMS spam detection using H2O framework. In: The 8th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2017). Published by Elsevier (2017) Suleiman, D., Al-Naymata, G.: SMS spam detection using H2O framework. In: The 8th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2017). Published by Elsevier (2017)
17.
Zurück zum Zitat Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-Sensitive Learning Methods for Imbalanced Data Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-Sensitive Learning Methods for Imbalanced Data
18.
Zurück zum Zitat Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Copyright by Foxit software company (2007) Weiss, G.M., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Copyright by Foxit software company (2007)
19.
Zurück zum Zitat Ling, C.X., Sheng, S.: Cost-sensitive learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning. Springer, Boston (2011) Ling, C.X., Sheng, S.: Cost-sensitive learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning. Springer, Boston (2011)
20.
Zurück zum Zitat Lopez, V., Fernandez, A., Moreno-Torres, J.G., Herrera, F.: Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 39, 6585–6608 (2012)CrossRef Lopez, V., Fernandez, A., Moreno-Torres, J.G., Herrera, F.: Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 39, 6585–6608 (2012)CrossRef
21.
Zurück zum Zitat Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39, 1208–1215 (2013)CrossRef Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39, 1208–1215 (2013)CrossRef
24.
Zurück zum Zitat Menzies, T., DiStefano, J., Orrego, A., Chapman (Mike), R.: Assessing predictors of software defects. Workshop on Predictive Software Models, Chicago, USA Co-located with ICSM (2004) Menzies, T., DiStefano, J., Orrego, A., Chapman (Mike), R.: Assessing predictors of software defects. Workshop on Predictive Software Models, Chicago, USA Co-located with ICSM (2004)
25.
Zurück zum Zitat Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M., Adwan, O.: Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int. J. Commun. Netw. Syst. Sci. 8, 118–129 (2015) Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M., Adwan, O.: Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int. J. Commun. Netw. Syst. Sci. 8, 118–129 (2015)
26.
Zurück zum Zitat Niu, N., Mahmoud, A.: Enhancing Candidate Link Generation for Requirements Tracing: The Cluster Hypothesis Revisited. 978-1-4673-2785-5/12/\$31.00 c 2012 IEEE (2012) Niu, N., Mahmoud, A.: Enhancing Candidate Link Generation for Requirements Tracing: The Cluster Hypothesis Revisited. 978-1-4673-2785-5/12/\$31.00 c 2012 IEEE (2012)
Metadaten
Titel
Cost-Sensitive Learner on Hybrid SMOTE-Ensemble Approach to Predict Software Defects
verfasst von
Inas Abuqaddom
Amjad Hudaib
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-00211-4_2

Premium Partner