Top

Published in:

2021 | OriginalPaper | Chapter

Tackling the Imbalanced Data in Software Maintainability Prediction Using Ensembles for Class Imbalance Problem

Authors : Ruchika Malhotra, Kusum Lata

Published in: Advances in Interdisciplinary Research in Engineering and Business Management

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Early prediction of software maintainability is a potential solution to save the cost of software maintenance which is almost 60–70% of the software development cost. Various machine learning (ML) techniques have been investigated in literature to predict software maintainability. The performance of well-established ML techniques degrades considerably if the training dataset contains irregularities in the form of data skewness or class imbalance. This study empirically investigates the performance of two types of ensembles for class problem, namely Bagging-based ensembles and Boosting-based ensembles. The study also compares the predictive performance of proposed ensembles with classic ensembles: Bagging and AdaBoost. The results of the empirical analysis favor the use of proposed ensembles to develop effective Software Maintainability Prediction (SMP) models from imbalanced datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Breast Cancer Prediction Using Nature Inspired Algorithm

next chapter Petri Net Modeling of Clinical Diagnosis Path in Tuberculosis

Sommerville, I. (2011). Software engineering. Boston: Pearson.MATH

Hayes, J. H., & Zhao, L. (2005, September). Maintainability prediction: A regression analysis of measures of evolving systems. In Software Maintenance, 2005. ICSM’05. Proceedings of the 21st IEEE International Conference on (pp. 601–604). IEEE.

Thwin, M. M. T., & Quah, T. S. (2005). Application of neural networks for software quality prediction using object-oriented metrics. Journal of Systems and Software,76(2), 147–156.CrossRef

Li, W., & Henry, S. (1993). Object-oriented metrics that predict maintainability. Journal of Systems and Software,23(2), 111–122.CrossRef

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications Expert Systems with Applications,73, 220–239.

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484.

He, H., & Garcia, E. A. (2008). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering,9, 1263–1284.

Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR),49(2), 31.

Sun, Z., Song, Q., & Zhu, X. (2012). Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1806–1817.

10.

Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software defect prediction using ensemble learning on selected features. Information and Software Technology,58, 388–402.CrossRef

11.

Pelayo, L., & Dick, S. (2007, June). Applying novel resampling strategies to software defect prediction. In Fuzzy Information Processing Society, 2007. NAFIPS’07. Annual Meeting of the North American (pp. 69–72). IEEE.

12.

Tan, M., Tan, L., Dara, S., & Mayeux, C. (2015, May). Online defect prediction for imbalanced data. 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE).). IEEE, 2, 99–108.

13.

Malhotra, R., & Khanna, M. (2017). An empirical study for software change prediction using imbalanced data. Empirical Software Engineering,22(6), 2806–2851.CrossRef

14.

Oza, N. C., & Tumer, K. (2008). Classifier ensembles: Select real-world applications. Information Fusion,9(1), 4–20.CrossRef

15.

Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 875–886). Boston, MA: Springer.

16.

Breiman, L. (1996). Bagging predictors. Machine Learning,24(2), 123–140.CrossRefMATH

17.

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences,55(1), 119–139.MathSciNetCrossRefMATH

18.

Wang, S., & Yao, X. (2009, March). Diversity analysis on imbalanced data sets by using ensemble models. In Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE Symposium on (pp. 324–331). IEEE.

19.

Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans,40(1), 185–197.CrossRef

20.

Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition,46(12), 3460–3471.CrossRef

21.

Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.

22.

https://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/metric.html.

23.

Malhotra, R., Pritam, N., Nagpal, K., & Upmanyu, P. (2014, September). Defect collection and reporting system for git based open source software. In: 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC). IEEE. pp. 1–7.

24.

https://www.keel.es.

25.

Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering,1, 2–13.CrossRef

26.

Zimmerman, D. W., & Zumbo, B. D. (1993). Relative power of the Wilcoxon test, the Friedman test, and repeated-measures ANOVA on ranks. The Journal of Experimental Education,62(1), 75–86.CrossRef

Title: Tackling the Imbalanced Data in Software Maintainability Prediction Using Ensembles for Class Imbalance Problem
Authors: Ruchika Malhotra
Kusum Lata
Publisher: Springer Nature Singapore
Book: Advances in Interdisciplinary Research in Engineering and Business Management
Print ISBN: 978-981-16-0036-4

Electronic ISBN: 978-981-16-0037-1

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-16-0037-1_31

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"