Skip to main content
Top

2021 | OriginalPaper | Chapter

Tackling the Imbalanced Data in Software Maintainability Prediction Using Ensembles for Class Imbalance Problem

Authors : Ruchika Malhotra, Kusum Lata

Published in: Advances in Interdisciplinary Research in Engineering and Business Management

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Early prediction of software maintainability is a potential solution to save the cost of software maintenance which is almost 60–70% of the software development cost. Various machine learning (ML) techniques have been investigated in literature to predict software maintainability. The performance of well-established ML techniques degrades considerably if the training dataset contains irregularities in the form of data skewness or class imbalance. This study empirically investigates the performance of two types of ensembles for class problem, namely Bagging-based ensembles and Boosting-based ensembles. The study also compares the predictive performance of proposed ensembles with classic ensembles: Bagging and AdaBoost. The results of the empirical analysis favor the use of proposed ensembles to develop effective Software Maintainability Prediction (SMP) models from imbalanced datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sommerville, I. (2011). Software engineering. Boston: Pearson.MATH Sommerville, I. (2011). Software engineering. Boston: Pearson.MATH
2.
go back to reference Hayes, J. H., & Zhao, L. (2005, September). Maintainability prediction: A regression analysis of measures of evolving systems. In Software Maintenance, 2005. ICSM’05. Proceedings of the 21st IEEE International Conference on (pp. 601–604). IEEE. Hayes, J. H., & Zhao, L. (2005, September). Maintainability prediction: A regression analysis of measures of evolving systems. In Software Maintenance, 2005. ICSM’05. Proceedings of the 21st IEEE International Conference on (pp. 601–604). IEEE.
3.
go back to reference Thwin, M. M. T., & Quah, T. S. (2005). Application of neural networks for software quality prediction using object-oriented metrics. Journal of Systems and Software,76(2), 147–156.CrossRef Thwin, M. M. T., & Quah, T. S. (2005). Application of neural networks for software quality prediction using object-oriented metrics. Journal of Systems and Software,76(2), 147–156.CrossRef
4.
go back to reference Li, W., & Henry, S. (1993). Object-oriented metrics that predict maintainability. Journal of Systems and Software,23(2), 111–122.CrossRef Li, W., & Henry, S. (1993). Object-oriented metrics that predict maintainability. Journal of Systems and Software,23(2), 111–122.CrossRef
5.
go back to reference Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications Expert Systems with Applications,73, 220–239. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications Expert Systems with Applications,73, 220–239.
6.
go back to reference Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484.
7.
go back to reference He, H., & Garcia, E. A. (2008). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering,9, 1263–1284. He, H., & Garcia, E. A. (2008). Learning from imbalanced data. IEEE Transactions on Knowledge & Data Engineering,9, 1263–1284.
8.
go back to reference Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR),49(2), 31. Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR),49(2), 31.
9.
go back to reference Sun, Z., Song, Q., & Zhu, X. (2012). Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1806–1817. Sun, Z., Song, Q., & Zhu, X. (2012). Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1806–1817.
10.
go back to reference Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software defect prediction using ensemble learning on selected features. Information and Software Technology,58, 388–402.CrossRef Laradji, I. H., Alshayeb, M., & Ghouti, L. (2015). Software defect prediction using ensemble learning on selected features. Information and Software Technology,58, 388–402.CrossRef
11.
go back to reference Pelayo, L., & Dick, S. (2007, June). Applying novel resampling strategies to software defect prediction. In Fuzzy Information Processing Society, 2007. NAFIPS’07. Annual Meeting of the North American (pp. 69–72). IEEE. Pelayo, L., & Dick, S. (2007, June). Applying novel resampling strategies to software defect prediction. In Fuzzy Information Processing Society, 2007. NAFIPS’07. Annual Meeting of the North American (pp. 69–72). IEEE.
12.
go back to reference Tan, M., Tan, L., Dara, S., & Mayeux, C. (2015, May). Online defect prediction for imbalanced data. 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE).). IEEE, 2, 99–108. Tan, M., Tan, L., Dara, S., & Mayeux, C. (2015, May). Online defect prediction for imbalanced data. 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE).). IEEE, 2, 99–108.
13.
go back to reference Malhotra, R., & Khanna, M. (2017). An empirical study for software change prediction using imbalanced data. Empirical Software Engineering,22(6), 2806–2851.CrossRef Malhotra, R., & Khanna, M. (2017). An empirical study for software change prediction using imbalanced data. Empirical Software Engineering,22(6), 2806–2851.CrossRef
14.
go back to reference Oza, N. C., & Tumer, K. (2008). Classifier ensembles: Select real-world applications. Information Fusion,9(1), 4–20.CrossRef Oza, N. C., & Tumer, K. (2008). Classifier ensembles: Select real-world applications. Information Fusion,9(1), 4–20.CrossRef
15.
go back to reference Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 875–886). Boston, MA: Springer. Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 875–886). Boston, MA: Springer.
17.
go back to reference Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences,55(1), 119–139.MathSciNetCrossRefMATH Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences,55(1), 119–139.MathSciNetCrossRefMATH
18.
go back to reference Wang, S., & Yao, X. (2009, March). Diversity analysis on imbalanced data sets by using ensemble models. In Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE Symposium on (pp. 324–331). IEEE. Wang, S., & Yao, X. (2009, March). Diversity analysis on imbalanced data sets by using ensemble models. In Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE Symposium on (pp. 324–331). IEEE.
19.
go back to reference Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans,40(1), 185–197.CrossRef Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans,40(1), 185–197.CrossRef
20.
go back to reference Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition,46(12), 3460–3471.CrossRef Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition,46(12), 3460–3471.CrossRef
21.
go back to reference Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier. Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
23.
go back to reference Malhotra, R., Pritam, N., Nagpal, K., & Upmanyu, P. (2014, September). Defect collection and reporting system for git based open source software. In: 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC). IEEE. pp. 1–7. Malhotra, R., Pritam, N., Nagpal, K., & Upmanyu, P. (2014, September). Defect collection and reporting system for git based open source software. In: 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC). IEEE. pp. 1–7.
25.
go back to reference Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering,1, 2–13.CrossRef Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering,1, 2–13.CrossRef
26.
go back to reference Zimmerman, D. W., & Zumbo, B. D. (1993). Relative power of the Wilcoxon test, the Friedman test, and repeated-measures ANOVA on ranks. The Journal of Experimental Education,62(1), 75–86.CrossRef Zimmerman, D. W., & Zumbo, B. D. (1993). Relative power of the Wilcoxon test, the Friedman test, and repeated-measures ANOVA on ranks. The Journal of Experimental Education,62(1), 75–86.CrossRef
Metadata
Title
Tackling the Imbalanced Data in Software Maintainability Prediction Using Ensembles for Class Imbalance Problem
Authors
Ruchika Malhotra
Kusum Lata
Copyright Year
2021
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-16-0037-1_31