Skip to main content
Top
Published in: Empirical Software Engineering 6/2017

05-01-2017

An empirical study for software change prediction using imbalanced data

Authors: Ruchika Malhotra, Megha Khanna

Published in: Empirical Software Engineering | Issue 6/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software change prediction is crucial in order to efficiently plan resource allocation during testing and maintenance phases of a software. Moreover, correct identification of change-prone classes in the early phases of software development life cycle helps in developing cost-effective, good quality and maintainable software. An effective software change prediction model should equally recognize change-prone and not change-prone classes with high accuracy. However, this is not the case as software practitioners often have to deal with imbalanced data sets where instances of one type of class is much higher than the other type. In such a scenario, the minority classes are not predicted with much accuracy leading to strategic losses. This study evaluates a number of techniques for handling imbalanced data sets using various data sampling methods and MetaCost learners on six open-source data sets. The results of the study advocate the use of resample with replacement sampling method for effective imbalanced learning.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Apandi ZFM, Mustapha N, Affendey LS (2011) Evaluating integrated weight linear method to class imbalanced learning in video data. In 3rd Conference on Data Mining and Optimization, 243–247 Apandi ZFM, Mustapha N, Affendey LS (2011) Evaluating integrated weight linear method to class imbalanced learning in video data. In 3rd Conference on Data Mining and Optimization, 243247
go back to reference Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17CrossRef Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17CrossRef
go back to reference Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3(10):27–38 Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3(10):27–38
go back to reference Bieman J, Jain D, Yang H (2001) OO design patterns, design structure, and program changes: an industrial case study. In proceedings of 17th International Conference on Software Maintenance, 580–589 Bieman J, Jain D, Yang H (2001) OO design patterns, design structure, and program changes: an industrial case study. In proceedings of 17th International Conference on Software Maintenance, 580–589
go back to reference Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATH
go back to reference Briand L, Daly J, Wust J (1998) A unified framework for cohesion measurement in object-oriented systems. Empir Softw Eng 3(1):65–117CrossRef Briand L, Daly J, Wust J (1998) A unified framework for cohesion measurement in object-oriented systems. Empir Softw Eng 3(1):65–117CrossRef
go back to reference Briand L, Daly J, Wust J (1999) A unified framework for coupling measurement in object-oriented systems. IEEE Trans Softw Eng 25(1):91–121CrossRef Briand L, Daly J, Wust J (1999) A unified framework for coupling measurement in object-oriented systems. IEEE Trans Softw Eng 25(1):91–121CrossRef
go back to reference Briand L, Wust J, Daly JW (2000) Exploring the relationship between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273CrossRef Briand L, Wust J, Daly JW (2000) Exploring the relationship between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273CrossRef
go back to reference Briand L, Wust J, Lounis H (2001) Replicated case studies for investigating quality factors in object oriented designs. Empir Softw Eng J 6(1):11–58CrossRefMATH Briand L, Wust J, Lounis H (2001) Replicated case studies for investigating quality factors in object oriented designs. Empir Softw Eng J 6(1):11–58CrossRefMATH
go back to reference CartWright M, Shepperd M (2000) An empirical investigation of an object-oriented software system. IEEE Tran Softw Eng 26(8):786–796CrossRef CartWright M, Shepperd M (2000) An empirical investigation of an object-oriented software system. IEEE Tran Softw Eng 26(8):786–796CrossRef
go back to reference Carvalho ABD, Pozo A, Vergilio SR (2010) A symbolic fault-prediction model based on multi-objective particle swarm optimization. J Syst Softw 83(5):868–882CrossRef Carvalho ABD, Pozo A, Vergilio SR (2010) A symbolic fault-prediction model based on multi-objective particle swarm optimization. J Syst Softw 83(5):868–882CrossRef
go back to reference Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci 179(8):1040–1058CrossRef Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf Sci 179(8):1040–1058CrossRef
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
go back to reference Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Tran Softw Eng 20(6):476–493CrossRef Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Tran Softw Eng 20(6):476–493CrossRef
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
go back to reference Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, CA, 155–164 Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, CA, 155–164
go back to reference Elish MO, Al-Khiaty MA (2013) A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J Softw: Evol Process 25(5):407–437 Elish MO, Al-Khiaty MA (2013) A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J Softw: Evol Process 25(5):407–437
go back to reference Eski S, Buzluca F (2011) An empirical study on object-oriented metrics and software evolution in order to reduce testing cost by predicting change prone classes. In Proc. of International Conference on Software Testing, Verification and Validation Workshop, 566–571 Eski S, Buzluca F (2011) An empirical study on object-oriented metrics and software evolution in order to reduce testing cost by predicting change prone classes. In Proc. of International Conference on Software Testing, Verification and Validation Workshop, 566–571
go back to reference Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484CrossRef Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484CrossRef
go back to reference Gao K, Khoshgoftaar TM, Napolitano A (2015) Combining feature subset selection and data sampling for coping with highly imbalanced software data. In Proc. of 27th International Conf. on Software Engineering and Knowledge Engineering, Pittsburgh, 2015 Gao K, Khoshgoftaar TM, Napolitano A (2015) Combining feature subset selection and data sampling for coping with highly imbalanced software data. In Proc. of 27th International Conf. on Software Engineering and Knowledge Engineering, Pittsburgh, 2015
go back to reference Giger E, Pinzger M, Gall HC (2012) Can we predict type of code changes? An empirical analysis. In Proc. of 9th IEEE Working Conference on Mining Software Repositories, 217–226 Giger E, Pinzger M, Gall HC (2012) Can we predict type of code changes? An empirical analysis. In Proc. of 9th IEEE Working Conference on Mining Software Repositories, 217–226
go back to reference Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In Proc. of the Seventeenth International Conference on Machine Learning, 359–366 Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In Proc. of the Seventeenth International Conference on Machine Learning, 359–366
go back to reference Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447CrossRef Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447CrossRef
go back to reference Harman M, Islam S, Jia Y, Minku LL, Sarro F, Sirivisut K (2014) less is more: temporal fault predictive performance over multiple Hadoop releases. In Proc. 6th International Symposium on Search Based Software Engineering, 240–246 Harman M, Islam S, Jia Y, Minku LL, Sarro F, Sirivisut K (2014) less is more: temporal fault predictive performance over multiple Hadoop releases. In Proc. 6th International Symposium on Search Based Software Engineering, 240–246
go back to reference Haykin S (2004) Neural networks: a comprehensive foundation, 2nd edn. Pearson education, DelhiMATH Haykin S (2004) Neural networks: a comprehensive foundation, 2nd edn. Pearson education, DelhiMATH
go back to reference He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
go back to reference Henderson-Sellers B (1996) Object-oriented metrics, measures of complexity. Prentice Hall Henderson-Sellers B (1996) Object-oriented metrics, measures of complexity. Prentice Hall
go back to reference Hirohisa AMAN, Mochiduki N, Yamada H (2006) A model for detecting cost-prone classes based on Mahalanobis-Taguchi method. IEICE Trans Inf Syst 89(4):1347–1358 Hirohisa AMAN, Mochiduki N, Yamada H (2006) A model for detecting cost-prone classes based on Mahalanobis-Taguchi method. IEICE Trans Inf Syst 89(4):1347–1358
go back to reference Hulse JV, Khoshgoftaar TM, Napolitano A, Wald R (2009) Feature selection with high-dimensional imbalanced data. In Proc. of International Conference on Data Mining Workshops, 507–514 Hulse JV, Khoshgoftaar TM, Napolitano A, Wald R (2009) Feature selection with high-dimensional imbalanced data. In Proc. of International Conference on Data Mining Workshops, 507–514
go back to reference Jeni L, Cohn JF, De La Torre F (2013) Facing imbalanced data--recommendations for the use of performance metrics. In Proc. of Humane Association Conf. on Affective Computing and Intelligent Interaction, 245–251 Jeni L, Cohn JF, De La Torre F (2013) Facing imbalanced data--recommendations for the use of performance metrics. In Proc. of Humane Association Conf. on Affective Computing and Intelligent Interaction, 245–251
go back to reference Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K (2007) The effects of over and under sampling on fault-prone module detection. In Proc. 1st International Symposium on Empirical Software Engineering and Measurement, 196–204 Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K (2007) The effects of over and under sampling on fault-prone module detection. In Proc. 1st International Symposium on Empirical Software Engineering and Measurement, 196–204
go back to reference Khoshgoftaar TM, Seliya N, Sundaresh N (2006) An empirical study of predicting faults with case-based reasoning. Softw Qual J 14(2):85–111CrossRef Khoshgoftaar TM, Seliya N, Sundaresh N (2006) An empirical study of predicting faults with case-based reasoning. Softw Qual J 14(2):85–111CrossRef
go back to reference Koru AG, Liu H (2007) Identifying and characterizing change-prone classes in two large-scale open-source products. J Syst Softw 80:63–73CrossRef Koru AG, Liu H (2007) Identifying and characterizing change-prone classes in two large-scale open-source products. J Syst Softw 80:63–73CrossRef
go back to reference Koru AG, Tian J (2005) Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans Softw Eng 31(8):625–642CrossRef Koru AG, Tian J (2005) Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans Softw Eng 31(8):625–642CrossRef
go back to reference Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one sided selection. In Proc. of 14th International Conference on Machine Learning 97: 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one sided selection. In Proc. of 14th International Conference on Machine Learning 97: 179–186
go back to reference Lessmann S, Baesans B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel finding. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesans B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel finding. IEEE Trans Softw Eng 34(4):485–496CrossRef
go back to reference Li M, Zhang H, Whu R, Zhou Z (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230CrossRef Li M, Zhang H, Whu R, Zhou Z (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230CrossRef
go back to reference Liu Y, An A, Huang X (2006) Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In Advances in Knowledge Discovery and Data Mining, 107–118 Liu Y, An A, Huang X (2006) Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In Advances in Knowledge Discovery and Data Mining, 107–118
go back to reference Lopez V, Fernandez A, Garcia S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRef Lopez V, Fernandez A, Garcia S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRef
go back to reference Lu H, Zhou Y, Xu B, Leung H, Chen L (2012) The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir Softw Eng J 17(3):200–242CrossRef Lu H, Zhou Y, Xu B, Leung H, Chen L (2012) The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir Softw Eng J 17(3):200–242CrossRef
go back to reference Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518CrossRef Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518CrossRef
go back to reference Malhotra R, Khanna M (2013) Investigation of relationship between object-oriented metrics and change proneness. Int J Mach Learn Cybern. Springer-Verlag 4(4): 273–286 Malhotra R, Khanna M (2013) Investigation of relationship between object-oriented metrics and change proneness. Int J Mach Learn Cybern. Springer-Verlag 4(4): 273–286
go back to reference Malhotra R, Nagpal K, Upmanyu P, Pritam N (2014) Defect collection and reporting system for git based open source software. In Proc. of International Conf. on Data Mining and Intelligent Computing, 1–7 Malhotra R, Nagpal K, Upmanyu P, Pritam N (2014) Defect collection and reporting system for git based open source software. In Proc. of International Conf. on Data Mining and Intelligent Computing, 1–7
go back to reference Martin RC (2002) Agile software development: principles, patters, and practices. Prentice Hall, USA Martin RC (2002) Agile software development: principles, patters, and practices. Prentice Hall, USA
go back to reference Menzies T, Greenwald J, Frank A (2007a) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef Menzies T, Greenwald J, Frank A (2007a) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef
go back to reference Menzies T, Dekhtyar A, Distefance J, Greenwald J (2007b) Problems with precision: a response to comments on ‘data mining static code attributes to learn defect predictors’. IEEE Trans Softw Eng 33(9):637–640CrossRef Menzies T, Dekhtyar A, Distefance J, Greenwald J (2007b) Problems with precision: a response to comments on ‘data mining static code attributes to learn defect predictors’. IEEE Trans Softw Eng 33(9):637–640CrossRef
go back to reference Munkhdalai T, Namsrai OE, Ryu KH (2015) Self-training in significance space of support vectors for imbalanced biomedical event data. BMC Bioinf 16(7):1 Munkhdalai T, Namsrai OE, Ryu KH (2015) Self-training in significance space of support vectors for imbalanced biomedical event data. BMC Bioinf 16(7):1
go back to reference Murphy KP (2006) Naive Bayes classifiers, Technical Report Murphy KP (2006) Naive Bayes classifiers, Technical Report
go back to reference Olague H, Etzkorn L, Gholston S, Quattlebaum S (2007) Empirical validation of three software metric suites to predict the fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(10):402–419CrossRef Olague H, Etzkorn L, Gholston S, Quattlebaum S (2007) Empirical validation of three software metric suites to predict the fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(10):402–419CrossRef
go back to reference Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686CrossRef Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686CrossRef
go back to reference Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKDD Explorations 6(1): 50–59 Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. SIGKDD Explorations 6(1): 50–59
go back to reference Rodriguez D, Herraiz I, Harrison R, Dolado J, Riquelme JC (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In Proc. of the 18th International Conf. on Evaluation and Assessment in Software Engineering, 43 Rodriguez D, Herraiz I, Harrison R, Dolado J, Riquelme JC (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In Proc. of the 18th International Conf. on Evaluation and Assessment in Software Engineering, 43
go back to reference Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. 27th IEEE International Conference on Software Maintenance, 303–312 Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. 27th IEEE International Conference on Software Maintenance, 303–312
go back to reference Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef
go back to reference Seliya N, Khoshgoftaar TM (2011) The use of decision trees for cost-sensitive classification: an empirical study in software quality prediction. Wiley Interdiscip Rev: Data Min Knowl Disc 1:448–459 Seliya N, Khoshgoftaar TM (2011) The use of decision trees for cost-sensitive classification: an empirical study in software quality prediction. Wiley Interdiscip Rev: Data Min Knowl Disc 1:448–459
go back to reference Shatnawi R (2012) Improving software fault-prediction for imbalanced data. In Proc. of International Conf. on Innovations in Information Technology, 54–59 Shatnawi R (2012) Improving software fault-prediction for imbalanced data. In Proc. of International Conf. on Innovations in Information Technology, 54–59
go back to reference Singh Y, Kaur A, Malhotra R (2009) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18:3–35CrossRef Singh Y, Kaur A, Malhotra R (2009) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18:3–35CrossRef
go back to reference Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Soc A 36:111–114MathSciNetMATH Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Soc A 36:111–114MathSciNetMATH
go back to reference Su CT, Hsiao YH (2007) An evaluation of the robustness of MTS for imbalanced data. IEEE Trans Knowl Data Eng 19(10):1321–1332CrossRef Su CT, Hsiao YH (2007) An evaluation of the robustness of MTS for imbalanced data. IEEE Trans Knowl Data Eng 19(10):1321–1332CrossRef
go back to reference Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In Proc. of 37th International Conf. on Software Engineering Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In Proc. of 37th International Conf. on Software Engineering
go back to reference Visa S, Ralescu A (2005) Issues in mining imbalanced data sets- a review paper. In Proc. of 16th Conference on Artificial Intelligence and Cognitive Science, 67–73 Visa S, Ralescu A (2005) Issues in mining imbalanced data sets- a review paper. In Proc. of 16th Conference on Artificial Intelligence and Cognitive Science, 67–73
go back to reference Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62:434–443CrossRef Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62:434–443CrossRef
go back to reference Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newslett 6(1):7–19CrossRef Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newslett 6(1):7–19CrossRef
go back to reference Weiss GM, McCarthy K, Zabar B (2007) Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs?. In Proc. of International Conf. on Data Mining, 35–41 Weiss GM, McCarthy K, Zabar B (2007) Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs?. In Proc. of International Conf. on Data Mining, 35–41
go back to reference Weng CG, Poon J (2008) A new evaluation measure for imbalanced datasets. In Proc. of the 7th Australian Data Mining Conference, 27–32 Weng CG, Poon J (2008) A new evaluation measure for imbalanced datasets. In Proc. of the 7th Australian Data Mining Conference, 27–32
go back to reference Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
go back to reference Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X (2015) Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput 7(2):226–240CrossRef Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X (2015) Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput 7(2):226–240CrossRef
go back to reference Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern 44(3):445–455CrossRef Yang P, Yoo PD, Fernando J, Zhou BB, Zhang Z, Zomaya AY (2014) Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE Trans Cybern 44(3):445–455CrossRef
go back to reference Zhang X, Li Y (2011) An empirical study of learning from imbalanced data. In Proc. of the 22nd Australasian Database Conf, 85–94 Zhang X, Li Y (2011) An empirical study of learning from imbalanced data. In Proc. of the 22nd Australasian Database Conf, 85–94
go back to reference Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans Softw Eng 35(5):607–623CrossRef Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans Softw Eng 35(5):607–623CrossRef
Metadata
Title
An empirical study for software change prediction using imbalanced data
Authors
Ruchika Malhotra
Megha Khanna
Publication date
05-01-2017
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 6/2017
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-016-9488-7

Other articles of this Issue 6/2017

Empirical Software Engineering 6/2017 Go to the issue

Premium Partner