Skip to main content
Top
Published in: Empirical Software Engineering 2/2023

01-03-2023

The impact of class imbalance techniques on crashing fault residence prediction models

Authors: Kunsong Zhao, Zhou Xu, Meng Yan, Tao Zhang, Lei Xue, Ming Fan, Jacky Keung

Published in: Empirical Software Engineering | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Software crashes occur when the software program is executed wrongly or interrupted compulsively, which negatively impacts on user experience. Since the stack traces offer the exception-related information about software crashes, researchers used features collected from the stack trace to automatically identify whether the fault residence where the crash occurred is in the stack trace, aiming at accelerating the process of crash localization. A recent work conducted the first large-scale empirical study, which investigated the impact of feature selection methods on the performance of classification models for this task. However, the crash data have the intrinsic class imbalance characteristic, i.e., there exists a large difference between the number of crash instances inside and outside the stack trace, which is ignored by the previous work. To fill this gap, in this work, we conduct a large-scale empirical study to explore how different imbalanced learning techniques impact the performance of crashing fault residence prediction models on a benchmark dataset comprising seven software projects with four evaluation indicators. Our experimental results demonstrate that two imbalanced variants of the bagging classifier perform better than other compared techniques in both the normal and cross-project settings, and can constantly generate excellent prediction performance even though the imbalance level changes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) {TensorFlow}: A system for {Large-Scale} machine learning. In: Proceedings of the 12th USENIX symposium on operating systems design and implementation (OSDI), pp 265–283 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) {TensorFlow}: A system for {Large-Scale} machine learning. In: Proceedings of the 12th USENIX symposium on operating systems design and implementation (OSDI), pp 265–283
go back to reference Agrawal A, Menzies T (2018) Is “better data” better than “better data miners”?. In: Proceedings of 40th IEEE/ACM international conference on software engineering (ICSE). IEEE, pp 1050–1061 Agrawal A, Menzies T (2018) Is “better data” better than “better data miners”?. In: Proceedings of 40th IEEE/ACM international conference on software engineering (ICSE). IEEE, pp 1050–1061
go back to reference Batista GE, Bazzan AL, Monard MC et al (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB, pp 10–18 Batista GE, Bazzan AL, Monard MC et al (2003) Balancing training data for automated annotation of keywords: a case study. In: WOB, pp 10–18
go back to reference Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29 Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
go back to reference Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng (EMSE) 24(2):602–636 Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng (EMSE) 24(2):602–636
go back to reference Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv (CSUR) 49(2):1–50 Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv (CSUR) 49(2):1–50
go back to reference Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
go back to reference Cabral GG, Minku LL, Shihab E, Mujahid S (2019) Class imbalance evolution and verification latency in just-in-time software defect prediction. In: Proceedings of the IEEE/ACM 41st international conference on software engineering (ICSE). IEEE, pp 666–676 Cabral GG, Minku LL, Shihab E, Mujahid S (2019) Class imbalance evolution and verification latency in just-in-time software defect prediction. In: Proceedings of the IEEE/ACM 41st international conference on software engineering (ICSE). IEEE, pp 666–676
go back to reference Catolino G (2017) Just-in-time bug prediction in mobile applications: the domain matters!. In: Proceedings of the IEEE/ACM 4th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 201–202 Catolino G (2017) Just-in-time bug prediction in mobile applications: the domain matters!. In: Proceedings of the IEEE/ACM 4th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 201–202
go back to reference Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: Proceedings of the IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 99–110 Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: Proceedings of the IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 99–110
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
go back to reference Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119 Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
go back to reference Chen C, Liaw A, Breiman L et al (2004) Using random forest to learn imbalanced data. Univ Calif Berkeley 110(1-12):24 Chen C, Liaw A, Breiman L et al (2004) Using random forest to learn imbalanced data. Univ Calif Berkeley 110(1-12):24
go back to reference Chen N, Kim S (2014) Star: Stack trace based automatic crash reproduction via symbolic execution. IEEE Trans Softw Engi (TSE) 41(2):198–220 Chen N, Kim S (2014) Star: Stack trace based automatic crash reproduction via symbolic execution. IEEE Trans Softw Engi (TSE) 41(2):198–220
go back to reference Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27MATH Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27MATH
go back to reference Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox. In: Proceedings of the 27th IEEE international conference on software maintenance (ICSM). IEEE, pp 333–342 Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox. In: Proceedings of the 27th IEEE international conference on software maintenance (ICSM). IEEE, pp 333–342
go back to reference Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: ICML, vol 99. Citeseer, pp 97–105 Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: ICML, vol 99. Citeseer, pp 97–105
go back to reference Fan Y, Xia X, Lo D, Hassan AE (2018) Chaff from the wheat: Characterizing and determining valid bug reports. IEEE Trans Softw Eng (TSE) 46 (5):495–525 Fan Y, Xia X, Lo D, Hassan AE (2018) Chaff from the wheat: Characterizing and determining valid bug reports. IEEE Trans Softw Eng (TSE) 46 (5):495–525
go back to reference Fang C, Liu Z, Shi Y, Huang J, Shi Q (2020) Functional code clone detection with syntax and semantics fusion learning. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis (ISSTA), pp 516–527 Fang C, Liu Z, Shi Y, Huang J, Shi Q (2020) Functional code clone detection with syntax and semantics fusion learning. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis (ISSTA), pp 516–527
go back to reference Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetMATH Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetMATH
go back to reference Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54MATH Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54MATH
go back to reference Gong L, Zhang H, Seo H, Kim S (2014) Locating crashing faults based on crash stack traces. arXiv:14044100 Gong L, Zhang H, Seo H, Kim S (2014) Locating crashing faults based on crash stack traces. arXiv:14044100
go back to reference Gu Y, Xuan J, Zhang H, Zhang L, Fan Q, Xie X, Qian T (2019) Does the fault reside in a stack trace? Assisting crash localization by predicting crashing fault residence. J Syst Softw (JSS) 148:88–104 Gu Y, Xuan J, Zhang H, Zhang L, Fan Q, Xie X, Qian T (2019) Does the fault reside in a stack trace? Assisting crash localization by predicting crashing fault residence. J Syst Softw (JSS) 148:88–104
go back to reference Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887 Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
go back to reference Hart P (1968) The condensed nearest neighbor rule (corresp.) IEEE Trans Inf Theory 14(3):515–516 Hart P (1968) The condensed nearest neighbor rule (corresp.) IEEE Trans Inf Theory 14(3):515–516
go back to reference He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng (TKDE) 21(9):1263–1284 He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng (TKDE) 21(9):1263–1284
go back to reference He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328
go back to reference Hinton GE (1990) Connectionist learning procedures. In: Machine learning. Elsevier, pp 555–610 Hinton GE (1990) Connectionist learning procedures. In: Machine learning. Elsevier, pp 555–610
go back to reference Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844 Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
go back to reference Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 10th joint meeting on foundations of software engineering (FSE), pp 496–507 Jing X, Wu F, Dong X, Qi F, Xu B (2015) Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 10th joint meeting on foundations of software engineering (FSE), pp 496–507
go back to reference Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng (TSE) 39(6):757–773 Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng (TSE) 39(6):757–773
go back to reference Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng (EMSE) 21(5):2072–2106 Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng (EMSE) 21(5):2072–2106
go back to reference Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol 97. Citeseer, pp 179–186 Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, vol 97. Citeseer, pp 179–186
go back to reference Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe. Springer, pp 63–66 Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe. Springer, pp 63–66
go back to reference Lerman RI, Yitzhaki S (1984) A note on the calculation and interpretation of the Gini index. Econ Lett 15(3-4):363–368 Lerman RI, Yitzhaki S (1984) A note on the calculation and interpretation of the Gini index. Econ Lett 15(3-4):363–368
go back to reference Li K, Xiang Z, Chen T, Wang S, Tan KC (2020) Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering (ICSE), pp 566–577 Li K, Xiang Z, Chen T, Wang S, Tan KC (2020) Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering (ICSE), pp 566–577
go back to reference Li Y, Ying S, Jia X, Xu Y, Zhao L, Cheng G, Wang B, Xuan J (2018) Eh-recommender: Recommending exception handling strategies based on program context. In: Proceedings of the 23rd international conference on engineering of complex computer systems (ICECCS). IEEE, pp 104–114 Li Y, Ying S, Jia X, Xu Y, Zhao L, Cheng G, Wang B, Xuan J (2018) Eh-recommender: Recommending exception handling strategies based on program context. In: Proceedings of the 23rd international conference on engineering of complex computer systems (ICECCS). IEEE, pp 104–114
go back to reference Liu XY, Wu J, Zhou ZH (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst, Man, Cybern Part B (Cybernetics) 39 (2):539–550 Liu XY, Wu J, Zhou ZH (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst, Man, Cybern Part B (Cybernetics) 39 (2):539–550
go back to reference Liu Z, Cao W, Gao Z, Bian J, Chen H, Chang Y, Liu TY (2020) Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of 36th IEEE international conference on data engineering (ICDE). IEEE, pp 841–852 Liu Z, Cao W, Gao Z, Bian J, Chen H, Chang Y, Liu TY (2020) Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of 36th IEEE international conference on data engineering (ICDE). IEEE, pp 841–852
go back to reference Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23 Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
go back to reference Louppe G, Geurts P (2012) Ensembles on random patches. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 346–361 Louppe G, Geurts P (2012) Ensembles on random patches. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 346–361
go back to reference Maclin R, Opitz D (1997) An empirical evaluation of bagging and boosting. AAAI/IAAI 1997:546–551 Maclin R, Opitz D (1997) An empirical evaluation of bagging and boosting. AAAI/IAAI 1997:546–551
go back to reference Mani I, Zhang I (2003). In: Proceedings of workshop on learning from imbalanced datasets, ICML United States, vol 126 Mani I, Zhang I (2003). In: Proceedings of workshop on learning from imbalanced datasets, ICML United States, vol 126
go back to reference Mathur AP (2013) Foundations of software testing, 2/e. Pearson Education India Mathur AP (2013) Foundations of software testing, 2/e. Pearson Education India
go back to reference McIntosh S, Kamei Y (2017) Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Trans Softw Eng (TSE) 44(5):412–428 McIntosh S, Kamei Y (2017) Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Trans Softw Eng (TSE) 44(5):412–428
go back to reference Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of 30th IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 151–160 Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of 30th IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 151–160
go back to reference Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of the 35th international conference on software engineering (ICSE). IEEE, pp 382–391 Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of the 35th international conference on software engineering (ICSE). IEEE, pp 382–391
go back to reference Nayrolles M, Hamou-Lhadj A, Tahar S, Larsson A (2017) A bug reproduction approach based on directed model checking and crash traces. J Softw Evol Process (JSEP) 29(3):e1789 Nayrolles M, Hamou-Lhadj A, Tahar S, Larsson A (2017) A bug reproduction approach based on directed model checking and crash traces. J Softw Evol Process (JSEP) 29(3):e1789
go back to reference Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradigms 3 (1):4–21 Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradigms 3 (1):4–21
go back to reference Pawlak R, Monperrus M, Petitprez N, Noguera C, Seinturier L (2016) SPOON: A library for implementing analyses and transformations of Java source code. Softw Pract Experience 46(9):1155–1179 Pawlak R, Monperrus M, Petitprez N, Noguera C, Seinturier L (2016) SPOON: A library for implementing analyses and transformations of Java source code. Softw Pract Experience 46(9):1155–1179
go back to reference Platt J, et al. (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 10(3):61–74 Platt J, et al. (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 10(3):61–74
go back to reference Ren X, Xing Z, Xia X, Lo D, Wang X, Grundy J (2019) Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM Trans Softw Eng Methodol (TOSEM) 28(3):1–45 Ren X, Xing Z, Xia X, Lo D, Wang X, Grundy J (2019) Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM Trans Softw Eng Methodol (TOSEM) 28(3):1–45
go back to reference Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: Proceedings of 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 118–121 Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: Proceedings of 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 118–121
go back to reference Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A Syst Hum 40(1):185–197 Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A Syst Hum 40(1):185–197
go back to reference Shawe-Taylor GKJ, Karakoulas G (1999) Optimizing classifiers for imbalanced training sets. Adv Neural Inf Process Syst 11(11):253 Shawe-Taylor GKJ, Karakoulas G (1999) Optimizing classifiers for imbalanced training sets. Adv Neural Inf Process Syst 11(11):253
go back to reference Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetMATH Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetMATH
go back to reference Soltani M, Panichella A, Van Deursen A (2017) A guided genetic algorithm for automated crash reproduction. In: Proceedings of 39th IEEE/ACM international conference on software engineering (ICSE). IEEE, pp 209–220 Soltani M, Panichella A, Van Deursen A (2017) A guided genetic algorithm for automated crash reproduction. In: Proceedings of 39th IEEE/ACM international conference on software engineering (ICSE). IEEE, pp 209–220
go back to reference Soltani M, Derakhshanfar P, Devroey X, Van Deursen A (2020) A benchmark-based evaluation of search-based crash reproduction. Empir Softw Eng (EMSE) 25(1):96–138 Soltani M, Derakhshanfar P, Devroey X, Van Deursen A (2020) A benchmark-based evaluation of search-based crash reproduction. Empir Softw Eng (EMSE) 25(1):96–138
go back to reference Song Q, Guo Y, Shepperd M (2018) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Softw Eng (TSE) 45(12):1253–1269 Song Q, Guo Y, Shepperd M (2018) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Softw Eng (TSE) 45(12):1253–1269
go back to reference Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of 37th IEEE international conference on software engineering (ICSE), vol 2. IEEE, pp 99–108 Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of 37th IEEE international conference on software engineering (ICSE), vol 2. IEEE, pp 99–108
go back to reference Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng (TSE) 43(1):1–18 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng (TSE) 43(1):1–18
go back to reference Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng (TSE) 46(11):1200–1219 Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng (TSE) 46(11):1200–1219
go back to reference Tomek I, et al. (1976a) An experiment with the edited nearest-neighbor rule Tomek I, et al. (1976a) An experiment with the edited nearest-neighbor rule
go back to reference Tomek I, et al. (1976b) Two modifications of CNN Tomek I, et al. (1976b) Two modifications of CNN
go back to reference Viola P, Jones M (2001) Fast and robust classification using asymmetric adaboost and a detector cascade. Adv Neural Inf Process Syst 14 Viola P, Jones M (2001) Fast and robust classification using asymmetric adaboost and a detector cascade. Adv Neural Inf Process Syst 14
go back to reference Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE symposium on computational intelligence and data mining. IEEE, pp 324–331 Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE symposium on computational intelligence and data mining. IEEE, pp 324–331
go back to reference Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443 Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
go back to reference Wang X, Liu J, Li L, Chen X, Liu X, Wu H (2020) Detecting and explaining self-admitted technical debts with attention-based neural networks. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering (ASE), pp 871–882 Wang X, Liu J, Li L, Chen X, Liu X, Wu H (2020) Detecting and explaining self-admitted technical debts with attention-based neural networks. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering (ASE), pp 871–882
go back to reference Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern (3):408–421 Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern (3):408–421
go back to reference Wong CP, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of 30th IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 181–190 Wong CP, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of 30th IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 181–190
go back to reference Wu R, Zhang H, Cheung SC, Kim S (2014) Crashlocator: Locating crashing faults based on crash stacks. In: Proceedings of the 23th international symposium on software testing and analysis (ISSTA), pp 204–214 Wu R, Zhang H, Cheung SC, Kim S (2014) Crashlocator: Locating crashing faults based on crash stacks. In: Proceedings of the 23th international symposium on software testing and analysis (ISSTA), pp 204–214
go back to reference Wu R, Wen M, Cheung SC, Zhang H (2018) Changelocator: locate crash-inducing changes based on crash reports. Empir Softw Eng (EMSE) 23(5):2866–2900 Wu R, Wen M, Cheung SC, Zhang H (2018) Changelocator: locate crash-inducing changes based on crash reports. Empir Softw Eng (EMSE) 23(5):2866–2900
go back to reference Xu Z, Li S, Xu J, Liu J, Luo X, Zhang Y, Zhang T, Keung J, Tang Y (2019a) LDFR: Learning deep feature representation for software defect prediction. J Syst Softw (JSS) 158:110402 Xu Z, Li S, Xu J, Liu J, Luo X, Zhang Y, Zhang T, Keung J, Tang Y (2019a) LDFR: Learning deep feature representation for software defect prediction. J Syst Softw (JSS) 158:110402
go back to reference Xu Z, Zhang T, Zhang Y, Tang Y, Liu J, Luo X, Keung J, Cui X (2019b) Identifying crashing fault residence based on cross project model. In: Proceedings of 30th IEEE international symposium on software reliability engineering (ISSRE). IEEE, pp 183–194 Xu Z, Zhang T, Zhang Y, Tang Y, Liu J, Luo X, Keung J, Cui X (2019b) Identifying crashing fault residence based on cross project model. In: Proceedings of 30th IEEE international symposium on software reliability engineering (ISSRE). IEEE, pp 183–194
go back to reference Xu Z, Zhao K, Yan M, Yuan P, Xu L, Lei Y, Zhang X (2020) Imbalanced metric learning for crashing fault residence prediction. J Syst Softw (JSS) 170:110763 Xu Z, Zhao K, Yan M, Yuan P, Xu L, Lei Y, Zhang X (2020) Imbalanced metric learning for crashing fault residence prediction. J Syst Softw (JSS) 170:110763
go back to reference Xu Z, Zhao K, Zhang T, Fu C, Yan M, Xie Z, Zhang X, Catolino G (2021) Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans Reliab 71(1):204–220 Xu Z, Zhao K, Zhang T, Fu C, Yan M, Xie Z, Zhang X, Catolino G (2021) Effort-aware just-in-time bug prediction for mobile apps via cross-triplet deep feature embedding. IEEE Trans Reliab 71(1):204–220
go back to reference Xuan J, Xie X, Monperrus M (2015) Crash reproduction via test case mutation: Let existing test cases help. In: Proceedings of the 10th joint meeting on foundations of software engineering, pp 910–913 Xuan J, Xie X, Monperrus M (2015) Crash reproduction via test case mutation: Let existing test cases help. In: Proceedings of the 10th joint meeting on foundations of software engineering, pp 910–913
go back to reference Yu HF, Huang FL, Lin CJ (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1-2):41–75MathSciNetMATH Yu HF, Huang FL, Lin CJ (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1-2):41–75MathSciNetMATH
go back to reference Zhao K, Liu J, Xu Z, Li L, Yan M, Yu J, Zhou Y (2021a) Predicting crash fault residence via simplified deep forest based on a reduced feature set. In: Proceedings of 29th IEEE/ACM international conference on program comprehension (ICPC). IEEE, pp 242–252 Zhao K, Liu J, Xu Z, Li L, Yan M, Yu J, Zhou Y (2021a) Predicting crash fault residence via simplified deep forest based on a reduced feature set. In: Proceedings of 29th IEEE/ACM international conference on program comprehension (ICPC). IEEE, pp 242–252
go back to reference Zhao K, Xu Z, Yan M, Zhang T, Yang D, Li W (2021b) A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models. Information and Software Technology (IST) p 106652 Zhao K, Xu Z, Yan M, Zhang T, Yang D, Li W (2021b) A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models. Information and Software Technology (IST) p 106652
go back to reference Zhao K, Xu Z, Zhang T, Tang Y, Yan M (2021c) Simplified deep forest model based just-in-time defect prediction for android mobile apps. IEEE Trans Reliab 70(2):848–859 Zhao K, Xu Z, Zhang T, Tang Y, Yan M (2021c) Simplified deep forest model based just-in-time defect prediction for android mobile apps. IEEE Trans Reliab 70(2):848–859
Metadata
Title
The impact of class imbalance techniques on crashing fault residence prediction models
Authors
Kunsong Zhao
Zhou Xu
Meng Yan
Tao Zhang
Lei Xue
Ming Fan
Jacky Keung
Publication date
01-03-2023
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 2/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10294-y

Other articles of this Issue 2/2023

Empirical Software Engineering 2/2023 Go to the issue

Premium Partner