Skip to main content
Erschienen in: Empirical Software Engineering 2/2024

01.03.2024

When less is more: on the value of “co-training” for semi-supervised software defect predictors

verfasst von: Suvodeep Majumder, Joymallya Chakraborty, Tim Menzies

Erschienen in: Empirical Software Engineering | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models. However, there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to name a few. Only a handful of these methods have been tested in SE for (e.g.) predicting defects– and even there, those methods have been tested on just a handful of projects. This paper applies a wide range of 55 semi-supervised learners to over 714 projects. We find that semi-supervised “co-training methods” work significantly better than other approaches. Specifically, after labeling, just 2.5% of data, then make predictions that are competitive to those using 100% of the data. That said, co-training needs to be used cautiously since the specific choice of co-training methods needs to be carefully selected based on a user’s specific goals. Also, we warn that a commonly-used co-training method (“multi-view”– where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs. 1.8 hours). It is an open question, worthy of future work, to test if these reductions can be seen in other areas of software analytics. To assist with exploring other areas, all the codes used are available at https://​github.​com/​ai-se/​Semi-Supervised.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
2
Some projects (27) had inconstant values for selected metrics and were removed from the selected project list
 
3
The keywords used are - bug, fix, error, issue, crash, problem, fail, defect, and patch. These keywords are used by Rosen et al. in their commit_guru (Rosen et al. 2015) paper.
 
4
From this point onwards, we will denote the commit which has bugs in them as a “bug-inducing”
 
6
MDS calculates distances between each pair of points in the original high-dimensional space and then maps it to lower-dimensional space while preserving those distances between points as well as possible.
 
7
When performing such re-balancing, it is a methodological error to re-balance both the training and test sets (since learned models should be tested on data with the naturally occurring class frequencies. We assert that we re-balance only the training data and not the test.
 
Literatur
Zurück zum Zitat Abaei G, Selamat A, Fujita H (2015) An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl-Based Syst 74:28–39CrossRef Abaei G, Selamat A, Fujita H (2015) An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl-Based Syst 74:28–39CrossRef
Zurück zum Zitat Abney S (2002) Bootstrapping. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Philadelphia, pp 360–367 Abney S (2002) Bootstrapping. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Philadelphia, pp 360–367
Zurück zum Zitat Agrawal A, Menzies T (2017) “better data” is better than “better data miners” (benefits of tuning SMOTE for defect prediction). CoRR abs/1705.03697 Agrawal A, Menzies T (2017) “better data” is better than “better data miners” (benefits of tuning SMOTE for defect prediction). CoRR abs/1705.03697
Zurück zum Zitat Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 1–10 Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 1–10
Zurück zum Zitat Bair E (2013) Semi-supervised clustering methods. Wiley Interdiscip Rev Comput Stat 5(5):349–361CrossRef Bair E (2013) Semi-supervised clustering methods. Wiley Interdiscip Rev Comput Stat 5(5):349–361CrossRef
Zurück zum Zitat Balcan MF, Blum A, Yang K (2004) Co-training and expansion: Towards bridging theory and practice. Adv Neural Inf Process Syst 17:2–5 Balcan MF, Blum A, Yang K (2004) Co-training and expansion: Towards bridging theory and practice. Adv Neural Inf Process Syst 17:2–5
Zurück zum Zitat Balogun AO, Bajeh AO, Orie VA, Asaju WAY (2018) Software defect prediction using ensemble learning: an anp based evaluation method. FUOYE J Eng Technol 3(2):50–55CrossRef Balogun AO, Bajeh AO, Orie VA, Asaju WAY (2018) Software defect prediction using ensemble learning: an anp based evaluation method. FUOYE J Eng Technol 3(2):50–55CrossRef
Zurück zum Zitat Bell RM, Ostrand TJ, Weyuker EJ (2013) The limited impact of individual developer data on software defect prediction. Empir Softw Eng 18(3):478–505CrossRef Bell RM, Ostrand TJ, Weyuker EJ (2013) The limited impact of individual developer data on software defect prediction. Empir Softw Eng 18(3):478–505CrossRef
Zurück zum Zitat Bennett KP, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, pp 289–296 Bennett KP, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, pp 289–296
Zurück zum Zitat Bennin KE, Toda K, Kamei Y et al (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: 2016 IEEE International conference on software quality, reliability and security (QRS). IEEE, pp 214–221 Bennin KE, Toda K, Kamei Y et al (2016) Empirical evaluation of cross-release effort-aware defect prediction models. In: 2016 IEEE International conference on software quality, reliability and security (QRS). IEEE, pp 214–221
Zurück zum Zitat Bird C, Nagappan N, Gall H, et al (2009) Putting it all together: using socio-technical networks to predict failures. In: Proceedings of the 20th IEEE International conference on software reliability engineering (ISSRE’09). IEEE Press, Bengaluru-Mysuru, pp 109–119 Bird C, Nagappan N, Gall H, et al (2009) Putting it all together: using socio-technical networks to predict failures. In: Proceedings of the 20th IEEE International conference on software reliability engineering (ISSRE’09). IEEE Press, Bengaluru-Mysuru, pp 109–119
Zurück zum Zitat Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory. Association for Computing Machinery, New York, pp 92–100 Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory. Association for Computing Machinery, New York, pp 92–100
Zurück zum Zitat Briand LC, Brasili V, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Trans Softw Eng 19(11):1028–1044CrossRef Briand LC, Brasili V, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Trans Softw Eng 19(11):1028–1044CrossRef
Zurück zum Zitat Cao Y, Ding Z, Xue F et al (2018) An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. Int J Bio-Inspired Comput 11(4):282–291CrossRef Cao Y, Ding Z, Xue F et al (2018) An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. Int J Bio-Inspired Comput 11(4):282–291CrossRef
Zurück zum Zitat Catolino G (2017a) Just-in-time bug prediction in mobile applications: the domain matters! In: 2017 IEEE/ACM 4th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 201–202 Catolino G (2017a) Just-in-time bug prediction in mobile applications: the domain matters! In: 2017 IEEE/ACM 4th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 201–202
Zurück zum Zitat Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: International workshop on artificial intelligence and statistics. PMLR, pp 57–64 Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: International workshop on artificial intelligence and statistics. PMLR, pp 57–64
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
Zurück zum Zitat Chen D, Stolee KT, Menzies T (2019) Replication can improve prior results: a github study of pull request acceptance. In: 2019 IEEE/ACM 27th international conference on program comprehension (ICPC). IEEE, pp 179–190 Chen D, Stolee KT, Menzies T (2019) Replication can improve prior results: a github study of pull request acceptance. In: 2019 IEEE/ACM 27th international conference on program comprehension (ICPC). IEEE, pp 179–190
Zurück zum Zitat Cox MA, Cox TF (2008) Multidimensional scaling. In: Handbook of data visualization. Springer, pp 315–347 Cox MA, Cox TF (2008) Multidimensional scaling. In: Handbook of data visualization. Springer, pp 315–347
Zurück zum Zitat Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng (ANNIE-99) 809–814 Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng (ANNIE-99) 809–814
Zurück zum Zitat Du J, Ling CX, Zhou ZH (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23(5):788–799CrossRef Du J, Ling CX, Zhou ZH (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23(5):788–799CrossRef
Zurück zum Zitat Gayatri N, Nickolas S, Reddy A et al (2010) Feature selection using decision tree induction in class level metrics dataset for software defect predictions. In: Proceedings of the world congress on engineering and computer science. pp 124–129 Gayatri N, Nickolas S, Reddy A et al (2010) Feature selection using decision tree induction in class level metrics dataset for software defect predictions. In: Proceedings of the world congress on engineering and computer science. pp 124–129
Zurück zum Zitat Ghotra B, McIntosh S, Hassan AE (2015a) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering. pp 789–800. https://doi.org/10.1109/ICSE.2015.91 Ghotra B, McIntosh S, Hassan AE (2015a) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering. pp 789–800. https://​doi.​org/​10.​1109/​ICSE.​2015.​91
Zurück zum Zitat Ghotra B, McIntosh S, Hassan AE (2015b) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th ICSE-vol 1. IEEE Press, pp 789–800 Ghotra B, McIntosh S, Hassan AE (2015b) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th ICSE-vol 1. IEEE Press, pp 789–800
Zurück zum Zitat Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 146–157 Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE, pp 146–157
Zurück zum Zitat Goldberg A, Zhu X, Singh A, et al (2009) Multi-manifold semi-supervised learning. In: Artificial intelligence and statistics. PMLR, pp 169–176 Goldberg A, Zhu X, Singh A, et al (2009) Multi-manifold semi-supervised learning. In: Artificial intelligence and statistics. PMLR, pp 169–176
Zurück zum Zitat Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: ICML. Citeseer, pp 327–334 Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: ICML. Citeseer, pp 327–334
Zurück zum Zitat Gong L, Jiang S, Wang R et al (2019) Empirical evaluation of the impact of class overlap on software defect prediction. In: 2019 34th IEEE/ACM International conference on automated software engineering (ASE). IEEE, pp 698–709 Gong L, Jiang S, Wang R et al (2019) Empirical evaluation of the impact of class overlap on software defect prediction. In: 2019 34th IEEE/ACM International conference on automated software engineering (ASE). IEEE, pp 698–709
Zurück zum Zitat Goyal S (2022) Handling class-imbalance with knn (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev 55(3):2023–2064CrossRef Goyal S (2022) Handling class-imbalance with knn (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev 55(3):2023–2064CrossRef
Zurück zum Zitat Hamill M, Goseva-Popstojanova K (2009) Common trends in software fault and failure data. IEEE Trans Softw Eng 35(4):484–496CrossRef Hamill M, Goseva-Popstojanova K (2009) Common trends in software fault and failure data. IEEE Trans Softw Eng 35(4):484–496CrossRef
Zurück zum Zitat He Q, Shen B, Chen Y (2016) Software defect prediction using semi-supervised learning with change burst information. In: 2016 IEEE 40th annual computer software and applications conference (COMPSAC). IEEE, pp 113–122 He Q, Shen B, Chen Y (2016) Software defect prediction using semi-supervised learning with change burst information. In: 2016 IEEE 40th annual computer software and applications conference (COMPSAC). IEEE, pp 113–122
Zurück zum Zitat He Z, Shu F, Yang Y et al (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199CrossRef He Z, Shu F, Yang Y et al (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199CrossRef
Zurück zum Zitat Hindle A, German DM, Holt R (2008) What do large commits tell us? a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on mining software repositories. Association for Computing Machinery, New York, pp 99–108 Hindle A, German DM, Holt R (2008) What do large commits tell us? a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on mining software repositories. Association for Computing Machinery, New York, pp 99–108
Zurück zum Zitat Hindle A, Barr ET, Su Z et al (2012) On the naturalness of software. In: 2012 34th ICSE (ICSE). IEEE, pp 837–847 Hindle A, Barr ET, Su Z et al (2012) On the naturalness of software. In: 2012 34th ICSE (ICSE). IEEE, pp 837–847
Zurück zum Zitat Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312CrossRef Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312CrossRef
Zurück zum Zitat Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). pp 159–170. https://doi.org/10.1109/ICSME.2017.51 Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). pp 159–170. https://​doi.​org/​10.​1109/​ICSME.​2017.​51
Zurück zum Zitat Hussain SF, Bashir S (2016) Co-clustering of multi-view datasets. Knowl Inf Syst 47(3):545–570CrossRef Hussain SF, Bashir S (2016) Co-clustering of multi-view datasets. Knowl Inf Syst 47(3):545–570CrossRef
Zurück zum Zitat Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 International conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257 Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 International conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257
Zurück zum Zitat Iglesias EL, Vieira AS, Diz LB (2016) An hmm-based multi-view co-training framework for single-view text corpora. In: Hybrid artificial intelligent systems: 11th international conference, HAIS 2016, Seville, Spain, April 18-20, 2016, Proceedings 11. Springer, pp 66–78 Iglesias EL, Vieira AS, Diz LB (2016) An hmm-based multi-view co-training framework for single-view text corpora. In: Hybrid artificial intelligent systems: 11th international conference, HAIS 2016, Seville, Spain, April 18-20, 2016, Proceedings 11. Springer, pp 66–78
Zurück zum Zitat Iqbal A, Aftab S, Ali U et al (2019) Performance analysis of machine learning techniques on software defect prediction using nasa datasets. Int J Adv Comput Sci Appl 10(5):301–307 Iqbal A, Aftab S, Ali U et al (2019) Performance analysis of machine learning techniques on software defect prediction using nasa datasets. Int J Adv Comput Sci Appl 10(5):301–307
Zurück zum Zitat Jacob SG et al (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int J Comput Appl 117(23):19–21 Jacob SG et al (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int J Comput Appl 117(23):19–21
Zurück zum Zitat Jebara T, Wang J, Chang SF (2009) Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th annual international conference on machine learning. Association for Computing Machinery, New York, pp 3–18 Jebara T, Wang J, Chang SF (2009) Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th annual international conference on machine learning. Association for Computing Machinery, New York, pp 3–18
Zurück zum Zitat Kamei Y, Shihab E, Adams B et al (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773CrossRef Kamei Y, Shihab E, Adams B et al (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773CrossRef
Zurück zum Zitat Kang HJ, Aw KL, Lo D (2022) Detecting false alarms from automatic static analysis tools: How far are we? In: Proceedings of the 44th international conference on software engineering (ICSE’22). Association for Computing Machinery, New York, pp 698–709. https://doi.org/10.1145/3510003.3510214 Kang HJ, Aw KL, Lo D (2022) Detecting false alarms from automatic static analysis tools: How far are we? In: Proceedings of the 44th international conference on software engineering (ICSE’22). Association for Computing Machinery, New York, pp 698–709. https://​doi.​org/​10.​1145/​3510003.​3510214
Zurück zum Zitat Kim M, Cai D, Kim S (2011) An empirical investigation into the role of api-level refactorings during software evolution. In: Proceedings of the 33rd ICSE. ACM, pp 151–160 Kim M, Cai D, Kim S (2011) An empirical investigation into the role of api-level refactorings during software evolution. In: Proceedings of the 33rd ICSE. ACM, pp 151–160
Zurück zum Zitat Kim M, Nam J, Yeon J, et al (2015) Remi: defect prediction for efficient api testing. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. Association for Computing Machinery, New York, pp 990–993 Kim M, Nam J, Yeon J, et al (2015) Remi: defect prediction for efficient api testing. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. Association for Computing Machinery, New York, pp 990–993
Zurück zum Zitat Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196CrossRef Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196CrossRef
Zurück zum Zitat Koru AG, Liu H (2005) Building effective defect-prediction models in practice. IEEE Softw 22(6):23–29CrossRef Koru AG, Liu H (2005) Building effective defect-prediction models in practice. IEEE Softw 22(6):23–29CrossRef
Zurück zum Zitat Koru AG, Zhang D, El Emam K, Liu H (2008) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRef Koru AG, Zhang D, El Emam K, Liu H (2008) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRef
Zurück zum Zitat Lee D, Lee J (2007) Equilibrium-based support vector machine for semisupervised classification. IEEE Trans Neural Netw 18(2):578–583CrossRef Lee D, Lee J (2007) Equilibrium-based support vector machine for semisupervised classification. IEEE Trans Neural Netw 18(2):578–583CrossRef
Zurück zum Zitat Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern A Syst Hum 37(6):1088–1098CrossRef Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern A Syst Hum 37(6):1088–1098CrossRef
Zurück zum Zitat Li M, Zhang H, Wu R et al (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230CrossRef Li M, Zhang H, Wu R et al (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230CrossRef
Zurück zum Zitat Li W, Zhang W, Jia X et al (2020) Effort-aware semi-supervised just-in-time defect prediction. Inf Softw Technol 126:106364CrossRef Li W, Zhang W, Jia X et al (2020) Effort-aware semi-supervised just-in-time defect prediction. Inf Softw Technol 126:106364CrossRef
Zurück zum Zitat Li YF, Zhou ZH (2014) Towards making unlabeled data never hurt. IEEE Trans Pattern Anal Mach Intell 37(1):175–188 Li YF, Zhou ZH (2014) Towards making unlabeled data never hurt. IEEE Trans Pattern Anal Mach Intell 37(1):175–188
Zurück zum Zitat Liu S, Li F, Li F et al (2013) Adaptive co-training svm for sentiment classification on tweets. In: Proceedings of the 22nd ACM international conference on information & knowledge management. Association for Computing Machinery, New York, pp 2079–2088 Liu S, Li F, Li F et al (2013) Adaptive co-training svm for sentiment classification on tweets. In: Proceedings of the 22nd ACM international conference on information & knowledge management. Association for Computing Machinery, New York, pp 2079–2088
Zurück zum Zitat Lu H, Cukic B, Culp M (2012) Software defect prediction using semi-supervised learning with dimension reduction. In: 2012 proceedings of the 27th IEEE/ACM international conference on automated software engineering. IEEE, pp 314–317 Lu H, Cukic B, Culp M (2012) Software defect prediction using semi-supervised learning with dimension reduction. In: 2012 proceedings of the 27th IEEE/ACM international conference on automated software engineering. IEEE, pp 314–317
Zurück zum Zitat Mabayoje MA, Balogun AO, Jibril HA et al (2019) Parameter tuning in knn for software defect prediction: an empirical analysis Mabayoje MA, Balogun AO, Jibril HA et al (2019) Parameter tuning in knn for software defect prediction: an empirical analysis
Zurück zum Zitat Majumder S, Mody P, Menzies T (2022) Revisiting process versus product metrics: a large scale analysis. Empir Softw Eng 27(3):1–42CrossRef Majumder S, Mody P, Menzies T (2022) Revisiting process versus product metrics: a large scale analysis. Empir Softw Eng 27(3):1–42CrossRef
Zurück zum Zitat Mallapragada PK, Jin R, Jain AK et al (2008) Semiboost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31(11):2000–2014CrossRef Mallapragada PK, Jin R, Jain AK et al (2008) Semiboost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31(11):2000–2014CrossRef
Zurück zum Zitat Matsumoto S, Kamei Y, Monden A, et al (2010) An analysis of developer metrics for fault prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. Association for Computing Machinery, New York Matsumoto S, Kamei Y, Monden A, et al (2010) An analysis of developer metrics for fault prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. Association for Computing Machinery, New York
Zurück zum Zitat Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407CrossRef Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17(4):375–407CrossRef
Zurück zum Zitat Misirli AT, Bener A, Kale R (2011) Ai-based software defect predictors: applications and benefits in a case study. AI Mag 32(2):57–68 Misirli AT, Bener A, Kale R (2011) Ai-based software defect predictors: applications and benefits in a case study. AI Mag 32(2):57–68
Zurück zum Zitat Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551CrossRef Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551CrossRef
Zurück zum Zitat Nagappan N, Zeller A, Zimmermann T, et al (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering. IEEE, pp 309–318 Nagappan N, Zeller A, Zimmermann T, et al (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering. IEEE, pp 309–318
Zurück zum Zitat Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391 Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391
Zurück zum Zitat Nam J, Fu W, Kim S et al (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896CrossRef Nam J, Fu W, Kim S et al (2018) Heterogeneous defect prediction. IEEE Trans Softw Eng 44(9):874–896CrossRef
Zurück zum Zitat Okutan A, Yıldız OT (2014) Software defect prediction using bayesian networks. Empir Softw Eng 19(1):154–181CrossRef Okutan A, Yıldız OT (2014) Software defect prediction using bayesian networks. Empir Softw Eng 19(1):154–181CrossRef
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. ACM SIGSOFT Softw Eng Notes 29(4):86–96CrossRef Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. ACM SIGSOFT Softw Eng Notes 29(4):86–96CrossRef
Zurück zum Zitat Pan SJ, Tsang IW, Kwok JT et al (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210CrossRef Pan SJ, Tsang IW, Kwok JT et al (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210CrossRef
Zurück zum Zitat Pelayo L, Dick S (2007) Applying novel resampling strategies to software defect prediction. In: NAFIPS 2007-2007 annual meeting of the North American fuzzy information processing society. IEEE, pp 69–72 Pelayo L, Dick S (2007) Applying novel resampling strategies to software defect prediction. In: NAFIPS 2007-2007 annual meeting of the North American fuzzy information processing society. IEEE, pp 69–72
Zurück zum Zitat Rahman F, Devanbu P (2013a) How, and why, process metrics are better. In: Proceedings of the 2013 ICSE. IEEE Press, pp 432–441 Rahman F, Devanbu P (2013a) How, and why, process metrics are better. In: Proceedings of the 2013 ICSE. IEEE Press, pp 432–441
Zurück zum Zitat Rahman F, Devanbu P (2013b) How, and why, process metrics are better. In: Proceedings of the 2013 ICSE, ICSE’13. IEEE Press, p 432-441 Rahman F, Devanbu P (2013b) How, and why, process metrics are better. In: Proceedings of the 2013 ICSE, ICSE’13. IEEE Press, p 432-441
Zurück zum Zitat Rahman F, Khatri S, Barr ET, et al (2014) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th ICSE. ACM, pp 424–434 Rahman F, Khatri S, Barr ET, et al (2014) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th ICSE. ACM, pp 424–434
Zurück zum Zitat Ray B, Hellendoorn V, Godhane S, et al (2016) On the “naturalness” of buggy code. In: Proceedings of the 38th international conference on software engineering. Association for Computing Machinery, New York, pp 428–439 Ray B, Hellendoorn V, Godhane S, et al (2016) On the “naturalness” of buggy code. In: Proceedings of the 38th international conference on software engineering. Association for Computing Machinery, New York, pp 428–439
Zurück zum Zitat Rosen C, Grawi B, Shihab E (2015b) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969 Rosen C, Grawi B, Shihab E (2015b) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969
Zurück zum Zitat Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71CrossRef Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71CrossRef
Zurück zum Zitat Scudder H (1965) Probability of error of some adaptive pattern-recognition machines. IEEE Trans Inf Theory 11(3):363–371MathSciNetCrossRef Scudder H (1965) Probability of error of some adaptive pattern-recognition machines. IEEE Trans Inf Theory 11(3):363–371MathSciNetCrossRef
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, Van Hulse J et al (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef Seiffert C, Khoshgoftaar TM, Van Hulse J et al (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef
Zurück zum Zitat Seliya N, Khoshgoftaar TM, Van Hulse J (2010) Predicting faults in high assurance software. In: 2010 IEEE 12th International Symposium on High Assurance Systems Engineering. IEEE, pp 26–34 Seliya N, Khoshgoftaar TM, Van Hulse J (2010) Predicting faults in high assurance software. In: 2010 IEEE 12th International Symposium on High Assurance Systems Engineering. IEEE, pp 26–34
Zurück zum Zitat Singh PD, Chug A (2017) Software defect prediction analysis using machine learning algorithms. In: 2017 7th international conference on cloud computing, data science & engineering-confluence. IEEE, pp 775–781 Singh PD, Chug A (2017) Software defect prediction analysis using machine learning algorithms. In: 2017 7th international conference on cloud computing, data science & engineering-confluence. IEEE, pp 775–781
Zurück zum Zitat Sucholutsky I, Schonlau M (2021) ‘Less than one’-shot learning: learning n classes from m \(<\) n samples. In: Proceedings of the AAAI conference on artificial intelligence. pp 9739–9746 Sucholutsky I, Schonlau M (2021) ‘Less than one’-shot learning: learning n classes from m \(<\) n samples. In: Proceedings of the AAAI conference on artificial intelligence. pp 9739–9746
Zurück zum Zitat Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1806–1817CrossRef Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1806–1817CrossRef
Zurück zum Zitat Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370CrossRef Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370CrossRef
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, et al (2016a) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE 2016. ACM, pp 321–332 Tantithamthavorn C, McIntosh S, Hassan AE, et al (2016a) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE 2016. ACM, pp 321–332
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE et al (2016) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18CrossRef Tantithamthavorn C, McIntosh S, Hassan AE et al (2016) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18CrossRef
Zurück zum Zitat Thota MK, Shajin FH, Rajesh P et al (2020) Survey on software defect prediction techniques. Int J Appl Sci Eng 17(4):331–344 Thota MK, Shajin FH, Rajesh P et al (2020) Survey on software defect prediction techniques. Int J Appl Sci Eng 17(4):331–344
Zurück zum Zitat Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl Based Syst 81:131–147CrossRef Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl Based Syst 81:131–147CrossRef
Zurück zum Zitat Tosun A, Bener A, Kale R (2010) Ai-based software defect predictors: applications and benefits in a case study. In: Twenty-second IAAI conference on artificial intelligence, vol 24, pp 1748–1755 Tosun A, Bener A, Kale R (2010) Ai-based software defect predictors: applications and benefits in a case study. In: Twenty-second IAAI conference on artificial intelligence, vol 24, pp 1748–1755
Zurück zum Zitat Tu H, Menzies T (2021) FRUGAL: unlocking SSL for software analytics. ASE CoRR, abs/2108.09 Tu H, Menzies T (2021) FRUGAL: unlocking SSL for software analytics. ASE CoRR, abs/2108.09
Zurück zum Zitat Tu H, Yu Z, Menzies T (2020) Better data labelling with emblem (and how that impacts defect prediction). IEEE Trans Softw Eng 48:278–294CrossRef Tu H, Yu Z, Menzies T (2020) Better data labelling with emblem (and how that impacts defect prediction). IEEE Trans Softw Eng 48:278–294CrossRef
Zurück zum Zitat Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRef Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999CrossRef
Zurück zum Zitat Vasilescu B (2018) Personnel communication at fse’18. Found Softw Eng Vasilescu B (2018) Personnel communication at fse’18. Found Softw Eng
Zurück zum Zitat Vasilescu B, Yu Y, Wang H et al (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816 Vasilescu B, Yu Y, Wang H et al (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 805–816
Zurück zum Zitat Wahono RS, Herman NS, Ahmad S (2014) A comparison framework of classification models for software defect prediction. Adv Sci Lett 20(10–11):1945–1950CrossRef Wahono RS, Herman NS, Ahmad S (2014) A comparison framework of classification models for software defect prediction. Adv Sci Lett 20(10–11):1945–1950CrossRef
Zurück zum Zitat Wan Z, Xia X, Hassan AE et al (2018) Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng 46(11):1241–1266CrossRef Wan Z, Xia X, Hassan AE et al (2018) Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng 46(11):1241–1266CrossRef
Zurück zum Zitat Wang J, Shen B, Chen Y (2012) Compressed c4. 5 models for software defect prediction. In: 2012 12th international conference on quality software. IEEE, pp 13–16 Wang J, Shen B, Chen Y (2012) Compressed c4. 5 models for software defect prediction. In: 2012 12th international conference on quality software. IEEE, pp 13–16
Zurück zum Zitat Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443CrossRef Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443CrossRef
Zurück zum Zitat Williams C, Spacco J (2008) Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on Defects in large software systems. ACM, pp 32–36 Williams C, Spacco J (2008) Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on Defects in large software systems. ACM, pp 32–36
Zurück zum Zitat Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52CrossRef Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52CrossRef
Zurück zum Zitat Xia X, Lo D, Wang X et al (2016) Collective personalized change classification with multiobjective search. IEEE Trans Reliab 65(4):1810–1829CrossRef Xia X, Lo D, Wang X et al (2016) Collective personalized change classification with multiobjective search. IEEE Trans Reliab 65(4):1810–1829CrossRef
Zurück zum Zitat Xie J, Szymanski BK (2011) Community detection using a neighborhood strength driven label propagation algorithm. In: 2011 IEEE Network Science Workshop. IEEE, pp 188–195 Xie J, Szymanski BK (2011) Community detection using a neighborhood strength driven label propagation algorithm. In: 2011 IEEE Network Science Workshop. IEEE, pp 188–195
Zurück zum Zitat Xu Z, Liu J, Yang Z, et al (2016) The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, pp 309–320 Xu Z, Liu J, Yang Z, et al (2016) The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, pp 309–320
Zurück zum Zitat Yang X, Lo D, Xia X, et al (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, pp 17–26 Yang X, Lo D, Xia X, et al (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, pp 17–26
Zurück zum Zitat Yang X, Lo D, Xia X et al (2017) Tlel: a two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220CrossRef Yang X, Lo D, Xia X et al (2017) Tlel: a two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220CrossRef
Zurück zum Zitat Yang Y, Zhou Y, Liu J, et al (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. Association for Computing Machinery, New York, pp 157–168 Yang Y, Zhou Y, Liu J, et al (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. Association for Computing Machinery, New York, pp 157–168
Zurück zum Zitat Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd annual meeting of the association for computational linguistics. Association for Computational Linguistics, Cambridge, pp 189–196 Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd annual meeting of the association for computational linguistics. Association for Computational Linguistics, Cambridge, pp 189–196
Zurück zum Zitat Yu Z, Su L, Li L et al (2010) Question classification based on co-training style semi-supervised learning. Pattern Recogn Lett 31(13):1975–1980CrossRef Yu Z, Su L, Li L et al (2010) Question classification based on co-training style semi-supervised learning. Pattern Recogn Lett 31(13):1975–1980CrossRef
Zurück zum Zitat Yu Z, Theisen C, Williams L et al (2019) Improving vulnerability inspection efficiency using active learning. IEEE Trans Softw Eng 47(11):2401–2420CrossRef Yu Z, Theisen C, Williams L et al (2019) Improving vulnerability inspection efficiency using active learning. IEEE Trans Softw Eng 47(11):2401–2420CrossRef
Zurück zum Zitat Yu Z, Fahid FM, Tu H, Menzies T (2022) Identifying self-admitted technical debts with jitterbug: a two-step approach. IEEE Trans Softw Eng 48(5):1676–1691CrossRef Yu Z, Fahid FM, Tu H, Menzies T (2022) Identifying self-admitted technical debts with jitterbug: a two-step approach. IEEE Trans Softw Eng 48(5):1676–1691CrossRef
Zurück zum Zitat Zhang F, Zheng Q, Zou Y, et al (2016b) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th ICSE (ICSE). IEEE, pp 309–320 Zhang F, Zheng Q, Zou Y, et al (2016b) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th ICSE (ICSE). IEEE, pp 309–320
Zurück zum Zitat Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empir Softw Eng 22(6):3186–3218CrossRef Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empir Softw Eng 22(6):3186–3218CrossRef
Zurück zum Zitat Zhang H, Zhang X, Gu M (2007) Predicting defective software components from code complexity measures. In: 13th Pacific RIM international symposium on dependable computing (PRDC 2007). IEEE, pp 93–96 Zhang H, Zhang X, Gu M (2007) Predicting defective software components from code complexity measures. In: 13th Pacific RIM international symposium on dependable computing (PRDC 2007). IEEE, pp 93–96
Zurück zum Zitat Zhang Q, Wang J, Gulzar MA et al (2020) Bigfuzz: Efficient fuzz testing for data analytics using framework abstraction. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. pp 722–733 Zhang Q, Wang J, Gulzar MA et al (2020) Bigfuzz: Efficient fuzz testing for data analytics using framework abstraction. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. pp 722–733
Zurück zum Zitat Zhang W, Li W, Jia X (2019) Effort-aware tri-training for semi-supervised just-in-time defect prediction. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 293–304 Zhang W, Li W, Jia X (2019) Effort-aware tri-training for semi-supervised just-in-time defect prediction. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 293–304
Zurück zum Zitat Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Autom Softw Eng 24(1):47–69CrossRef Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Autom Softw Eng 24(1):47–69CrossRef
Zurück zum Zitat Zhong S, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: HASE. Citeseer, pp 149–155 Zhong S, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: HASE. Citeseer, pp 149–155
Zurück zum Zitat Zhou D, Bousquet O, Lal T, et al (2003) Learning with local and global consistency. Adv Neural Inf Process Syst 16 Zhou D, Bousquet O, Lal T, et al (2003) Learning with local and global consistency. Adv Neural Inf Process Syst 16
Zurück zum Zitat Zhou Y, Goldman S (2004) Democratic co-learning. In: 16th IEEE international conference on tools with artificial intelligence. IEEE, pp 594–602 Zhou Y, Goldman S (2004) Democratic co-learning. In: 16th IEEE international conference on tools with artificial intelligence. IEEE, pp 594–602
Zurück zum Zitat Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw 83(4):660–674CrossRef Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw 83(4):660–674CrossRef
Zurück zum Zitat Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press
Zurück zum Zitat Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541CrossRef Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541CrossRef
Zurück zum Zitat Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. ProQuest Number: INFORMATION TO ALL USERS Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. ProQuest Number: INFORMATION TO ALL USERS
Zurück zum Zitat Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):1–130 Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):1–130
Zurück zum Zitat Zhu XJ (2005) Semi-supervised learning literature survey Zhu XJ (2005) Semi-supervised learning literature survey
Zurück zum Zitat Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third international workshop on predictor models in software engineering. IEEE Computer Society, p 9 Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third international workshop on predictor models in software engineering. IEEE Computer Society, p 9
Zurück zum Zitat Zimmermann T, Nagappan N, Gall H et al (2009b) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. Association for Computing Machinery, New York, pp 91–100 Zimmermann T, Nagappan N, Gall H et al (2009b) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. Association for Computing Machinery, New York, pp 91–100
Metadaten
Titel
When less is more: on the value of “co-training” for semi-supervised software defect predictors
verfasst von
Suvodeep Majumder
Joymallya Chakraborty
Tim Menzies
Publikationsdatum
01.03.2024
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2024
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10418-4

Weitere Artikel der Ausgabe 2/2024

Empirical Software Engineering 2/2024 Zur Ausgabe

Premium Partner