Skip to main content
Erschienen in: Empirical Software Engineering 3/2022

01.05.2022

Revisiting process versus product metrics: a large scale analysis

verfasst von: Suvodeep Majumder, Pranav Mody, Tim Menzies

Erschienen in: Empirical Software Engineering | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98%/44% and AUCs of 95%/54%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
For examples of such papers, see Table 3, later in this paper.
 
2
Note, here, when referring to analytics in-the-small and analytics in-the-large, we are not comparing findings from a local vs global approach. Rather we compare results and findings summarized from analyzing small number of projects vs results and findings summarized from analyzing large number of projects.
 
3
232 and 179 citations respectively in Google Scholar, as of Sept 28, 2020.
 
4
This is because process metrics can be calculate using the change history of a file. While calculating the product metrics, the tool needs to download the specific version of the file, then go through the actual code to gather the necessary statistics to calculate the actual metrics.
 
5
https://www.acm.org/publications/policies/artifact-review-and-badging-current
 
6
To be clear: technically speaking, this paper is a partial reproduction of Rahman et al. or Kamei et al. When we tried their methodology, we found in some cases, our results needed a slightly different approach (see Section 3.4).
 
7
Note to reviewers: Our data is so large we cannot place it in the Github repo. Zenodo.org will host our data. https://github.com/Suvodeep90/Revisit_process_product only contains a sample of our data. We will link that repository to link to data stored at Zenodo.org.
 
8
The keywords used are - bug, fix, error, issue, crash, problem, fail, defect and patch. These keywords are taken used by Rosen et al. in their commit_guru (Rosen et al. 2015) paper.
 
9
From this point onwards, we will denote the commit which has bugs in them as a “buginducing”
 
10
http://www.scitools.com/
 
11
https://scikit-learn.org/stable/index.html
 
Literatur
Zurück zum Zitat Agrawal A, Menzies T (2018) Is better data better than better data miners?: on the benefits of tuning smote for defect prediction. In: IST. ACM Agrawal A, Menzies T (2018) Is better data better than better data miners?: on the benefits of tuning smote for defect prediction. In: IST. ACM
Zurück zum Zitat Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Information and Software Technology 98:74–88CrossRef Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Information and Software Technology 98:74–88CrossRef
Zurück zum Zitat Agrawal A, Menzies T (2017) Better data is better than better data miners (benefits of tuning SMOTE for defect prediction). arXiv:1705.03697 Agrawal A, Menzies T (2017) Better data is better than better data miners (benefits of tuning SMOTE for defect prediction). arXiv:1705.03697
Zurück zum Zitat Agrawal A, Rahman A, Krishna R, Sobran A, Menzies T (2018) We don’t need another hero? the impact of heroes on software development. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. pp 245–253 Agrawal A, Rahman A, Krishna R, Sobran A, Menzies T (2018) We don’t need another hero? the impact of heroes on software development. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. pp 245–253
Zurück zum Zitat Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 1–10 Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 1–10
Zurück zum Zitat Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: ESEM. ACM Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: ESEM. ACM
Zurück zum Zitat Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83(1):2–17CrossRef Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83(1):2–17CrossRef
Zurück zum Zitat Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10):751–761CrossRef Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10):751–761CrossRef
Zurück zum Zitat Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: Using socio-technical networks to predict failures. In: ISSRE Bird C, Nagappan N, Gall H, Murphy B, Devanbu P (2009) Putting it all together: Using socio-technical networks to predict failures. In: ISSRE
Zurück zum Zitat Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality? an empirical case study of windows vista. In: 2009 IEEE 31st international conference on software engineering. IEEE, pp 518–528 Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality? an empirical case study of windows vista. In: 2009 IEEE 31st international conference on software engineering. IEEE, pp 518–528
Zurück zum Zitat Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code! examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 4–14 Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code! examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 4–14
Zurück zum Zitat Briand LC, Brasili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11):1028–1044CrossRef Briand LC, Brasili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11):1028–1044CrossRef
Zurück zum Zitat Cao Y, Ding Z, Xue F, Rong X (2018) An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. International Journal of Bio-Inspired Computation 11(4):282–291CrossRef Cao Y, Ding Z, Xue F, Rong X (2018) An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. International Journal of Bio-Inspired Computation 11(4):282–291CrossRef
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357CrossRefMATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357CrossRefMATH
Zurück zum Zitat Chen D, Fu W, Krishna R, Menzies T (2018) Applications of psychological science for actionable analytics. FSE’19 Chen D, Fu W, Krishna R, Menzies T (2018) Applications of psychological science for actionable analytics. FSE’19
Zurück zum Zitat Chen D, Stolee KT, Menzies T (2019) Replication can improve prior results: A github study of pull request acceptance. In: Proceedings of the 27th international conference on program comprehension, ICPC ’19. IEEE Press, pp 179–190 Chen D, Stolee KT, Menzies T (2019) Replication can improve prior results: A github study of pull request acceptance. In: Proceedings of the 27th international conference on program comprehension, ICPC ’19. IEEE Press, pp 179–190
Zurück zum Zitat Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Computers & Electrical Engineering 67:15–24CrossRef Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Computers & Electrical Engineering 67:15–24CrossRef
Zurück zum Zitat D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 31–41 D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 31–41
Zurück zum Zitat Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Mono Stat Appl Probab, London Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Mono Stat Appl Probab, London
Zurück zum Zitat Fenton NE, Neil M (2000) Software metrics: roadmap. In: Proceedings of the conference on the future of software engineering. pp 357–370 Fenton NE, Neil M (2000) Software metrics: roadmap. In: Proceedings of the conference on the future of software engineering. pp 357–370
Zurück zum Zitat Fu W, Menzies T, Shen X (2016) Tuning for software analytics: Is it really necessary? Information and Software Technology 76:135–146CrossRef Fu W, Menzies T, Shen X (2016) Tuning for software analytics: Is it really necessary? Information and Software Technology 76:135–146CrossRef
Zurück zum Zitat Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software: Practice and Experience 41(5):579–606 Gao K, Khoshgoftaar TM, Wang H, Seliya N (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Software: Practice and Experience 41(5):579–606
Zurück zum Zitat Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 2015 37th ICSE Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 2015 37th ICSE
Zurück zum Zitat Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th ICSE, vol 1. IEEE Press, pp 789–800 Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: 37th ICSE, vol 1. IEEE Press, pp 789–800
Zurück zum Zitat Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement. IEEE, pp 171–180 Giger E, D’Ambros M, Pinzger M, Gall HC (2012) Method-level bug prediction. In: Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement. IEEE, pp 171–180
Zurück zum Zitat Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. TSE Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. TSE
Zurück zum Zitat He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19(2):167–199CrossRef He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering 19(2):167–199CrossRef
Zurück zum Zitat Herbsleb J (2014) Socio-technical coordination (keynote). I: Companion Proceedings of the 36th international conference on software engineering, ICSE Companion 2014. Association for Computing Machinery, New York, NY, USA, p 1 Herbsleb J (2014) Socio-technical coordination (keynote). I: Companion Proceedings of the 36th international conference on software engineering, ICSE Companion 2014. Association for Computing Machinery, New York, NY, USA, p 1
Zurück zum Zitat Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170 Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170
Zurück zum Zitat Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 International conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257 Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 International conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257
Zurück zum Zitat Jacob SG, et al. (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int J Comput Appl 117(23) Jacob SG, et al. (2015) Improved random forest algorithm for software defect prediction through data mining techniques. Int J Comput Appl 117(23)
Zurück zum Zitat Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, NY, USA, pp 92–101 Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, NY, USA, pp 92–101
Zurück zum Zitat Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance. pp 1–10 Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance. pp 1–10
Zurück zum Zitat Kamei Y, Matsumoto S, Monden A, Matsumoto K-I, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE International Conference on Software Maintenance. IEEE, pp 1–10 Kamei Y, Matsumoto S, Monden A, Matsumoto K-I, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE International Conference on Software Maintenance. IEEE, pp 1–10
Zurück zum Zitat Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K-I (2007) The effects of over and under sampling on fault-prone module detection. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 196–204 Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto K-I (2007) The effects of over and under sampling on fault-prone module detection. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 196–204
Zurück zum Zitat Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757–773CrossRef Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757–773CrossRef
Zurück zum Zitat Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176 Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176
Zurück zum Zitat Kondo M, German DM, Mizuno O, Choi E-H (2020) The impact of context metrics on just-in-time defect prediction. Empirical Software Engineering 25(1):890–939CrossRef Kondo M, German DM, Mizuno O, Choi E-H (2020) The impact of context metrics on just-in-time defect prediction. Empirical Software Engineering 25(1):890–939CrossRef
Zurück zum Zitat Krishna R, Menzies T (2018) Bellwethers: A baseline method for transfer learning. IEEE Trans Softw Eng Krishna R, Menzies T (2018) Bellwethers: A baseline method for transfer learning. IEEE Trans Softw Eng
Zurück zum Zitat Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Software 12(3):161–175CrossRef Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Software 12(3):161–175CrossRef
Zurück zum Zitat Lumpe M, Vasa R, Menzies T, Rush R, Turhan B (2012) Learning better inspection optimization policies. International Journal of Software Engineering and Knowledge Engineering 22(5):621–644CrossRef Lumpe M, Vasa R, Menzies T, Rush R, Turhan B (2012) Learning better inspection optimization policies. International Journal of Software Engineering and Knowledge Engineering 22(5):621–644CrossRef
Zurück zum Zitat Madeyski L (2006) Is external code quality correlated with programming experience or feelgood factor? In: International conference on extreme programming and agile processes in software engineering. Springer, pp 65–74 Madeyski L (2006) Is external code quality correlated with programming experience or feelgood factor? In: International conference on extreme programming and agile processes in software engineering. Springer, pp 65–74
Zurück zum Zitat Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal 23(3):393–422CrossRef Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal 23(3):393–422CrossRef
Zurück zum Zitat Mathew G, Agrawal A, Menzies T (2017) Trends in topics at se conferences (1993-2013). In: 2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C). IEEE, pp 397–398 Mathew G, Agrawal A, Menzies T (2017) Trends in topics at se conferences (1993-2013). In: 2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C). IEEE, pp 397–398
Zurück zum Zitat Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: 6th PROMISE Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: 6th PROMISE
Zurück zum Zitat Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. TSE Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. TSE
Zurück zum Zitat Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: Current results, limitations, new approaches. ASE Menzies T, Milton Z, Turhan B, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: Current results, limitations, new approaches. ASE
Zurück zum Zitat Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(1):2–13CrossRef Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(1):2–13CrossRef
Zurück zum Zitat Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 554–563 Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR). IEEE, pp 554–563
Zurück zum Zitat Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on Predictor models in software engineering. ACM, pp 47–54 Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on Predictor models in software engineering. ACM, pp 47–54
Zurück zum Zitat Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Transactions on Software Engineering 39(4):537–551CrossRef Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Transactions on Software Engineering 39(4):537–551CrossRef
Zurück zum Zitat Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering, ICSE ’08. Association for Computing Machinery, New York, NY, USA, pp 181–190 Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering, ICSE ’08. Association for Computing Machinery, New York, NY, USA, pp 181–190
Zurück zum Zitat Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International conference on software engineering. ACM, pp 181–190 Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International conference on software engineering. ACM, pp 181–190
Zurück zum Zitat Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empirical Software Engineering 22(6):3219–3253CrossRef Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empirical Software Engineering 22(6):3219–3253CrossRef
Zurück zum Zitat Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 364–373 Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: First international symposium on empirical software engineering and measurement (ESEM 2007). IEEE, pp 364–373
Zurück zum Zitat Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461 Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM, pp 452–461
Zurück zum Zitat Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering. IEEE, pp 309–318 Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21st international symposium on software reliability engineering. IEEE, pp 309–318
Zurück zum Zitat Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE TSE Nam J, Fu W, Kim S, Menzies T, Tan L (2018) Heterogeneous defect prediction. IEEE TSE
Zurück zum Zitat Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391 Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 382–391
Zurück zum Zitat Nayrolles M, Hamou-Lhadj A (2018) Clever: combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: Proceedings of the 15th international conference on mining software repositories. pp 153–164 Nayrolles M, Hamou-Lhadj A (2018) Clever: combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: Proceedings of the 15th international conference on mining software repositories. pp 153–164
Zurück zum Zitat Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62:1–16CrossRef Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62:1–16CrossRef
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: ISSTA ’04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis. ACM, New York, NY, USA, pp 86–96 Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: ISSTA ’04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis. ACM, New York, NY, USA, pp 86–96
Zurück zum Zitat Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22(2):199–210CrossRef Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks 22(2):199–210CrossRef
Zurück zum Zitat Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers? In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209 Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers? In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209
Zurück zum Zitat Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. Journal of Systems and Software 150:22–36CrossRef Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. Journal of Systems and Software 150:22–36CrossRef
Zurück zum Zitat Pascarella L, Palomba F, Bacchelli A (2020) On the performance of method-level bug prediction: A negative result. Journal of Systems and Software 161:110493CrossRef Pascarella L, Palomba F, Bacchelli A (2020) On the performance of method-level bug prediction: A negative result. Journal of Systems and Software 161:110493CrossRef
Zurück zum Zitat Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: A systematic literature review. Information and Software Technology 55(8):1397–1418CrossRef Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: A systematic literature review. Information and Software Technology 55(8):1397–1418CrossRef
Zurück zum Zitat Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd international conference on software engineering. pp 491–500 Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd international conference on software engineering. pp 491–500
Zurück zum Zitat Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 432–441 Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 432–441
Zurück zum Zitat Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 432–441 Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 432–441
Zurück zum Zitat Rahman F, Khatri S, Barr ET, Devanbu P (2014a) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering, ICSE 2014. Association for Computing Machinery, New York, NY, USA, pp 424–434 Rahman F, Khatri S, Barr ET, Devanbu P (2014a) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering, ICSE 2014. Association for Computing Machinery, New York, NY, USA, pp 424–434
Zurück zum Zitat Rahman F, Khatri S, Barr ET, Devanbu P (2014b) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering. ACM, pp 424–434 Rahman F, Khatri S, Barr ET, Devanbu P (2014b) Comparing static bug finders and statistical prediction. In: Proceedings of the 36th international conference on software engineering. ACM, pp 424–434
Zurück zum Zitat Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. pp 147–157 Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. pp 147–157
Zurück zum Zitat Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) Bugcache for inspections: hit or miss? In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 322–331 Rahman F, Posnett D, Hindle A, Barr E, Devanbu P (2011) Bugcache for inspections: hit or miss? In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. pp 322–331
Zurück zum Zitat Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. ESEC/FSE 2015 Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. ESEC/FSE 2015
Zurück zum Zitat Rosen C, Grawi B, Shihab E (2015) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969 Rosen C, Grawi B, Shihab E (2015) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 966–969
Zurück zum Zitat Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering 21(1):43–71CrossRef Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering 21(1):43–71CrossRef
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Information Sciences 259:571–595CrossRef Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Information Sciences 259:571–595CrossRef
Zurück zum Zitat Seliya N, Khoshgoftaar TM, Van Hulse J (2010) Predicting faults in high assurance software. In: 2010 IEEE 12th international symposium on high assurance systems engineering. IEEE, pp 26–34 Seliya N, Khoshgoftaar TM, Van Hulse J (2010) Predicting faults in high assurance software. In: 2010 IEEE 12th international symposium on high assurance systems engineering. IEEE, pp 26–34
Zurück zum Zitat Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? EMSE Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? EMSE
Zurück zum Zitat Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4):341–359MathSciNetCrossRefMATH Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4):341–359MathSciNetCrossRefMATH
Zurück zum Zitat Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Transactions on Software Engineering 29(4):297–310CrossRef Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Transactions on Software Engineering 29(4):297–310CrossRef
Zurück zum Zitat Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6):1806–1817CrossRef Sun Z, Song Q, Zhu X (2012) Using coding-based ensemble learning to improve software defect prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6):1806–1817CrossRef
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering pp 1–1 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering pp 1–1
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 812–823 Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 812–823
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE 2016. ACM, pp 321–332 Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: ICSE 2016. ACM, pp 321–332
Zurück zum Zitat Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45(7):683–711CrossRef Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45(7):683–711CrossRef
Zurück zum Zitat Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowledge-Based Systems 81:131–147CrossRef Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowledge-Based Systems 81:131–147CrossRef
Zurück zum Zitat Tu H, Nair V (2018) While tuning is good, no tuner is best. In: FSE SWAN Tu H, Nair V (2018) While tuning is good, no tuner is best. In: FSE SWAN
Zurück zum Zitat Tu H, Yu Z, Menzies T (2020) Better data labelling with emblem (and how that impacts defect prediction). IEEE Trans Softw Eng Tu H, Yu Z, Menzies T (2020) Better data labelling with emblem (and how that impacts defect prediction). IEEE Trans Softw Eng
Zurück zum Zitat Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14(5):540–578CrossRef Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14(5):540–578CrossRef
Zurück zum Zitat Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability 62(2):434–443CrossRef Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability 62(2):434–443CrossRef
Zurück zum Zitat Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering 13(5):539–559CrossRef Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering 13(5):539–559CrossRef
Zurück zum Zitat Williams C, Spacco J (2008) Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on Defects in large software systems. ACM, pp 32–36 Williams C, Spacco J (2008) Szz revisited: verifying when changes induce fixes. In: Proceedings of the 2008 workshop on Defects in large software systems. ACM, pp 32–36
Zurück zum Zitat Xia T, Krishna R, Chen J, Mathew G, Shen X, Menzies T (2018) Hyperparameter optimization for effort estimation. arXiv:1805.00336 Xia T, Krishna R, Chen J, Mathew G, Shen X, Menzies T (2018) Hyperparameter optimization for effort estimation. arXiv:1805.00336
Zurück zum Zitat Xia X, Bao L, Lo D, Li S (2016) Automated debugging considered harmful considered harmful: A user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 267–278 Xia X, Bao L, Lo D, Li S (2016) Automated debugging considered harmful considered harmful: A user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 267–278
Zurück zum Zitat Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering 42(10):977–998CrossRef Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering 42(10):977–998CrossRef
Zurück zum Zitat Xia X, Lo D, Wang X, Yang X (2016) Collective personalized change classification with multiobjective search. IEEE Transactions on Reliability 65(4):1810–1829CrossRef Xia X, Lo D, Wang X, Yang X (2016) Collective personalized change classification with multiobjective search. IEEE Transactions on Reliability 65(4):1810–1829CrossRef
Zurück zum Zitat Yang X, Lo D, Xia X, Sun Jianling (2017) Tlel: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87:206–220CrossRef Yang X, Lo D, Xia X, Sun Jianling (2017) Tlel: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87:206–220CrossRef
Zurück zum Zitat Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE international conference on software quality, reliability and security. IEEE, pp 17–26 Yang X, Lo D, Xia X, Zhang Y, Sun J (2015) Deep learning for just-in-time defect prediction. In: 2015 IEEE international conference on software quality, reliability and security. IEEE, pp 17–26
Zurück zum Zitat Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168 Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 157–168
Zurück zum Zitat Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. pp 689–699 Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. pp 689–699
Zurück zum Zitat Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empirical Software Engineering 22(6):3186–3218CrossRef Zhang F, Keivanloo I, Zou Y (2017) Data transformation in cross-project defect prediction. Empirical Software Engineering 22(6):3186–3218CrossRef
Zurück zum Zitat Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 309–320 Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 309–320
Zurück zum Zitat Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009 IEEE international conference on software maintenance. IEEE, pp 274–283 Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009 IEEE international conference on software maintenance. IEEE, pp 274–283
Zurück zum Zitat Zhang H, Zhang X, Gu M (2007) Predicting defective software components from code complexity measures. In: 13th Pacific Rim international symposium on dependable computing (PRDC 2007). IEEE, pp 93–96 Zhang H, Zhang X, Gu M (2007) Predicting defective software components from code complexity measures. In: 13th Pacific Rim international symposium on dependable computing (PRDC 2007). IEEE, pp 93–96
Zurück zum Zitat Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering 32(10):771–789CrossRef Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering 32(10):771–789CrossRef
Zurück zum Zitat Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software 83(4):660–674CrossRef Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software 83(4):660–674CrossRef
Zurück zum Zitat Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, pp 91–100 Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, pp 91–100
Zurück zum Zitat Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third international workshop on predictor models in software engineering. IEEE Computer Society, p 9 Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third international workshop on predictor models in software engineering. IEEE Computer Society, p 9
Metadaten
Titel
Revisiting process versus product metrics: a large scale analysis
verfasst von
Suvodeep Majumder
Pranav Mody
Tim Menzies
Publikationsdatum
01.05.2022
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 3/2022
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-021-10068-4

Weitere Artikel der Ausgabe 3/2022

Empirical Software Engineering 3/2022 Zur Ausgabe

Premium Partner