Skip to main content
Top
Published in: Empirical Software Engineering 7/2022

01-12-2022

FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction

Authors: Tanghaoran Zhang, Yue Yu, Xinjun Mao, Yao Lu, Zhixing Li, Huaimin Wang

Published in: Empirical Software Engineering | Issue 7/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Context:

Just-in-time defect prediction (JITDP) leverages modern machine learning models to predict the defect-proneness of commits. Such models require adequate training data, which is unavailable in projects with short histories. To address this problem, cross-project methods reuse the data or models in other projects to make predictions, grounded on the assumption that they share similar defect-related features. However, these features are overlooked, which leads to unsatisfying model performance.

Objective:

This study aims to investigate the relationship between cross-project JITDP performances and project features, thereby improving the performance of cross-project models.

Method:

We propose a F eature-based ENSE mble modeling approach (FENSE) to cross-project JITDP. For a target project, FENSE pairs it to each source project and obtains 20 features. Leveraging them, it can predict the transferability of each off-the-shelf JITDP model. Then FENSE identifies the most transferable ones and combines them to make cross-project predictions. To achieve this, we conduct a large-scale empirical study of 113,906 project pairs in GitHub and investigate the impact of project features.

Results:

The results show that: (1) cross-project transferability is highly related to features including programming language and the defect ratio of the source project; (2) our feature-based model selection scheme can improve the cross-project JITDP performance by 10%; (3) FENSE outperforms other models on five evaluation measures without extra time and space costs.

Conclusions:

Our study suggests that project features can help identify powerful cross-project JITDP models and improve the performance of ensemble approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Aversano L, Cerulo L, Del Grosso C (2007) Learning from bug-introducing changes to prevent fault prone code. In: Ninth International Workshop on Principles of Software Evolution: In Conjunction with the 6th ESEC/FSE Joint Meeting, Association for Computing Machinery, New York, NY, USA, IWPSE ’07, pp. 19–26, https://doi.org/10.1145/1294948.1294954 Aversano L, Cerulo L, Del Grosso C (2007) Learning from bug-introducing changes to prevent fault prone code. In: Ninth International Workshop on Principles of Software Evolution: In Conjunction with the 6th ESEC/FSE Joint Meeting, Association for Computing Machinery, New York, NY, USA, IWPSE ’07, pp. 19–26, https://​doi.​org/​10.​1145/​1294948.​1294954
go back to reference Cabral G G, Minku L L, Shihab E, Mujahid S (2019) Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction. In: Proceedings - International Conference on Software Engineering, IEEE, vol. 2019-May, pp. 666–676, https://doi.org/10.1109/ICSE.2019.00076 Cabral G G, Minku L L, Shihab E, Mujahid S (2019) Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction. In: Proceedings - International Conference on Software Engineering, IEEE, vol. 2019-May, pp. 666–676, https://​doi.​org/​10.​1109/​ICSE.​2019.​00076
go back to reference Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: Proceedings of the 6th International Conference on Mobile Software Engineering and Systems, IEEE Press, MOBILESoft ’19, pp. 99–110 Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: Proceedings of the 6th International Conference on Mobile Software Engineering and Systems, IEEE Press, MOBILESoft ’19, pp. 99–110
go back to reference Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An Empirical Study of Just-in-Time Defect Prediction Using Cross-Project Models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, New York, NY, USA, MSR 2014, pp 172–181, https://doi.org/10.1145/2597073.2597075 Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An Empirical Study of Just-in-Time Defect Prediction Using Cross-Project Models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, New York, NY, USA, MSR 2014, pp 172–181, https://​doi.​org/​10.​1145/​2597073.​2597075
go back to reference Guo P J, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, Association for Computing Machinery, New York, NY, USA, ICSE ’10, p 495–504, https://doi.org/10.1145/1806799.1806871 Guo P J, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, Association for Computing Machinery, New York, NY, USA, ICSE ’10, p 495–504, https://​doi.​org/​10.​1145/​1806799.​1806871
go back to reference Ho T K (1995) Random Decision Forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, IEEE Computer Society, USA, ICDAR ’95, p. 278 Ho T K (1995) Random Decision Forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, IEEE Computer Society, USA, ICDAR ’95, p. 278
go back to reference Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In: IEEE International Working Conference on Mining Software Repositories, IEEE, vol. 2019-May, pp 34–45, https://doi.org/10.1109/MSR.2019.00016 Hoang T, Khanh Dam H, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In: IEEE International Working Conference on Mining Software Repositories, IEEE, vol. 2019-May, pp 34–45, https://​doi.​org/​10.​1109/​MSR.​2019.​00016
go back to reference Huang Q, Xia X, Lo D (2017) Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, pp. 159–170, https://doi.org/10.1109/icsme.2017.51 Huang Q, Xia X, Lo D (2017) Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, pp. 159–170, https://​doi.​org/​10.​1109/​icsme.​2017.​51
go back to reference Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence, pp. 2–7, https://doi.org/10.1109/ACIT-CSI.2015.104 Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence, pp. 2–7, https://​doi.​org/​10.​1109/​ACIT-CSI.​2015.​104
go back to reference Krishna R, Menzies T, Fu W (2016) Too much automation? the bellwether effect and its implications for transfer learning. In: ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131, https://doi.org/10.1145/2970276.2970339 Krishna R, Menzies T, Fu W (2016) Too much automation? the bellwether effect and its implications for transfer learning. In: ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131, https://​doi.​org/​10.​1145/​2970276.​2970339
go back to reference Liu J, Zhou Y, Yang Y, Lu H, Xu B (2017) Code Churn: A Neglected Metric in Effort-Aware Just-in-Time Defect Prediction. In: International Symposium on Empirical Software Engineering and Measurement, vol 2017-Novem, pp 11–19, DOI https://doi.org/10.1109/ESEM.2017.8 Liu J, Zhou Y, Yang Y, Lu H, Xu B (2017) Code Churn: A Neglected Metric in Effort-Aware Just-in-Time Defect Prediction. In: International Symposium on Empirical Software Engineering and Measurement, vol 2017-Novem, pp 11–19, DOI https://​doi.​org/​10.​1109/​ESEM.​2017.​8
go back to reference Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, Association for Computing Machinery, New York, NY, USA, PROMISE ’10, https://doi.org/10.1145/1868328.1868356 Matsumoto S, Kamei Y, Monden A, Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, Association for Computing Machinery, New York, NY, USA, PROMISE ’10, https://​doi.​org/​10.​1145/​1868328.​1868356
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH
go back to reference Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python framework for mining software repositories. In: ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 908–911, https://doi.org/10.1145/3236024.3264598 Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python framework for mining software repositories. In: ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 908–911, https://​doi.​org/​10.​1145/​3236024.​3264598
go back to reference Tantithamthavorn C, Hassan A E (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 286–295 Tantithamthavorn C, Hassan A E (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 286–295
go back to reference Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18CrossRef Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43(1):1–18CrossRef
go back to reference Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2018) The impact of automated parameter optimization for defect prediction models. IEEE Trans Softw Eng 45(7):683–711CrossRef Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K (2018) The impact of automated parameter optimization for defect prediction models. IEEE Trans Softw Eng 45(7):683–711CrossRef
go back to reference Tosun A, Bener A (2009) Reducing false alarms in software defect prediction by decision threshold optimization. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, USA, ESEM’09, pp. 477–480 Tosun A, Bener A (2009) Reducing false alarms in software defect prediction by decision threshold optimization. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, USA, ESEM’09, pp. 477–480
go back to reference Wu R, Zhang H, Kim S, Cheung S-C (2011) ReLink: Recovering Links between Bugs and Changes. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE ’11, pp 15–25, https://doi.org/10.1145/2025113.2025120 Wu R, Zhang H, Kim S, Cheung S-C (2011) ReLink: Recovering Links between Bugs and Changes. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE ’11, pp 15–25, https://​doi.​org/​10.​1145/​2025113.​2025120
go back to reference Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-Aware just-in-Time defect prediction: Simple unsupervised models could be better than supervised models. In: Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, vol. 13-18-Nove, pp 157–168, https://doi.org/10.1145/2950290.295035 Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-Aware just-in-Time defect prediction: Simple unsupervised models could be better than supervised models. In: Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, vol. 13-18-Nove, pp 157–168, https://​doi.​org/​10.​1145/​2950290.​295035
go back to reference Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep Just-in-Time Defect Prediction: How Far Are We?. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Association for Computing Machinery, New York, NY, USA, ISSTA 2021, pp 427–438, https://doi.org/10.1145/3460319.3464819 Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep Just-in-Time Defect Prediction: How Far Are We?. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Association for Computing Machinery, New York, NY, USA, ISSTA 2021, pp 427–438, https://​doi.​org/​10.​1145/​3460319.​3464819
go back to reference Zhou Z-H (2012) Ensemble methods: Foundations and algorithms Zhou Z-H (2012) Ensemble methods: Foundations and algorithms
go back to reference Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-Project Defect Prediction: A Large Scale Experiment on Data vs. Domain vs. Process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE ’09, pp 91–100, https://doi.org/10.1145/1595696.1595713 Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-Project Defect Prediction: A Large Scale Experiment on Data vs. Domain vs. Process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Association for Computing Machinery, New York, NY, USA, ESEC/FSE ’09, pp 91–100, https://​doi.​org/​10.​1145/​1595696.​1595713
Metadata
Title
FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction
Authors
Tanghaoran Zhang
Yue Yu
Xinjun Mao
Yao Lu
Zhixing Li
Huaimin Wang
Publication date
01-12-2022
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 7/2022
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10185-8

Other articles of this Issue 7/2022

Empirical Software Engineering 7/2022 Go to the issue

Premium Partner