Skip to main content
Erschienen in: Empirical Software Engineering 5/2016

01.10.2016

Studying just-in-time defect prediction using cross-project models

verfasst von: Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, Ahmed E. Hassan

Erschienen in: Empirical Software Engineering | Ausgabe 5/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Unlike traditional defect prediction models that identify defect-prone modules, Just-In-Time (JIT) defect prediction models identify defect-inducing changes. As such, JIT defect models can provide earlier feedback for developers, while design decisions are still fresh in their minds. Unfortunately, similar to traditional defect models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this limitation in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from other projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT models in a cross-project context. Through an empirical study on 11 open source projects, we find that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that: (1) select models trained using other projects that are similar to the testing project, (2) combine the data of several other projects to produce a larger pool of training data, and (3) combine the models of several other projects to produce an ensemble model. Our findings empirically confirm that JIT models learned using other projects are a viable solution for projects with limited historical data. However, JIT models tend to perform best in a cross-project context when the data used to learn them are carefully selected.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
5
We choose domain-aware similarity techniques, similarity merge (5) using domain-aware similarity and weighted similarity voting using domain-aware similarity, which show the best median value in each RQ, and within-project JIT models as ideal models. We check the difference of the median values among the four models using Tukey’s HSD. If we find that there is not statistically significant difference between within- and one cross-project JIT models, we find no evidence of a difference in the performance of within- and cross-project models in those cases (i.e., the cross-project model perform well).
 
Literatur
Zurück zum Zitat Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRef Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRef
Zurück zum Zitat Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’12), pp 60–69 Bettenburg N, Nagappan M, Hassan AE (2012) Think locally, act globally: Improving defect and effort prediction models. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’12), pp 60–69
Zurück zum Zitat Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720CrossRef Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720CrossRef
Zurück zum Zitat Coolidge FL (2012) Statistics: A Gentle Introduction. SAGE Publications (3rd ed.) Coolidge FL (2012) Statistics: A Gentle Introduction. SAGE Publications (3rd ed.)
Zurück zum Zitat D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’10), pp 31–41 D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’10), pp 31–41
Zurück zum Zitat Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’14), pp 172–181 Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’14), pp 172–181
Zurück zum Zitat Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRef Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRef
Zurück zum Zitat Guo P J, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’10) vol 1, pp 495–504 Guo P J, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and predicting which bugs get fixed: An empirical study of microsoft windows. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’10) vol 1, pp 495–504
Zurück zum Zitat Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304CrossRef Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304CrossRef
Zurück zum Zitat Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’09), pp 78–88 Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’09), pp 78–88
Zurück zum Zitat He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Automated Software Engg 19(2):167–199CrossRef He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Automated Software Engg 19(2):167–199CrossRef
Zurück zum Zitat Jiang Y, Cukic B, Menzies T (2008) Can data transformation help in the detection of fault-prone modules?. In: Proc. Workshop on Defects in Large Software Systems (DEFECTS’08), pp 16–20 Jiang Y, Cukic B, Menzies T (2008) Can data transformation help in the detection of fault-prone modules?. In: Proc. Workshop on Defects in Large Software Systems (DEFECTS’08), pp 16–20
Zurück zum Zitat Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto Ki (2007) The effects of over and under sampling on fault-prone module detection. In: Proc. Int’l Symposium on Empirical Softw. Eng. and Measurement (ESEM’07), pp 196–204 Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto Ki (2007) The effects of over and under sampling on fault-prone module detection. In: Proc. Int’l Symposium on Empirical Softw. Eng. and Measurement (ESEM’07), pp 196–204
Zurück zum Zitat Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort aware models. In: Proc. Int’l Conf. on Software Maintenance (ICSM’10), pp 1–10 Kamei Y, Matsumoto S, Monden A, Matsumoto K, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort aware models. In: Proc. Int’l Conf. on Software Maintenance (ICSM’10), pp 1–10
Zurück zum Zitat Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773CrossRef Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773CrossRef
Zurück zum Zitat Kampstra P (2008) Beanplot: A boxplot alternative for visual comparison of distributions. J Stat Softw,Code Snippets 28(1):1–9 Kampstra P (2008) Beanplot: A boxplot alternative for visual comparison of distributions. J Stat Softw,Code Snippets 28(1):1–9
Zurück zum Zitat Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: Clean or buggy IEEE Trans Softw Eng 34(2):181–196CrossRef Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: Clean or buggy IEEE Trans Softw Eng 34(2):181–196CrossRef
Zurück zum Zitat Kocaguneli E, Menzies T, Keung J (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416CrossRef Kocaguneli E, Menzies T, Keung J (2012) On the value of ensemble effort estimation. IEEE Trans Softw Eng 38(6):1403–1416CrossRef
Zurück zum Zitat Koru AG, Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRef Koru AG, Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRef
Zurück zum Zitat Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
Zurück zum Zitat Li PL, Herbsleb J, Shaw M, Robinson B (2006) Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’06), pp 413–422 Li PL, Herbsleb J, Shaw M, Robinson B (2006) Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’06), pp 413–422
Zurück zum Zitat Matsumoto S, Kamei Y, Monden A, Matsumoto K (2010) An analysis of developer metrics for fault prediction. In: Proc. Int’l Conf. on Predictive Models in Softw. Eng. (PROMISE’10), pp 18:1–18:9 Matsumoto S, Kamei Y, Monden A, Matsumoto K (2010) An analysis of developer metrics for fault prediction. In: Proc. Int’l Conf. on Predictive Models in Softw. Eng. (PROMISE’10), pp 18:1–18:9
Zurück zum Zitat Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proc. Int’l Conf. on Predictive Models in Softw. Eng. (PROMISE’10), pp 47–54 Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proc. Int’l Conf. on Predictive Models in Softw. Eng. (PROMISE’10), pp 47–54
Zurück zum Zitat Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proc. Int’l Conf. on Automated Software Engineering (ASE’11), pp 343–351 Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proc. Int’l Conf. on Automated Software Engineering (ASE’11), pp 343–351
Zurück zum Zitat Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834CrossRef Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834CrossRef
Zurück zum Zitat Minku LL, Yao X (2014) How to make best use of cross-company data in software effort estimation?. In: Proc. Int’l Conf. on Software Engineering (ICSE’14), pp 446–456 Minku LL, Yao X (2014) How to make best use of cross-company data in software effort estimation?. In: Proc. Int’l Conf. on Software Engineering (ICSE’14), pp 446–456
Zurück zum Zitat Mısırlı AT, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Softw Qual J 19(3):515–536CrossRef Mısırlı AT, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Softw Qual J 19(3):515–536CrossRef
Zurück zum Zitat Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’09), pp 11–20 Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’09), pp 11–20
Zurück zum Zitat Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180CrossRef Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5(2):169–180CrossRef
Zurück zum Zitat Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’08), 181–190 Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’08), 181–190
Zurück zum Zitat Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’05), pp 284–292 Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’05), pp 284–292
Zurück zum Zitat Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’06), pp 452–461 Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’06), pp 452–461
Zurück zum Zitat Nam J, Pan S J, Kim S (2013) Transfer defect learning. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’13), pp 382–391 Nam J, Pan S J, Kim S (2013) Transfer defect learning. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’13), pp 382–391
Zurück zum Zitat Purushothaman R, Perry DE (2005) Toward understanding the rhetoric of small source code changes. IEEE Trans Softw Eng 31(6):511–526CrossRef Purushothaman R, Perry DE (2005) Toward understanding the rhetoric of small source code changes. IEEE Trans Softw Eng 31(6):511–526CrossRef
Zurück zum Zitat Rahman F, Posnett D, Devanbu P (2012) Recalling the ”imprecision” of cross-project defect prediction. In: Proc. Int’l Symposium on the Foundations of Softw. Eng. (FSE’12), pp 61:1–61:11 Rahman F, Posnett D, Devanbu P (2012) Recalling the ”imprecision” of cross-project defect prediction. In: Proc. Int’l Symposium on the Foundations of Softw. Eng. (FSE’12), pp 61:1–61:11
Zurück zum Zitat Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’08), pp 35–38 Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’08), pp 35–38
Zurück zum Zitat Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. PhD thesis, Queen’s University Shihab E (2012) An exploration of challenges limiting pragmatic software defect prediction. PhD thesis, Queen’s University
Zurück zum Zitat Shihab E, Hassan AE, Adams B, Jiang ZM (2012) An industrial study on the risk of software changes. In: Proc. Int’l Symposium on the Foundations of Softw. Eng. (FSE’12), pp 62:1–62:11 Shihab E, Hassan AE, Adams B, Jiang ZM (2012) An industrial study on the risk of software changes. In: Proc. Int’l Symposium on the Foundations of Softw. Eng. (FSE’12), pp 62:1–62:11
Zurück zum Zitat Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’05), pp 1–5 Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’05), pp 1–5
Zurück zum Zitat Tan M, Tan L, Dara S, Mayuex C (2015) Online defect prediction for imbalanced data. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’13 SEIP), (To appear) Tan M, Tan L, Dara S, Mayuex C (2015) Online defect prediction for imbalanced data. In: Proc. Int’l Conf. on Softw. Eng. (ICSE’13 SEIP), (To appear)
Zurück zum Zitat Thomas SW, Nagappan M, Blostein D, Hassan AE (2013) The impact of classifier configuration and classifier combination on bug localization. IEEE Trans Softw Eng 39(10):1427–1443CrossRef Thomas SW, Nagappan M, Blostein D, Hassan AE (2013) The impact of classifier configuration and classifier combination on bug localization. IEEE Trans Softw Eng 39(10):1427–1443CrossRef
Zurück zum Zitat Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empirical Softw Engg 17(1-2):62–74MathSciNetCrossRef Turhan B (2012) On the dataset shift problem in software engineering prediction models. Empirical Softw Engg 17(1-2):62–74MathSciNetCrossRef
Zurück zum Zitat Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14 (5):540–578CrossRef Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14 (5):540–578CrossRef
Zurück zum Zitat Turhan B, Tosun A, Bener A (2011) Empirical evaluation of mixed-project defect prediction models. In: Proc. EUROMICRO Conf. on Software Engineering and Advanced Applications (SEAA’11), pp 396–403 Turhan B, Tosun A, Bener A (2011) Empirical evaluation of mixed-project defect prediction models. In: Proc. EUROMICRO Conf. on Software Engineering and Advanced Applications (SEAA’11), pp 396–403
Zurück zum Zitat Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proc. European Softw. Eng. Conf. and Symposium on the Foundations of Softw. Eng. (ESEC/FSE’11), pp 15–25 Wu R, Zhang H, Kim S, Cheung SC (2011) Relink: recovering links between bugs and changes. In: Proc. European Softw. Eng. Conf. and Symposium on the Foundations of Softw. Eng. (ESEC/FSE’11), pp 15–25
Zurück zum Zitat Zhang F, Mockus A, Zou Y, Khomh F, Hassan AE (2013) How does context affect the distribution of software maintainability metrics?. In: Proc. Int’l Conf. on Software Maintenance (ICSM’13), pp 350–359 Zhang F, Mockus A, Zou Y, Khomh F, Hassan AE (2013) How does context affect the distribution of software maintainability metrics?. In: Proc. Int’l Conf. on Software Maintenance (ICSM’13), pp 350–359
Zurück zum Zitat Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’14), pp 182–191 Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proc. Int’l Working Conf. on Mining Software Repositories (MSR’14), pp 182–191
Zurück zum Zitat Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proc. European Softw. Eng. Conf. and Symposium on the Foundations of Softw. Eng. (ESEC/FSE’09), pp 91–100 Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proc. European Softw. Eng. Conf. and Symposium on the Foundations of Softw. Eng. (ESEC/FSE’09), pp 91–100
Metadaten
Titel
Studying just-in-time defect prediction using cross-project models
verfasst von
Yasutaka Kamei
Takafumi Fukushima
Shane McIntosh
Kazuhiro Yamashita
Naoyasu Ubayashi
Ahmed E. Hassan
Publikationsdatum
01.10.2016
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2016
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-015-9400-x

Weitere Artikel der Ausgabe 5/2016

Empirical Software Engineering 5/2016 Zur Ausgabe