Skip to main content
Erschienen in: Empirical Software Engineering 4-5/2012

01.08.2012

Time variance and defect prediction in software projects

Towards an exploitation of periods of stability and change as well as a notion of concept drift in software projects

verfasst von: Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, Abraham Bernstein

Erschienen in: Empirical Software Engineering | Ausgabe 4-5/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project’s goals. In this paper we first verify the existence of variability in a bug prediction model’s accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a high variability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models’ accuracy in advance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
4
Tables can be found in Appendix A.
 
5
Note that a complete description can be found in Appendix B and that for all features where authorship is relevant it is determined as the person committing the code into the CVS rather than the developer noted in the comments of the code. However, most of active contributors are committers of a project. For example in the PDT project (http://​www.​eclipse.​org/​pdt/​people/​contributors.​php#Seva-%28Wsevolod%29-Lapsha), out of 12 participants 11 of them are committers. Hence, this assumption will not have a great impact on the outcome of the experiments.
 
6
E.g. how the individual committers coding behavior synchronizes towards a milestone.
 
7
Note that we used the Mann–Whitney test as the test for normality (one-Sample Kolmogorov–Smirnov test: p = 0.055) produced a borderline result. As some still use the t-test for large collections of slightly non-parametric data we also ran an independent-sample t-test and found it to be significant at α = 0.001.
 
8
Like above a t-test reconfirmed these findings at α = 0.001.
 
9
More precisely, we used FixCache as BugCache is only the theoretical model behind the method. Nevertheless, BugCache is the often-used term for both methods.
 
10
Note that the observed number of models (162) that pick random features is significantly different from the expected number of models (1,425) according to a χ 2-test (p < 0.001).
 
11
You can find a complete set of the figures in the technical report Ekanayake et al. (2011) online. http://​www.​ifi.​uzh.​ch/​research/​publications/​technical-reports.​html.
 
Literatur
Zurück zum Zitat Ancona D, Chong CL (1996) Entrainment: pace, cycle, and rhythm in organizational behavior. In: Research in organizational behavior, vol 18. JAI Press, Greenwich, pp 251–284 Ancona D, Chong CL (1996) Entrainment: pace, cycle, and rhythm in organizational behavior. In: Research in organizational behavior, vol 18. JAI Press, Greenwich, pp 251–284
Zurück zum Zitat Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the Center for Advanced Studies on Collaborative Research (CASCON). ACM, New York, pp 304–318CrossRef Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the Center for Advanced Studies on Collaborative Research (CASCON). ACM, New York, pp 304–318CrossRef
Zurück zum Zitat Bachmann A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Tech. Rep. IFI-2009.0003, University of Zurich, Department of Informatics Bachmann A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Tech. Rep. IFI-2009.0003, University of Zurich, Department of Informatics
Zurück zum Zitat Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: IWPSE ’07: ninth international workshop on principles of software evolution, ACM, New York, pp 11–18. doi:10.1145/1294948.1294953 Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: IWPSE ’07: ninth international workshop on principles of software evolution, ACM, New York, pp 11–18. doi:10.​1145/​1294948.​1294953
Zurück zum Zitat Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). ACM, New York, pp 121–130CrossRef Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). ACM, New York, pp 121–130CrossRef
Zurück zum Zitat Brooks FP, Phillips F (1995) The mythical man-month: essays on software engineering. Addison-Wesley, Reading Brooks FP, Phillips F (1995) The mythical man-month: essays on software engineering. Addison-Wesley, Reading
Zurück zum Zitat Diehl S, Gall HC, Hassan AE (2009) Guest editors introduction: special issue on mining software repositories. Empir Software Eng 14(3):257–261CrossRef Diehl S, Gall HC, Hassan AE (2009) Guest editors introduction: special issue on mining software repositories. Empir Software Eng 14(3):257–261CrossRef
Zurück zum Zitat Eaddy M, Zimmermann T, Sherwood KD, Garg V, Murphy GC, Nagappan N, Aho AV (2008) Do crosscutting concerns cause defects? IEEE Trans Softw Eng 34(4):497–515CrossRef Eaddy M, Zimmermann T, Sherwood KD, Garg V, Murphy GC, Nagappan N, Aho AV (2008) Do crosscutting concerns cause defects? IEEE Trans Softw Eng 34(4):497–515CrossRef
Zurück zum Zitat Ekanayake J, Tappolet J, Gall HC, Bernstein A (2011) Time variance and defect prediction in software projects—additional figures. Tech. Rep. IFI-2011.0004, University of Zurich, Department of Informatics Ekanayake J, Tappolet J, Gall HC, Bernstein A (2011) Time variance and defect prediction in software projects—additional figures. Tech. Rep. IFI-2011.0004, University of Zurich, Department of Informatics
Zurück zum Zitat Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, Washington, DC, pp 78–88. doi:10.1109/ICSE.2009.5070510 Hassan AE (2009) Predicting faults using the complexity of code changes. In: ICSE ’09: Proceedings of the 31st international conference on software engineering. IEEE Computer Society, Washington, DC, pp 78–88. doi:10.​1109/​ICSE.​2009.​5070510
Zurück zum Zitat Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: ICSM ’05: Proceedings of the 21st IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 263–272. doi:10.1109/ICSM.2005.91 CrossRef Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: ICSM ’05: Proceedings of the 21st IEEE international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 263–272. doi:10.​1109/​ICSM.​2005.​91 CrossRef
Zurück zum Zitat Kagdi H, Collard ML, Maletic JI (2007) A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J Softw Maint Evol 19(2):77–131. doi:10.1002/smr.344 CrossRef Kagdi H, Collard ML, Maletic JI (2007) A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J Softw Maint Evol 19(2):77–131. doi:10.​1002/​smr.​344 CrossRef
Zurück zum Zitat Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: ISSRE ’96: Proceedings of the the seventh international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, p 364CrossRef Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: ISSRE ’96: Proceedings of the the seventh international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, p 364CrossRef
Zurück zum Zitat Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: ICSE ’07: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 489–498. doi:10.1109/ICSE.2007.66 Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: ICSE ’07: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 489–498. doi:10.​1109/​ICSE.​2007.​66
Zurück zum Zitat Knab P, Pinzger M, Bernstein A (2006) Predicting defect densities in source code files with decision tree learners. In: MSR ’06: Proceedings of the 2006 international workshop on mining software repositories. ACM, New York, pp 119–125. doi:10.1145/1137983.1138012 CrossRef Knab P, Pinzger M, Bernstein A (2006) Predicting defect densities in source code files with decision tree learners. In: MSR ’06: Proceedings of the 2006 international workshop on mining software repositories. ACM, New York, pp 119–125. doi:10.​1145/​1137983.​1138012 CrossRef
Zurück zum Zitat Ko AJ, Chilana PK (2010) How power users help and hinder open bug reporting. In: CHI ’10: Proceedings of the 28th international conference on human factors in computing systems. ACM, Atlanta, pp 1665–1674 Ko AJ, Chilana PK (2010) How power users help and hinder open bug reporting. In: CHI ’10: Proceedings of the 28th international conference on human factors in computing systems. ACM, Atlanta, pp 1665–1674
Zurück zum Zitat Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. doi:10.1109/TSE.2008.35 CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496. doi:10.​1109/​TSE.​2008.​35 CrossRef
Zurück zum Zitat Li PL, Herbsleb J, Shaw M (2005) Forecasting field defect rates using a combined time-based and metrics-based approach: a case study of openbsd. In: ISSRE ’05: Proceedings of the 16th IEEE international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, pp 193–202. doi:10.1109/ISSRE.2005.19 Li PL, Herbsleb J, Shaw M (2005) Forecasting field defect rates using a combined time-based and metrics-based approach: a case study of openbsd. In: ISSRE ’05: Proceedings of the 16th IEEE international symposium on software reliability engineering. IEEE Computer Society, Washington, DC, pp 193–202. doi:10.​1109/​ISSRE.​2005.​19
Zurück zum Zitat Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: ICSM ’00: Proceedings of the international conference on software maintenance (ICSM’00). IEEE Computer Society, Washington, DC, p 120 Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: ICSM ’00: Proceedings of the international conference on software maintenance (ICSM’00). IEEE Computer Society, Washington, DC, p 120
Zurück zum Zitat Nagappan N, Ball T (2005) Static analysis tools as early indicators of pre-release defect density. In: ICSE ’05: Proceedings of the 27th international conference on software engineering. ACM, New York, NY, pp 580–586. doi:10.1145/1062455.1062558 CrossRef Nagappan N, Ball T (2005) Static analysis tools as early indicators of pre-release defect density. In: ICSE ’05: Proceedings of the 27th international conference on software engineering. ACM, New York, NY, pp 580–586. doi:10.​1145/​1062455.​1062558 CrossRef
Zurück zum Zitat Ostrand T, Weyuker E, Bell R (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef Ostrand T, Weyuker E, Bell R (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef
Zurück zum Zitat Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231MATHCrossRef Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231MATHCrossRef
Zurück zum Zitat Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Zurück zum Zitat Tsymbal A (2004) The problem of concept drift: definitions and related work. Tech. rep., Department of Computer Science Trinity College Tsymbal A (2004) The problem of concept drift: definitions and related work. Tech. rep., Department of Computer Science Trinity College
Zurück zum Zitat Vorburger P, Bernstein A (2006) Entropy-based concept shift detection. In: ICDM ’06: Proceedings of the sixth international conference on data mining. IEEE Computer Society, Washington, DC, pp 1113–1118. doi:10.1109/ICDM.2006.66 CrossRef Vorburger P, Bernstein A (2006) Entropy-based concept shift detection. In: ICDM ’06: Proceedings of the sixth international conference on data mining. IEEE Computer Society, Washington, DC, pp 1113–1118. doi:10.​1109/​ICDM.​2006.​66 CrossRef
Zurück zum Zitat Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. In: ECML ’93: Proceedings of the European conference on machine learning. Springer, London, pp 227–243CrossRef Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. In: ECML ’93: Proceedings of the European conference on machine learning. Springer, London, pp 227–243CrossRef
Zurück zum Zitat Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San MateoMATH Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San MateoMATH
Zurück zum Zitat Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: PROMISE ’07: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, Washington, DC, p 9. doi:10.1109/PROMISE.2007.10 Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: PROMISE ’07: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, Washington, DC, p 9. doi:10.​1109/​PROMISE.​2007.​10
Metadaten
Titel
Time variance and defect prediction in software projects
Towards an exploitation of periods of stability and change as well as a notion of concept drift in software projects
verfasst von
Jayalath Ekanayake
Jonas Tappolet
Harald C. Gall
Abraham Bernstein
Publikationsdatum
01.08.2012
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 4-5/2012
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-011-9180-x

Weitere Artikel der Ausgabe 4-5/2012

Empirical Software Engineering 4-5/2012 Zur Ausgabe

Premium Partner