Skip to main content
Erschienen in: Empirical Software Engineering 3/2013

01.06.2013

The limited impact of individual developer data on software defect prediction

verfasst von: Robert M. Bell, Thomas J. Ostrand, Elaine J. Weyuker

Erschienen in: Empirical Software Engineering | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Previous research has provided evidence that a combination of static code metrics and software history metrics can be used to predict with surprising success which files in the next release of a large system will have the largest numbers of defects. In contrast, very little research exists to indicate whether information about individual developers can profitably be used to improve predictions. We investigate whether files in a large system that are modified by an individual developer consistently contain either more or fewer faults than the average of all files in the system. The goal of the investigation is to determine whether information about which particular developer modified a file is able to improve defect predictions. We also extend earlier research evaluating use of counts of the number of developers who modified a file as predictors of the file’s future faultiness. We analyze change reports filed for three large systems, each containing 18 releases, with a combined total of nearly 4 million LOC and over 11,000 files. A buggy file ratio is defined for programmers, measuring the proportion of faulty files in Release R out of all files modified by the programmer in Release R-1. We assess the consistency of the buggy file ratio across releases for individual programmers both visually and within the context of a fault prediction model. Buggy file ratios for individual programmers often varied widely across all the releases that they participated in. A prediction model that takes account of the history of faulty files that were changed by individual developers shows improvement over the standard negative binomial model of less than 0.13% according to one measure, and no improvement at all according to another measure. In contrast, augmenting a standard model with counts of cumulative developers changing files in prior releases produced up to a 2% improvement in the percentage of faults detected in the top 20% of predicted faulty files. The cumulative number of developers interacting with a file can be a useful variable for defect prediction. However, the study indicates that adding information to a model about which particular developer modified a file is not likely to improve defect predictions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The fault-percentile average is not available for the first three systems we analyzed.
 
2
Rather than use raw predictions from the negative binomial models, which are sometimes extremely large, we used average fault counts for strata formed from the model’s predicted value. At each release, we formed five strata consisting of changed files grouped by the raw predicted value from the negative binomial regression model.
 
Literatur
Zurück zum Zitat Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proc. ACM/IEEE ISESE, Rio de Janeiro Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proc. ACM/IEEE ISESE, Rio de Janeiro
Zurück zum Zitat Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proc. ISSTA 2006, Portland, Maine, USA, pp 61–71 Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proc. ISSTA 2006, Portland, Maine, USA, pp 61–71
Zurück zum Zitat Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Putting it all together: using socio-technical networks to predict failures. In: Proc. int symp on software reliability engineering (ISSRE 2009) Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Putting it all together: using socio-technical networks to predict failures. In: Proc. int symp on software reliability engineering (ISSRE 2009)
Zurück zum Zitat Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRef Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRef
Zurück zum Zitat Jiang Y, Cukic B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proc. 4th international workshop on predictor models in software engineering (PROMISE’08), pp 11–18 Jiang Y, Cukic B, Menzies T, Bartlow N (2008) Comparing design and code metrics for software quality prediction. In: Proc. 4th international workshop on predictor models in software engineering (PROMISE’08), pp 11–18
Zurück zum Zitat Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462CrossRef Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462CrossRef
Zurück zum Zitat Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
Zurück zum Zitat McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonMATH McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonMATH
Zurück zum Zitat Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proc. SIGSOFT FSE 2008, pp 13–23 Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proc. SIGSOFT FSE 2008, pp 13–23
Zurück zum Zitat Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef
Zurück zum Zitat Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proc. 4th int. workshop on predictor models in software engineering (PROMISE08), pp 47–54 Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proc. 4th int. workshop on predictor models in software engineering (PROMISE08), pp 47–54
Zurück zum Zitat Mockus A, Weiss DM (2000) Predicting risk of software changes. Bells Labs Technical J 5(2):169–180CrossRef Mockus A, Weiss DM (2000) Predicting risk of software changes. Bells Labs Technical J 5(2):169–180CrossRef
Zurück zum Zitat Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proc. 27th int. conference on software engineering (ICSE05), pp 284–292 Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proc. 27th int. conference on software engineering (ICSE05), pp 284–292
Zurück zum Zitat Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc. 28th int. conference on software engineering (ICSE06), pp 452–461 Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc. 28th int. conference on software engineering (ICSE06), pp 452–461
Zurück zum Zitat Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: Proc. ICSE’08, Leipzig, Germany, pp 521–530 Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: Proc. ICSE’08, Leipzig, Germany, pp 521–530
Zurück zum Zitat Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894CrossRef Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894CrossRef
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: Proc. ISSTA 2007, London, England, pp 219–227 Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: Proc. ISSTA 2007, London, England, pp 219–227
Zurück zum Zitat Ostrand TJ, Bell RM, Weyuker EJ (2010) Programmer-based fault prediction. In: Proc. PROMISE 2010, Timisoara, Romania Ostrand TJ, Bell RM, Weyuker EJ (2010) Programmer-based fault prediction. In: Proc. PROMISE 2010, Timisoara, Romania
Zurück zum Zitat Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: Proc. SIGSOFT FSE 2008, pp 2–12 Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: Proc. SIGSOFT FSE 2008, pp 2–12
Zurück zum Zitat Weyuker EJ, Bell RM, Ostrand TJ (2007) Using developer information as a factor for fault prediction. In: Proc. 4th international workshop on predictor models in software engineering (PROMISE 2007), Minneapolis Weyuker EJ, Bell RM, Ostrand TJ (2007) Using developer information as a factor for fault prediction. In: Proc. 4th international workshop on predictor models in software engineering (PROMISE 2007), Minneapolis
Zurück zum Zitat Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empirical Software Eng 13(5):539–559 Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empirical Software Eng 13(5):539–559
Zurück zum Zitat Weyuker EJ, Bell RM, Ostrand TJ (2010a) We’re finding most of the bugs, but what are we missing? In: Proc. 3rd international conference on software testing, verification and validation, Paris Weyuker EJ, Bell RM, Ostrand TJ (2010a) We’re finding most of the bugs, but what are we missing? In: Proc. 3rd international conference on software testing, verification and validation, Paris
Zurück zum Zitat Weyuker EJ, Ostrand TJ, BellRM(2010b) Comparing the effectiveness of several modeling methods for fault prediction. Empirical Software Eng 15(3):277–295 Weyuker EJ, Ostrand TJ, BellRM(2010b) Comparing the effectiveness of several modeling methods for fault prediction. Empirical Software Eng 15(3):277–295
Zurück zum Zitat Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proc. ICSE’08, Leipzig, Germany, pp 531–540 Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proc. ICSE’08, Leipzig, Germany, pp 531–540
Metadaten
Titel
The limited impact of individual developer data on software defect prediction
verfasst von
Robert M. Bell
Thomas J. Ostrand
Elaine J. Weyuker
Publikationsdatum
01.06.2013
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 3/2013
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-011-9178-4

Weitere Artikel der Ausgabe 3/2013

Empirical Software Engineering 3/2013 Zur Ausgabe

Premium Partner