Skip to main content
Erschienen in: Empirical Software Engineering 3/2010

01.06.2010

Comparing the effectiveness of several modeling methods for fault prediction

verfasst von: Elaine J. Weyuker, Thomas J. Ostrand, Robert M. Bell

Erschienen in: Empirical Software Engineering | Ausgabe 3/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We compare the effectiveness of four modeling methods—negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees—for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics—the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adams EN (1984) Optimizing preventive service of software products. IBM J Res Develop 28(1):2–14CrossRef Adams EN (1984) Optimizing preventive service of software products. IBM J Res Develop 28(1):2–14CrossRef
Zurück zum Zitat Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc ACM/IEEE ISESE, Rio de Janeiro Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc ACM/IEEE ISESE, Rio de Janeiro
Zurück zum Zitat Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52CrossRef Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52CrossRef
Zurück zum Zitat Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proc ACM/international symposium on software testing and analysis (ISSTA2006), Portland, pp 61–71 Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proc ACM/international symposium on software testing and analysis (ISSTA2006), Portland, pp 61–71
Zurück zum Zitat Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontMATH Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontMATH
Zurück zum Zitat Denaro G, Pezze M (2002) An empirical evaluation of fault-proneness models. In: Proc international conf on software engineering (ICSE2002), Miami Denaro G, Pezze M (2002) An empirical evaluation of fault-proneness models. In: Proc international conf on software engineering (ICSE2002), Miami
Zurück zum Zitat Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12CrossRef Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12CrossRef
Zurück zum Zitat Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814CrossRef Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814CrossRef
Zurück zum Zitat Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proc ISSRE 2004, Saint-Malo Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proc ISSRE 2004, Saint-Malo
Zurück zum Zitat Hatton L (1997) Reexamining the fault density—component size connection. IEEE Softw 14:89–97CrossRef Hatton L (1997) Reexamining the fault density—component size connection. IEEE Softw 14:89–97CrossRef
Zurück zum Zitat Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13:561–595CrossRef Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13:561–595CrossRef
Zurück zum Zitat Khoshgoftaar TM, Allen EB, Kalaichelvan KS, Goel N (1996) Early quality prediction: a case study in telecommunications. IEEE Softw 13:65–71CrossRef Khoshgoftaar TM, Allen EB, Kalaichelvan KS, Goel N (1996) Early quality prediction: a case study in telecommunications. IEEE Softw 13:65–71CrossRef
Zurück zum Zitat Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462CrossRef Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462CrossRef
Zurück zum Zitat Koru AG, Liu H (2005) An investigation of the effect of module size on defect prediction using static measures. In: 2005 Promise workshop, 15 May 2005 Koru AG, Liu H (2005) An investigation of the effect of module size on defect prediction using static measures. In: 2005 Promise workshop, 15 May 2005
Zurück zum Zitat Lessmann S, Baesens B, Ues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef Lessmann S, Baesens B, Ues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef
Zurück zum Zitat McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonMATH McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonMATH
Zurück zum Zitat Menzies T, Di Stefano JS, Cunanan C, Chapman R (2004) Mining repositories to assist in project planning and resource allocation. In: International workshop on mining software repositories Menzies T, Di Stefano JS, Cunanan C, Chapman R (2004) Mining repositories to assist in project planning and resource allocation. In: International workshop on mining software repositories
Zurück zum Zitat Moeller K-H, Paulish DJ (1993) An empirical investigation of software fault distribution. In: Proc IEEE first international software metrics symposium, Baltimore, 21–22 May 1993, pp 82–90 Moeller K-H, Paulish DJ (1993) An empirical investigation of software fault distribution. In: Proc IEEE first international software metrics symposium, Baltimore, 21–22 May 1993, pp 82–90
Zurück zum Zitat Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433CrossRef Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433CrossRef
Zurück zum Zitat Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894CrossRef Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894CrossRef
Zurück zum Zitat Ostrand T, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proc ACM/international symposium on software testing and analysis (ISSTA2002), Rome, pp 55–64 Ostrand T, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proc ACM/international symposium on software testing and analysis (ISSTA2002), Rome, pp 55–64
Zurück zum Zitat Ostrand T, Weyuker EJ (2007) How to measure success of software prediction models. In: Proc fourth international workshop on software quality assurance, Dubrovnik, September 2007 Ostrand T, Weyuker EJ (2007) How to measure success of software prediction models. In: Proc fourth international workshop on software quality assurance, Dubrovnik, September 2007
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: Proc ACM/international symposium on software testing and analysis (ISSTA07), London Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: Proc ACM/international symposium on software testing and analysis (ISSTA07), London
Zurück zum Zitat Pighin M, Marzona A (2003) An empirical analysis of fault persistence through software releases. In: Proc IEEE/ACM ISESE 2003, pp 206–212 Pighin M, Marzona A (2003) An empirical analysis of fault persistence through software releases. In: Proc IEEE/ACM ISESE 2003, pp 206–212
Zurück zum Zitat Succi G, Pedrycz W, Stefanovic M, Miller J (2003) Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics. J Syst Softw 65(1):1–12CrossRef Succi G, Pedrycz W, Stefanovic M, Miller J (2003) Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics. J Syst Softw 65(1):1–12CrossRef
Zurück zum Zitat White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrics 48:817–838MATHCrossRef White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrics 48:817–838MATHCrossRef
Metadaten
Titel
Comparing the effectiveness of several modeling methods for fault prediction
verfasst von
Elaine J. Weyuker
Thomas J. Ostrand
Robert M. Bell
Publikationsdatum
01.06.2010
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 3/2010
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-009-9111-2

Premium Partner