nach oben

Empirical Software Engineering

Erschienen in:

01.08.2012

Evaluating defect prediction approaches: a benchmark and an extensive comparison

verfasst von: Marco D’Ambros, Michele Lanza, Romain Robbes

Erschienen in: Empirical Software Engineering | Ausgabe 4-5/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavor.

Vorheriger Artikel Clones: what is that smell?

Nächster Artikel The evolution of Java build systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://promisedata.org

http://bug.inf.usi.ch

http://promisedata.org/data

http://mdp.ivv.nasa.gov, also part of PROMISE.

We employ JUnit 3 naming conventions to detect test classes, i.e., classes whose names end with “Test” are detected as tests.

Available at http://www.intooitus.com/.

Available at http://www.moosetechnology.org.

Available at http://churrasco.inf.usi.ch.

For instance, the churn of FanIn linearly decayed and churn of FanIn logarithmically decayed have a very high correlation.

This is not in contradiction with Antoniol et al. (2008): Bugs mentioned as fixes in CVS comments are intuitively more likely to be real bugs, as they got fixed.

Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y-G (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON 2008). ACM, New York, pp 304–318CrossRef

Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering (ISESE 2006). ACM, New York, pp 8–17CrossRef

Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17CrossRef

Arnaoudova V, Eshkevari L, Oliveto R, Gueheneuc Y-G, Antoniol G (2010) Physical and conceptual identifier dispersion: measures and relation to fault proneness. In: Proceedings of the 26th IEEE international conference on software maintenance (ICSM 2010). IEEE CS, Washington, pp 1–5

Bacchelli A, D’Ambros M, Lanza M (2010) Are popular classes more defect prone? In: Proceedings of the 13th international conference on fundamental approaches to software engineering (FASE 2010). Springer, Berlin, pp 59–73

Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRef

Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: Proceedings of the ninth international workshop on principles of software evolution (IWPSE 2007). ACM, New York, pp 11–18

Binkley AB, Schach SR (1998) Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures. In: Proceedings of the 20th international conference on software engineering (ICSE 1998). IEEE CS, Washington, pp 452–455CrossRef

Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE 2009). ACM, New York, pp 121–130

Briand LC, Daly JW, Wüst J (1999) A unified framework for coupling measurement in object-oriented systems. IEEE Trans Softw Eng 25(1):91–121CrossRef

Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493CrossRef

D’Ambros M, Lanza M (2010) Distributed and collaborative software evolution analysis with churrasco. J Sci Comput Program (SCP) 75(4):276–287MATHCrossRefMathSciNet

D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Proceedings of the 7th international working conference on mining software repositories (MSR 2010). IEEE CS, Washington, pp 31–41

Demeyer S, Tichelaar S, Ducasse S (2001) FAMIX 2.1—The FAMOOS information exchange model. Technical report, University of Bern

Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet

Ducasse S, Gîrba T, Nierstrasz O (2005) Moose: an agile reengineering environment. In: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering (ESEC/FSE 2005). ACM, New York, pp 99–102. Tool demo

El Emam K, Melo W, Machado JC (2001) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56(1):63–75CrossRef

Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814CrossRef

Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: Proceedings of the international conference on software maintenance (ICSM 2003). IEEE CS, Washington, pp 23–32CrossRef

Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion mmre. IEEE Trans Softw Eng 29(11):985–995CrossRef

Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(07):653–661CrossRef

Gyimóthy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910CrossRef

Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning (ICML 2000). In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Mateo, pp 359–366

Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447CrossRef

Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering (ICSE 2009). IEEE CS, Washington, pp 78–88

Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: Proceedings of the 21st IEEE international conference on software maintenance (ICSM 2005). IEEE CS, Washington, pp 263–272

Ho YC, Pepyne DL (2002) Simple explanation of the no-free-lunch theorem and its implications. J Optim Theory Appl 115(3):549–570MATHCrossRefMathSciNet

Jackson EJ (2003) A users guide to principal components. Wiley, New York

Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Software Eng 13:561–595.CrossRef

Juristo NJ, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement (ESEM 2009). IEEE CS,Washington, pp 356–366

Kamei Y, Matsumoto S, Monden A, Matsumoto K-i, Adams B, Hassan AE (2010) Revisiting common bug prediction findings using effort aware models. In: Proceedings of the 26th IEEE international conference on software maintenance (ICSM 2010). IEEE CS, Washington, pp 1–10

Kim S, Zimmermann T, Whitehead J, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering (ICSE 2007). IEEE CS, Washington, pp 489–498CrossRef

Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning (ML 1992). Morgan Kaufmann, San Mateo, pp 249–256

Khoshgoftaar TM, Allen EB (1999) A comparative study of ordering and classification of fault-prone software modules. Empir Software Eng 4:159–186.CrossRef

Khoshgoftaar TM, Allen EB, Goel N, Nandi A, McMullan J (1996) Detection of software modules with high debug code churn in a very large legacy system. In: Proceedings of the seventh international symposium on software reliability engineering (ISSRE 1996). IEEE CS, Washington, pp 364–371CrossRef

Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRef

Kollmann R, Selonen P, Stroulia E (2002) A study on the current state of the art in tool-supported UML-based static reverse engineering. In: Proceedings of the ninth working conference on reverse engineering (WCRE 2002). IEEE CS, Washington, pp 22–32

Kononenko I (1994) Estimating attributes: analysis and extensions of relief. Springer, Berlin, pp 171–182

Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering (PROMISE 2007). IEEE CS, Washington, pp 10–19CrossRef

Koru AG, El Emam K, Zhang D, Liu H, Mathew D (2008) Theory of relative defect proneness. Empir Software Eng 13:473–498CrossRef

Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496CrossRef

Marcus A, Poshyvanyk D, Ferenc R (2008) Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng 34(2):287–300CrossRef

Mende T (2010) Replication of defect prediction studies: problems, pitfalls and recommendations. In: Proceedings of the 6th international conference on predictive models in software engineering (PROMISE 2010). ACM, New York, pp 1–10

Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th international conference on predictive models in software engineering (PROMISE 2009). ACM, New York, pp 1–10CrossRef

Mende T, Koschke R (2010) Effort-aware defect prediction models. In: Proceedings of the 14th European conference on software maintenance and reengineering (CSMR 2010). IEEE CS, Washington, pp 109–118

Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13CrossRef

Menzies T, Milton Z, Turhan B, Cukic B, Bener YJA (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng 17:375–407CrossRef

Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering (ICSE 2008). ACM, New York, pp 181–190

Myrtveit I, Stensrud E, Shepperd MJ (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391CrossRef

Nagappan N, Ball T (2005a) Static analysis tools as early indicators of pre-release defect density. In: Proceedings of the 27th international conference on software engineering (ICSE 2005). ACM, New York, pp 580–586CrossRef

Nagappan N, Ball T (2005b) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering (ICSE 2005). ACM, New York, pp 284–292CrossRef

Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering (ICSE 2006). ACM, New York, pp 452–461CrossRef

Nikora AP, Munson JC (2003) Developing fault predictors for evolving software systems. In: Proceedings of the 9th international symposium on software metrics (METRICS 2003). IEEE CS, Washington, pp 338–349

Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security (CCS 2007). ACM, New York, pp 529–540CrossRef

Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894CrossRef

Ostrand TJ, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proceedings of the ACM SIGSOFT international symposium on software testing and analysis (ISSTA 2002). ACM, New York, pp 55–64

Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: Proceedings of the ACM SIGSOFT international symposium on software testing and analysis (ISSTA 2004). ACM, New York, pp 86–96CrossRef

Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef

Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: ISST proceedings of the ACM SIGSOFT international symposium on software testing and analysis (ISSTA 2007). ACM, New York, pp 219–227

Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering (FSE 2008). ACM, New York, pp 2–12

Robles G (2010) Replicating MSR: a study of the potential replicability of papers published in the mining software repositories proceedings. In: Proceedings of the 7th international working conference on mining software repositories (MSR 2010). IEEE CS, Washington, pp 171–180

Shin Y, Bell RM, Ostrand TJ, Weyuker EJ (2009) Does calling structure information improve the accuracy of fault prediction? In: Proceedings of the 7th international working conference on mining software repositories (MSR 2009). IEEE CS, Washington, pp 61–70

Sim SE, Easterbrook SM, Holt RC (2003) Using benchmarking to advance research: a challenge to software engineering. In: Proceedings of the 25th international conference on software engineering (ICSE 2003). IEEE CS, Washington, pp 74–83

Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: implications for software defects. IEEE Trans Softw Eng 29(4):297–310MATHCrossRef

Turhan B, Menzies T, Bener AB, Di Stefano JS (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Software Eng 14(5):540–578CrossRef

Turhan B, Bener AB, Menzies T (2010) Regularities in learning defect predictors. In: Proceedings of the 11th international conference on product-focused software process improvement (PROFES 2010). Springer, Berlin, pp 116–130

Wolf T, Schröter A, Damian D, Nguyen THD (2009) Predicting build failures using social network analysis on developer communication. In: Proceedings of the 31st international conference on software engineering (ICSE 2009). IEEE CS, Washington, pp 1–11

Zimmermann T, Nagappan N (2008) Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th international conference on software engineering (ICSE 2008). ACM, New York, pp 531–540

Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the 3rd international workshop on predictive models in software engineering (PROMISE 2007). IEEE CS, Washington, pp 9–15CrossRef

Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE 2009). ACM, New York, pp 91–100

Titel: Evaluating defect prediction approaches: a benchmark and an extensive comparison
verfasst von: Marco D’Ambros
Michele Lanza
Romain Robbes
Publikationsdatum: 01.08.2012
Verlag: Springer US
Erschienen in: Empirical Software Engineering / Ausgabe 4-5/2012
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-011-9173-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4-5/2012

On the use of calling structure information to improve fault prediction

Time variance and defect prediction in software projects

Introduction to the Special Issue on Mining Software Repositories in 2010

Clones: what is that smell?

The evolution of Java build systems

Analyzing and mining a code search engine usage log