Top

Published in:

2024 | OriginalPaper | Chapter

An Experience in the Evaluation of Fault Prediction

Authors : Luigi Lavazza, Sandro Morasca, Gabriele Rotoloni

Published in: Product-Focused Software Process Improvement

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Background. ROC (Receiver Operating Characteristic) curves are widely used to represent the performance (i.e., degree of correctness) of fault proneness models. AUC, the Area Under the ROC Curve is a quite popular performance metric, which summarizes into a single number the goodness of the predictions represented by the ROC curve. Alternative techniques have been proposed for evaluating the performance represented by a ROC curve: among these are RRA (Ratio of Relevant Areas) and \(\phi \) (alias Matthews Correlation Coefficient).

Objectives. In this paper, we aim at evaluating AUC as a performance metric, also with respect to alternative proposals.

Method. We carry out an empirical study by replicating a previously published fault prediction study and measuring the performance of the obtained faultiness models using AUC, RRA, and a recently proposed way of relating a specific kind of ROC curves to \(\phi \), based on iso-\(\phi \) ROC curves, i.e., ROC curves with constant \(\phi \). We take into account prevalence, i.e., the proportion of faulty modules in the dataset that is the object of predictions.

Results. AUC appears to provide indications that are concordant with \(\phi \) for fairly balanced datasets, while it is much more optimistic than \(\phi \) for quite imbalanced datasets. RRA’s indications appear to be moderately affected by the degree of balance in a dataset. In addition, RRA appears to agree with \(\phi \).

Conclusions. Based on the collected evidence, AUC does not seem to be suitable for evaluating the performance of fault proneness models when used with imbalanced datasets. In these cases, using RRA can be a better choice. At any rate, more research is needed to generalize these conclusions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Design Patterns Understanding and Use in the Automotive Industry: An Interview Study

next chapter Is It the Best Solution? Testing an Optimisation Algorithm with Metamorphic Testing

Arisholm, E., Briand, L.C., Fuglerud, M.: Data mining techniques for building fault proneness models in telecom java software. In: The 18th IEEE International Symposium on Software Reliability, 2007. ISSRE2007, pp. 215–224. IEEE (2007)

Beecham, S., Hall, T., Bowes, D., Gray, D., Counsell, S., Black, S.: A systematic review of fault prediction approaches used in software engineering. Technical report Lero-TR-2010-04, Lero (2010)

Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)CrossRef

Catal, C.: Performance evaluation metrics for software fault prediction studies. Acta Polytech. Hung. 9(4), 193–206 (2012)

Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)CrossRef

Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 1–13 (2020)CrossRef

Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16(1), 1–23 (2023)CrossRef

Cohen, J.: Statistical Power Analysis for the Behavioral Sciences Lawrence Earlbaum Associates. Routledge, New York (1988)

Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010MathSciNetCrossRef

10.

Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009). https://doi.org/10.1007/s10994-009-5119-5CrossRefMATH

11.

Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, Hoboken (2013)CrossRefMATH

12.

Lavazza, L., Morasca, S.: Comparing \(\phi \) and the F-measure as performance metrics for software-related classifications. EMSE 27(7), 185 (2022)

13.

Lavazza, L., Morasca, S., Rotoloni, G.: On the reliability of the area under the roc curve in empirical software engineering. In: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (EASE). Association for Computing Machinery (ACM) (2023)

14.

Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2), 442–451 (1975)

15.

Morasca, S., Lavazza, L.: On the assessment of software defect prediction models via ROC curves. Empir. Softw. Eng. 25(5), 3977–4019 (2020)CrossRef

16.

Moussa, R., Sarro, F.: On the use of evaluation measures for defect prediction studies. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM (2022)

17.

Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Software Eng. 39(9), 1208–1215 (2013)CrossRef

18.

Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18(1), 3 (2010)CrossRef

19.

Uchigaki, S., Uchida, S., Toda, K., Monden, A.: An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 476–481. IEEE (2012)

20.

Yao, J., Shepperd, M.: Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. In: Proceedings of the Evaluation and Assessment in Software Engineering, pp. 120–129 (2020)

21.

Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)CrossRef

Title: An Experience in the Evaluation of Fault Prediction
Authors: Luigi Lavazza
Sandro Morasca
Gabriele Rotoloni
Publisher: Springer Nature Switzerland
Book: Product-Focused Software Process Improvement
Print ISBN: 978-3-031-49265-5

Electronic ISBN: 978-3-031-49266-2

Copyright Year: 2024
DOI: https://doi.org/10.1007/978-3-031-49266-2_22

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner