Skip to main content
Top

2024 | OriginalPaper | Chapter

An Experience in the Evaluation of Fault Prediction

Authors : Luigi Lavazza, Sandro Morasca, Gabriele Rotoloni

Published in: Product-Focused Software Process Improvement

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Background. ROC (Receiver Operating Characteristic) curves are widely used to represent the performance (i.e., degree of correctness) of fault proneness models. AUC, the Area Under the ROC Curve is a quite popular performance metric, which summarizes into a single number the goodness of the predictions represented by the ROC curve. Alternative techniques have been proposed for evaluating the performance represented by a ROC curve: among these are RRA (Ratio of Relevant Areas) and \(\phi \) (alias Matthews Correlation Coefficient).
Objectives. In this paper, we aim at evaluating AUC as a performance metric, also with respect to alternative proposals.
Method. We carry out an empirical study by replicating a previously published fault prediction study and measuring the performance of the obtained faultiness models using AUC, RRA, and a recently proposed way of relating a specific kind of ROC curves to \(\phi \), based on iso-\(\phi \) ROC curves, i.e., ROC curves with constant \(\phi \). We take into account prevalence, i.e., the proportion of faulty modules in the dataset that is the object of predictions.
Results. AUC appears to provide indications that are concordant with \(\phi \) for fairly balanced datasets, while it is much more optimistic than \(\phi \) for quite imbalanced datasets. RRA’s indications appear to be moderately affected by the degree of balance in a dataset. In addition, RRA appears to agree with \(\phi \).
Conclusions. Based on the collected evidence, AUC does not seem to be suitable for evaluating the performance of fault proneness models when used with imbalanced datasets. In these cases, using RRA can be a better choice. At any rate, more research is needed to generalize these conclusions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arisholm, E., Briand, L.C., Fuglerud, M.: Data mining techniques for building fault proneness models in telecom java software. In: The 18th IEEE International Symposium on Software Reliability, 2007. ISSRE2007, pp. 215–224. IEEE (2007) Arisholm, E., Briand, L.C., Fuglerud, M.: Data mining techniques for building fault proneness models in telecom java software. In: The 18th IEEE International Symposium on Software Reliability, 2007. ISSRE2007, pp. 215–224. IEEE (2007)
2.
go back to reference Beecham, S., Hall, T., Bowes, D., Gray, D., Counsell, S., Black, S.: A systematic review of fault prediction approaches used in software engineering. Technical report Lero-TR-2010-04, Lero (2010) Beecham, S., Hall, T., Bowes, D., Gray, D., Counsell, S., Black, S.: A systematic review of fault prediction approaches used in software engineering. Technical report Lero-TR-2010-04, Lero (2010)
3.
go back to reference Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)CrossRef Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)CrossRef
4.
go back to reference Catal, C.: Performance evaluation metrics for software fault prediction studies. Acta Polytech. Hung. 9(4), 193–206 (2012) Catal, C.: Performance evaluation metrics for software fault prediction studies. Acta Polytech. Hung. 9(4), 193–206 (2012)
5.
go back to reference Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)CrossRef Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)CrossRef
6.
go back to reference Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 1–13 (2020)CrossRef Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 1–13 (2020)CrossRef
7.
go back to reference Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16(1), 1–23 (2023)CrossRef Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16(1), 1–23 (2023)CrossRef
8.
go back to reference Cohen, J.: Statistical Power Analysis for the Behavioral Sciences Lawrence Earlbaum Associates. Routledge, New York (1988) Cohen, J.: Statistical Power Analysis for the Behavioral Sciences Lawrence Earlbaum Associates. Routledge, New York (1988)
11.
go back to reference Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, Hoboken (2013)CrossRefMATH Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, Hoboken (2013)CrossRefMATH
12.
go back to reference Lavazza, L., Morasca, S.: Comparing \(\phi \) and the F-measure as performance metrics for software-related classifications. EMSE 27(7), 185 (2022) Lavazza, L., Morasca, S.: Comparing \(\phi \) and the F-measure as performance metrics for software-related classifications. EMSE 27(7), 185 (2022)
13.
go back to reference Lavazza, L., Morasca, S., Rotoloni, G.: On the reliability of the area under the roc curve in empirical software engineering. In: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (EASE). Association for Computing Machinery (ACM) (2023) Lavazza, L., Morasca, S., Rotoloni, G.: On the reliability of the area under the roc curve in empirical software engineering. In: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (EASE). Association for Computing Machinery (ACM) (2023)
14.
go back to reference Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2), 442–451 (1975) Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2), 442–451 (1975)
15.
go back to reference Morasca, S., Lavazza, L.: On the assessment of software defect prediction models via ROC curves. Empir. Softw. Eng. 25(5), 3977–4019 (2020)CrossRef Morasca, S., Lavazza, L.: On the assessment of software defect prediction models via ROC curves. Empir. Softw. Eng. 25(5), 3977–4019 (2020)CrossRef
16.
go back to reference Moussa, R., Sarro, F.: On the use of evaluation measures for defect prediction studies. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM (2022) Moussa, R., Sarro, F.: On the use of evaluation measures for defect prediction studies. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM (2022)
17.
go back to reference Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Software Eng. 39(9), 1208–1215 (2013)CrossRef Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Software Eng. 39(9), 1208–1215 (2013)CrossRef
18.
go back to reference Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18(1), 3 (2010)CrossRef Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18(1), 3 (2010)CrossRef
19.
go back to reference Uchigaki, S., Uchida, S., Toda, K., Monden, A.: An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 476–481. IEEE (2012) Uchigaki, S., Uchida, S., Toda, K., Monden, A.: An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 476–481. IEEE (2012)
20.
go back to reference Yao, J., Shepperd, M.: Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. In: Proceedings of the Evaluation and Assessment in Software Engineering, pp. 120–129 (2020) Yao, J., Shepperd, M.: Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. In: Proceedings of the Evaluation and Assessment in Software Engineering, pp. 120–129 (2020)
21.
go back to reference Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)CrossRef Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)CrossRef
Metadata
Title
An Experience in the Evaluation of Fault Prediction
Authors
Luigi Lavazza
Sandro Morasca
Gabriele Rotoloni
Copyright Year
2024
DOI
https://doi.org/10.1007/978-3-031-49266-2_22

Premium Partner