Skip to main content
Erschienen in: Empirical Software Engineering 5/2008

01.10.2008

On the effectiveness of early life cycle defect prediction with Bayesian Nets

verfasst von: Norman Fenton, Martin Neil, William Marsh, Peter Hearty, Łukasz Radliński, Paul Krause

Erschienen in: Empirical Software Engineering | Ausgabe 5/2008

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Standard practice in building models in software engineering normally involves three steps: collecting domain knowledge (previous results, expert knowledge); building a skeleton of the model based on step 1 including as yet unknown parameters; estimating the model parameters using historical data. Our experience shows that it is extremely difficult to obtain reliable data of the required granularity, or of the required volume with which we could later generalize our conclusions. Therefore, in searching for a method for building a model we cannot consider methods requiring large volumes of data. This paper discusses an experiment to develop a causal model (Bayesian net) for predicting the number of residual defects that are likely to be found during independent testing or operational usage. The approach supports (1) and (2), does not require (3), yet still makes accurate defect predictions (an R 2 of 0.93 between predicted and actual defects). Since our method does not require detailed domain knowledge it can be applied very early in the process life cycle. The model incorporates a set of quantitative and qualitative factors describing a project and its development process, which are inputs to the model. The model variables, as well as the relationships between them, were identified as part of a major collaborative project. A dataset, elicited from 31 completed software projects in the consumer electronics industry, was gathered using a questionnaire distributed to managers of recent projects. We used this dataset to validate the model by analyzing several popular evaluation measures (R 2, measures based on the relative error and Pred). The validation results also confirm the need for using the qualitative factors in the model. The dataset may be of interest to other researchers evaluating models with similar aims. Based on some typical scenarios we demonstrate how the model can be used for better decision support in operational environments. We also performed sensitivity analysis in which we identified the most influential variables on the number of residual defects. This showed that the project size, scale of distributed communication and the project complexity cause the most of variation in number of defects in our model. We make both the dataset and causal model available for research use.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Boehm B, Clark B, Horowitz E, Westland C, Madachy R, Selby R (1995) Cost models for future software life cycle process: COCOMO 2.0. Ann Softw Eng 1(1):57–94CrossRef Boehm B, Clark B, Horowitz E, Westland C, Madachy R, Selby R (1995) Cost models for future software life cycle process: COCOMO 2.0. Ann Softw Eng 1(1):57–94CrossRef
Zurück zum Zitat Boetticher G, Menzies T, Ostrand T (2008) PROMISE Repository of Empirical Software Engineering Data http://promisedata.org/ repository, West Virginia University, Department of Computer Science Boetticher G, Menzies T, Ostrand T (2008) PROMISE Repository of Empirical Software Engineering Data http://​promisedata.​org/​ repository, West Virginia University, Department of Computer Science
Zurück zum Zitat Cangussu JW, DeCarlo RA, Mathur AP (2003) Using sensitivity analysis to validate a state variable model of the software test process. IEEE Trans Softw Eng 29(5):430–443CrossRef Cangussu JW, DeCarlo RA, Mathur AP (2003) Using sensitivity analysis to validate a state variable model of the software test process. IEEE Trans Softw Eng 29(5):430–443CrossRef
Zurück zum Zitat Chulani S, Boehm B (1999) Modelling Software Defect Introduction and Removal: COQUALMO (COnstructive QUAlity MOdel). Technical Report USC-CSE-99-510, University of Southern California, Center for Software Engineering Chulani S, Boehm B (1999) Modelling Software Defect Introduction and Removal: COQUALMO (COnstructive QUAlity MOdel). Technical Report USC-CSE-99-510, University of Southern California, Center for Software Engineering
Zurück zum Zitat Chulani S, Boehm B, Steece B (1999) Bayesian analysis of empirical software engineering cost models. IEEE Trans Softw Eng 25(4):573–583CrossRef Chulani S, Boehm B, Steece B (1999) Bayesian analysis of empirical software engineering cost models. IEEE Trans Softw Eng 25(4):573–583CrossRef
Zurück zum Zitat Compton T, Withrow C (1990) Prediction and Control of Ada Software Defects. J Syst Softw 12:199–207CrossRef Compton T, Withrow C (1990) Prediction and Control of Ada Software Defects. J Syst Softw 12:199–207CrossRef
Zurück zum Zitat Fenton NE, Pfleeger SL (1998) Software Metrics: A Rigorous and Practical Approach (2nd Edition). PWS Publishing, Boston Fenton NE, Pfleeger SL (1998) Software Metrics: A Rigorous and Practical Approach (2nd Edition). PWS Publishing, Boston
Zurück zum Zitat Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Software Eng 25(5):675–689CrossRef Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Software Eng 25(5):675–689CrossRef
Zurück zum Zitat Fenton NE, Krause P, Neil M (2002a) Probabilistic modelling for software quality control. J Appl Non-Class Log 12(2):173–188MATHCrossRefMathSciNet Fenton NE, Krause P, Neil M (2002a) Probabilistic modelling for software quality control. J Appl Non-Class Log 12(2):173–188MATHCrossRefMathSciNet
Zurück zum Zitat Fenton NE, Krause P, Neil M (2002b) Software measurement: uncertainty and causal modelling. IEEE Software. 10(4):116–122CrossRef Fenton NE, Krause P, Neil M (2002b) Software measurement: uncertainty and causal modelling. IEEE Software. 10(4):116–122CrossRef
Zurück zum Zitat Fenton NE, Marsh W, Neil M, Cates P, Forey S, Tailor M (2004) Making Resource Decisions for Software Projects. Proceedings of 26th International Conference on Software Engineering (ICSE 2004), Edinburgh, United Kingdom, 397–406 Fenton NE, Marsh W, Neil M, Cates P, Forey S, Tailor M (2004) Making Resource Decisions for Software Projects. Proceedings of 26th International Conference on Software Engineering (ICSE 2004), Edinburgh, United Kingdom, 397–406
Zurück zum Zitat Fenton NE, Neil M, Caballero JG (2007a) Using ranked nodes to model qualitative judgments in Bayesian networks. IEEE Trans Knowl Data Eng 19(10):1420–1432CrossRef Fenton NE, Neil M, Caballero JG (2007a) Using ranked nodes to model qualitative judgments in Bayesian networks. IEEE Trans Knowl Data Eng 19(10):1420–1432CrossRef
Zurück zum Zitat Fenton NE, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007b) Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 49(1):32–43CrossRef Fenton NE, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007b) Predicting software defects in varying development lifecycles using Bayesian nets. Inf Softw Technol 49(1):32–43CrossRef
Zurück zum Zitat Fenton N, Neil M, Marsh W, Hearty P, Radliński Ł, Krause P (2007c) Project Data Incorporating Qualitative Factors for Improved Software Defect Prediction. Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering. International Conference on Software Engineering. IEEE Computer Society, Washington, DC: 2 Fenton N, Neil M, Marsh W, Hearty P, Radliński Ł, Krause P (2007c) Project Data Incorporating Qualitative Factors for Improved Software Defect Prediction. Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering. International Conference on Software Engineering. IEEE Computer Society, Washington, DC: 2
Zurück zum Zitat Henry S, Kafura D (1984) The evaluation of software system’s structure using quantitative software metrics. Softw Pract Exp 14(6):561–573CrossRef Henry S, Kafura D (1984) The evaluation of software system’s structure using quantitative software metrics. Softw Pract Exp 14(6):561–573CrossRef
Zurück zum Zitat Jensen FV (1996) An introduction to Bayesian networks. UCL Press, London Jensen FV (1996) An introduction to Bayesian networks. UCL Press, London
Zurück zum Zitat Jones C (1986) Programmer productivity. McGraw Hill, New York Jones C (1986) Programmer productivity. McGraw Hill, New York
Zurück zum Zitat Kitchenham BA, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEE Proc Softw 148(3):81–85CrossRef Kitchenham BA, Pickard LM, MacDonell SG, Shepperd MJ (2001) What accuracy statistics really measure. IEE Proc Softw 148(3):81–85CrossRef
Zurück zum Zitat Lipow M (1982) Number of Faults per Line of Code. IEEE Trans Softw Eng 8(4):437–439CrossRef Lipow M (1982) Number of Faults per Line of Code. IEEE Trans Softw Eng 8(4):437–439CrossRef
Zurück zum Zitat Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 32(11):1–12 Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 32(11):1–12
Zurück zum Zitat MODIST (2003) Models of Uncertainty and Risk for Distributed Software Development. EC Information Society Technologies Project IST-2000-28749. www.modist.org MODIST (2003) Models of Uncertainty and Risk for Distributed Software Development. EC Information Society Technologies Project IST-2000-28749. www.​modist.​org
Zurück zum Zitat Musilek P, Pedrycz W, Nan Sun, Succi G (2002) On the Sensitivity of COCOMO II Software Cost Model. Proc of the 8th IEEE Symposium on Software Metrics: 13–20 Musilek P, Pedrycz W, Nan Sun, Succi G (2002) On the Sensitivity of COCOMO II Software Cost Model. Proc of the 8th IEEE Symposium on Software Metrics: 13–20
Zurück zum Zitat Neapolitan RE (2004) Learning Bayesian networks. Pearson Prentice Hall, Upper Saddle River Neapolitan RE (2004) Learning Bayesian networks. Pearson Prentice Hall, Upper Saddle River
Zurück zum Zitat Neil M, Krause P, Fenton NE (2003) Software Quality Prediction Using Bayesian Networks Software Engineering with Computational Intelligence (Ed T.M. Khoshgoftaar). Kluwer, Chapter 6 Neil M, Krause P, Fenton NE (2003) Software Quality Prediction Using Bayesian Networks Software Engineering with Computational Intelligence (Ed T.M. Khoshgoftaar). Kluwer, Chapter 6
Zurück zum Zitat Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef
Zurück zum Zitat Radliński Ł, Fenton N, Neil M, Marquez D (2007) Improved Decision-Making for Software Managers Using Bayesian Networks, Proc. of 11th IASTED Int. Conf. Software Engineering and Applications (SEA), Cambridge, MA: 13–19 Radliński Ł, Fenton N, Neil M, Marquez D (2007) Improved Decision-Making for Software Managers Using Bayesian Networks, Proc. of 11th IASTED Int. Conf. Software Engineering and Applications (SEA), Cambridge, MA: 13–19
Zurück zum Zitat Saltelli A (2000) What is Sensitivity Analysis. In: Saltelli A, Chan K, Scott EM (eds) Sensitivity Analysis. John Wiley & Sons, pp. 4–13 Saltelli A (2000) What is Sensitivity Analysis. In: Saltelli A, Chan K, Scott EM (eds) Sensitivity Analysis. John Wiley & Sons, pp. 4–13
Zurück zum Zitat Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An Empirical Validation of the Relationship Between the Magnitude of Relative Error and Project Size. Proc. of 8th IEEE Symposium on Software Metrics: 3–12 Stensrud E, Foss T, Kitchenham B, Myrtveit I (2002) An Empirical Validation of the Relationship Between the Magnitude of Relative Error and Project Size. Proc. of 8th IEEE Symposium on Software Metrics: 3–12
Zurück zum Zitat Wagner S (2007a) An Approach to Global Sensitivity Analysis: FAST on COCOMO. Proceedings of the First International Symposium on Empirical Software Engineering and Measurement: 440–442 Wagner S (2007a) An Approach to Global Sensitivity Analysis: FAST on COCOMO. Proceedings of the First International Symposium on Empirical Software Engineering and Measurement: 440–442
Zurück zum Zitat Wagner S (2007b) Global Sensitivity Analysis of Predictor Models in Software Engineering. Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering. International Conference on Software Engineering. IEEE Computer Society, Washington, DC: 3 Wagner S (2007b) Global Sensitivity Analysis of Predictor Models in Software Engineering. Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering. International Conference on Software Engineering. IEEE Computer Society, Washington, DC: 3
Zurück zum Zitat Winkler RL (2003) An introduction to Bayesian inference and decision,, 2nd edn. Probabilistic Publishing, Gainesville Winkler RL (2003) An introduction to Bayesian inference and decision,, 2nd edn. Probabilistic Publishing, Gainesville
Metadaten
Titel
On the effectiveness of early life cycle defect prediction with Bayesian Nets
verfasst von
Norman Fenton
Martin Neil
William Marsh
Peter Hearty
Łukasz Radliński
Paul Krause
Publikationsdatum
01.10.2008
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2008
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-008-9072-x

Weitere Artikel der Ausgabe 5/2008

Empirical Software Engineering 5/2008 Zur Ausgabe