Skip to main content
Top
Published in: Empirical Software Engineering 5/2008

01-10-2008

Theory of relative defect proneness

Replicated studies on the functional form of the size-defect relationship

Authors: A. Güneş Koru, Khaled El Emam, Dongsong Zhang, Hongfang Liu, Divya Mathew

Published in: Empirical Software Engineering | Issue 5/2008

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this study, we investigated the functional form of the size-defect relationship for software modules through replicated studies conducted on ten open-source products. We consistently observed a power-law relationship where defect proneness increases at a slower rate compared to size. Therefore, smaller modules are proportionally more defect prone. We externally validated the application of our results for two commercial systems. Given limited and fixed resources for code inspections, there would be an impressive improvement in the cost-effectiveness, as much as 341% in one of the systems, if a smallest-first strategy were preferred over a largest-first one. The consistent results obtained in this study led us to state a theory of relative defect proneness (RDP): In large-scale software systems, smaller modules will be proportionally more defect-prone compared to larger ones. We suggest that practitioners consider our results and give higher priority to smaller modules in their focused quality assurance efforts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Webcite link: http://​www.​webcitation.​org/​5RqqbCKKm (cached Sep. 14, 2007)
 
2
Webcite link: http://​www.​webcitation.​org/​5Rqr0BSz8 (cached Sep. 14, 2007)
 
3
CVS was the source code control system used by the KOffice developers. Webcite link: http://​www.​webcitation.​org/​5RrT2BaV1 (cached Sep. 14, 2007)
 
4
Perl is a stable, cross platform programming language. Webcite link: http://​www.​webcitation.​org/​5RrTDEdYV (cached Sep. 14, 2007)
 
Literature
go back to reference Akiyama F (1971) An example of software system debuggings. In: Information processing 71, Proceedings of IFIP congress 71, vol 1. IFIP, Amsterdam, pp 353–359 Akiyama F (1971) An example of software system debuggings. In: Information processing 71, Proceedings of IFIP congress 71, vol 1. IFIP, Amsterdam, pp 353–359
go back to reference Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, HeidelbergMATH Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, HeidelbergMATH
go back to reference Askari M, Holt R (2006) Information theoretic evaluation of change prediction models for large-scale software. In: Workshop on mining software repositories, MSR 2006, ICSE workshop, Shanghai, 22–23 May 2006 Askari M, Holt R (2006) Information theoretic evaluation of change prediction models for large-scale software. In: Workshop on mining software repositories, MSR 2006, ICSE workshop, Shanghai, 22–23 May 2006
go back to reference Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52CrossRef Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52CrossRef
go back to reference Briand LC, Basili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Trans Softw Eng 19(11):1028–1044CrossRef Briand LC, Basili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Trans Softw Eng 19(11):1028–1044CrossRef
go back to reference Briand LC, Bunse C, Daly JW (2001) A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs. IEEE Trans Softw Eng 27(6):513–530CrossRef Briand LC, Bunse C, Daly JW (2001) A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs. IEEE Trans Softw Eng 27(6):513–530CrossRef
go back to reference Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720CrossRef Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706–720CrossRef
go back to reference Chayes F (1971) Ratio correlation: a manual for students of petrology and geochemistry. University of Chicago Press, Chicago Chayes F (1971) Ratio correlation: a manual for students of petrology and geochemistry. University of Chicago Press, Chicago
go back to reference Compton BT, Withrow C (1990) Prediction and control of ada software defects. J Syst Softw 12(3):199–207CrossRef Compton BT, Withrow C (1990) Prediction and control of ada software defects. J Syst Softw 12(3):199–207CrossRef
go back to reference Cox DR (1972) Regression models and life tables. J Royal Stat Soc 34:187–220MATH Cox DR (1972) Regression models and life tables. J Royal Stat Soc 34:187–220MATH
go back to reference El Emam K (2005) The ROI from software quality. Auerbach Publications, Taylor and Francis Group, LLC, Boca RatonMATH El Emam K (2005) The ROI from software quality. Auerbach Publications, Taylor and Francis Group, LLC, Boca RatonMATH
go back to reference El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650CrossRef El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650CrossRef
go back to reference El Emam K, Benlarbi S, Goel N, Melo W, Lounis H, Rai SN (2002) The optimal class size for object-oriented software. IEEE Trans Softw Eng 28(5):494–509CrossRef El Emam K, Benlarbi S, Goel N, Melo W, Lounis H, Rai SN (2002) The optimal class size for object-oriented software. IEEE Trans Softw Eng 28(5):494–509CrossRef
go back to reference Fenton N, Pfleeger SL (1996) Software metrics: a rigorous and practical approach, 2nd edn. PWS, Boston Fenton N, Pfleeger SL (1996) Software metrics: a rigorous and practical approach, 2nd edn. PWS, Boston
go back to reference Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689CrossRef Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689CrossRef
go back to reference Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814CrossRef Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814CrossRef
go back to reference Funami Y, Halstead MH (1976) A software physics analysis of akiyama’s debugging data. In: Proceedings of MRI XXIV international symposium on computer software engineering. IEEE, Piscataway, pp 133–138 Funami Y, Halstead MH (1976) A software physics analysis of akiyama’s debugging data. In: Proceedings of MRI XXIV international symposium on computer software engineering. IEEE, Piscataway, pp 133–138
go back to reference Halstead MH (1977) Elements of software science. Elsevier, AmsterdamMATH Halstead MH (1977) Elements of software science. Elsevier, AmsterdamMATH
go back to reference Harrell FE (2001) Regression modeling strategies: with applications to linear modes, logistic regression, and survival analysis. Springer, Heidelberg Harrell FE (2001) Regression modeling strategies: with applications to linear modes, logistic regression, and survival analysis. Springer, Heidelberg
go back to reference Harvey AC, Collier P (1977) Testing for functional misspecification in regression analysis. J Econom 6(1):103–119MATHCrossRef Harvey AC, Collier P (1977) Testing for functional misspecification in regression analysis. J Econom 6(1):103–119MATHCrossRef
go back to reference Hatton L (1997) Reexamining the fault density-component size connection. IEEE Softw 14(2):89–97CrossRef Hatton L (1997) Reexamining the fault density-component size connection. IEEE Softw 14(2):89–97CrossRef
go back to reference Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time to event data. Wiley, New YorkMATH Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time to event data. Wiley, New YorkMATH
go back to reference Khoshgoftaar TM, Allen EB, Hudepohl J, Aud S (1997) Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4):902–909CrossRef Khoshgoftaar TM, Allen EB, Hudepohl J, Aud S (1997) Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4):902–909CrossRef
go back to reference Koru AG, Tian J (2003) An empirical comparison and characterization of high defect and high complexity modules. J Syst Softw 67(3):153–163CrossRef Koru AG, Tian J (2003) An empirical comparison and characterization of high defect and high complexity modules. J Syst Softw 67(3):153–163CrossRef
go back to reference Koru AG, Tian J (2004) Defect handling in medium and large open source projects. Softw IEEE 21(4):54–61CrossRef Koru AG, Tian J (2004) Defect handling in medium and large open source projects. Softw IEEE 21(4):54–61CrossRef
go back to reference Koru AG, Ma L, Li Z (2003) Utilizing operational profile in refactoring large scale legacy systems. In: WCRE 2003: first IEEE international workshop on refactoring: achievements, challenges, effects, Victoria, November 2003 Koru AG, Ma L, Li Z (2003) Utilizing operational profile in refactoring large scale legacy systems. In: WCRE 2003: first IEEE international workshop on refactoring: achievements, challenges, effects, Victoria, November 2003
go back to reference Koru AG, Zhang D, Liu, H (2007) Modeling the effect of size on defect proneness for open-source software. In: Predictor models in software engineering, PROMISE’07, 20–26 May 2007 Koru AG, Zhang D, Liu, H (2007) Modeling the effect of size on defect proneness for open-source software. In: Predictor models in software engineering, PROMISE’07, 20–26 May 2007
go back to reference Lipow M (1982) Number of faults per line of code. IEEE Trans Softw Eng 8(4):437–439CrossRef Lipow M (1982) Number of faults per line of code. IEEE Trans Softw Eng 8(4):437–439CrossRef
go back to reference Meine JPvdM, Miguel AR (2007) Correlations between internal software metrics and software dependability in a large population of small c/c++ programs. In: The 18th IEEE international symposium on software reliability. IEEE, Trollhattan, pp 203–208 Meine JPvdM, Miguel AR (2007) Correlations between internal software metrics and software dependability in a large population of small c/c++ programs. In: The 18th IEEE international symposium on software reliability. IEEE, Trollhattan, pp 203–208
go back to reference Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309–346CrossRef Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309–346CrossRef
go back to reference Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433CrossRef Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433CrossRef
go back to reference Newman MEJ (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46:323CrossRef Newman MEJ (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46:323CrossRef
go back to reference Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRef
go back to reference R Development Core Team (2003) R: a language and environment for statistical computing. ISBN 3-900051-00-3 R Development Core Team (2003) R: a language and environment for statistical computing. ISBN 3-900051-00-3
go back to reference Raymond ES (1999) The Cathedral and the Bazaar: musings on Linux and open source by an accidental revolutionary. O’Reilly, Sebastopol Raymond ES (1999) The Cathedral and the Bazaar: musings on Linux and open source by an accidental revolutionary. O’Reilly, Sebastopol
go back to reference Rosenberg J (1997) Some misconceptions about lines of code. In: METRICS ’97: Proceedings of the 4th international symposium on software metrics. IEEE Computer Society, Washington, DC, pp 137–142CrossRef Rosenberg J (1997) Some misconceptions about lines of code. In: METRICS ’97: Proceedings of the 4th international symposium on software metrics. IEEE Computer Society, Washington, DC, pp 137–142CrossRef
go back to reference Schmidt DC (1995) Using design patterns to develop reusable object-oriented communication software. Commun ACM 38(10):65–74CrossRef Schmidt DC (1995) Using design patterns to develop reusable object-oriented communication software. Commun ACM 38(10):65–74CrossRef
go back to reference Scientific Toolworks I (2003) Understand for c++: user guide and reference manual, January. I Scientific Toolworks, St. George Scientific Toolworks I (2003) Understand for c++: user guide and reference manual, January. I Scientific Toolworks, St. George
go back to reference Shen VY, Yu TJ, Thebaut SM, Paulsen L (1985) Identifying error-prone software - an empirical study. IEEE Trans Softw Eng 11(4):317–324CrossRef Shen VY, Yu TJ, Thebaut SM, Paulsen L (1985) Identifying error-prone software - an empirical study. IEEE Trans Softw Eng 11(4):317–324CrossRef
go back to reference Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, HeidelbergMATH Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, HeidelbergMATH
go back to reference Tian J, Troster J (1998) A comparison of measurement and defect characteristics of new and legacy software systems. J Syst Softw 44(2):135–146CrossRef Tian J, Troster J (1998) A comparison of measurement and defect characteristics of new and legacy software systems. J Syst Softw 44(2):135–146CrossRef
go back to reference Troster J, Tian J (1995) Defect characteristics of legacy software: measurement, visualization, regression analysis, and tree-based modeling. Technical report, IBM SWS Toronto Laboratory, March Troster J, Tian J (1995) Defect characteristics of legacy software: measurement, visualization, regression analysis, and tree-based modeling. Technical report, IBM SWS Toronto Laboratory, March
go back to reference Withrow C (1990) Error density and size in ada software. IEEE Softw 7(1):26–30CrossRef Withrow C (1990) Error density and size in ada software. IEEE Softw 7(1):26–30CrossRef
Metadata
Title
Theory of relative defect proneness
Replicated studies on the functional form of the size-defect relationship
Authors
A. Güneş Koru
Khaled El Emam
Dongsong Zhang
Hongfang Liu
Divya Mathew
Publication date
01-10-2008
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 5/2008
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-008-9080-x

Other articles of this Issue 5/2008

Empirical Software Engineering 5/2008 Go to the issue

Premium Partner