Skip to main content
Erschienen in: Empirical Software Engineering 1/2013

01.02.2013

All complaints are not created equal: text analysis of open source software defect reports

verfasst von: Uzma Raja

Erschienen in: Empirical Software Engineering | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the use of Open Source Software (OSS) systems increases in the corporate environment, it is important to examine the maintenance process of these projects. OSS projects allow end users to directly submit reports in case of any operational issues. Timely resolution of these defect reports requires effective management of maintenance resources. This study analyzes the usefulness of the textual content of the defect reports as an early indicator of their resolution time. Text Mining techniques are used to categorize defect reports of five OSS projects. Significant variation in the defect resolution time amongst the resulting categories, for each of the sample projects, indicates that a text based classification of defect reports can be useful in early assessment of resolution time before source code level analysis. Such technique can assist in allocation of sufficient maintenance resources to targeted defects and also enable project teams to manage customer expectations regarding defect resolution times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Some OSS projects do offer paid versions that include software maintenance services as well.
 
2
The term significant hereafter, in this section, is used to indicate statistical significance.
 
3
Asterisk indicates statistical significance at p < 0.05.
 
Literatur
Zurück zum Zitat Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the OOPSLA workshop on eclipse technology eXchange. San Diego, California, pp 35–39, 16–17 Oct 2005 Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the OOPSLA workshop on eclipse technology eXchange. San Diego, California, pp 35–39, 16–17 Oct 2005
Zurück zum Zitat Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering. ACM, Shanghai, China, pp 361–370, 20–28 May 2006 Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering. ACM, Shanghai, China, pp 361–370, 20–28 May 2006
Zurück zum Zitat Anvik J, Murphy GC (2007) Determining implementation expertise from bug reports. In: Proceedings of the 4th intl. Workshop on mining software repositories. Washington, DC, USA, p 2 Anvik J, Murphy GC (2007) Determining implementation expertise from bug reports. In: Proceedings of the 4th intl. Workshop on mining software repositories. Washington, DC, USA, p 2
Zurück zum Zitat Berry MW, Browne M (2005) Understanding search engines: mathematical modeling and text retrieval. Soc for Industrial & Applied Math, Philadelphia, PAMATHCrossRef Berry MW, Browne M (2005) Understanding search engines: mathematical modeling and text retrieval. Soc for Industrial & Applied Math, Philadelphia, PAMATHCrossRef
Zurück zum Zitat Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really? In: IEEE international conference on software maintenance, ICSM, pp 337–345, 28 Sept 2008–4 Oct 2008 Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful really? In: IEEE international conference on software maintenance, ICSM, pp 337–345, 28 Sept 2008–4 Oct 2008
Zurück zum Zitat Brettschneider R (1989) Is your software ready for release? IEEE Softw 6(4):100, 102, 108 Brettschneider R (1989) Is your software ready for release? IEEE Softw 6(4):100, 102, 108
Zurück zum Zitat Canfora G, Cerulo L (2005) How software repositories can help in resolving a new change request. In: Proceedings of the workshop on empirical studies in reverse engineering Canfora G, Cerulo L (2005) How software repositories can help in resolving a new change request. In: Proceedings of the workshop on empirical studies in reverse engineering
Zurück zum Zitat Chiarini-Tremblay M, Berndt DJ, Foulis P, Luther S (2005) Utilizing text mining techniques to identify fall related injuries. In: Eleventh Americas conference on information systems, Omaha, NE Chiarini-Tremblay M, Berndt DJ, Foulis P, Luther S (2005) Utilizing text mining techniques to identify fall related injuries. In: Eleventh Americas conference on information systems, Omaha, NE
Zurück zum Zitat Chu-Ti L, Chin-Yu H, Chin-Yu H (2006) Software release time management: how to use reliability growth models to make better decisions. In: 2006 IEEE international conference on management of innovation and technology, vol 2, pp 658–662 Chu-Ti L, Chin-Yu H, Chin-Yu H (2006) Software release time management: how to use reliability growth models to make better decisions. In: 2006 IEEE international conference on management of innovation and technology, vol 2, pp 658–662
Zurück zum Zitat Chulani S, Ray B, Santhanam P, Leszkowicz R (2003) Metrics for managing customer view of software quality. In: Proceedings ninth international software metrics symposium, pp 189–198 Chulani S, Ray B, Santhanam P, Leszkowicz R (2003) Metrics for managing customer view of software quality. In: Proceedings ninth international software metrics symposium, pp 189–198
Zurück zum Zitat Cochran WG (1947) Some consequences when the assumptions for the analysis of variances are not satisfied. Biometrics 3(1):22–38MathSciNetCrossRef Cochran WG (1947) Some consequences when the assumptions for the analysis of variances are not satisfied. Biometrics 3(1):22–38MathSciNetCrossRef
Zurück zum Zitat Crowston K, Annabi H, Howison J (2003) Defining open source project success. In: International conference of information systems. Seattle, WA Crowston K, Annabi H, Howison J (2003) Defining open source project success. In: International conference of information systems. Seattle, WA
Zurück zum Zitat Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the sixteenth international conference on software engineering & knowledge engineering (SEKE 2004), pp 92–97 Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the sixteenth international conference on software engineering & knowledge engineering (SEKE 2004), pp 92–97
Zurück zum Zitat Deerwester EA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci Technol 41(6):391–401CrossRef Deerwester EA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci Technol 41(6):391–401CrossRef
Zurück zum Zitat Di Lucca G, Di Penta M, Gradara S (2002) An approach to classify software maintenance requests. In: Proceedings international conference onSoftware maintenance, pp 93–102 Di Lucca G, Di Penta M, Gradara S (2002) An approach to classify software maintenance requests. In: Proceedings international conference onSoftware maintenance, pp 93–102
Zurück zum Zitat Dinh-Trong TT (2005) The freebsd project: a replication case study of open source development. IEEE Trans Softw Eng 31(6):481–494CrossRef Dinh-Trong TT (2005) The freebsd project: a replication case study of open source development. IEEE Trans Softw Eng 31(6):481–494CrossRef
Zurück zum Zitat Dit B, Poshyvanyk D, Marcus A (2008) Measuring the semantic similarity of comments in bug reports. In: Proceedings 1st international workshop on semantic technologies in system maintenance (STSM’08) Dit B, Poshyvanyk D, Marcus A (2008) Measuring the semantic similarity of comments in bug reports. In: Proceedings 1st international workshop on semantic technologies in system maintenance (STSM’08)
Zurück zum Zitat Everitt BS (1998) The Cambridge dictionary of statistics. Cambridge University Press, CambridgeMATH Everitt BS (1998) The Cambridge dictionary of statistics. Cambridge University Press, CambridgeMATH
Zurück zum Zitat Goldberg J (1995) CDM: an approach to learning in text categorization. Tools with artificial intelligence, 1995. In: Proceedings seventh international conference, pp 258–265 Goldberg J (1995) CDM: an approach to learning in text categorization. Tools with artificial intelligence, 1995. In: Proceedings seventh international conference, pp 258–265
Zurück zum Zitat Huang C-Y, Lin C-T (2006) Software reliability analysis by considering fault dependency and debugging time lag. IEEE Trans Reliab 55(3):436–450MathSciNetCrossRef Huang C-Y, Lin C-T (2006) Software reliability analysis by considering fault dependency and debugging time lag. IEEE Trans Reliab 55(3):436–450MathSciNetCrossRef
Zurück zum Zitat Ito PK (1980) Handbook of statistics 1: analysis of variance. Amsterdam: North-Holland, Ch. robustness of ANOVA and MANOVA test procedures, pp 199–236 Ito PK (1980) Handbook of statistics 1: analysis of variance. Amsterdam: North-Holland, Ch. robustness of ANOVA and MANOVA test procedures, pp 199–236
Zurück zum Zitat Jensen C, Scacchi W (2004) Data mining for software process discovery in open source software development communities. In: International workshop on mining software repositories (MSR 2004) workshop - 26th international conference on software engineering, pp 96–100 Jensen C, Scacchi W (2004) Data mining for software process discovery in open source software development communities. In: International workshop on mining software repositories (MSR 2004) workshop - 26th international conference on software engineering, pp 96–100
Zurück zum Zitat Jones C (1996) Software defect-removal efficiency. Computer 29(4):94–95CrossRef Jones C (1996) Software defect-removal efficiency. Computer 29(4):94–95CrossRef
Zurück zum Zitat Kemerer CF, Slaughter SA (1997) Determinants of software maintenance profiles: an empirical investigation. J Softw Maint Evol: Research and Practice 9(4):235–251CrossRef Kemerer CF, Slaughter SA (1997) Determinants of software maintenance profiles: an empirical investigation. J Softw Maint Evol: Research and Practice 9(4):235–251CrossRef
Zurück zum Zitat Kemerer CF, Slaughter SA (1999) An empirical approach to studying software evolution. IEEE Trans Softw Eng 25:493–509CrossRef Kemerer CF, Slaughter SA (1999) An empirical approach to studying software evolution. IEEE Trans Softw Eng 25:493–509CrossRef
Zurück zum Zitat Kerlinger FN, Lee HB (1999) Foundations of behavioral research, 4th edn. Wadsworth Publishing, New York, NY Kerlinger FN, Lee HB (1999) Foundations of behavioral research, 4th edn. Wadsworth Publishing, New York, NY
Zurück zum Zitat Ko Y, Seo J (2000) Automatic text categorization by unsupervised learning. In: Linguistics ICOC (ed) Proceedings of the 18th conference on computational linguistics, vol 1. Saarbrcken, Germany, pp 453–459CrossRef Ko Y, Seo J (2000) Automatic text categorization by unsupervised learning. In: Linguistics ICOC (ed) Proceedings of the 18th conference on computational linguistics, vol 1. Saarbrcken, Germany, pp 453–459CrossRef
Zurück zum Zitat Lehman M, Bennett KH (2002) Feast: feedback, evolution and software technology, pp 1996–2001 Lehman M, Bennett KH (2002) Feast: feedback, evolution and software technology, pp 1996–2001
Zurück zum Zitat Levendel Y (1990) Reliability analysis of large software systems: defect data modeling. IEEE Trans Softw Eng 16(2):141–152CrossRef Levendel Y (1990) Reliability analysis of large software systems: defect data modeling. IEEE Trans Softw Eng 16(2):141–152CrossRef
Zurück zum Zitat Lewis D, Reguette M (1994) A comparison of two learning algorithms for text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Las Vegas, NV, pp 81–93 Lewis D, Reguette M (1994) A comparison of two learning algorithms for text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Las Vegas, NV, pp 81–93
Zurück zum Zitat Matter D, Adrian Kuhn ON (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the working conference on mining software repositories. Los Alamitos, CA, USA, pp 131–140 Matter D, Adrian Kuhn ON (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the working conference on mining software repositories. Los Alamitos, CA, USA, pp 131–140
Zurück zum Zitat McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD ’00: proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 169–178CrossRef McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD ’00: proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 169–178CrossRef
Zurück zum Zitat Melouk SH, Raja U, Keskin BB (2010) Managing resource allocation and task prioritization decisions in large scale virtual collaborative development projects. Inf Resour Manage J 23(2):53–76CrossRef Melouk SH, Raja U, Keskin BB (2010) Managing resource allocation and task prioritization decisions in large scale virtual collaborative development projects. Inf Resour Manage J 23(2):53–76CrossRef
Zurück zum Zitat Mockus A, Fielding RT, Herbsleb J (2000) A case study of open source software development: the apache server. In: Proceedings of the 2000 international conference on software engineering, pp 263–272 Mockus A, Fielding RT, Herbsleb J (2000) A case study of open source software development: the apache server. In: Proceedings of the 2000 international conference on software engineering, pp 263–272
Zurück zum Zitat Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346CrossRef Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol (TOSEM) 11(3):309–346CrossRef
Zurück zum Zitat Mockus A, Votta L (2000) Identifying reasons for software changes using historic databases. In: ICSM ’00: proceedings of the 2000 international conference on software maintenance. IEEE Computer Society, San Jose, CA, p 120 Mockus A, Votta L (2000) Identifying reasons for software changes using historic databases. In: ICSM ’00: proceedings of the 2000 international conference on software maintenance. IEEE Computer Society, San Jose, CA, p 120
Zurück zum Zitat Musa JD, Ackerman AF (1989) Quantifying software validation: when to stop testing? IEEE Softw 6:19–27CrossRef Musa JD, Ackerman AF (1989) Quantifying software validation: when to stop testing? IEEE Softw 6:19–27CrossRef
Zurück zum Zitat Neter J, Wasserman W, Kutner MH (2004) Applied linear regression models, vol 4. McGraw-Hill/Irwin, Boston, MA Neter J, Wasserman W, Kutner MH (2004) Applied linear regression models, vol 4. McGraw-Hill/Irwin, Boston, MA
Zurück zum Zitat Newman M (2002) Software errors cost u.s. economy $59.5 billion annually. Tech. Rep. NIST 2002-10, National Institute of Standards Newman M (2002) Software errors cost u.s. economy $59.5 billion annually. Tech. Rep. NIST 2002-10, National Institute of Standards
Zurück zum Zitat Ohtera H, Yamada S (1990) Optimum software-release time considering an error-detection phenomenon during operation. IEEE Trans Reliab 39:596–599MATHCrossRef Ohtera H, Yamada S (1990) Optimum software-release time considering an error-detection phenomenon during operation. IEEE Trans Reliab 39:596–599MATHCrossRef
Zurück zum Zitat Pankaj B, Far BH, Ruhe G, Far BH (2005) Explorative study to provide decision support for software release decisions. In: Proceedings of the 21st IEEE international conference on software maintenance 2005, ICSM’05, pp 617–620 Pankaj B, Far BH, Ruhe G, Far BH (2005) Explorative study to provide decision support for software release decisions. In: Proceedings of the 21st IEEE international conference on software maintenance 2005, ICSM’05, pp 617–620
Zurück zum Zitat Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137CrossRef Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137CrossRef
Zurück zum Zitat Poulsen K (2004) Software bug contributed to blackout. Tech. rep., Security Focus Poulsen K (2004) Software bug contributed to blackout. Tech. rep., Security Focus
Zurück zum Zitat Raja U, Tretter MJ (2009) Antecedents of open source software defects: a data mining approach to model formulation, validation and testing. Information Technology and Management 10(4):235–251CrossRef Raja U, Tretter MJ (2009) Antecedents of open source software defects: a data mining approach to model formulation, validation and testing. Information Technology and Management 10(4):235–251CrossRef
Zurück zum Zitat Raja U, Tretter MJ (2010) Classification of software patches: a text mining approach. J Softw Maint: Research and Practice 23(2):69–87CrossRef Raja U, Tretter MJ (2010) Classification of software patches: a text mining approach. J Softw Maint: Research and Practice 23(2):69–87CrossRef
Zurück zum Zitat Scariano SM, Davenport JM (1987) The effects of violations of independence assumptions in the one-way ANOVA. Am Stat 41(2):123–129MathSciNet Scariano SM, Davenport JM (1987) The effects of violations of independence assumptions in the one-way ANOVA. Am Stat 41(2):123–129MathSciNet
Zurück zum Zitat Shadish WR, Cook TD, Campbell DT (2001) Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd edn. Wadsworth Publishing, New York, NY Shadish WR, Cook TD, Campbell DT (2001) Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd edn. Wadsworth Publishing, New York, NY
Zurück zum Zitat Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: ICSE ’08: Proceedings of the 30th international conference on software engineering. ACM, New York, NY, USA, pp 461–470CrossRef Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: ICSE ’08: Proceedings of the 30th international conference on software engineering. ACM, New York, NY, USA, pp 461–470CrossRef
Zurück zum Zitat Weiss C, Premraj R, Zimmermann T, Zeller, May 20 – 26 2007. How long will it take to fix this bug? In: ICSE workshops MSR ’07. Fourth international workshop on mining software repositories. Minneapolis, MN Weiss C, Premraj R, Zimmermann T, Zeller, May 20 – 26 2007. How long will it take to fix this bug? In: ICSE workshops MSR ’07. Fourth international workshop on mining software repositories. Minneapolis, MN
Zurück zum Zitat Williams CC, Hollingsworth JK (2005) Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans Softw Eng 31(6):466–480CrossRef Williams CC, Hollingsworth JK (2005) Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans Softw Eng 31(6):466–480CrossRef
Zurück zum Zitat Xie AM, Hu QP (2007) A study of the modeling and analysis of software fault-detection and fault-correction processes. Qual Reliab Eng Int 23(4):459–470MathSciNetCrossRef Xie AM, Hu QP (2007) A study of the modeling and analysis of software fault-detection and fault-correction processes. Qual Reliab Eng Int 23(4):459–470MathSciNetCrossRef
Zurück zum Zitat Xie M, Yang, B (2003) A study of the effect of imperfect debugging on software develpment cost. IEEE Trans Softw Eng 29(5):471–473CrossRef Xie M, Yang, B (2003) A study of the effect of imperfect debugging on software develpment cost. IEEE Trans Softw Eng 29(5):471–473CrossRef
Zurück zum Zitat Ying AT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586CrossRef Ying AT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586CrossRef
Zurück zum Zitat Zhang X, Teng X, Pham H (2003)Considering fault removal efficiency in software reliability assessment. IEEE Trans Syst Man Cybern, Part A, Syst Humans 33(1):114–120CrossRef Zhang X, Teng X, Pham H (2003)Considering fault removal efficiency in software reliability assessment. IEEE Trans Syst Man Cybern, Part A, Syst Humans 33(1):114–120CrossRef
Zurück zum Zitat Zimmermann T, Zeller A, Weissgerber P, Diehl S (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRef Zimmermann T, Zeller A, Weissgerber P, Diehl S (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRef
Metadaten
Titel
All complaints are not created equal: text analysis of open source software defect reports
verfasst von
Uzma Raja
Publikationsdatum
01.02.2013
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 1/2013
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-012-9197-9

Weitere Artikel der Ausgabe 1/2013

Empirical Software Engineering 1/2013 Zur Ausgabe

Premium Partner