ABSTRACT
Bug tracking systems are valuable assets for managing maintenance activities. They are widely used in open-source projects as well as in the software industry. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organizational issues. These different kinds of issues are simply labeled as "bug" for lack of a better classification support or of knowledge about the possible kinds.
This paper investigates whether the text of the issues posted in bug tracking systems is enough to classify them into corrective maintenance and other kinds of activities.
We show that alternating decision trees, naive Bayes classifiers, and logistic regression can be used to accurately distinguish bugs from other kinds of issues. Results from empirical studies performed on issues for Mozilla, Eclipse, and JBoss indicate that issues can be classified with between 77% and 82% of correct decisions.
- Ethem Aplaydin. Introduction to Machine Learning. MIT Press, 2004. Google ScholarDigital Library
- Kamel Ayari, Peyman Meshkinfam, Giulio Antoniol, and Massimiliano Di Penta. Threats on building models from cvs and bugzilla repositories: the mozilla case study. In CASCON, Toronto, CA, Oct 23--25 2007. Google ScholarDigital Library
- V. Basili, G. Caldiera, and D. H. Rombach. The Goal Question Metric Paradigm Encyclopedia of Software Engineering. John Wiley and Sons, 1994.Google Scholar
- L. C. Briand, S. Morasca, and V. Basili. Measuring and assesing maintainability at the end of high level design. In Proceedings of IEEE International Conference on Software Maintenance, pages 88--97, Montreal, 1993. Google ScholarDigital Library
- S. E. Robertson C. J. van Rijsbergen and M. F. Porter. New models in probabilistic information retrieval. London: British Library, Research and Development Report, no. 5587, 1980.Google Scholar
- Rumelhart D. E., Hinton G. E., and Williams R. J. Learning representations by back-propagating errors. Nature, 323:533--536, 1986.Google ScholarCross Ref
- Fenton N. and Neil M. A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5):675--689, 1999. Google ScholarDigital Library
- Michael Fischer, Martin Pinzger, and Harald Gall. Populating a release history database from version control and bug tracking systems. In Proceedings of the International Conference on Software Maintenance, pages 23--32, Amsterdam Netherlands, September 2003. Google ScholarDigital Library
- W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ, 1992.Google ScholarDigital Library
- Harald Gall, Karin Hajek, and Mehdi Jazayeri. Detection of logical coupling based on product release history. In Proceedings of IEEE International Conference on Software Maintenance, pages 190--197, 1998. Google ScholarDigital Library
- Daniel M. German. An empirical study of fine-grained software modifications. Journal of Empirical Software Engineering, 2005. Google ScholarDigital Library
- Tibor Gyimóthy, Rudolf Ferenc, and István Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Software Eng., 31(10):897--910, 2005. Google ScholarDigital Library
- N. Kurishima, H. Oikawa, J. Nakamura, K. Amari, M. Fujioka, and K. D. Denwa. Quantitative analysis of error in telecomunications software. In Proceedings of IEEE International Conference on Software Maintenance, pages 190--198, Victoria, 1994. Google ScholarDigital Library
- Tom Mitchell. Machine Learning. MIT Press, 1997. Google ScholarDigital Library
- J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- J. O. Rawlings, S. G. Pandula, and D. A. Dickey. Applied Regression Analysis a Research Tool. Springer Texts in Statistics. New York: Springer-Verlag, second edition edition, 1998.Google Scholar
- Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. When do changes induce fixes? In Proceedings of the 2005 International Workshop on Mining Software Repositories MSR 2005 Saint Louis Missouri USA, May 17 2005. Google ScholarDigital Library
- M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36:111--147, 1974.Google Scholar
- Marek Vokavc. Defect frequency and design patterns: An empirical study of industrial code. IEEE Trans. Software Eng., 30:904--917, 2004. Google ScholarDigital Library
- Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An approach to detecting duplicate bug reports using natural language and execution information. In ICSE '08: Proceedings of the 30th international conference on Software engineering, pages 461--470, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Peter Weissgerber and Stephan Diehl. Are refactorings less error-prone than other changes? In Proceedings of the 2006 International Workshop on Mining Software Repositories MSR 2006 Shanghai China May 22--23 2006, pages 112--118, 2006. Google ScholarDigital Library
- Ian Witten and Eibe Frank. Data Mining Practical Machine Learning Tools and Techniques - Second Edition. Elsevier, 2005. Google ScholarDigital Library
- R. K. Yin. Case Study Research: Design and Methods - Third Edition. SAGE Publications, London, 2002.Google Scholar
- Annie T. T. Ying, Gail C. Murphy, Raymond T. Ng, and Mark Chu-Carroll. Predicting source code changes by mining change history. IEEE Trans. Software Eng., 30(9):574--586, 2004. Google ScholarDigital Library
- Thomas Zimmermann, Peter Weissgerber, Stephan Diehl, and Andreas Zeller. Mining version histories to guide software changes. In Proceedings of the International Conference on Software Engineering, pages 563--572, 2004. Google ScholarDigital Library
Index Terms
- Is it a bug or an enhancement?: a text-based approach to classify change requests
Recommendations
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability EngineeringFor complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Is it a bug or an enhancement?: a text-based approach to classify change requests
CASCON '18: Proceedings of the 28th Annual International Conference on Computer Science and Software EngineeringBug tracking systems are valuable assets for managing maintenance activities. They are widely used in open-source projects as well as in the software industry. They collect many different kinds of issues: requests for defect fixing, enhancements, ...
Memories of bug fixes
SIGSOFT '06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineeringThe change history of a software project contains a rich collection of code changes that record previous development experience. Changes that fix bugs are especially interesting, since they record both the old buggy code and the new fixed code. This ...
Comments