Abstract
A key collaborative hub for many software development projects is the bug report repository. Although its use can improve the software development process in a number of ways, reports added to the repository need to be triaged. A triager determines if a report is meaningful. Meaningful reports are then organized for integration into the project's development process.
To assist triagers with their work, this article presents a machine learning approach to create recommenders that assist with a variety of decisions aimed at streamlining the development process. The recommenders created with this approach are accurate; for instance, recommenders for which developer to assign a report that we have created using this approach have a precision between 70% and 98% over five open source projects. As the configuration of a recommender for a particular project can require substantial effort and be time consuming, we also present an approach to assist the configuration of such recommenders that significantly lowers the cost of putting a recommender in place for a project. We show that recommenders for which developer should fix a bug can be quickly configured with this approach and that the configured recommenders are within 15% precision of hand-tuned developer recommenders.
Supplemental Material
Available for Download
Supplemental movie, image and appendix files for Reducing the effort of bug report triage: Recommenders for development-oriented decisions
- Aha, D. W., Kibler, D., and Albert, M. K. 1991. Instance-based learning algorithms. Mach. Learn. 6, 1, 37--66. Google ScholarDigital Library
- Anvik, J., Hiew, L., and Murphy, G. C. 2006. Who should fix this bug? In Proceedings of the 28th International Conference on Software Engineering (ICSE'06). ACM, 361--370. Google ScholarDigital Library
- Anvik, J. and Murphy, G. C. 2007. Determining implementation expertise from bug reports. In Proceedings of the 4th International Workshop on Mining Software Repositories. IEEE Computer Society, 9--16. Google ScholarDigital Library
- Anvik, J. K. 2007. Assisting bug report triage through recommendation. Ph.D. dissertation, University of British Columbia.Google Scholar
- Baeza-Yates, R. A. and Ribeiro-Neto, B. A. 1999. Modern Information Retrieval. ACM. Google ScholarDigital Library
- Canfora, G. and Cerulo, L. 2006. Supporting change request assignment in open source development. In Proceedings of the 21st ACM Symposium on Applied Computing. ACM, 1767--1772. Google ScholarDigital Library
- Carstensen, P. H. and Sorensen, C. 1995. Let's talk about bugs! Scand. J. Inf. Syst. 7, 1, 33--54. Google ScholarDigital Library
- Crowston, K., Howison, J., and Annabi, H. 2006. Information systems success in free and open source software development: theory and measures. Softw. Proc. Improve. Pract. 11, 2, 123--148.Google ScholarCross Ref
- Čubranić, D. and Murphy, G. C. 2004. Automatic bug triage using text classification. In Proceedings of 16th International Conference on Software Engineering and Knowledge Engineering. 92--97.Google Scholar
- de Souza, C. R. B., Redmiles, D., Mark, G., Penix, J., and Sierhuis, M. 2003. Management of interdependencies in collaborative software development. In Proceedings of the International Symposium on Empirical Software Engineering (ISESE'03). IEEE Computer Society Press, 294--303. Google ScholarDigital Library
- Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39, 1, 1--38.Google Scholar
- Di Lucca, G. A., Penta, M. D., and Gradara, S. 2002. An approach to classify software maintenance requests. In Proceedings of the International Conference on Software Maintenance (ICSM'02). IEEE Computer Society Press, 93--102. Google ScholarDigital Library
- Gunn, S. R. 1998. Support Vector Machines for classification and regression. Tech. rep., Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, University of Southampton.Google Scholar
- Hiew, L. 2006. Assisted detection of duplicate bug reports. M.S. dissertation, University of British Columbia.Google Scholar
- Hooimeijer, P. and Weimer, W. 2007. Modeling bug report quality. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 34--43. Google ScholarDigital Library
- John, G. H. and Langley, P. 1995. Estimating continous distributions in Bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufmann, 338--345. Google ScholarDigital Library
- McDonald, D. W. 2001. Evaluating expertise recommendations. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work. ACM, 214--223. Google ScholarDigital Library
- Minto, S. and Murphy, G. C. 2007. Recommending emergent teams. In Proceedings of 4th International Workshop on Mining Software Repositories. IEEE Computer Society Press, 33--40. Google ScholarDigital Library
- Mitchell, T. M. 1997. Machine Learning. WCB/McGraw-Hill. Google ScholarDigital Library
- Mockus, A., Fielding, R. T., and Herbsleb, J. D. 2002. Two case studies of open source software development: Apache and Mozilla. ACM Trans. Softw. Eng. Meth. 11, 3, 309--346. Google ScholarDigital Library
- Mockus, A. and Herbsleb, J. D. 2002. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering. ACM, 503--512. Google ScholarDigital Library
- Mockus, A. and Votta, L. G. 2000. Identifying reasons for software changes using historic databases. In Proceedings of the International Conference on Software Maintenance (ICSM'00). IEEE Computer Society Press, 120. Google ScholarDigital Library
- Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan-Kaufmann. Google ScholarDigital Library
- Raymond, E. S. 1999. The cathedral and the bazaar. Knowl. Tech. Policy 12, 3, 23--49.Google ScholarCross Ref
- Reis, C. R. and de Mattos Fortes, R. P. 2002. An overview of the software engineering process and tools in the Mozilla project. In Proceedings of the Open Source Software Development Workshop. 155--175.Google Scholar
- Rennie, J. D. M., Shih, L., Teevan, J., and Karger, D. R. 2003. Tackling the poor assumptions of Naïve Bayes classifiers. In Proceedings of 20th International Conference on Machine Learning. AAAI Press, 616--623.Google Scholar
- Runeson, P., Alexandersson, M., and Nyholm, O. 2007. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th International Conference on Software Engineering (ICSE'07). IEEE Computer Society Press, 499--510. Google ScholarDigital Library
- Sakakibara, Y. 1997. Recent advances of grammatical inference. Theoret. Comput. Sci. 185, 1, 15--45. Google ScholarDigital Library
- Sandusky, R. J. and Gasser, L. 2005. Negotiation and the coordination of information and activity in distributed software problem management. In Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work (GROUP'05). ACM, 187--196. Google ScholarDigital Library
- Sandusky, R. J., Gasser, L., and Ripoche, G. 2004. Bug report networks: Varieties, strategies, and impacts in a F/OSS development community. In Proceedings of ICSE Workshop on Mining Software Repositories. 80--84.Google Scholar
- Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47. Google ScholarDigital Library
- Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th International Conference on Software Engineering (ICSE'08). ACM, 461--470. Google ScholarDigital Library
- Weiss, C., Premraj, R., Zimmermann, T., and Zeller, A. 2007. How long will it take to fix this bug? In Proceedings of 4th International Workshop on Mining Software Repositories (MSR'07). IEEE Computer Society Press, 1--8. Google ScholarDigital Library
- Witten, I. H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan-Kaufmann. Google ScholarDigital Library
Index Terms
- Reducing the effort of bug report triage: Recommenders for development-oriented decisions
Recommendations
Improving bug triage with bug tossing graphs
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringbug report is typically assigned to a single developer who is then responsible for fixing the bug. In Mozilla and Eclipse, between 37%-44% of bug reports are "tossed" (reassigned) to other developers, for example because the bug has been assigned by ...
Modeling bug report quality
ASE '07: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software EngineeringSoftware developers spend a significant portion of their resources handling user-submitted bug reports. For software that is widely deployed, the number of bug reports typically outstrips the resources available to triage them. As a result, some reports ...
Cost-aware triage ranking algorithms for bug reporting systems
Bug triaging of deciding whom to fix the bug has been studied actively. However, existing work does not consider varying cost of the same bug over developers with diverse backgrounds and experiences. In clear contrast, we argue the "cost" of one bug can ...
Comments