ABSTRACT
Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance. We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics.
- J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In ICSE'09, pages 298--308, May 2009. Google ScholarDigital Library
- G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. 2002. Recovering Traceability Links between Code and Documentation, IEEE Trans. Softw. Eng. 28, 10 October 2002, 970--983. Google ScholarDigital Library
- A. Bacchelli, M. D'Ambros, M. Lanza, R. Robbes, Benchmarking Lightweight Techniques to Link E-Mails and Source Code. In WCRE'09, Lille, France, pp. 205--214, Oct 2009. Google ScholarDigital Library
- A. Bacchelli, M. Lanza, and R. Robbes, Linking e-mails and source code artifacts. In ICSE '10, Vol. 1. ACM, New York, NY, USA, 375--384. Google ScholarDigital Library
- A. Bachmann and A. Bernstein. Software process data quality and characteristics - a historical view on open and closed source projects. In IWPSE-Evol'09, pages 119--128, Amsterdam, The Netherlands, August 2009. Google ScholarDigital Library
- A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein, The Missing Links: Bugs and Bug-fix Commits. In FSE'10, 97--106, Santa Fe, New Mexico, USA, Nov 2010. Google ScholarDigital Library
- C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu, Fair and balanced?: bias in bug-fixing datasets. In ESEC/FSE'09, Aug. 2009, 121--130. Google ScholarDigital Library
- C. Bird, A. Bachmann, F. Rahman, and A. Bernstein, LINKSTER: enabling efficient manual inspection and annotation of mined data. In FSE'10, 369--370, Santa Fe, New Mexico, USA, Nov 2010. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999. Google ScholarDigital Library
- N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Duplicate bug reports considered harmful... really? In ICSM'08, pages 337--345, October 2008.Google ScholarCross Ref
- K. Chen, S. R. Schach, L. Yu, J. Offutt, and G. Z. Heller. Open-source change logs. Emp. Softw. Eng., 9(3):197--210, 2004. Google ScholarDigital Library
- C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, 1998.Google Scholar
- M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In WCRE'03, pages 90--99, Victoria, Canada, November 2003. Google ScholarDigital Library
- M. Fischer, M. Pinzger, and H. C. Gall. Populating a release history database from version control and bug tracking systems. In ICSM'03, pages 23--32, Amsterdam, Netherlands, September 2003. Google ScholarDigital Library
- A. Hindle, D. M. German, R. C. Holt: What do large commits tell us?: a taxonomical study of large commits. In MSR 2008, pp. 99--108, May 2008. Google ScholarDigital Library
- S. Kim, T. Zimmermann, K. Pan and E. Whitehead Jr., Automatic Identification of Bug-Introducing Changes. In ASE'06, Tokyo, Japan, September 2006. Google ScholarDigital Library
- S. Kim, T. Zimmermann, E. J. Whitehead Jr., and A. Zeller. Predicting faults from cached history. In ICSE'07, pages 489--498, Washington, DC, USA, 2007. Google ScholarDigital Library
- S. Kim, H. Zhang, R. Wu and L. Gong, Dealing with Noise in Defect Prediction. In ICSE'11, Honolulu, Hawaii, USA, May 2011. Google ScholarDigital Library
- G. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In PROMISE'08, 39--44, May 2008. Google ScholarDigital Library
- A. Mockus, Missing Data in Software Engineering, Empirical Methods in Software Engineering. The MIT Press, 2000.Google Scholar
- A. Mockus and L. G. Votta, Identifying Reasons for Software Changes Using Historic Databases. In ICSM 2000, San Jose, CA, USA, 2000, pp. 120--130. Google ScholarDigital Library
- A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of open source software development: Apache and mozilla. ACM Trans. Softw. Eng. Methodol., 11(3):309--346, 2002. Google ScholarDigital Library
- A. Murgia, G. Concas, M. Marchesi, R. Tonelli, A machine learning approach for text categorization of fixing-issue commits on CVS. In ESEM 2010, Bolzano-Bozen, Italy, Sep 2010. Google ScholarDigital Library
- I. Myrtveit, E. Stensrud, and U. H. Olsson. Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods. IEEE Trans. on Software Engineering, 27(11), pp.999--1013, 2001. Google ScholarDigital Library
- T. H. D. Nguyen, B. Adams, A. E. Hassan, A Case Study of Bias in Bug-Fix Datasets. In WCRE'10, pp. 259--268. Google ScholarDigital Library
- P. Runeson, M. Alexanderson, O. Nyholm, Detection of Duplicate Defect Reports Using Natural Language Processing. In ICSE'07, 499--510, May 2007. Google ScholarDigital Library
- A. Schroter, T. Zimmermann, R. Premraj, and A. Zeller. If your bug database could talk... In ICSE'06, pages 18--20, Rio de Janeiro, Brazil, September 2006.Google Scholar
- J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In MSR'05, pages 24--28, Saint Louis, Missouri, USA, May 2005. ACM. Google ScholarDigital Library
- K. Strike, K. E. Emam, and N. Madhavji. Software Cost Estimation with Incomplete Data. IEEE Trans. on Software Engineering, 27(10), pp.890--908, 2001. Google ScholarDigital Library
- X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, An approach to detecting duplicate bug reports using natural language and execution information. In ICSE'08, pages 461--470, Leipzig, Germany, 2008 Google ScholarDigital Library
- WEKA: http://www.cs.waikato.ac.nz/ml/weka/Google Scholar
- I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation, second ed., Morgan Kaufmann, 2005. Google ScholarDigital Library
- T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In PROMISE'07, pages 1--9, Minneapolis, Minnesota, USA, May 2007. Google ScholarDigital Library
- T. Zimmermann and P. Weissgerber. Preprocessing cvs data for Fine-grained analysis. In MSR'04, pages 2--6, Edinburgh, Scotland, UK, May 2004.Google ScholarCross Ref
- H. Zhang and R. Wu, Sampling Program Quality, Proc. ICSM 2010, Timisoara, Romania, Sep 2010, pp. 1--10. Google ScholarDigital Library
- H. Zhang, An Investigation of the Relationships between Lines of Code and Defects. In ICSM'09, Edmonton, Canada, September 2009, pp. 274--28.Google ScholarCross Ref
Index Terms
- ReLink: recovering links between bugs and changes
Recommendations
Multi-layered approach for recovering links between bug reports and fixes
FSE '12: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software EngineeringThe links between the bug reports in an issue-tracking system and the corresponding fixing changes in a version repository are not often recorded by developers. Such linking information is crucial for research in mining software repositories in ...
Empirical Evaluation of Bug Linking
CSMR '13: Proceedings of the 2013 17th European Conference on Software Maintenance and ReengineeringTo collect software bugs found by users, development teams often set up bug trackers using systems such as Bugzilla. Developers would then fix some of the bugs and commit corresponding code changes into version control systems such as svn or git. ...
Identifying static analysis techniques for finding non-fix hunks in fix revisions
DSMM '09: Proceedings of the ACM first international workshop on Data-intensive software management and miningMining software repositories for bug detection requires accurate techniques of identifying bug-fix revisions. There have been many researches to find exact bug-fix revisions. However there are still noises, we call these noises non-fix hunks, even in ...
Comments