skip to main content
10.1145/2025113.2025120acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

ReLink: recovering links between bugs and changes

Authors Info & Claims
Published:09 September 2011Publication History

ABSTRACT

Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance. We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics.

References

  1. J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In ICSE'09, pages 298--308, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. 2002. Recovering Traceability Links between Code and Documentation, IEEE Trans. Softw. Eng. 28, 10 October 2002, 970--983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bacchelli, M. D'Ambros, M. Lanza, R. Robbes, Benchmarking Lightweight Techniques to Link E-Mails and Source Code. In WCRE'09, Lille, France, pp. 205--214, Oct 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bacchelli, M. Lanza, and R. Robbes, Linking e-mails and source code artifacts. In ICSE '10, Vol. 1. ACM, New York, NY, USA, 375--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bachmann and A. Bernstein. Software process data quality and characteristics - a historical view on open and closed source projects. In IWPSE-Evol'09, pages 119--128, Amsterdam, The Netherlands, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein, The Missing Links: Bugs and Bug-fix Commits. In FSE'10, 97--106, Santa Fe, New Mexico, USA, Nov 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu, Fair and balanced?: bias in bug-fixing datasets. In ESEC/FSE'09, Aug. 2009, 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Bird, A. Bachmann, F. Rahman, and A. Bernstein, LINKSTER: enabling efficient manual inspection and annotation of mined data. In FSE'10, 369--370, Santa Fe, New Mexico, USA, Nov 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Duplicate bug reports considered harmful... really? In ICSM'08, pages 337--345, October 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Chen, S. R. Schach, L. Yu, J. Offutt, and G. Z. Heller. Open-source change logs. Emp. Softw. Eng., 9(3):197--210, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, 1998.Google ScholarGoogle Scholar
  13. M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In WCRE'03, pages 90--99, Victoria, Canada, November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Fischer, M. Pinzger, and H. C. Gall. Populating a release history database from version control and bug tracking systems. In ICSM'03, pages 23--32, Amsterdam, Netherlands, September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Hindle, D. M. German, R. C. Holt: What do large commits tell us?: a taxonomical study of large commits. In MSR 2008, pp. 99--108, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Kim, T. Zimmermann, K. Pan and E. Whitehead Jr., Automatic Identification of Bug-Introducing Changes. In ASE'06, Tokyo, Japan, September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kim, T. Zimmermann, E. J. Whitehead Jr., and A. Zeller. Predicting faults from cached history. In ICSE'07, pages 489--498, Washington, DC, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kim, H. Zhang, R. Wu and L. Gong, Dealing with Noise in Defect Prediction. In ICSE'11, Honolulu, Hawaii, USA, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In PROMISE'08, 39--44, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Mockus, Missing Data in Software Engineering, Empirical Methods in Software Engineering. The MIT Press, 2000.Google ScholarGoogle Scholar
  21. A. Mockus and L. G. Votta, Identifying Reasons for Software Changes Using Historic Databases. In ICSM 2000, San Jose, CA, USA, 2000, pp. 120--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of open source software development: Apache and mozilla. ACM Trans. Softw. Eng. Methodol., 11(3):309--346, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Murgia, G. Concas, M. Marchesi, R. Tonelli, A machine learning approach for text categorization of fixing-issue commits on CVS. In ESEM 2010, Bolzano-Bozen, Italy, Sep 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Myrtveit, E. Stensrud, and U. H. Olsson. Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods. IEEE Trans. on Software Engineering, 27(11), pp.999--1013, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. H. D. Nguyen, B. Adams, A. E. Hassan, A Case Study of Bias in Bug-Fix Datasets. In WCRE'10, pp. 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Runeson, M. Alexanderson, O. Nyholm, Detection of Duplicate Defect Reports Using Natural Language Processing. In ICSE'07, 499--510, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Schroter, T. Zimmermann, R. Premraj, and A. Zeller. If your bug database could talk... In ICSE'06, pages 18--20, Rio de Janeiro, Brazil, September 2006.Google ScholarGoogle Scholar
  28. J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In MSR'05, pages 24--28, Saint Louis, Missouri, USA, May 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Strike, K. E. Emam, and N. Madhavji. Software Cost Estimation with Incomplete Data. IEEE Trans. on Software Engineering, 27(10), pp.890--908, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, An approach to detecting duplicate bug reports using natural language and execution information. In ICSE'08, pages 461--470, Leipzig, Germany, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. WEKA: http://www.cs.waikato.ac.nz/ml/weka/Google ScholarGoogle Scholar
  32. I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation, second ed., Morgan Kaufmann, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In PROMISE'07, pages 1--9, Minneapolis, Minnesota, USA, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Zimmermann and P. Weissgerber. Preprocessing cvs data for Fine-grained analysis. In MSR'04, pages 2--6, Edinburgh, Scotland, UK, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  35. H. Zhang and R. Wu, Sampling Program Quality, Proc. ICSM 2010, Timisoara, Romania, Sep 2010, pp. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Zhang, An Investigation of the Relationships between Lines of Code and Defects. In ICSM'09, Edmonton, Canada, September 2009, pp. 274--28.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. ReLink: recovering links between bugs and changes

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE '11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
      September 2011
      548 pages
      ISBN:9781450304436
      DOI:10.1145/2025113

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 September 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate17of128submissions,13%

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader