skip to main content
10.1145/1081706.1081754acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

DynaMine: finding common error patterns by mining software revision histories

Published:01 September 2005Publication History

ABSTRACT

A great deal of attention has lately been given to addressing software bugs such as errors in operating system drivers or security bugs. However, there are many other lesser known errors specific to individual applications or APIs and these violations of application-specific coding rules are responsible for a multitude of errors. In this paper we propose DynaMine, a tool that analyzes source code check-ins to find highly correlated method calls as well as common bug fixes in order to automatically discover application-specific coding patterns. Potential patterns discovered through mining are passed to a dynamic analysis tool for validation; finally, the results of dynamic analysis are presented to the user.The combination of revision history mining and dynamic analysis techniques leveraged in DynaMine proves effective for both discovering new application-specific patterns and for finding errors when applied to very large applications with many man-years of development and debugging effort behind them. We have analyzed Eclipse and jEdit, two widely-used, mature, highly extensible applications consisting of more than 3,600,000 lines of code combined. By mining revision histories, we have discovered 56 previously unknown, highly application-specific patterns. Out of these, 21 were dynamically confirmed as very likely valid patterns and a total of 263 pattern violations were found.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th Very Large Data Bases Conference, pages 487--499. Morgan Kaufmann, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Alur, P. Černý, P. Madhusudan, and W. Nam. Synthesis of interface specifications for Java classes. In Proceedings of the 32nd ACM Sysposium on Principles of Programming Languages, pages 98--109, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Ammons, R. Bodik, and J. Larus. Mining specifications. In Proceedings of the 29th ACM Symposium on Principles of Programming Languages, pages 4--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Ball, B. Cook, V. Levin, and S. K. Rajamani. SLAM and static driver verifier: Technology transfer of formal methods inside Microsoft. Technical Report MSR-TR-2004-08, Microsoft, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Bevan and J. Whitehead. Identification of software instabilities. In Proceedings of the Working Conference on Reverse Engineering, pages 134--143, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. M. Bieman, A. A. Andrews, and H. J. Yang. Understanding change-proneness in OO software through visualization. In Proceedings of the 11th International Workshop on Program Comprehension, pages 44--53, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. A static analyzer for large safety-critical software. In Proceedings of the ACM Conference on Programming Language Design and Implementation, pages 196--207, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Brat and A. Venet. Precise and scalable static program analysis of NASA flight software. In Proceedings of the 2005 IEEE Aerospace Conference, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. Burke and A. Brock. Aspect-oriented programming and JBoss. http://www.onjava.com/pub/a/onjava/2003/05/28/aop_jboss.html, 2003.Google ScholarGoogle Scholar
  10. D. Carlson. Eclipse Distilled. Addison-Wesley Professional, 2005.Google ScholarGoogle Scholar
  11. V. Dallmeier, C. Lindig, and A. Zeller. Lightweight defect localization for java. In Proceedings of the 19th European Conference on Object-Oriented Programming, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Dudney, S. Asbury, J. Krozak, and K. Wittkopf. J2EE AntiPatterns. Wiley, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of the Fourth Symposium on Operating Systems Design and Implentation, pages 1--16, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. R. Engler, D. Y. Chen, and A. Chou. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Symposium on Operating Systems Principles, pages 57--72, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27(2):99--123, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In Proceedings of the Working Conference on Reverse Engineering, pages 90--101, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based on product release history. In Proceedings of the International Conference on Software Maintenance, pages 190--198, Nov. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Gall, M. Jazayeri, and J. Krajewski. CVS release history data for detecting logical couplings. In Proceedings International Workshop on Principles of Software Evolution, pages 13--23, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In Proceedings of the Conference on Programming Language Design and Implementation, pages 69--82, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y.-W. Huang, F. Yu, C. Hang, C.-H. Tsai, D.-T. Lee, and S.-Y. Kuo. Securing web application code by static analysis and runtime protection. In Proceedings of the 13th conference on World Wide Web, pages 40--52, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Lam and M. Rinard. A type system and analysis for the automatic extraction and enforcement of design information. In Proceedings of the 17th European Conference on Object-Oriented Programming, pages 275--302, July 2003.Google ScholarGoogle ScholarCross RefCross Ref
  22. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 181--192, July 1994.Google ScholarGoogle Scholar
  23. A. Michail. Data mining library reuse patterns in user-selected applications. In Proceedings of the 14th International Conference on Automated Software Engineering, pages 24--33, Oct. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Michail. Data mining library reuse patterns using generalized association rules. In Proceedings of the International Conference on Software Engineering, pages 167--176, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Pestov. jEdit user guide. http://www.jedit.org/.Google ScholarGoogle Scholar
  26. R. Purushothaman and D. E. Perry. Towards understanding the rhetoric of small changes. In Proceedings of the International Workshop on Mining Software Repositories, pages 90--94, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  27. D. Reimer, E. Schonberg, K. Srinivas, H. Srinivasan, B. Alpern, R. D. Johnson, A. Kershenbaum, and L. Koved. SABER: Smart Analysis Based Error Reduction. In Proceedings of the International Symposium on Software Testing and Analysis, pages 243--251, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. V. Rysselberghe and S. Demeyer. Mining version control systems for FACs (frequently applied changes). In Proceedings of the International Workshop on Mining Software Repositories, pages 48--52, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. R. Schach. Object-Oriented and Classical Software Engineering. McGraw-Hill Science/Engineering/Math, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting format string vulnerabilities with type qualifiers. In Proceedings of the 2001 Usenix Security Conference, pages 201--220, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Tate, M. Clark, B. Lee, and P. Linskey. Bitter EJB. Manning Publications, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Wagner, J. Foster, E. Brewer, and A. Aiken. A first step towards automated detection of buffer overrun vulnerabilities. In Proceedings of Network and Distributed Systems Security Symposium, pages 3--17, Feb. 2000.Google ScholarGoogle Scholar
  33. W. Weimer and G. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms For The Construction And Analysis Of Systems, pages 461--476, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Whaley, M. Martin, and M. Lam. Automatic extraction of object-oriented component interfaces. In Proceedings of the International Symposium of Software Testing and Analysis, pages 218--228, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering, 31(6), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. C. Williams and J. K. Hollingsworth. Recovering system specific rules from software repositories. In Proceedings of the International Workshop on Mining Software Repositories, pages 7--11, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. T. Ying, G. C. Murphy, R. Ng, and M. C. Chu-Carroll. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9):574--586, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Zimmermann, S. Diehl, and A. Zeller. How history justifies system architecture (or not). In Proceedings International Workshop on Principles of Software Evolution, pages 73--83, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Zimmermann and P. Weiβgerber. Preprocessing CVS data for fine-grained analysis. In Proceedings of the International Workshop on Mining Software Repositories, pages 2--6, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  40. T. Zimmermann, P. Weiβgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In Proceedings of the 26th International Conference on Software Engineering, pages 563--572, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DynaMine: finding common error patterns by mining software revision histories

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
            September 2005
            402 pages
            ISBN:1595930140
            DOI:10.1145/1081706
            • cover image ACM SIGSOFT Software Engineering Notes
              ACM SIGSOFT Software Engineering Notes  Volume 30, Issue 5
              September 2005
              462 pages
              ISSN:0163-5948
              DOI:10.1145/1095430
              Issue’s Table of Contents

            Copyright © 2005 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 September 2005

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate112of543submissions,21%

            Upcoming Conference

            FSE '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader