ABSTRACT
A great deal of attention has lately been given to addressing software bugs such as errors in operating system drivers or security bugs. However, there are many other lesser known errors specific to individual applications or APIs and these violations of application-specific coding rules are responsible for a multitude of errors. In this paper we propose DynaMine, a tool that analyzes source code check-ins to find highly correlated method calls as well as common bug fixes in order to automatically discover application-specific coding patterns. Potential patterns discovered through mining are passed to a dynamic analysis tool for validation; finally, the results of dynamic analysis are presented to the user.The combination of revision history mining and dynamic analysis techniques leveraged in DynaMine proves effective for both discovering new application-specific patterns and for finding errors when applied to very large applications with many man-years of development and debugging effort behind them. We have analyzed Eclipse and jEdit, two widely-used, mature, highly extensible applications consisting of more than 3,600,000 lines of code combined. By mining revision histories, we have discovered 56 previously unknown, highly application-specific patterns. Out of these, 21 were dynamically confirmed as very likely valid patterns and a total of 263 pattern violations were found.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th Very Large Data Bases Conference, pages 487--499. Morgan Kaufmann, 1994. Google ScholarDigital Library
- R. Alur, P. Černý, P. Madhusudan, and W. Nam. Synthesis of interface specifications for Java classes. In Proceedings of the 32nd ACM Sysposium on Principles of Programming Languages, pages 98--109, 2005. Google ScholarDigital Library
- G. Ammons, R. Bodik, and J. Larus. Mining specifications. In Proceedings of the 29th ACM Symposium on Principles of Programming Languages, pages 4--16, 2002. Google ScholarDigital Library
- T. Ball, B. Cook, V. Levin, and S. K. Rajamani. SLAM and static driver verifier: Technology transfer of formal methods inside Microsoft. Technical Report MSR-TR-2004-08, Microsoft, 2004.Google ScholarCross Ref
- J. Bevan and J. Whitehead. Identification of software instabilities. In Proceedings of the Working Conference on Reverse Engineering, pages 134--143, Nov. 2003. Google ScholarDigital Library
- J. M. Bieman, A. A. Andrews, and H. J. Yang. Understanding change-proneness in OO software through visualization. In Proceedings of the 11th International Workshop on Program Comprehension, pages 44--53, May 2003. Google ScholarDigital Library
- B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. A static analyzer for large safety-critical software. In Proceedings of the ACM Conference on Programming Language Design and Implementation, pages 196--207, June 2003. Google ScholarDigital Library
- G. Brat and A. Venet. Precise and scalable static program analysis of NASA flight software. In Proceedings of the 2005 IEEE Aerospace Conference, 2005.Google ScholarCross Ref
- B. Burke and A. Brock. Aspect-oriented programming and JBoss. http://www.onjava.com/pub/a/onjava/2003/05/28/aop_jboss.html, 2003.Google Scholar
- D. Carlson. Eclipse Distilled. Addison-Wesley Professional, 2005.Google Scholar
- V. Dallmeier, C. Lindig, and A. Zeller. Lightweight defect localization for java. In Proceedings of the 19th European Conference on Object-Oriented Programming, July 2005. Google ScholarDigital Library
- B. Dudney, S. Asbury, J. Krozak, and K. Wittkopf. J2EE AntiPatterns. Wiley, 2003. Google ScholarDigital Library
- D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of the Fourth Symposium on Operating Systems Design and Implentation, pages 1--16, 2000. Google ScholarDigital Library
- D. R. Engler, D. Y. Chen, and A. Chou. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Symposium on Operating Systems Principles, pages 57--72, 2001. Google ScholarDigital Library
- M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27(2):99--123, 2001. Google ScholarDigital Library
- M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In Proceedings of the Working Conference on Reverse Engineering, pages 90--101, Nov. 2003. Google ScholarDigital Library
- H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based on product release history. In Proceedings of the International Conference on Software Maintenance, pages 190--198, Nov. 1998. Google ScholarDigital Library
- H. Gall, M. Jazayeri, and J. Krajewski. CVS release history data for detecting logical couplings. In Proceedings International Workshop on Principles of Software Evolution, pages 13--23, Sept. 2003. Google ScholarDigital Library
- S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In Proceedings of the Conference on Programming Language Design and Implementation, pages 69--82, 2002. Google ScholarDigital Library
- Y.-W. Huang, F. Yu, C. Hang, C.-H. Tsai, D.-T. Lee, and S.-Y. Kuo. Securing web application code by static analysis and runtime protection. In Proceedings of the 13th conference on World Wide Web, pages 40--52, May 2004. Google ScholarDigital Library
- P. Lam and M. Rinard. A type system and analysis for the automatic extraction and enforcement of design information. In Proceedings of the 17th European Conference on Object-Oriented Programming, pages 275--302, July 2003.Google ScholarCross Ref
- H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 181--192, July 1994.Google Scholar
- A. Michail. Data mining library reuse patterns in user-selected applications. In Proceedings of the 14th International Conference on Automated Software Engineering, pages 24--33, Oct. 1999. Google ScholarDigital Library
- A. Michail. Data mining library reuse patterns using generalized association rules. In Proceedings of the International Conference on Software Engineering, pages 167--176, June 2000. Google ScholarDigital Library
- S. Pestov. jEdit user guide. http://www.jedit.org/.Google Scholar
- R. Purushothaman and D. E. Perry. Towards understanding the rhetoric of small changes. In Proceedings of the International Workshop on Mining Software Repositories, pages 90--94, May 2004.Google ScholarCross Ref
- D. Reimer, E. Schonberg, K. Srinivas, H. Srinivasan, B. Alpern, R. D. Johnson, A. Kershenbaum, and L. Koved. SABER: Smart Analysis Based Error Reduction. In Proceedings of the International Symposium on Software Testing and Analysis, pages 243--251, July 2004. Google ScholarDigital Library
- F. V. Rysselberghe and S. Demeyer. Mining version control systems for FACs (frequently applied changes). In Proceedings of the International Workshop on Mining Software Repositories, pages 48--52, May 2004.Google ScholarCross Ref
- S. R. Schach. Object-Oriented and Classical Software Engineering. McGraw-Hill Science/Engineering/Math, 2004. Google ScholarDigital Library
- U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting format string vulnerabilities with type qualifiers. In Proceedings of the 2001 Usenix Security Conference, pages 201--220, 2001. Google ScholarDigital Library
- B. Tate, M. Clark, B. Lee, and P. Linskey. Bitter EJB. Manning Publications, 2003. Google ScholarDigital Library
- D. Wagner, J. Foster, E. Brewer, and A. Aiken. A first step towards automated detection of buffer overrun vulnerabilities. In Proceedings of Network and Distributed Systems Security Symposium, pages 3--17, Feb. 2000.Google Scholar
- W. Weimer and G. Necula. Mining temporal specifications for error detection. In Proceedings of the 11th International Conference on Tools and Algorithms For The Construction And Analysis Of Systems, pages 461--476, Apr. 2005. Google ScholarDigital Library
- J. Whaley, M. Martin, and M. Lam. Automatic extraction of object-oriented component interfaces. In Proceedings of the International Symposium of Software Testing and Analysis, pages 218--228, July 2002. Google ScholarDigital Library
- C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering, 31(6), June 2005. Google ScholarDigital Library
- C. C. Williams and J. K. Hollingsworth. Recovering system specific rules from software repositories. In Proceedings of the International Workshop on Mining Software Repositories, pages 7--11, May 2005. Google ScholarDigital Library
- A. T. Ying, G. C. Murphy, R. Ng, and M. C. Chu-Carroll. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9):574--586, Sept. 2004. Google ScholarDigital Library
- T. Zimmermann, S. Diehl, and A. Zeller. How history justifies system architecture (or not). In Proceedings International Workshop on Principles of Software Evolution, pages 73--83, Sept. 2003. Google ScholarDigital Library
- T. Zimmermann and P. Weiβgerber. Preprocessing CVS data for fine-grained analysis. In Proceedings of the International Workshop on Mining Software Repositories, pages 2--6, May 2004.Google ScholarCross Ref
- T. Zimmermann, P. Weiβgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In Proceedings of the 26th International Conference on Software Engineering, pages 563--572, May 2004. Google ScholarDigital Library
Index Terms
- DynaMine: finding common error patterns by mining software revision histories
Recommendations
DynaMine: finding common error patterns by mining software revision histories
A great deal of attention has lately been given to addressing software bugs such as errors in operating system drivers or security bugs. However, there are many other lesser known errors specific to individual applications or APIs and these violations ...
Toward an understanding of bug fix patterns
Twenty-seven automatically extractable bug fix patterns are defined using the syntax components and context of the source code involved in bug fix changes. Bug fix patterns are extracted from the configuration management repositories of seven open ...
Analytical Study on Bug Triaging Practices
Software bugs are inevitable and fixing these bugs is a difficult and time consuming task. Bug report assignment is the activity of designating a developer who makes source code changes in order to fix the bug. Many bug assignment techniques have been ...
Comments