ABSTRACT
The reliability and correctness of complex software systems can be significantly enhanced through well-defined specifications that dictate the use of various units of abstraction (e.g., modules, or procedures). Often times, however, specifications are either missing, imprecise, or simply too complex to encode within a signature, necessitating specification inference. The process of inferring specifications from complex software systems forms the focus of this paper. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both data flow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must bepreceded by a call to q). We derive these preconditions using a ninter-procedural path-sensitive dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null.) We describe an implementation of these techniques, and validate the effectiveness of the approach on a number of large open-source benchmarks. Experimental results confirm that our mining algorithms are efficient, and that the specifications derived are both precise and useful-the implementation discovers several critical, yet previously, undocumented preconditions for well-tested libraries.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499, 1994. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Mining sequential patterns. In Eleventh International Conference on Data Engineering, pages 3--14, 1995. Google ScholarDigital Library
- R. Alur, P. Cerny, P. Madhusudan, and W. Nam. Synthesis of interface specidications for java classes. In POPL '05: Proceedings of the 32nd ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages, pages 98--109, 2005. Google ScholarDigital Library
- G. Ammons, R. Bodik, and J. Larus. Mining specifications. In POPL'02: Proceedings of the 29th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages, pages 4--16, 2002. Google ScholarDigital Library
- G. Ammons, D. Mandelin, R. Bodik, and J. Larus. Debugging temporal specifications with concept analysis. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 182--195, 2003. Google ScholarDigital Library
- P. Anderson, T. Reps, and T. Teitelbaum. Design and implementation of a fine-grained software inspection tool. IEEE Trans. on Software Engineering, 29(8):721--733, August 2003. Google ScholarDigital Library
- T. Ball and S.K. Rajamani. Automatically validating temporal safety properties of interfaces. In SPIN 2001, Workshop on Model Checking of Software, LNCS 2057, pages 103--122, May 2001. Google ScholarDigital Library
- B. Blanchet. A computationally sound mechanized prover for security protocols. In IEEE Symposium on Security and Privacy, pages 140--154, 2006. Google ScholarDigital Library
- D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A performance study of mining maximal frequent itemsets. In Workshop on Frequent Itemset Mining Implementations (FIMI'03), 2003.Google Scholar
- M. Castro, M. Costa, and T. Harris. Securing software by enforcing data-flow integrity. In OSDI '06: Proceedings of the 7th Usenix Symposium on Operating Systems Design and Implementation, pages 147--160, 2006. Google ScholarDigital Library
- B. Chin, S. Markstrum, and T. Millstein. Semantic type qualifiers. In PLDI '05: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--95, 2005. Google ScholarDigital Library
- C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In KDD '00: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 9--17, 2000. Google ScholarDigital Library
- D. Engler, D. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP'01: Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 57--72, 2001. Google ScholarDigital Library
- M. Ernst, J. Cockrell, W. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):1--25, February 2001. Google ScholarDigital Library
- J. Fischer, R. Jhala, and R. Majumdar. Joining dataflow with predicates. In ESEC--FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 227--236, 2005. Google ScholarDigital Library
- J. Foster, T. Terauchi, and A. Aiken. Flow-sensitive type qualifiers. In PLDI '02: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1--12, 2002. Google ScholarDigital Library
- M. Furr and J. Foster. Checking type safety of foreign function calls. In PLDI '05: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 62--72, 2005. Google ScholarDigital Library
- P. Godefroid, N. Klarslund, and K. Sen. Dart: Directed automated random testing. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 213--223, Chicago, Il, 2005. Google ScholarDigital Library
- D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. Sharma. Discovering all most specific sentences. ACM Transactions on Database Systems, 28:140--174, 2003. Google ScholarDigital Library
- T. Henzinger, R. Jhala, and R. Majumdar. Permissive interfaces. SIGSOFT Softw. Eng. Notes, 30(5):31--40, 2005. Google ScholarDigital Library
- G. J. Holzmann. The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley, 2004. Google ScholarDigital Library
- R. Jhala and R. Majumdar. Path slicing. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 38--47, 2005. Google ScholarDigital Library
- T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In OSDI'06: Proceedings of the 7th Usenix Symposium on Operating Systems Design and Implementation, pages 161--176, 2006. Google ScholarDigital Library
- P. Lam, V. Kuncak, and M. Rinard. Generalized typestate checking for data structure consistency. In VMCAI '05: Proceedings of 6th International Conference on Verification, Model Checking and Abstract Interpretation, pages 430--447, 2005. Google ScholarDigital Library
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, pages 289--302, 2004. Google ScholarDigital Library
- Z. Li and Y. Zhou. Pr-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In ESEC-FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 306--315, 2005. Google ScholarDigital Library
- B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan. Scalable statistical bug isolation. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 15--26, Chicago, Illinois, 2005. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. Dynamine: Finding common error patterns by mining software revision histories. In ESEC-FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 296--305, 2005. Google ScholarDigital Library
- D. Mandelin, L. Xu, R. Bodik, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 48--61, 2005. Google ScholarDigital Library
- J. Palsberg. Closure analysis in constraint form. ACM Trans. Program. Lang. Syst., 17(1):47--62, 1995. Google ScholarDigital Library
- M. K. Ramanathan, A. Grama, and S. Jagannathan. Path-sensitive inference of function precedence protocols. In ICSE '07: Proceedings of the 29th International Conference on Software Engineering, 2007. Google ScholarDigital Library
- M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and W.S. Beebee. Enhancing server availability and security through failureoblivious computing. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, 2004. Google ScholarDigital Library
- O. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, May 1991. Google ScholarDigital Library
- W. Weimer and G. Necula. Mining temporal specifications for error detection. In TACAS '05: Eleventh International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 461--476, April 2005. Google ScholarDigital Library
- J. Whaley, M. Martin, and M. Lam. Automatic extraction of objectoriented component interfaces. In Proceedings of the International Symposium of Software Testing and Analysis, ISSTA, 2002. Google ScholarDigital Library
- Y. Xie and A. Aiken. Scalable error detection using boolean satisfiability. In POPL '05: Proceedings of the 32nd ACM Symposium on Principles of Programming Languages, pages 351--363, 2005. Google ScholarDigital Library
- G. Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD '04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 344--353, 2004. Google ScholarDigital Library
- J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal api rules from imperfect traces. In ICSE '06: Proceedings of the International Conference on Software Engineering, 2006. Google ScholarDigital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, pages 273--288, San Francisco, CA, 2004. Google ScholarDigital Library
Index Terms
- Static specification inference using predicate mining
Recommendations
Mining API patterns as partial orders from source code: from usage scenarios to specifications
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringA software system interacts with third-party libraries through various APIs. Using these library APIs often needs tofollow certain usage patterns. Furthermore, ordering rules (specifications) exist between APIs, and these rules govern the secure and ...
Mining specifications
POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languagesProgram verification is a promising approach to improving program quality, because it can search all possible program executions for specific errors. However, the need to formally describe correct behavior or errors is a major barrier to the widespread ...
Graph-based mining of multiple object usage patterns
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringThe interplay of multiple objects in object-oriented programming often follows specific protocols, for example certain orders of method calls and/or control structure constraints among them that are parts of the intended object usages. Unfortunately, ...
Comments