skip to main content
10.1145/1250734.1250749acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Static specification inference using predicate mining

Published:10 June 2007Publication History

ABSTRACT

The reliability and correctness of complex software systems can be significantly enhanced through well-defined specifications that dictate the use of various units of abstraction (e.g., modules, or procedures). Often times, however, specifications are either missing, imprecise, or simply too complex to encode within a signature, necessitating specification inference. The process of inferring specifications from complex software systems forms the focus of this paper. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both data flow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must bepreceded by a call to q). We derive these preconditions using a ninter-procedural path-sensitive dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null.) We describe an implementation of these techniques, and validate the effectiveness of the approach on a number of large open-source benchmarks. Experimental results confirm that our mining algorithms are efficient, and that the specifications derived are both precise and useful-the implementation discovers several critical, yet previously, undocumented preconditions for well-tested libraries.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Mining sequential patterns. In Eleventh International Conference on Data Engineering, pages 3--14, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Alur, P. Cerny, P. Madhusudan, and W. Nam. Synthesis of interface specidications for java classes. In POPL '05: Proceedings of the 32nd ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages, pages 98--109, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Ammons, R. Bodik, and J. Larus. Mining specifications. In POPL'02: Proceedings of the 29th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages, pages 4--16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Ammons, D. Mandelin, R. Bodik, and J. Larus. Debugging temporal specifications with concept analysis. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 182--195, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Anderson, T. Reps, and T. Teitelbaum. Design and implementation of a fine-grained software inspection tool. IEEE Trans. on Software Engineering, 29(8):721--733, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Ball and S.K. Rajamani. Automatically validating temporal safety properties of interfaces. In SPIN 2001, Workshop on Model Checking of Software, LNCS 2057, pages 103--122, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Blanchet. A computationally sound mechanized prover for security protocols. In IEEE Symposium on Security and Privacy, pages 140--154, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A performance study of mining maximal frequent itemsets. In Workshop on Frequent Itemset Mining Implementations (FIMI'03), 2003.Google ScholarGoogle Scholar
  10. M. Castro, M. Costa, and T. Harris. Securing software by enforcing data-flow integrity. In OSDI '06: Proceedings of the 7th Usenix Symposium on Operating Systems Design and Implementation, pages 147--160, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Chin, S. Markstrum, and T. Millstein. Semantic type qualifiers. In PLDI '05: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--95, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In KDD '00: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 9--17, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Engler, D. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP'01: Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 57--72, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Ernst, J. Cockrell, W. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):1--25, February 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Fischer, R. Jhala, and R. Majumdar. Joining dataflow with predicates. In ESEC--FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 227--236, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Foster, T. Terauchi, and A. Aiken. Flow-sensitive type qualifiers. In PLDI '02: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1--12, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Furr and J. Foster. Checking type safety of foreign function calls. In PLDI '05: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 62--72, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Godefroid, N. Klarslund, and K. Sen. Dart: Directed automated random testing. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 213--223, Chicago, Il, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. Sharma. Discovering all most specific sentences. ACM Transactions on Database Systems, 28:140--174, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Henzinger, R. Jhala, and R. Majumdar. Permissive interfaces. SIGSOFT Softw. Eng. Notes, 30(5):31--40, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. J. Holzmann. The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Jhala and R. Majumdar. Path slicing. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 38--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In OSDI'06: Proceedings of the 7th Usenix Symposium on Operating Systems Design and Implementation, pages 161--176, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Lam, V. Kuncak, and M. Rinard. Generalized typestate checking for data structure consistency. In VMCAI '05: Proceedings of 6th International Conference on Verification, Model Checking and Abstract Interpretation, pages 430--447, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, pages 289--302, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Li and Y. Zhou. Pr-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In ESEC-FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 306--315, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan. Scalable statistical bug isolation. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 15--26, Chicago, Illinois, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Livshits and T. Zimmermann. Dynamine: Finding common error patterns by mining software revision histories. In ESEC-FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 296--305, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Mandelin, L. Xu, R. Bodik, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 48--61, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Palsberg. Closure analysis in constraint form. ACM Trans. Program. Lang. Syst., 17(1):47--62, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. K. Ramanathan, A. Grama, and S. Jagannathan. Path-sensitive inference of function precedence protocols. In ICSE '07: Proceedings of the 29th International Conference on Software Engineering, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and W.S. Beebee. Enhancing server availability and security through failureoblivious computing. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. O. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. Weimer and G. Necula. Mining temporal specifications for error detection. In TACAS '05: Eleventh International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 461--476, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Whaley, M. Martin, and M. Lam. Automatic extraction of objectoriented component interfaces. In Proceedings of the International Symposium of Software Testing and Analysis, ISSTA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Xie and A. Aiken. Scalable error detection using boolean satisfiability. In POPL '05: Proceedings of the 32nd ACM Symposium on Principles of Programming Languages, pages 351--363, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD '04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 344--353, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal api rules from imperfect traces. In ICSE '06: Proceedings of the International Conference on Software Engineering, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, pages 273--288, San Francisco, CA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Static specification inference using predicate mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2007
          508 pages
          ISBN:9781595936332
          DOI:10.1145/1250734
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 42, Issue 6
            Proceedings of the 2007 PLDI conference
            June 2007
            491 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1273442
            Issue’s Table of Contents

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 June 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader