skip to main content
article

Abstraction refinement guided by a learnt probabilistic model

Published:11 January 2016Publication History
Skip Abstract Section

Abstract

The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an abstraction. Our approach uses a pessimistic strategy, which can optionally use guidance from a probabilistic model. Our approach applies to parametric static analyses implemented in Datalog, and is based on counterexample-guided abstraction refinement. For each untried abstraction, our probabilistic model provides a probability of success, while the size of the abstraction provides an estimate of its cost in terms of analysis time. Combining these two metrics, probability and cost, our refinement algorithm picks an optimal abstraction. Our probabilistic model is a variant of the Erdos--Renyi random graph model, and it is tunable by what we call hyperparameters. We present a method to learn good values for these hyperparameters, by observing past runs of the analysis on an existing codebase. We evaluate our approach on an object sensitive pointer analysis for Java programs, with two client analyses (PolySite and Downcast).

References

  1. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal modelling of source code and natural language. In ICML, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Carlos Ans´otegui, Maria Luisa Bonet, and Jordi Levy. SAT-based MaxSAT algorithms. Artif. Intell., 196:77–105, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Ball and S. Rajamani. The SLAM project: Debugging system software via static analysis. In POPL, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nels E. Beckman and Aditya V. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. Modular verification of software components in C. In ICSE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexampleguided abstraction refinement for symbolic model checking. JACM, 50(5), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Clarke, A. Gupta, J. Kukula, and O. Shrichman. SAT based abstraction-refinement using ILP and machine learning techniques. In CAV, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Paul Erd˝os and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 1960.Google ScholarGoogle Scholar
  12. Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Sht. Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. To appear in Theory and Practice of Logic Programming, 2013.Google ScholarGoogle Scholar
  13. Lise Getoor and Ben Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Roberto Giacobazzi, Francesco Logozzo, and Francesco Ranzato. Analyzing program analyses. In POPL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Making abstract interpretations complete. J. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Grebenshchikov, A. Gupta, N. Lopes, C. Popeea, and A. Rybalchenko. HSF(C): A software verifier based on Horn clauses. In TACAS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sergey Grebenshchikov, Nuno P. Lopes, Corneliu Popeea, and Andrey Rybalchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Anubhav Gupta. Learning Abstractions for Model Checking. PhD thesis, Carnegie Mellon, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bernd Gutmann, Ingo Thon, and Luc De Raedt. Learning the parameters of probabilistic logic programs from interpretations. In ECML PKDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Henzinger, R. Jhala, R. Majumdar, and K. McMillan. Abstractions from proofs. In POPL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. On the naturalness of software. In ICSE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mikoláˇs Janota. MiFuMax — a literate MaxSat solver. http: //sat.inesc-id.pt/~mikolas/sw/mifumax/, 2013.Google ScholarGoogle Scholar
  23. Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. {Online; accessed 2015-07-06}.Google ScholarGoogle Scholar
  24. Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrasebased statistical translation of programming languages. In Onward!, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. A factor graph model for software bug finding. In IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Joohyung Lee. A model-theoretic counterpart of loop formulas. In IJCAI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Percy Liang and Mayur Naik. Scaling abstraction refinement via pruning. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ben Liblit, Alexander Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vladimir Lifschitz and Alexander A. Razborov. Why are there so many loop formulas? ACM Trans. Comput. Log., 7(2):261–268, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. Merlin: specification inference for explicit information flow problems. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alexey Loginov, Thomas W. Reps, and Shmuel Sagiv. Abstraction refinement via inductive learning. In CAV, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chris J. Maddison and Daniel Tarlow. Structured generative models of natural source code. In ICML, 2014.Google ScholarGoogle Scholar
  36. Jo˜ao Marques-Silva, Federico Heras, Mikolás Janota, Alessandro Previti, and Anton Belov. On computing minimal correction subsets. In IJCAI, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alon Mishne, Sharon Shoham, and Eran Yahav. Typestate-based semantic code search over partial programs. In OOPSLA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ant´onio Morgado, Federico Heras, Mark H. Liffiton, Jordi Planes, and Jo˜ao Marques-Silva. Iterative and core-guided MaxSAT solving: A survey and assessment. Constraints, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mayur Naik. jChord. https://bitbucket.org/pag-lab/jchord.Google ScholarGoogle Scholar
  40. Mayur Naik, Hongseok Yang, Ghila Castelnuovo, and Mooly Sagiv. Abstractions from tests. In POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Aditya V. Nori and Rahul Sharma. Termination proofs from tests. In FSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. Selective context-sensitivity guided by impact preanalysis. In PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from “Big Code”. In POPL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Veselin Raychev, Martin T. Vechev, and Eran Yahav. Code completion with statistical language models. In PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Andrey Rybalchenko and Viorica Sofronie-Stokkermans. Constraint solving for interpolation. In VMCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sriram Sankaranarayanan, Franjo Ivancic, and Aarti Gupta. Mining library specifications using inductive logic programming. In ICSE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Taisuke Sato. A statistical learning method for logic programs with distribution semantics. In ICLP, 1995.Google ScholarGoogle Scholar
  48. Rahul Sharma, Aditya V. Nori, and Alex Aiken. Interpolants as classifiers. In CAV, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Smaragdakis and M. Bravenboer. Using Datalog for fast and easy program analysis. In Datalog 2.0 Workshop, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yannis Smaragdakis, George Kastrinis, and George Balatsouras. Introspective analysis: context-sensitivity, across the board. In PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tachio Terauchi. Explaining the effectiveness of small refinement heuristics in program verification with CEGAR. In SAS, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  52. Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Whaley, D. Avots, M. Carbin, and M. Lam. Using Datalog with binary decision diagrams for program analysis. In APLAS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kwangkeun Yi, Hosik Choi, Jaehwang Kim, and Yongdai Kim. An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstraction refinement for program analyses in Datalog. In PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Xin Zhang, Mayur Naik, and Hongseok Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. Statistical debugging: simultaneous identification of multiple bugs. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Abstraction refinement guided by a learnt probabilistic model

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 51, Issue 1
              POPL '16
              January 2016
              815 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/2914770
              • Editor:
              • Andy Gill
              Issue’s Table of Contents
              • cover image ACM Conferences
                POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
                January 2016
                815 pages
                ISBN:9781450335492
                DOI:10.1145/2837614

              Copyright © 2016 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 11 January 2016

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader