article

Abstraction refinement guided by a learnt probabilistic model

Authors:
Radu Grigore

University of Oxford, UK

University of Oxford, UK
View Profile

,
Hongseok Yang

University of Oxford, UK

University of Oxford, UK
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 51 Issue 1January 2016pp 485–498https://doi.org/10.1145/2914770.2837663

Published:11 January 2016Publication History

ACM SIGPLAN Notices

Abstract

The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an abstraction. Our approach uses a pessimistic strategy, which can optionally use guidance from a probabilistic model. Our approach applies to parametric static analyses implemented in Datalog, and is based on counterexample-guided abstraction refinement. For each untried abstraction, our probabilistic model provides a probability of success, while the size of the abstraction provides an estimate of its cost in terms of analysis time. Combining these two metrics, probability and cost, our refinement algorithm picks an optimal abstraction. Our probabilistic model is a variant of the Erdos--Renyi random graph model, and it is tunable by what we call hyperparameters. We present a method to learn good values for these hyperparameters, by observing past runs of the analysis on an existing codebase. We evaluate our approach on an object sensitive pointer analysis for Java programs, with two client analyses (PolySite and Downcast).

References

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarDigital Library
Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal modelling of source code and natural language. In ICML, 2015.Google ScholarDigital Library
Carlos Ans´otegui, Maria Luisa Bonet, and Jordi Levy. SAT-based MaxSAT algorithms. Artif. Intell., 196:77–105, 2013. Google ScholarDigital Library
T. Ball and S. Rajamani. The SLAM project: Debugging system software via static analysis. In POPL, 2002. Google ScholarDigital Library
Nels E. Beckman and Aditya V. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, 2011. Google ScholarDigital Library
M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In OOPSLA, 2009. Google ScholarDigital Library
S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. Modular verification of software components in C. In ICSE, 2003. Google ScholarDigital Library
E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexampleguided abstraction refinement for symbolic model checking. JACM, 50(5), 2003. Google ScholarDigital Library
E. Clarke, A. Gupta, J. Kukula, and O. Shrichman. SAT based abstraction-refinement using ILP and machine learning techniques. In CAV, 2002. Google ScholarDigital Library
Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google ScholarDigital Library
Paul Erd˝os and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 1960.Google Scholar
Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Sht. Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. To appear in Theory and Practice of Logic Programming, 2013.Google Scholar
Lise Getoor and Ben Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarDigital Library
Roberto Giacobazzi, Francesco Logozzo, and Francesco Ranzato. Analyzing program analyses. In POPL, 2015. Google ScholarDigital Library
Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Making abstract interpretations complete. J. ACM, 2000. Google ScholarDigital Library
S. Grebenshchikov, A. Gupta, N. Lopes, C. Popeea, and A. Rybalchenko. HSF(C): A software verifier based on Horn clauses. In TACAS, 2012. Google ScholarDigital Library
Sergey Grebenshchikov, Nuno P. Lopes, Corneliu Popeea, and Andrey Rybalchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. Google ScholarDigital Library
Anubhav Gupta. Learning Abstractions for Model Checking. PhD thesis, Carnegie Mellon, 2006. Google ScholarDigital Library
Bernd Gutmann, Ingo Thon, and Luc De Raedt. Learning the parameters of probabilistic logic programs from interpretations. In ECML PKDD, 2011. Google ScholarDigital Library
T. Henzinger, R. Jhala, R. Majumdar, and K. McMillan. Abstractions from proofs. In POPL, 2004. Google ScholarDigital Library
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. On the naturalness of software. In ICSE, 2012. Google ScholarDigital Library
Mikoláˇs Janota. MiFuMax — a literate MaxSat solver. http: //sat.inesc-id.pt/~mikolas/sw/mifumax/, 2013.Google Scholar
Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. {Online; accessed 2015-07-06}.Google Scholar
Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 1999. Google ScholarDigital Library
Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrasebased statistical translation of programming languages. In Onward!, 2014. Google ScholarDigital Library
Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, 1972.Google ScholarCross Ref
Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. A factor graph model for software bug finding. In IJCAI, 2007. Google ScholarDigital Library
Joohyung Lee. A model-theoretic counterpart of loop formulas. In IJCAI, 2005. Google ScholarDigital Library
Percy Liang and Mayur Naik. Scaling abstraction refinement via pruning. In PLDI, 2011. Google ScholarDigital Library
Ben Liblit, Alexander Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarDigital Library
Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarDigital Library
Vladimir Lifschitz and Alexander A. Razborov. Why are there so many loop formulas? ACM Trans. Comput. Log., 7(2):261–268, 2006. Google ScholarDigital Library
V. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. Merlin: specification inference for explicit information flow problems. In PLDI, 2009. Google ScholarDigital Library
Alexey Loginov, Thomas W. Reps, and Shmuel Sagiv. Abstraction refinement via inductive learning. In CAV, 2005. Google ScholarDigital Library
Chris J. Maddison and Daniel Tarlow. Structured generative models of natural source code. In ICML, 2014.Google Scholar
Jo˜ao Marques-Silva, Federico Heras, Mikolás Janota, Alessandro Previti, and Anton Belov. On computing minimal correction subsets. In IJCAI, 2013.Google ScholarDigital Library
Alon Mishne, Sharon Shoham, and Eran Yahav. Typestate-based semantic code search over partial programs. In OOPSLA, 2012. Google ScholarDigital Library
Ant´onio Morgado, Federico Heras, Mark H. Liffiton, Jordi Planes, and Jo˜ao Marques-Silva. Iterative and core-guided MaxSAT solving: A survey and assessment. Constraints, 2013. Google ScholarDigital Library
Mayur Naik. jChord. https://bitbucket.org/pag-lab/jchord.Google Scholar
Mayur Naik, Hongseok Yang, Ghila Castelnuovo, and Mooly Sagiv. Abstractions from tests. In POPL, 2012. Google ScholarDigital Library
Aditya V. Nori and Rahul Sharma. Termination proofs from tests. In FSE, 2013. Google ScholarDigital Library
Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. Selective context-sensitivity guided by impact preanalysis. In PLDI, 2014. Google ScholarDigital Library
Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from “Big Code”. In POPL, 2015. Google ScholarDigital Library
Veselin Raychev, Martin T. Vechev, and Eran Yahav. Code completion with statistical language models. In PLDI, 2014. Google ScholarDigital Library
Andrey Rybalchenko and Viorica Sofronie-Stokkermans. Constraint solving for interpolation. In VMCAI, 2007. Google ScholarDigital Library
Sriram Sankaranarayanan, Franjo Ivancic, and Aarti Gupta. Mining library specifications using inductive logic programming. In ICSE, 2008. Google ScholarDigital Library
Taisuke Sato. A statistical learning method for logic programs with distribution semantics. In ICLP, 1995.Google Scholar
Rahul Sharma, Aditya V. Nori, and Alex Aiken. Interpolants as classifiers. In CAV, 2012. Google ScholarDigital Library
Y. Smaragdakis and M. Bravenboer. Using Datalog for fast and easy program analysis. In Datalog 2.0 Workshop, 2010. Google ScholarDigital Library
Yannis Smaragdakis, George Kastrinis, and George Balatsouras. Introspective analysis: context-sensitivity, across the board. In PLDI, 2014. Google ScholarDigital Library
Tachio Terauchi. Explaining the effectiveness of small refinement heuristics in program verification with CEGAR. In SAS, 2015.Google ScholarCross Ref
Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008. Google ScholarDigital Library
J. Whaley, D. Avots, M. Carbin, and M. Lam. Using Datalog with binary decision diagrams for program analysis. In APLAS, 2005. Google ScholarDigital Library
Kwangkeun Yi, Hosik Choi, Jaehwang Kim, and Yongdai Kim. An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett., 2007. Google ScholarDigital Library
X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstraction refinement for program analyses in Datalog. In PLDI, 2014. Google ScholarDigital Library
Xin Zhang, Mayur Naik, and Hongseok Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, 2013. Google ScholarDigital Library
Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. Statistical debugging: simultaneous identification of multiple bugs. In ICML, 2006. Google ScholarDigital Library

Index Terms

Abstraction refinement guided by a learnt probabilistic model

Recommendations

Abstraction refinement guided by a learnt probabilistic model
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an ...
Read More
Probabilistic points-to analysis for java
CC'11/ETAPS'11: Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software

Probabilistic points-to analysis is an analysis technique for defining the probabilities on the points-to relations in programs. It provides the compiler with some optimization chances such as speculative dead store elimination, speculative redundancy ...
Read More
Counterexample Guided Abstraction Refinement for Polyhedral Probabilistic Hybrid Systems
Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019

We consider the problem of safety analysis of probabilistic hybrid systems, which capture discrete, continuous and probabilistic behaviors. We present a novel counterexample guided abstraction refinement (CEGAR) algorithm for a subclass of probabilistic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 51, Issue 1
POPL '16
January 2016
815 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2914770
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2016
815 pages
ISBN:9781450335492
DOI:10.1145/2837614
General Chair:
Rastislav Bodik
UC Berkeley, USA
,
Program Chair:
Rupak Majumdar
MPI-SWS, Germany
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 January 2016
Check for updates
Author Tags
Datalog
Horn
hypergraph
probability
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 330
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstraction refinement guided by a learnt probabilistic model

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Abstraction refinement guided by a learnt probabilistic model

Probabilistic points-to analysis for java

Counterexample Guided Abstraction Refinement for Polyhedral Probabilistic Hybrid Systems