Abstract
The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an abstraction. Our approach uses a pessimistic strategy, which can optionally use guidance from a probabilistic model. Our approach applies to parametric static analyses implemented in Datalog, and is based on counterexample-guided abstraction refinement. For each untried abstraction, our probabilistic model provides a probability of success, while the size of the abstraction provides an estimate of its cost in terms of analysis time. Combining these two metrics, probability and cost, our refinement algorithm picks an optimal abstraction. Our probabilistic model is a variant of the Erdos--Renyi random graph model, and it is tunable by what we call hyperparameters. We present a method to learn good values for these hyperparameters, by observing past runs of the analysis on an existing codebase. We evaluate our approach on an object sensitive pointer analysis for Java programs, with two client analyses (PolySite and Downcast).
- Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarDigital Library
- Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal modelling of source code and natural language. In ICML, 2015.Google ScholarDigital Library
- Carlos Ans´otegui, Maria Luisa Bonet, and Jordi Levy. SAT-based MaxSAT algorithms. Artif. Intell., 196:77–105, 2013. Google ScholarDigital Library
- T. Ball and S. Rajamani. The SLAM project: Debugging system software via static analysis. In POPL, 2002. Google ScholarDigital Library
- Nels E. Beckman and Aditya V. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, 2011. Google ScholarDigital Library
- M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In OOPSLA, 2009. Google ScholarDigital Library
- S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. Modular verification of software components in C. In ICSE, 2003. Google ScholarDigital Library
- E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexampleguided abstraction refinement for symbolic model checking. JACM, 50(5), 2003. Google ScholarDigital Library
- E. Clarke, A. Gupta, J. Kukula, and O. Shrichman. SAT based abstraction-refinement using ILP and machine learning techniques. In CAV, 2002. Google ScholarDigital Library
- Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google ScholarDigital Library
- Paul Erd˝os and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 1960.Google Scholar
- Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Sht. Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. To appear in Theory and Practice of Logic Programming, 2013.Google Scholar
- Lise Getoor and Ben Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarDigital Library
- Roberto Giacobazzi, Francesco Logozzo, and Francesco Ranzato. Analyzing program analyses. In POPL, 2015. Google ScholarDigital Library
- Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Making abstract interpretations complete. J. ACM, 2000. Google ScholarDigital Library
- S. Grebenshchikov, A. Gupta, N. Lopes, C. Popeea, and A. Rybalchenko. HSF(C): A software verifier based on Horn clauses. In TACAS, 2012. Google ScholarDigital Library
- Sergey Grebenshchikov, Nuno P. Lopes, Corneliu Popeea, and Andrey Rybalchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. Google ScholarDigital Library
- Anubhav Gupta. Learning Abstractions for Model Checking. PhD thesis, Carnegie Mellon, 2006. Google ScholarDigital Library
- Bernd Gutmann, Ingo Thon, and Luc De Raedt. Learning the parameters of probabilistic logic programs from interpretations. In ECML PKDD, 2011. Google ScholarDigital Library
- T. Henzinger, R. Jhala, R. Majumdar, and K. McMillan. Abstractions from proofs. In POPL, 2004. Google ScholarDigital Library
- Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. On the naturalness of software. In ICSE, 2012. Google ScholarDigital Library
- Mikoláˇs Janota. MiFuMax — a literate MaxSat solver. http: //sat.inesc-id.pt/~mikolas/sw/mifumax/, 2013.Google Scholar
- Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. {Online; accessed 2015-07-06}.Google Scholar
- Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 1999. Google ScholarDigital Library
- Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrasebased statistical translation of programming languages. In Onward!, 2014. Google ScholarDigital Library
- Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, 1972.Google ScholarCross Ref
- Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. A factor graph model for software bug finding. In IJCAI, 2007. Google ScholarDigital Library
- Joohyung Lee. A model-theoretic counterpart of loop formulas. In IJCAI, 2005. Google ScholarDigital Library
- Percy Liang and Mayur Naik. Scaling abstraction refinement via pruning. In PLDI, 2011. Google ScholarDigital Library
- Ben Liblit, Alexander Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarDigital Library
- Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarDigital Library
- Vladimir Lifschitz and Alexander A. Razborov. Why are there so many loop formulas? ACM Trans. Comput. Log., 7(2):261–268, 2006. Google ScholarDigital Library
- V. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. Merlin: specification inference for explicit information flow problems. In PLDI, 2009. Google ScholarDigital Library
- Alexey Loginov, Thomas W. Reps, and Shmuel Sagiv. Abstraction refinement via inductive learning. In CAV, 2005. Google ScholarDigital Library
- Chris J. Maddison and Daniel Tarlow. Structured generative models of natural source code. In ICML, 2014.Google Scholar
- Jo˜ao Marques-Silva, Federico Heras, Mikolás Janota, Alessandro Previti, and Anton Belov. On computing minimal correction subsets. In IJCAI, 2013.Google ScholarDigital Library
- Alon Mishne, Sharon Shoham, and Eran Yahav. Typestate-based semantic code search over partial programs. In OOPSLA, 2012. Google ScholarDigital Library
- Ant´onio Morgado, Federico Heras, Mark H. Liffiton, Jordi Planes, and Jo˜ao Marques-Silva. Iterative and core-guided MaxSAT solving: A survey and assessment. Constraints, 2013. Google ScholarDigital Library
- Mayur Naik. jChord. https://bitbucket.org/pag-lab/jchord.Google Scholar
- Mayur Naik, Hongseok Yang, Ghila Castelnuovo, and Mooly Sagiv. Abstractions from tests. In POPL, 2012. Google ScholarDigital Library
- Aditya V. Nori and Rahul Sharma. Termination proofs from tests. In FSE, 2013. Google ScholarDigital Library
- Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. Selective context-sensitivity guided by impact preanalysis. In PLDI, 2014. Google ScholarDigital Library
- Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from “Big Code”. In POPL, 2015. Google ScholarDigital Library
- Veselin Raychev, Martin T. Vechev, and Eran Yahav. Code completion with statistical language models. In PLDI, 2014. Google ScholarDigital Library
- Andrey Rybalchenko and Viorica Sofronie-Stokkermans. Constraint solving for interpolation. In VMCAI, 2007. Google ScholarDigital Library
- Sriram Sankaranarayanan, Franjo Ivancic, and Aarti Gupta. Mining library specifications using inductive logic programming. In ICSE, 2008. Google ScholarDigital Library
- Taisuke Sato. A statistical learning method for logic programs with distribution semantics. In ICLP, 1995.Google Scholar
- Rahul Sharma, Aditya V. Nori, and Alex Aiken. Interpolants as classifiers. In CAV, 2012. Google ScholarDigital Library
- Y. Smaragdakis and M. Bravenboer. Using Datalog for fast and easy program analysis. In Datalog 2.0 Workshop, 2010. Google ScholarDigital Library
- Yannis Smaragdakis, George Kastrinis, and George Balatsouras. Introspective analysis: context-sensitivity, across the board. In PLDI, 2014. Google ScholarDigital Library
- Tachio Terauchi. Explaining the effectiveness of small refinement heuristics in program verification with CEGAR. In SAS, 2015.Google ScholarCross Ref
- Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008. Google ScholarDigital Library
- J. Whaley, D. Avots, M. Carbin, and M. Lam. Using Datalog with binary decision diagrams for program analysis. In APLAS, 2005. Google ScholarDigital Library
- Kwangkeun Yi, Hosik Choi, Jaehwang Kim, and Yongdai Kim. An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett., 2007. Google ScholarDigital Library
- X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstraction refinement for program analyses in Datalog. In PLDI, 2014. Google ScholarDigital Library
- Xin Zhang, Mayur Naik, and Hongseok Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, 2013. Google ScholarDigital Library
- Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. Statistical debugging: simultaneous identification of multiple bugs. In ICML, 2006. Google ScholarDigital Library
Index Terms
- Abstraction refinement guided by a learnt probabilistic model
Recommendations
Abstraction refinement guided by a learnt probabilistic model
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesThe core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an ...
Probabilistic points-to analysis for java
CC'11/ETAPS'11: Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of softwareProbabilistic points-to analysis is an analysis technique for defining the probabilities on the points-to relations in programs. It provides the compiler with some optimization chances such as speculative dead store elimination, speculative redundancy ...
Counterexample Guided Abstraction Refinement for Polyhedral Probabilistic Hybrid Systems
Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019We consider the problem of safety analysis of probabilistic hybrid systems, which capture discrete, continuous and probabilistic behaviors. We present a novel counterexample guided abstraction refinement (CEGAR) algorithm for a subclass of probabilistic ...
Comments