skip to main content
research-article
Open Access

Global Sparse Analysis Framework

Published:25 September 2014Publication History
Skip Abstract Section

Abstract

In this article, we present a general method for achieving global static analyzers that are precise and sound, yet also scalable. Our method, on top of the abstract interpretation framework, is a general sparse analysis technique that supports relational as well as nonrelational semantics properties for various programming languages. Analysis designers first use the abstract interpretation framework to have a global and correct static analyzer whose scalability is unattended. Upon this underlying sound static analyzer, analysis designers add our generalized sparse analysis techniques to improve its scalability while preserving the precision of the underlying analysis. Our method prescribes what to prove to guarantee that the resulting sparse version should preserve the precision of the underlying analyzer.

We formally present our framework and show that existing sparse analyses are all restricted instances of our framework. In addition, we show more semantically elaborate design examples of sparse nonrelational and relational static analyses. We then present their implementation results that scale to globally analyze up to one million lines of C programs. We also show a set of implementation techniques that turn out to be critical to economically support the sparse analysis process.

Skip Supplemental Material Section

Supplemental Material

References

  1. Xavier Allamigeon, Wenceslas Godard, and Charles Hymans. 2006. Static analysis of string manipulations in critical embedded C programs. In Proceedings of the International Symposium on Static Analysis. 35--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gogul Balakrishnan and Thomas Reps. 2004. Analyzing memory accesses in ×86 binary executables. In Proceedings of the International Conference on Compiler Construction. 5--23.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jerome Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2003. A static analyzer for large safety-critical software. In Proceedings of the ACM SIGPLAN-SIGACT Conference on Programming Language Design and Implementation. 196--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Francois Bourdoncle. 1993. Efficient chaotic iteration strategies with widenings. In Proceedings of the International Conference on Formal Methods in Programming and Their Applications. 128--141.Google ScholarGoogle ScholarCross RefCross Ref
  5. David R. Chase, Mark Wegman, and F. Kenneth Zadeck. 1990. Analysis of pointers and structures. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 296--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jong-Deok Choi, Ron Cytron, and Jeanne Ferrante. 1991. Automatic construction of sparse data flow evaluation graphs. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 55--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jong-Deok Choi, Vivek Sarkar, and Edith Schonberg. 1996. Incremental computation of static single assignment form. In Proceedings of the 6th International Conference on Compiler Construction (CC'96). Springer-Verlag, London, UK, 223--237. http://dl.acm.org/citation.cfm?id=647473.727451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 238--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Patrick Cousot and Radhia Cousot. 1979. Systematic design of program analysis frameworks. In Conference Record of the 6th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, NY, 269--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Patrick Cousot and Radhia Cousot. 1992. Abstract interpretation frameworks. Journal of Logic and Computation 2, 4, 511--547.Google ScholarGoogle ScholarCross RefCross Ref
  11. Patrick Cousot, Radhia Cousot, Jerome Feret, Laurent Mauborgne, Antoine Miné, and Xavier Rival. 2009. Why does Astre scale up? Formal Methods in System Design 35, 3, 229--264. DOI:http://dx.doi.org/10.1007/s10703-009-0089-6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Patrick Cousot and Nicolas Halbwachs. 1978. Automatic discovery of linear restraints among variables of a program. In Conference Record of the 5th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, NY, 84--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4, 451--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ron K. Cytron and Jeanne Ferrante. 1995. Efficiently computing &phis;-nodes on-the-fly. ACM Transactions on Programming Languages and Systems 17, 3, 487--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dhananjay M. Dhamdhere, Barry K. Rosen, and F. Kenneth Zadeck. 1992. How to analyze large programs efficiently and informatively. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'92). ACM, New York, NY, 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, complete and scalable path-sensitive analysis. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). ACM, New York, NY, 270--280. DOI:http://dx.doi.org/10.1145/1375581.1375615 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Isil Dillig, Thomas Dillig, and Alex Aiken. 2011. Precise reasoning for programs using containers. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'11). ACM, New York, NY, 187--200. DOI:http://dx.doi.org/10.1145/1926385.1926407 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Azadeh Farzan and Zachary Kincaid. 2012. Verification of parameterized concurrent programs by modular reasoning about data and control. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'12). ACM, New York, NY, 297--308. DOI:http://dx.doi.org/10.1145/2103656.2103693 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Azadeh Farzan, Zachary Kincaid, and Andreas Podelski. 2013. Inductive data flow graphs. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'13). ACM, New York, NY, 129--142. DOI:http://dx.doi.org/10.1145/2429069.2429086 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: Fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). ACM, New York, NY, 290--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ben Hardekopf and Calvin Lin. 2009. Semi-sparse flow-sensitive pointer analysis. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 226--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael Hind and Anthony Pioli. 1998. Assessing the effects of flow-sensitivity on pointer alias analyses. In Proceedings of the International Symposium on Static Analysis. 57--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bertrand Jeannet and Antoine Miné. 2009. APRON: A library of numerical abstract domains for static analysis. In Proceedings of the 21st International Conference on Computer Aided Verification (CAV'09). 661--667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yongin Jhee, Minsik Jin, Yungbum Jung, Deokhwan Kim, Soonho Kong, Heejong Lee, Hakjoo Oh, Daejun Park, and Kwangkeun Yi. 2008. Abstract interpretation + impure catalysts: Our sparrow experience. Presentation at the Workshop of the 30 Years of Abstract Interpretation. Available at ropas. snu.ac.kr/~kwang/paper/30yai-08.pdf.Google ScholarGoogle Scholar
  26. Richard Johnson and Keshav Pingali. 1993. Dependence-based program analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 78--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. 2005. Taming false alarms from a domain-unaware C analyzer by a Bayesian statistical post analysis. In Proceedings of the International Symposium on Static Analysis. 203--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. John B. Kam and Jeffrey D. Ullman. 1977. Monotone data flow analysis frameworks. Acta Informatica 7, 3, 305--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Woosuk Lee, Wonchan Lee, and Kwangkeun Yi. 2012. Sound non-statistical clustering of static analysis alarms. In Proceedings of the 13th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI'12). Lecture Notes in Computer Science, Vol. 7148. Springer, 299--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lian Li, Cristina Cifuentes, and Nathan Keynes. 2011. Boosting the performance of flow-sensitive points-to analysis using value flow. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE'11). ACM, New York, NY, 343-- 353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jorn Lind-Nielson. n.d. BuDDy, a Binary Decision Diagram Package. Retrieved July 27, 2014, from http://vlsicad.eecs.umich.edu/BK/Slots/cache/www.itu.dk/research/buddy/.Google ScholarGoogle Scholar
  33. MathWorks. n.d. Polyspace Embedded Software Verification. Retrieved July 27, 2014, from http://www.mathworks.com/products/polyspace/index.html.Google ScholarGoogle Scholar
  34. Laurent Mauborgne and Xavier Rival. 2005. Trace partitioning in abstract interpretation based static analyzers. In Proceedings of the European Symposium on Programming. Lecture Notes in Computer Science, Vol. 3444. Springer, 5--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Matthew Might and Olin Shivers. 2006. Improving flow analyses via ΓCFA: Abstract garbage collection and counting. In Proceedings of the ACM SIGPLAN-SIGACT International Conference on Functional Programming. 13--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2004. Precise and efficient call graph construction for C programs with function pointers. Journal of Automated Software Engineering 11, 1, 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Antoine Miné. 2006a. Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'06). ACM, New York, NY, 54--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Antoine Miné. 2006b. The octagon abstract domain. Higher-Order and Symbolic Computation 19, 1, 31-- 100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hakjoo Oh. 2009. Large spurious cycle in global static analyses and its algorithmic mitigation. In Proceedings of the Asian Symposium on Programming Languages and Systems. Lecture Notes in Computer Science, Vol. 5904. Springer, 14--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hakjoo Oh, Lucas Brutschy, and Kwangkeun Yi. 2011. Access analysis-based tight localization of abstract memories. In Proceedings of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI'11). Lecture Notes in Computer Science, Vol. 6538. Springer, 356--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hakjoo Oh and Kwangkeun Yi. 2010. An algorithmic mitigation of large spurious interprocedural cycles in static analysis. Software: Practice and Experience 40, 8, 585--603. DOI:http://dx.doi.org/10.1002/spe.969 Google ScholarGoogle ScholarCross RefCross Ref
  43. Hakjoo Oh and Kwangkeun Yi. 2011. Access-based localization with bypassing. In Proceedings of the 9th Asian Symposium on Programming Languages and Systems (APLAS'11). Lecture Notes in Computer Science, Vol. 7078. Springer, 50--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ganesan Ramalingam. 2002. On sparse evaluation representations. Theoretical Computer Science 277, 1--2, 119--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. John H. Reif and Harry R. Lewis. 1977. Symbolic evaluation and the global value graph. In Proceedings of the 4th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 104--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Noam Rinetzky, Jörg Bauer, Thomas Reps, Mooly Sagiv, and Reinhard Wilhelm. 2005. A semantics for procedure local heaps and its abstractions. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 296--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Teck Bok Tok, Samuel Z. Guyer, and Calvin Lin. 2006. Efficient flow-sensitive interprocedural data-flow analysis in the presence of pointers. In Proceedings of the International Conference on Compiler Construction. 17--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Arnaud Venet and Guillaume Brat. 2004. Precise and efficient static array bound checking for large embedded C programs. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI'04). ACM, New York, NY, 231--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mark N. Wegman and F. Kenneth Zadeck. 1991. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems 13, 2, 181--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Hongseok Yang, Oukseh Lee, Josh Berdine, Cristiano Calcagno, Byron Cook, Dino Distefano, and Peter O'Hearn. 2008. Scalable shape analysis for systems code. In Proceedings of the International Conference on Computer Aided Verification. 385--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Hongtao Yu, Jingling Xue, Wei Huo, Xiaobing Feng, and Zhaoqing Zhang. 2010. Level by level: Making flow- and context-sensitive pointer analysis scalable for millions of lines of code. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'10). ACM, New York, NY, 218--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Misha Zitser, D. E. Shaw Group, and Tim Leek. 2004. Testing static analysis tools using exploitable buffer overflows from open source code. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT'04/FSE-12). ACM, New York NY, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Global Sparse Analysis Framework

    Recommendations

    Reviews

    Jacques Carette

    High-profile bugs have increased the urgency of designing good program analysis frameworks that can detect such bugs before the programs are released into the wild. Luckily, we possess a great framework for the systematic design of analyses: abstract interpretation (AI). Unfortunately, a straightforward implementation does not scale to realistic programs. A simple but powerful observation is at play in the framework presented here: even very large programs tend to modify a very small number of memory locations in any given block of code. So if we can take advantage of this "sparsity," our analyses ought to scale much better. Of course, to take this observation, turn it first into sound theory, and then implement it requires quite a lot of careful work. This is exactly the kind of work that this paper describes. The reader should be warned that the results are extremely technical. There are helpful examples throughout to illustrate the most important points, two sections on how to design various kinds of analyses on top of this framework, a helpful section on implementation, and a lucid section on the results of experimental evaluation of their implementation on a wide variety of programs. The authors cleverly relegated all proofs to appendices. Nevertheless, anyone not fully versed in the lore of abstract interpretation would find this paper extremely challenging. But for those who do know AI, this paper is well worth reading. I was disappointed not to find a link to the authors' software anywhere; it is past time that all papers evaluating an implementation provide the source code electronically for independent evaluation. It is also slightly disappointing that the proofs are not automated, but perhaps mainstream CS has not quite reached that stage yet. Lest the reader of this review misunderstand, let me be quite clear: this is a superb paper. Well written, well structured, and smoothly bridging theory and practice, it could serve as a model for writing solid CS papers. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 36, Issue 3
      September 2014
      124 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/2669618
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 September 2014
      • Accepted: 1 January 2014
      • Revised: 1 October 2013
      • Received: 1 March 2013
      Published in toplas Volume 36, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader