research-article

Context-sensitive data-dependence analysis via linear conjunctive language reachability

Authors:
Qirun Zhang

University of California at Davis, USA

University of California at Davis, USA
View Profile

,
Zhendong Su

University of California at Davis, USA

University of California at Davis, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 52 Issue 1January 2017pp 344–358https://doi.org/10.1145/3093333.3009848

Published:01 January 2017Publication History

ACM SIGPLAN Notices

Abstract

Many program analysis problems can be formulated as graph reachability problems. In the literature, context-free language (CFL) reachability has been the most popular formulation and can be computed in subcubic time. The context-sensitive data-dependence analysis is a fundamental abstraction that can express a broad range of program analysis problems. It essentially describes an interleaved matched-parenthesis language reachability problem. The language is not context-free, and the problem is well-known to be undecidable. In practice, many program analyses adopt CFL-reachability to exactly model the matched parentheses for either context-sensitivity or structure-transmitted data-dependence, but not both. Thus, the CFL-reachability formulation for context-sensitive data-dependence analysis is inherently an approximation.

To support more precise and scalable analyses, this paper introduces linear conjunctive language (LCL) reachability, a new, expressive class of graph reachability. LCL not only contains the interleaved matched-parenthesis language, but is also closed under all set-theoretic operations. Given a graph with n nodes and m edges, we propose an O(mn) time approximation algorithm for solving all-pairs LCL-reachability, which is asymptotically better than known CFL-reachability algorithms. Our formulation and algorithm offer a new perspective on attacking the aforementioned undecidable problem - the LCL-reachability formulation is exact, while the LCL-reachability algorithm yields a sound approximation. We have applied the LCL-reachability framework to two existing client analyses. The experimental results show that the LCL-reachability framework is both more precise and scalable than the traditional CFL-reachability framework. This paper opens up the opportunity to exploit LCL-reachability in program analysis.

References

DaCapo benchmark suite. http://dacapobench.org/.Google Scholar
R. Alur and P. Madhusudan. Visibly pushdown languages. In STOC, pages 202–211, 2004. Google ScholarDigital Library
O. Bastani, S. Anand, and A. Aiken. Specification inference using context-free language reachability. In POPL, pages 553–566, 2015. Google ScholarDigital Library
S. Chaudhuri. Subcubic algorithms for recursive state machines. In POPL, pages 159–169, 2008. Google ScholarDigital Library
K. Culik II, J. Gruska, and A. Salomaa. Systolic trellis automata I. International Journal of Computer Mathematics, 15:195–212, 1984.Google ScholarCross Ref
K. Culik II, J. Gruska, and A. Salomaa. Systolic trellis automatat II. International Journal of Computer Mathematics, 16:3–22, 1984.Google ScholarCross Ref
M. A. Harrison. Introduction to Formal Language Theory. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1978. Google ScholarDigital Library
N. Hollingum and B. Scholz. Towards a scalable framework for contextfree language reachability. In CC, pages 193–211, 2015.Google Scholar
J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. Google ScholarDigital Library
W. Huang, Y. Dong, A. Milanova, and J. Dolby. Scalable and precise taint analysis for Android. In ISSTA, pages 106–117, 2015. Google ScholarDigital Library
O. H. Ibarra and S. M. Kim. Characterizations and computational complexity of systolic trellis automata. Theor. Comput. Sci., 29:123– 153, 1984.Google ScholarCross Ref
O. H. Ibarra, M. A. Palis, and S. M. Kim. Designing systolic algorithms using sequential machines. In FOCS, pages 46–55, 1984. Google ScholarDigital Library
O. H. Ibarra, S. M. Kim, and S. Moran. Sequential machine characterizations of trellis and cellular automata and applications. SIAM J. Comput., 14(2):426–447, 1985.Google ScholarDigital Library
A. K. Joshi, L. S. Levy, and M. Takahashi. Tree adjunct grammars. Journal of Computer and System Sciences, 10(1):136–163, 1975. Google ScholarDigital Library
V. Kahlon. Boundedness vs. unboundedness of lock chains: Characterizing decidability of pairwise CFL-reachability for threads communicating via locks. In LICS, pages 27–36, 2009. Google ScholarDigital Library
J. Kodumal and A. Aiken. The set constraint/CFL reachability connection in practice. In PLDI, pages 207–218, 2004. Google ScholarDigital Library
J. Kodumal and A. Aiken. Regularly annotated set constraints. In PLDI, pages 331–341, 2007. Google ScholarDigital Library
S. La Torre and M. Napoli. Reachability of multistack pushdown systems with scope-bounded matching relations. In CONCUR, pages 203–218, 2011. Google ScholarDigital Library
S. La Torre, P. Madhusudan, and G. Parlato. A robust class of contextsensitive languages. In LICS, pages 161–170, 2007. Google ScholarDigital Library
S. La Torre, M. Napoli, and G. Parlato. Scope-bounded pushdown languages. Int. J. Found. Comput. Sci., 27(2):215–234, 2016.Google ScholarCross Ref
M. Might, Y. Smaragdakis, and D. V. Horn. Resolving and exploiting the k-CFA paradox: illuminating functional vs. object-oriented program analysis. In PLDI, pages 305–315, 2010. Google ScholarDigital Library
A. Milanova, W. Huang, and Y. Dong. CFL-reachability and contextsensitive integrity types. In PPPJ, pages 99–109, 2014. Google ScholarCross Ref
M. Nederhof. Practical experiments with regular approximation of context-free languages. Computational Linguistics, 26(1):17–44, 2000. Google ScholarDigital Library
A. Okhotin. On the closure properties of linear conjunctive languages. Theor. Comput. Sci., 1-3(299):663–685, 2003. Google ScholarDigital Library
A. Okhotin. On the equivalence of linear conjunctive grammars and trellis automata. Informatique Théorique et Applications, 38(1):69–88, 2004.Google ScholarCross Ref
P. Pratikakis, J. S. Foster, and M. Hicks. Existential label flow inference via CFL reachability. In SAS, pages 88–106, 2006. Google ScholarDigital Library
G. Ramalingam. Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst., 22(2):416–430, 2000. Google ScholarDigital Library
J. Rehof and M. Fähndrich. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. In POPL, pages 54–66, 2001. Google ScholarDigital Library
T. W. Reps. Shape analysis as a generalized path problem. In PEPM, pages 1–11, 1995. Google ScholarDigital Library
T. W. Reps. Program analysis via graph reachability. Information & Software Technology, 40(11-12):701–726, 1998.Google ScholarCross Ref
T. W. Reps. Undecidability of context-sensitive data-dependence analysis. ACM Trans. Program. Lang. Syst., 22(1):162–186, 2000. Google ScholarDigital Library
T. W. Reps, S. Horwitz, S. Sagiv, and G. Rosay. Speeding up slicing. In SIGSOFT FSE, pages 11–20, 1994. Google ScholarDigital Library
T. W. Reps, S. Horwitz, and S. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In POPL, pages 49–61, 1995. Google ScholarDigital Library
M. Sharir and A. Pnueli. Two approaches to interprocedural data flow analysis. In S. S. Muchnick and N. D. Jones, editors, Program Flow Analysis: Theory and Applications, pages 189–234. Prentice-Hall, 1981.Google Scholar
M. Sridharan and R. Bod´ık. Refinement-based context-sensitive pointsto analysis for java. In PLDI, pages 387–400, 2006. Google ScholarDigital Library
H. Tang, X. Wang, L. Zhang, B. Xie, L. Zhang, and H. Mei. Summarybased context-sensitive data-dependence analysis in presence of callbacks. In POPL, pages 83–95, 2015. Google ScholarDigital Library
L. G. Valiant. Regularity and related problems for deterministic pushdown automata. J. ACM, 22(1):1–10, 1975. Google ScholarDigital Library
X. Xiao, Q. Zhang, J. Zhou, and C. Zhang. Persistent pointer information. In PLDI, pages 463–474, 2014. Google ScholarDigital Library
G. Xu, A. Rountev, and M. Sridharan. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In ECOOP, pages 98–122, 2009. Google ScholarDigital Library
D. Yan, G. H. Xu, and A. Rountev. Demand-driven context-sensitive alias analysis for Java. In ISSTA, pages 155–165, 2011. Google ScholarDigital Library
Q. Zhang, M. R. Lyu, H. Yuan, and Z. Su. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In PLDI, pages 435–446, 2013. Google ScholarDigital Library
Q. Zhang, X. Xiao, C. Zhang, H. Yuan, and Z. Su. Efficient subcubic alias analysis for C. In OOPSLA, pages 829–845, 2014. Google ScholarDigital Library
X. Zheng and R. Rugina. Demand-driven alias analysis for C. In POPL, pages 197–208, 2008. Google ScholarDigital Library

Index Terms

Context-sensitive data-dependence analysis via linear conjunctive language reachability
1. Software and its engineering
  1. Software notations and tools
    1. Formal language definitions
2. Theory of computation

Recommendations

Context-sensitive data-dependence analysis via linear conjunctive language reachability
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages

Many program analysis problems can be formulated as graph reachability problems. In the literature, context-free language (CFL) reachability has been the most popular formulation and can be computed in subcubic time. The context-sensitive data-...
Read More
Interprocedural pointer alias analysis

We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework ...
Read More
Undecidability of context-sensitive data-dependence analysis

A number of program-analysis problems can be tackled by transforming them into certain kinds of graph-reachability problems in labeled directed graphs. The edge labels can be used to filter out paths that are not interest: a path P from vertex s to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 52, Issue 1
POPL '17
January 2017
901 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3093333
Editor:
Matthew Fluet
Issue’s Table of Contents
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages
January 2017
901 pages
ISBN:9781450346603
DOI:10.1145/3009837
General Chair:
Giuseppe Castagna
Paris Diderot University, France / CNRS, France
,
Program Chair:
Andrew D. Gordon
Microsoft Research, UK / University of Edinburgh, UK
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2017
Check for updates
Author Tags
Context-free language reachability
linear conjunctive grammar
program analysis
trellis automata
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 527
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Context-sensitive data-dependence analysis via linear conjunctive language reachability

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Context-sensitive data-dependence analysis via linear conjunctive language reachability

Interprocedural pointer alias analysis

Undecidability of context-sensitive data-dependence analysis