ABSTRACT
Finite automata and finite transducers are used in a wide range of applications in software engineering, from regular expressions to specification languages. We extend these classic objects with symbolic alphabets represented as parametric theories. Admitting potentially infinite alphabets makes this representation strictly more general and succinct than classical finite transducers and automata over strings. Despite this, the main operations, including composition, checking that a transducer is single-valued, and equivalence checking for single-valued symbolic finite transducers are effective given a decision procedure for the background theory. We provide novel algorithms for these operations and extend composition to symbolic transducers augmented with registers. Our base algorithms are unusual in that they are nonconstructive, therefore, we also supply a separate model generation algorithm that can quickly find counterexamples in the case two symbolic finite transducers are not equivalent. The algorithms give rise to a complete decidable algebra of symbolic transducers. Unlike previous work, we do not need any syntactic restriction of the formulas on the transitions, only a decision procedure. In practice we leverage recent advances in satisfiability modulo theory (SMT) solvers. We demonstrate our techniques on four case studies, covering a wide range of applications. Our techniques can synthesize string pre-images in excess of 8,000 bytes in roughly a minute, and we find that our new encodings significantly outperform previous techniques in succinctness and speed of analysis.
Supplemental Material
- R. Alur and P. Cerný. Streaming transducers for algorithmic verification of single-pass list-processing programs. In POPL'11, pages 599--610. ACM, 2011. Google ScholarDigital Library
- D. E. Bakke, R. Parameswaran, D. M. Blough, A. A. Franz, and T. J. Palmer. Data obfuscation: Anonymity and desensitization of usable data sets. IEEE Security and Privacy, Apr. 2004. Google ScholarDigital Library
- D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In IEEE Oakland Security and Privacy, 2008. Google ScholarDigital Library
- Bek. http://research.microsoft.com/bek.Google Scholar
- M. Benedikt, C. Ley, and G. Puppis. Automata vs. logics on data words. In CSL, volume 6247 of LNCS, pages 110--124. Springer, 2010. Google ScholarDigital Library
- er and Veanes(2011)}BV11N. Bjørner and M. Veanes. Symbolic transducers. Technical Report MSR-TR-2011--3, Microsoft Research, January 2011.Google Scholar
- M. Boja'nczyk, A. Muscholl, T. Schwentick, L. Segoufin, and C. David. Two-variable logic on words with data. In phLICS, pages 7--16. IEEE, 06. Google ScholarDigital Library
- A. Brush, J. Krumm, and J. Scott. Exploring end user preferences for location obfuscation, location-based services, and the value of location. In UbiComp, September 2010. Google ScholarDigital Library
- K. Z. Chen, G. Gu, J. Nazario, X. Han, and J. Zhuge. WebPatrol: Automated collection and replay of web-based malware scenarios. In ASIACCS, March 2011. Google ScholarDigital Library
- B. Courcelle and P. Franchi-Zannettacchi. Attribute grammars and recursive program schemes. Theoretical Computer Science, 17: 163--191, 1982.Google ScholarCross Ref
- M. Cova, C. Kruegel, and G. Vigna. Detection and analysis of drive-by-download attacks and malicious JavaScript code. In WWW Conference, Raleigh, NC, April 2010. Google ScholarDigital Library
- K. Culic and J. Karhumaki. The equivalence of finite-valued transducers (on HDTOL languages) is decidable. Theoretical Computer Science, 47: 71--84, 1986. Google ScholarDigital Library
- C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. Zozzle: Low-overhead mostly static javascript malware detection. In Proceedings of the Usenix Security Symposium, Aug. 2011. Google ScholarDigital Library
- G. B. Dantzig and B. C. Eaves. Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A), 14: 288--297, 1973.Google ScholarCross Ref
- L. de Moura and N. Bjørner. Z3: An Efficient SMT Solver. In TACAS'08, LNCS, 2008. Google ScholarDigital Library
- A. J. Demers, C. Keleman, and B. Reusch. On some decidable properties of finite state translations. Acta Informatica, 17: 349--364, 1982.Google ScholarDigital Library
- P. Eckersley. How unique is your web browser? In Privacy Enhancing Technologies, pages 1--18, 2010. Google ScholarDigital Library
- J. Engelfriet. Some open questions and recent results on tree transducers and tree languages. In R. V. Book, editor, Formal Language Theory, pages 241--286. Academic Press, 1980.Google ScholarCross Ref
- J. Engelfriet and S. Maneth. A comparison of pebble tree transducers with macro tree transducers. Acta Informatica, 39: 2003, 2003.Google ScholarDigital Library
- Z. Esik. Decidability results concerning tree transducers. Acta Cybernetica, 5: 1--20, 1980.Google Scholar
- Z. Fülöp and H. Vogler. Syntax-Directed Semantics: Formal Models Based on Tree Transducers. EATCS. Springer, 1998. Google ScholarDigital Library
- T. L. Gall and B. Jeannet. Lattice automata: A representation for languages on infinite alphabets, and some applications to verification. In SAS 2007, volume 4634 of LNCS, pages 52--68, 2007. Google ScholarDigital Library
- T. Griffiths. The unsolvability of the equivalence problem for lambda-free nondeterministic generalized machines. J. ACM, 15: 409--413, 1968. Google ScholarDigital Library
- P. Hooimeijer and M. Veanes. An evaluation of automata algorithms for string analysis. In VMCAI'11, LNCS. Springer, 2011. Google ScholarDigital Library
- P. Hooimeijer and W. Weimer. A decision procedure for subset constraints over regular languages. In Proceedings of the Conference on Programming Language Design and Implementation, pages 188--198, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--392--1. Google ScholarDigital Library
- P. Hooimeijer and W. Weimer. Solving string constraints lazily. In ASE, 2010. Google ScholarDigital Library
- P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with bek. In Proceedings of the USENIX Security Symposium, August 2011. Google ScholarDigital Library
- O. Ibarra. The unsolvability of the equivalence problem for Efree NGSM's with unary input (output) alphabet and applications. SIAM Journal on Computing, 4: 524--532, 1978.Google ScholarCross Ref
- M. Kaminski and N. Francez. Finite-memory automata. In 31st Annual Symposium on Foundations of Computer Science (FOCS 1990), volume 2, pages 683--688. IEEE, 1990. Google ScholarDigital Library
- A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. HAMPI: a solver for string constraints. In ISSTA, 2009. Google ScholarDigital Library
- N. Kobayashi, N. Tabuchi, and H. Unno. Higher-order multi-parameter tree transducers and recursion schemes for program verification. In POPL, pages 495--508. ACM, 2010. Google ScholarDigital Library
- J. Krumm. A survey of computational location privacy. Personal Ubiquitous Comput., 13: 391--399, August 2009. Google ScholarDigital Library
- A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM J. Comput., 39: 410--430, June 2009. Google ScholarDigital Library
- T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. In Proc. 19th ACM Symposium on Principles of Database Systems (PODS'2000), pages 11--22. ACM, 2000. Google ScholarDigital Library
- Y. Minamide. Static approximation of dynamically generated web pages. In WWW '05: Proceedings of the 14th International Conference on the World Wide Web, pages 432--441, 2005. ISBN 1--59593-046--9. Google ScholarDigital Library
- K. Mowery, D. Bogenreif, S. Yilek, and H. Shacham. Fingerprinting information in javascript implementations. In Proceedings of Web 2.0 Security and Privacy 2011 (W2SP), May 2011.Google Scholar
- F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alphabets. ACM Trans. CL, 5: 403--435, 2004. Google ScholarDigital Library
- G. V. Noord and D. Gerdemann. Finite state transducers with predicates and identities. Grammars, 4: 263--286, 2001.Google ScholarCross Ref
- C.-H. L. Ong and S. J. Ramsay. Verifying higher-order functional programs with pattern-matching algebraic data types. In POPL'11, pages 587--598. ACM, 2011. Google ScholarDigital Library
- J. R. Parker. Algorithms for Image Processing and Computer Vision. Wiley and Sons, 2006.Google Scholar
- P. Ratanaworabhan, B. Livshits, and B. Zorn. Nozzle: A defense against heap-spraying code injection attacks. In Proceedings of the Usenix Security Symposium, Aug. 2009. Google ScholarDigital Library
- P. Saxena, D. Akhawe, S. Hanna, S. McCamant, F. Mao, and D. Song. A symbolic execution framework for javascript. In IEEE Security and Privacy, 2010. Google ScholarDigital Library
- M. P. Schützenberger. Sur les relations rationnelles. In GI Conference on Automata Theory and Formal Languages, volume 33 of LNCS, pages 209--213, 1975. Google ScholarDigital Library
- L. Segoufin. Automata and logics for words and trees over an infinite alphabet. In Z. Ésik, editor, CSL, volume 4207 of LNCS, pages 41--57, 2006. Google ScholarDigital Library
- H. Seidl. Equivalence of finite-valued tree transducers is decidable. Math. Systems Theory, 27: 285--346, 1994. Google ScholarDigital Library
- M. Veanes and N. Bjørner. Symbolic tree transducers. In Perspectives of System Informatics (PSI'11), 2011. Google ScholarDigital Library
- M. Veanes, N. Bjørner, and L. de Moura. Symbolic automata constraint solving. In C. Fermüller and A. Voronkov, editors, LPAR-17, volume 6397 of LNCS, pages 640--654, 2010. Google ScholarDigital Library
- M. Veanes, P. de Halleux, and N. Tillmann. Rex: Symbolic Regular Expression Explorer. In ICST'10. IEEE, 2010. Google ScholarDigital Library
- G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, and Z. Su. Dynamic test input generation for web applications. In ISSTA, 2008. Google ScholarDigital Library
- A. Weber. Decomposing finite-valued transducers and deciding their equivalence. SIAM Journal on Computing, 22 (1): 175--202, February 1993. Google ScholarDigital Library
- F. Yu, T. Bultan, and O. H. Ibarra. Relational string verification using multi-track automata. In Proceedings of the 15th international conference on Implementation and application of automata, CIAA'10, pages 290--299, 2011. Google ScholarDigital Library
Index Terms
- Symbolic finite state transducers: algorithms and applications
Recommendations
Symbolic finite state transducers: algorithms and applications
POPL '12Finite automata and finite transducers are used in a wide range of applications in software engineering, from regular expressions to specification languages. We extend these classic objects with symbolic alphabets represented as parametric theories. ...
Extended symbolic finite automata and transducers
Symbolic finite automata and transducers augment classic automata and transducers with symbolic alphabets represented as parametric theories. This extension enables to succinctly represent large and potentially infinite alphabets while preserving ...
From Two-Way to One-Way Finite State Transducers
LICS '13: Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer ScienceAny two-way finite state automaton is equivalent to some one-way finite state automaton. This well-known result, shown by Rabin and Scott and independently by Shepherd son, states that two-way finite state automata (even non-deterministic) characterize ...
Comments