nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation

verfasst von : Oren Ish-Shalom, Shachar Itzhaky, Roman Manevich, Noam Rinetzky

Erschienen in: Verification, Model Checking, and Abstract Interpretation

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP \(p:\bar{\Sigma }\rightarrow \bar{\Sigma }\), we compute a partial pseudo-inverse function \(p^{-1}\) such that given a target string \(t\in \overline{\Sigma }\), if \(p^{-1}(t)\ne \bot \) then \(p(p^{-1}(t))=t\). The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis.

Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet \(\Sigma \) from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet.

We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Sharing Ghost Variables in a Collection of Abstract Domains

Nächstes Kapitel Synthesizing Environment Invariants for Modular Hardware Verification

The procedure is based on a GCC procedure which adds an escape character before tab and newline characters. For clarity, we replaced the whitespace characters with more visible characters. For simplicity, we removed code concerning array bound checking.

We remind the reader that our tool operates directly on LLVM bitcode.

The choice to block the program was done in the sake of simplicity. Alternatively, we could have designed the language to signal an error.

The transformers pertaining to read-last() and write-last() operations are similar to those of read-first() and write-first(), respectively, and are thus omitted.

An equally plausible alternative would be to make \( env (v)\) bad.

http://www.dataparksearch.org/

Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 209–224. USENIX Association, Berkeley (2008)

Checn, W., Duding, J.T.: Program inversion: More than fun! Sci. Comput. Program. 15(1), 1–13 (1990)

de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24CrossRef

Dijkstra, E.W.: Program inversion. In: Bauer, F.L., et al. (eds.) Program Construction. LNCS, vol. 69, pp. 54–57. Springer, Heidelberg (1979). https://doi.org/10.1007/BFb0014657. http://dl.acm.org/citation.cfm?id=647639.733360CrossRef

Ganesh, V.: Decision procedures for bit-vectors, arrays and integers. Ph.D. thesis, Stanford, CA, USA (2007). aAI3281841

Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20–22 January 2016, pp. 499–512 (2016)

Glück, R., Kawabe, M.: A method for automatic program inversion based on LR(0) parsing. Fundam. Inform. 66, 367–395 (2005)MathSciNetMATH

Grigore, R., Yang, H.: Abstraction refinement guided by a learnt probabilistic model. SIGPLAN Not. 51(1), 485–498 (2016). https://doi.org/10.1145/2914770.2837663CrossRefMATH

10.

Gulwani, S.: Automating string processing in spreadsheets using input-output examples. SIGPLAN Not. 46(1), 317–330 (2011)CrossRef

11.

Gulwani, S.: Programming by examples: applications, algorithms, and ambiguity resolution. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS (LNAI), vol. 9706, pp. 9–14. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40229-1_2CrossRef

12.

Heo, K., Oh, H., Yang, H.: Learning a variable-clustering strategy for octagon from labeled data generated by a static analysis. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 237–256. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53413-7_12CrossRefMATH

13.

Hu, Q., D’Antoni, L.: Automatic program inversion using symbolic transducers. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 376–389. ACM, New York (2017). https://doi.org/10.1145/3062341.3062345

14.

Jose Oncina, P.G., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)CrossRef

15.

Kanade, A., Alur, R., Rajamani, S., Ramanlingam, G.: Representation dependence testing using program inversion. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2010, pp. 277–286. ACM, New York (2010). https://doi.org/10.1145/1882291.1882332

16.

Kawabe, M., Glück, R.: The program inverter LRinv and its structure. In: Hermenegildo, M.V., Cabeza, D. (eds.) PADL 2005. LNCS, vol. 3350, pp. 219–234. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30557-6_17CrossRef

17.

Kiezun, A., Ganesh, V., Guo, P.J., Hooimeijer, P., Ernst, M.D.: HAMPI: a solver for string constraints. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, pp. 105–116. ACM, New York (2009). https://doi.org/10.1145/1572272.1572286

18.

Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, p. 75. IEEE Computer Society, Washington, DC (2004)

19.

Miltner, A., Fisher, K., Pierce, B.C., Walker, D., Zdancewic, S.: Synthesizing bijective lenses. Proc. ACM Program. Lang. 2(POPL), 1:1–1:30 (2017). https://doi.org/10.1145/3158089CrossRef

20.

Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef

21.

Nori, A.V., Rajamani, S.K.: Program analysis and machine learning: a win-win deal. In: Yahav, E. (ed.) SAS 2011. LNCS, vol. 6887, pp. 2–3. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23702-7_2CrossRef

22.

Nori, A.V., Sharma, R.: Termination proofs from tests. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 246–256. ACM, New York (2013). https://doi.org/10.1145/2491411.2491413

23.

Octeau, D., et al.: Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 469–484. ACM, New York (2016). https://doi.org/10.1145/2837614.2837661

24.

Oh, H., Yang, H., Yi, K.: Learning a strategy for adapting a program analysis via Bayesian optimisation. In: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pp. 572–588. ACM, New York (2015). https://doi.org/10.1145/2814270.2814309

25.

Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy data. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 761–774. ACM, New York (2016). https://doi.org/10.1145/2837614.2837671

26.

Ross, B.J.: Running programs backwards: the logical inversion of imperative computation. Formal Aspects Comput. 9(3), 331–348 (1997)CrossRef

27.

Sankaranarayanan, S., Chaudhuri, S., Ivančić, F., Gupta, A.: Dynamic inference of likely data preconditions over predicates by tree learning. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 295–306. ACM, New York (2008). https://doi.org/10.1145/1390630.1390666

28.

Sankaranarayanan, S., Ivančić, F., Gupta, A.: Mining library specifications using inductive logic programming. In: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 131–140. ACM, New York (2008). https://doi.org/10.1145/1368088.1368107

29.

Schoenmakers, B.: Inorder traversal of a binary heap and its inversion in optimal time and space. In: Bird, R.S., Morgan, C.C., Woodcock, J.C.P. (eds.) MPC 1992. LNCS, vol. 669, pp. 291–301. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56625-2_19CrossRef

30.

Singh, R., Gulwani, S.: Learning semantic string transformations from examples. Proc. VLDB Endow. 5(8), 740–751 (2012). https://doi.org/10.14778/2212351.2212356CrossRef

31.

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 3104–3112 (2014)

32.

Yi, K., Choi, H., Kim, J., Kim, Y.: An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett. 102(2–3), 118–123 (2007). https://doi.org/10.1016/j.ipl.2006.11.004MathSciNetCrossRefMATH

33.

Zaremba, W., Sutskever, I.: Learning to execute. CoRR abs/1410.4615 (2014)

34.

Zheng, Y., Ganesh, V., Subramanian, S., Tripp, O., Dolby, J., Zhang, X.: Effective search-space pruning for solvers of string equations, regular expressions and length constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 235–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_14CrossRef

Titel: Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation
verfasst von: Oren Ish-Shalom
Shachar Itzhaky
Roman Manevich
Noam Rinetzky
Verlag: Springer International Publishing
Buch: Verification, Model Checking, and Abstract Interpretation
Print ISBN: 978-3-030-39321-2

Electronic ISBN: 978-3-030-39322-9

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-39322-9_9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"