Skip to main content

2020 | OriginalPaper | Buchkapitel

Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation

verfasst von : Oren Ish-Shalom, Shachar Itzhaky, Roman Manevich, Noam Rinetzky

Erschienen in: Verification, Model Checking, and Abstract Interpretation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP \(p:\bar{\Sigma }\rightarrow \bar{\Sigma }\), we compute a partial pseudo-inverse function \(p^{-1}\) such that given a target string \(t\in \overline{\Sigma }\), if \(p^{-1}(t)\ne \bot \) then \(p(p^{-1}(t))=t\). The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis.
Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet \(\Sigma \) from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet.
We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The procedure is based on a GCC procedure which adds an escape character before tab and newline characters. For clarity, we replaced the whitespace characters with more visible characters. For simplicity, we removed code concerning array bound checking.
 
2
We remind the reader that our tool operates directly on LLVM bitcode.
 
3
The choice to block the program was done in the sake of simplicity. Alternatively, we could have designed the language to signal an error.
 
4
The transformers pertaining to read-last() and write-last() operations are similar to those of read-first() and write-first(), respectively, and are thus omitted.
 
5
An equally plausible alternative would be to make \( env (v)\) bad.
 
Literatur
2.
Zurück zum Zitat Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 209–224. USENIX Association, Berkeley (2008) Cadar, C., Dunbar, D., Engler, D.: Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 209–224. USENIX Association, Berkeley (2008)
3.
Zurück zum Zitat Checn, W., Duding, J.T.: Program inversion: More than fun! Sci. Comput. Program. 15(1), 1–13 (1990) Checn, W., Duding, J.T.: Program inversion: More than fun! Sci. Comput. Program. 15(1), 1–13 (1990)
6.
Zurück zum Zitat Ganesh, V.: Decision procedures for bit-vectors, arrays and integers. Ph.D. thesis, Stanford, CA, USA (2007). aAI3281841 Ganesh, V.: Decision procedures for bit-vectors, arrays and integers. Ph.D. thesis, Stanford, CA, USA (2007). aAI3281841
7.
Zurück zum Zitat Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20–22 January 2016, pp. 499–512 (2016) Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, 20–22 January 2016, pp. 499–512 (2016)
8.
Zurück zum Zitat Glück, R., Kawabe, M.: A method for automatic program inversion based on LR(0) parsing. Fundam. Inform. 66, 367–395 (2005)MathSciNetMATH Glück, R., Kawabe, M.: A method for automatic program inversion based on LR(0) parsing. Fundam. Inform. 66, 367–395 (2005)MathSciNetMATH
10.
Zurück zum Zitat Gulwani, S.: Automating string processing in spreadsheets using input-output examples. SIGPLAN Not. 46(1), 317–330 (2011)CrossRef Gulwani, S.: Automating string processing in spreadsheets using input-output examples. SIGPLAN Not. 46(1), 317–330 (2011)CrossRef
13.
Zurück zum Zitat Hu, Q., D’Antoni, L.: Automatic program inversion using symbolic transducers. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 376–389. ACM, New York (2017). https://doi.org/10.1145/3062341.3062345 Hu, Q., D’Antoni, L.: Automatic program inversion using symbolic transducers. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 376–389. ACM, New York (2017). https://​doi.​org/​10.​1145/​3062341.​3062345
14.
Zurück zum Zitat Jose Oncina, P.G., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)CrossRef Jose Oncina, P.G., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)CrossRef
15.
Zurück zum Zitat Kanade, A., Alur, R., Rajamani, S., Ramanlingam, G.: Representation dependence testing using program inversion. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2010, pp. 277–286. ACM, New York (2010). https://doi.org/10.1145/1882291.1882332 Kanade, A., Alur, R., Rajamani, S., Ramanlingam, G.: Representation dependence testing using program inversion. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2010, pp. 277–286. ACM, New York (2010). https://​doi.​org/​10.​1145/​1882291.​1882332
17.
Zurück zum Zitat Kiezun, A., Ganesh, V., Guo, P.J., Hooimeijer, P., Ernst, M.D.: HAMPI: a solver for string constraints. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, pp. 105–116. ACM, New York (2009). https://doi.org/10.1145/1572272.1572286 Kiezun, A., Ganesh, V., Guo, P.J., Hooimeijer, P., Ernst, M.D.: HAMPI: a solver for string constraints. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, pp. 105–116. ACM, New York (2009). https://​doi.​org/​10.​1145/​1572272.​1572286
18.
Zurück zum Zitat Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, p. 75. IEEE Computer Society, Washington, DC (2004) Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, p. 75. IEEE Computer Society, Washington, DC (2004)
20.
Zurück zum Zitat Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef
23.
Zurück zum Zitat Octeau, D., et al.: Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 469–484. ACM, New York (2016). https://doi.org/10.1145/2837614.2837661 Octeau, D., et al.: Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 469–484. ACM, New York (2016). https://​doi.​org/​10.​1145/​2837614.​2837661
24.
Zurück zum Zitat Oh, H., Yang, H., Yi, K.: Learning a strategy for adapting a program analysis via Bayesian optimisation. In: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pp. 572–588. ACM, New York (2015). https://doi.org/10.1145/2814270.2814309 Oh, H., Yang, H., Yi, K.: Learning a strategy for adapting a program analysis via Bayesian optimisation. In: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pp. 572–588. ACM, New York (2015). https://​doi.​org/​10.​1145/​2814270.​2814309
25.
Zurück zum Zitat Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy data. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 761–774. ACM, New York (2016). https://doi.org/10.1145/2837614.2837671 Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy data. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pp. 761–774. ACM, New York (2016). https://​doi.​org/​10.​1145/​2837614.​2837671
26.
Zurück zum Zitat Ross, B.J.: Running programs backwards: the logical inversion of imperative computation. Formal Aspects Comput. 9(3), 331–348 (1997)CrossRef Ross, B.J.: Running programs backwards: the logical inversion of imperative computation. Formal Aspects Comput. 9(3), 331–348 (1997)CrossRef
27.
Zurück zum Zitat Sankaranarayanan, S., Chaudhuri, S., Ivančić, F., Gupta, A.: Dynamic inference of likely data preconditions over predicates by tree learning. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 295–306. ACM, New York (2008). https://doi.org/10.1145/1390630.1390666 Sankaranarayanan, S., Chaudhuri, S., Ivančić, F., Gupta, A.: Dynamic inference of likely data preconditions over predicates by tree learning. In: Proceedings of the 2008 International Symposium on Software Testing and Analysis, ISSTA 2008, pp. 295–306. ACM, New York (2008). https://​doi.​org/​10.​1145/​1390630.​1390666
28.
Zurück zum Zitat Sankaranarayanan, S., Ivančić, F., Gupta, A.: Mining library specifications using inductive logic programming. In: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 131–140. ACM, New York (2008). https://doi.org/10.1145/1368088.1368107 Sankaranarayanan, S., Ivančić, F., Gupta, A.: Mining library specifications using inductive logic programming. In: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 131–140. ACM, New York (2008). https://​doi.​org/​10.​1145/​1368088.​1368107
31.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 3104–3112 (2014)
33.
Zurück zum Zitat Zaremba, W., Sutskever, I.: Learning to execute. CoRR abs/1410.4615 (2014) Zaremba, W., Sutskever, I.: Learning to execute. CoRR abs/1410.4615 (2014)
34.
Zurück zum Zitat Zheng, Y., Ganesh, V., Subramanian, S., Tripp, O., Dolby, J., Zhang, X.: Effective search-space pruning for solvers of string equations, regular expressions and length constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 235–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_14CrossRef Zheng, Y., Ganesh, V., Subramanian, S., Tripp, O., Dolby, J., Zhang, X.: Effective search-space pruning for solvers of string equations, regular expressions and length constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 235–254. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-21690-4_​14CrossRef
Metadaten
Titel
Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation
verfasst von
Oren Ish-Shalom
Shachar Itzhaky
Roman Manevich
Noam Rinetzky
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-39322-9_9