Skip to main content
Top
Published in:
Cover of the book

2015 | OriginalPaper | Chapter

Programming with “Big Code”

Author : Eran Yahav

Published in: Programming Languages and Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this talk, I will cover recent research trends on leveraging such “big code” for program analysis, program synthesis and reverse engineering. We will consider a range of semantic representations based on symbolic automata [11, 15], tracelets [3], numerical abstractions [13, 14], and textual descriptions [1, 22], as well as different notions of code similarity based on these representations.
To leverage these semantic representations, we will consider a number of prediction techniques, including statistical language models [19, 20], variable order Markov models [2], and other distance-based and model-based sequence classification techniques.
Finally, I will show applications of these techniques including semantic code search in both source code and stripped binaries, code completion and reverse engineering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)MathSciNetMATH Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order Markov models. J. Artif. Intell. Res. 22, 385–421 (2004)MathSciNetMATH
3.
go back to reference David, Y., Yahav, E.: Tracelet-based code search in executables. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp. 349–360 (2014) David, Y., Yahav, E.: Tracelet-based code search in executables. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp. 349–360 (2014)
4.
go back to reference Faktor, A., Irani, M.: Clustering by composition: unsupervised discovery of image categories. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1092–1106 (2014)CrossRef Faktor, A., Irani, M.: Clustering by composition: unsupervised discovery of image categories. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1092–1106 (2014)CrossRef
5.
go back to reference Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)CrossRef Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)CrossRef
6.
go back to reference Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH ’07, New York, NY, USA (2007) Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH ’07, New York, NY, USA (2007)
7.
go back to reference Horwitz, S.: Identifying the semantic and textual differences between two versions of a program, vol. 25. ACM (1990) Horwitz, S.: Identifying the semantic and textual differences between two versions of a program, vol. 25. ACM (1990)
8.
go back to reference Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)CrossRef Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)CrossRef
9.
go back to reference Kang, H., Hebert, M., Efros, A.A., Kanade, T.: Data-driven objectness. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 189–195 (2015)CrossRef Kang, H., Hebert, M., Efros, A.A., Kanade, T.: Data-driven objectness. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 189–195 (2015)CrossRef
10.
go back to reference Katz, O.: Type prediction using variable order Markov models. Master’s thesis, Technion (2015) Katz, O.: Type prediction using variable order Markov models. Master’s thesis, Technion (2015)
11.
go back to reference Mishne, A., Shoham, S., Yahav, E.: Typestate-based semantic code search over partial programs. In: OOPSLA ’12 (2012) Mishne, A., Shoham, S., Yahav, E.: Typestate-based semantic code search over partial programs. In: OOPSLA ’12 (2012)
12.
go back to reference Necula, G.C.: Translation validation for an optimizing compiler. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pp. 83–94, New York, NY, USA (2000) Necula, G.C.: Translation validation for an optimizing compiler. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pp. 83–94, New York, NY, USA (2000)
13.
go back to reference Partush, N., Yahav, E.: Abstract semantic differencing for numerical programs. In: Logozzo, F., Fähndrich, M. (eds.) Static Analysis. LNCS, vol. 7935, pp. 238–258. Springer, Heidelberg (2013) CrossRef Partush, N., Yahav, E.: Abstract semantic differencing for numerical programs. In: Logozzo, F., Fähndrich, M. (eds.) Static Analysis. LNCS, vol. 7935, pp. 238–258. Springer, Heidelberg (2013) CrossRef
14.
go back to reference Partush, N., Yahav, E.: Abstract semantic differencing via speculative correlation. In: Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA’14 (2014) Partush, N., Yahav, E.: Abstract semantic differencing via speculative correlation. In: Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA’14 (2014)
15.
go back to reference Peleg, H., Shoham, S., Yahav, E., Yang, H.: Symbolic automata for represnting big code. In: International journal on Software Tools for Technology Transfer, STTT’15 (2015) Peleg, H., Shoham, S., Yahav, E., Yang, H.: Symbolic automata for represnting big code. In: International journal on Software Tools for Technology Transfer, STTT’15 (2015)
16.
go back to reference Pnueli, A., Siegel, M.D., Singerman, E.: Translation validation. In: Steffen, B. (ed.) TACAS 1998. LNCS, vol. 1384, pp. 151–166. Springer, Heidelberg (1998) CrossRef Pnueli, A., Siegel, M.D., Singerman, E.: Translation validation. In: Steffen, B. (ed.) TACAS 1998. LNCS, vol. 1384, pp. 151–166. Springer, Heidelberg (1998) CrossRef
17.
go back to reference Ramos, D.A., Engler, D.R.: Practical, low-effort equivalence verification of real code. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 669–685. Springer, Heidelberg (2011) CrossRef Ramos, D.A., Engler, D.R.: Practical, low-effort equivalence verification of real code. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 669–685. Springer, Heidelberg (2011) CrossRef
18.
go back to reference Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’15, pp. 111–124 (2015) Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’15, pp. 111–124 (2015)
19.
go back to reference Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14, p. 44 (2014) Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14, p. 44 (2014)
20.
go back to reference Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88, 1270–1278 (2000)CrossRef Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88, 1270–1278 (2000)CrossRef
22.
go back to reference Sinai, M.B., Yahav, E.: Code similarity via natural language descriptions. In: POPL Off the Beaten Track, OBT’15 (2014) Sinai, M.B., Yahav, E.: Code similarity via natural language descriptions. In: POPL Off the Beaten Track, OBT’15 (2014)
Metadata
Title
Programming with “Big Code”
Author
Eran Yahav
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-26529-2_1

Premium Partner