Skip to main content

2017 | OriginalPaper | Buchkapitel

Learning Types for Binaries

verfasst von : Zhiwu Xu, Cheng Wen, Shengchao Qin

Erschienen in: Formal Methods and Software Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Type inference for Binary codes is a challenging problem due partly to the fact that much type-related information has been lost during the compilation from high-level source code. Most of the existing research on binary code type inference tend to resort to program analysis techniques, which can be too conservative to infer types with high accuracy or too heavy-weight to be viable in practice. In this paper, we propose a new approach to learning types for recovered variables from their related representative instructions. Our idea is motivated by “duck typing”, where the type of a variable is determined by its features and properties. Our approach first learns a classifier from existing binaries with debug information and then uses this classifier to predict types for new, unseen binaries. We have implemented our approach in a tool called BITY and used it to conduct some experiments on a well-known benchmark coreutils (v8.4). The results show that our tool is more precise than the commercial tool Hey-Rays, both in terms of correct types and compatible types.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In FASTCALL convention, the first two parameters are passed in ECX and EDX.
 
2
Hex-Rays makes use of debug information, so we perform both our tool and Hex-Rays on stripped binaries.
 
3
Theoretically, we can use the radio of the number of common levels among the number of maximum levels between \(t_1\) and \(t_2\) [4]. Since we consider 3 levels in practice, we use the half here.
 
Literatur
1.
Zurück zum Zitat Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Network and Distributed System Security Symposium (2010) Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Network and Distributed System Security Symposium (2010)
2.
Zurück zum Zitat Lee, J.H., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: Network and Distributed System Security Symposium (2011) Lee, J.H., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: Network and Distributed System Security Symposium (2011)
3.
Zurück zum Zitat Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: approaching C++ decompilation. In: Reverse Engineering, pp. 347–356 (2011) Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: approaching C++ decompilation. In: Reverse Engineering, pp. 347–356 (2011)
4.
Zurück zum Zitat Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Scalable variable and data type detection in a binary rewriter. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 51–60 (2013) Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Scalable variable and data type detection in a binary rewriter. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 51–60 (2013)
5.
Zurück zum Zitat Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 27–41 (2016) Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 27–41 (2016)
7.
Zurück zum Zitat Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 binary executables. University of Wisconsin-Madison Department of Computer Sciences (2012) Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 binary executables. University of Wisconsin-Madison Department of Computer Sciences (2012)
8.
Zurück zum Zitat Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRef Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRef
9.
Zurück zum Zitat Smola, A.J., Schlkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica 22(1), 211–231 (1998)MathSciNetCrossRefMATH Smola, A.J., Schlkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica 22(1), 211–231 (1998)MathSciNetCrossRefMATH
10.
Zurück zum Zitat IntelCorporation: Intel 64 and IA-32 Architectures Software Developer Manuals, December 2016 IntelCorporation: Intel 64 and IA-32 Architectures Software Developer Manuals, December 2016
11.
Zurück zum Zitat Crnic, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983) Crnic, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
13.
Zurück zum Zitat Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149(PB), 677–682 (2015)CrossRef Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149(PB), 677–682 (2015)CrossRef
16.
Zurück zum Zitat Xu, S.: Commonly Used Algorithm Assembly (C Language Description). Tsinghua University Press, Beijing (2004). (in Chinese) Xu, S.: Commonly Used Algorithm Assembly (C Language Description). Tsinghua University Press, Beijing (2004). (in Chinese)
17.
Zurück zum Zitat Robbins, E., Howe, J.M., King, A.: Theory propagation and rational-trees. In: Symposium on Principles and Practice of Declarative Programming, pp. 193–204 (2013) Robbins, E., Howe, J.M., King, A.: Theory propagation and rational-trees. In: Symposium on Principles and Practice of Declarative Programming, pp. 193–204 (2013)
18.
Zurück zum Zitat Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. 48(4), 65 (2016)CrossRef Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. 48(4), 65 (2016)CrossRef
19.
Zurück zum Zitat Zhang, M., Prakash, A., Li, X., Liang, Z., Yin, H.: Identifying and analyzing pointer misuses for sophisticated memory-corruption exploit diagnosis. Proc. West. Pharmacol. Soc. 47(47), 46–49 (2013) Zhang, M., Prakash, A., Li, X., Liang, Z., Yin, H.: Identifying and analyzing pointer misuses for sophisticated memory-corruption exploit diagnosis. Proc. West. Pharmacol. Soc. 47(47), 46–49 (2013)
20.
Zurück zum Zitat Yan, Q., McCamant, S.: Conservative signed/unsigned type inference for binaries using minimum cut. Technical report, University of Minnesota (2014) Yan, Q., McCamant, S.: Conservative signed/unsigned type inference for binaries using minimum cut. Technical report, University of Minnesota (2014)
21.
Zurück zum Zitat Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Network and Distributed System Security Symposium (2011) Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Network and Distributed System Security Symposium (2011)
22.
Zurück zum Zitat Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Artiste: automatic generation of hybrid data structure signatures from binary code executions. Technical report TRIMDEA-SW-2012-001, IMDEA Software Institute (2012) Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Artiste: automatic generation of hybrid data structure signatures from binary code executions. Technical report TRIMDEA-SW-2012-001, IMDEA Software Institute (2012)
23.
Zurück zum Zitat Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Reverse Engineering, pp. 32–41 (2013) Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Reverse Engineering, pp. 32–41 (2013)
24.
Zurück zum Zitat Jin, W., Cohen, C., Gennari, J., Hines, C., Chaki, S., Gurfinkel, A., Havrilla, J., Narasimhan, P.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: ACM Sigplan on Program Protection and Reverse Engineering Workshop, p. 1 (2014) Jin, W., Cohen, C., Gennari, J., Hines, C., Chaki, S., Gurfinkel, A., Havrilla, J., Narasimhan, P.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: ACM Sigplan on Program Protection and Reverse Engineering Workshop, p. 1 (2014)
25.
Zurück zum Zitat Yoo, K., Barua, R.: Recovery of object oriented features from C++ binaries. In: Asia-Pacific Software Engineering Conference, pp. 231–238 (2014) Yoo, K., Barua, R.: Recovery of object oriented features from C++ binaries. In: Asia-Pacific Software Engineering Conference, pp. 231–238 (2014)
26.
Zurück zum Zitat Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 313–326 (2016) Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 313–326 (2016)
27.
Zurück zum Zitat Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: The ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124 (2015) Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: The ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124 (2015)
Metadaten
Titel
Learning Types for Binaries
verfasst von
Zhiwu Xu
Cheng Wen
Shengchao Qin
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-68690-5_26

Premium Partner