Skip to main content
Top

2017 | OriginalPaper | Chapter

Learning Types for Binaries

Authors : Zhiwu Xu, Cheng Wen, Shengchao Qin

Published in: Formal Methods and Software Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Type inference for Binary codes is a challenging problem due partly to the fact that much type-related information has been lost during the compilation from high-level source code. Most of the existing research on binary code type inference tend to resort to program analysis techniques, which can be too conservative to infer types with high accuracy or too heavy-weight to be viable in practice. In this paper, we propose a new approach to learning types for recovered variables from their related representative instructions. Our idea is motivated by “duck typing”, where the type of a variable is determined by its features and properties. Our approach first learns a classifier from existing binaries with debug information and then uses this classifier to predict types for new, unseen binaries. We have implemented our approach in a tool called BITY and used it to conduct some experiments on a well-known benchmark coreutils (v8.4). The results show that our tool is more precise than the commercial tool Hey-Rays, both in terms of correct types and compatible types.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In FASTCALL convention, the first two parameters are passed in ECX and EDX.
 
2
Hex-Rays makes use of debug information, so we perform both our tool and Hex-Rays on stripped binaries.
 
3
Theoretically, we can use the radio of the number of common levels among the number of maximum levels between \(t_1\) and \(t_2\) [4]. Since we consider 3 levels in practice, we use the half here.
 
Literature
1.
go back to reference Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Network and Distributed System Security Symposium (2010) Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Network and Distributed System Security Symposium (2010)
2.
go back to reference Lee, J.H., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: Network and Distributed System Security Symposium (2011) Lee, J.H., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: Network and Distributed System Security Symposium (2011)
3.
go back to reference Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: approaching C++ decompilation. In: Reverse Engineering, pp. 347–356 (2011) Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: approaching C++ decompilation. In: Reverse Engineering, pp. 347–356 (2011)
4.
go back to reference Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Scalable variable and data type detection in a binary rewriter. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 51–60 (2013) Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Scalable variable and data type detection in a binary rewriter. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 51–60 (2013)
5.
go back to reference Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 27–41 (2016) Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 27–41 (2016)
7.
go back to reference Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 binary executables. University of Wisconsin-Madison Department of Computer Sciences (2012) Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 binary executables. University of Wisconsin-Madison Department of Computer Sciences (2012)
8.
go back to reference Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRef Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRef
9.
go back to reference Smola, A.J., Schlkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica 22(1), 211–231 (1998)MathSciNetCrossRefMATH Smola, A.J., Schlkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica 22(1), 211–231 (1998)MathSciNetCrossRefMATH
10.
go back to reference IntelCorporation: Intel 64 and IA-32 Architectures Software Developer Manuals, December 2016 IntelCorporation: Intel 64 and IA-32 Architectures Software Developer Manuals, December 2016
11.
go back to reference Crnic, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983) Crnic, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
13.
go back to reference Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149(PB), 677–682 (2015)CrossRef Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149(PB), 677–682 (2015)CrossRef
16.
go back to reference Xu, S.: Commonly Used Algorithm Assembly (C Language Description). Tsinghua University Press, Beijing (2004). (in Chinese) Xu, S.: Commonly Used Algorithm Assembly (C Language Description). Tsinghua University Press, Beijing (2004). (in Chinese)
17.
go back to reference Robbins, E., Howe, J.M., King, A.: Theory propagation and rational-trees. In: Symposium on Principles and Practice of Declarative Programming, pp. 193–204 (2013) Robbins, E., Howe, J.M., King, A.: Theory propagation and rational-trees. In: Symposium on Principles and Practice of Declarative Programming, pp. 193–204 (2013)
18.
go back to reference Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. 48(4), 65 (2016)CrossRef Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. 48(4), 65 (2016)CrossRef
19.
go back to reference Zhang, M., Prakash, A., Li, X., Liang, Z., Yin, H.: Identifying and analyzing pointer misuses for sophisticated memory-corruption exploit diagnosis. Proc. West. Pharmacol. Soc. 47(47), 46–49 (2013) Zhang, M., Prakash, A., Li, X., Liang, Z., Yin, H.: Identifying and analyzing pointer misuses for sophisticated memory-corruption exploit diagnosis. Proc. West. Pharmacol. Soc. 47(47), 46–49 (2013)
20.
go back to reference Yan, Q., McCamant, S.: Conservative signed/unsigned type inference for binaries using minimum cut. Technical report, University of Minnesota (2014) Yan, Q., McCamant, S.: Conservative signed/unsigned type inference for binaries using minimum cut. Technical report, University of Minnesota (2014)
21.
go back to reference Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Network and Distributed System Security Symposium (2011) Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Network and Distributed System Security Symposium (2011)
22.
go back to reference Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Artiste: automatic generation of hybrid data structure signatures from binary code executions. Technical report TRIMDEA-SW-2012-001, IMDEA Software Institute (2012) Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Artiste: automatic generation of hybrid data structure signatures from binary code executions. Technical report TRIMDEA-SW-2012-001, IMDEA Software Institute (2012)
23.
go back to reference Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Reverse Engineering, pp. 32–41 (2013) Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Reverse Engineering, pp. 32–41 (2013)
24.
go back to reference Jin, W., Cohen, C., Gennari, J., Hines, C., Chaki, S., Gurfinkel, A., Havrilla, J., Narasimhan, P.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: ACM Sigplan on Program Protection and Reverse Engineering Workshop, p. 1 (2014) Jin, W., Cohen, C., Gennari, J., Hines, C., Chaki, S., Gurfinkel, A., Havrilla, J., Narasimhan, P.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: ACM Sigplan on Program Protection and Reverse Engineering Workshop, p. 1 (2014)
25.
go back to reference Yoo, K., Barua, R.: Recovery of object oriented features from C++ binaries. In: Asia-Pacific Software Engineering Conference, pp. 231–238 (2014) Yoo, K., Barua, R.: Recovery of object oriented features from C++ binaries. In: Asia-Pacific Software Engineering Conference, pp. 231–238 (2014)
26.
go back to reference Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 313–326 (2016) Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 313–326 (2016)
27.
go back to reference Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: The ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124 (2015) Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: The ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124 (2015)
Metadata
Title
Learning Types for Binaries
Authors
Zhiwu Xu
Cheng Wen
Shengchao Qin
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-68690-5_26

Premium Partner