Skip to main content
Top

2020 | OriginalPaper | Chapter

RouAlign: Cross-Version Function Alignment and Routine Recovery with Graphlet Edge Embedding

Authors : Can Yang, Jian Liu, Mengxia Luo, Xiaorui Gong, Baoxu Liu

Published in: ICT Systems Security and Privacy Protection

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Reverse engineering is labor-intensive work to understand the inner implementation of a program, and is necessary for malware analysis, vulnerability hunting, etc. Cross-version function identification and subroutine matching would greatly release manpower by indicating the known parts coming from different binary programs. Existing approaches mainly focus on function recognition ignoring the recovery of the relationships between functions, which makes the researchers hard to locate the calling routine they are interested in.
In this paper, we propose a method using graphlet edge embedding to abstract high-level topology features of function call graphs and recover the relationships between functions. With the recovery of function relationships, we reconstruct the calling routine of the program and then infer the specific functions in it. We implement a prototype model called RouAlign, which can automatically align the trunk routine of assembly codes. We evaluated RouAlign on 65 groups of real-world programs, with over two million functions. RouAlign outperforms state-of-the-art binary comparing solutions by over 35% with a high precision of 92% on average in pairwise function recognition.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Le, S.: Structure2Vec: deep learning for security analytics over graphs (2018) Le, S.: Structure2Vec: deep learning for security analytics over graphs (2018)
2.
go back to reference Bromley, J., et al.: Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems (1994) Bromley, J., et al.: Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems (1994)
3.
go back to reference Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRef Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRef
4.
go back to reference Shi, C., et al.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2016)CrossRef Shi, C., et al.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2016)CrossRef
5.
go back to reference Andriesse, D., et al.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: 25th USENIX Security Symposium (USENIX Security 2016) (2016) Andriesse, D., et al.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: 25th USENIX Security Symposium (USENIX Security 2016) (2016)
6.
go back to reference Chandramohan, M., et al.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM (2016) Chandramohan, M., et al.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM (2016)
7.
go back to reference Ding, S., et al.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. IEEE (2019) Ding, S., et al.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. IEEE (2019)
8.
go back to reference Dullien, T., et al.: Automated attacker correlation for malicious code. Bochum University (Germany FR) (2010) Dullien, T., et al.: Automated attacker correlation for malicious code. Bochum University (Germany FR) (2010)
9.
go back to reference Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5(1), 3 (2005) Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5(1), 3 (2005)
11.
go back to reference Junod, P., et al.: Obfuscator-LLVM-software protection for the masses. In: 2015 IEEE/ACM 1st International Workshop on Software Protection, pp. 3–9. IEEE (2015) Junod, P., et al.: Obfuscator-LLVM-software protection for the masses. In: 2015 IEEE/ACM 1st International Workshop on Software Protection, pp. 3–9. IEEE (2015)
12.
go back to reference Eschweiler, S, Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016) Eschweiler, S, Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016)
13.
go back to reference Feng, M., et al.: Open-source license violations of binary software at large scale. In: IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2019) Feng, M., et al.: Open-source license violations of binary software at large scale. In: IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2019)
14.
go back to reference Feng, Q., et al.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM (2016) Feng, Q., et al.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM (2016)
15.
go back to reference Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM (2017) Xu, X., et al.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM (2017)
16.
go back to reference Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016) Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016)
17.
go back to reference Hu, X., et al.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM (2009) Hu, X., et al.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security. ACM (2009)
18.
go back to reference Kuchaiev, O., et al.: Topological network alignment uncovers biological function and phylogeny. J. R. Soc. Interface 7(50), 1341–1354 (2010)CrossRef Kuchaiev, O., et al.: Topological network alignment uncovers biological function and phylogeny. J. R. Soc. Interface 7(50), 1341–1354 (2010)CrossRef
19.
go back to reference László, T., Kiss, Á.: Obfuscating C++ programs via control flow flattening. Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica 30(1), 3–19 (2009)MATH László, T., Kiss, Á.: Obfuscating C++ programs via control flow flattening. Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica 30(1), 3–19 (2009)MATH
20.
go back to reference Liu, B., et al.: \(\alpha \)diff: cross-version binary code similarity detection with DNN. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM (2018) Liu, B., et al.: \(\alpha \)diff: cross-version binary code similarity detection with DNN. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM (2018)
22.
go back to reference Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inform. 6, 257–273 (2008). CIN-S680CrossRef Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inform. 6, 257–273 (2008). CIN-S680CrossRef
23.
go back to reference Milo, R., et al.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)CrossRef Milo, R., et al.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)CrossRef
24.
go back to reference Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2015) Tang, J., et al.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2015)
26.
go back to reference Zuo, F., Li, X., et al. Neural machine translation inspired binary code similarity comparison beyond function pairs. In: Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS) (2019, in press) Zuo, F., Li, X., et al. Neural machine translation inspired binary code similarity comparison beyond function pairs. In: Proceedings of the 2019 Network and Distributed Systems Security Symposium (NDSS) (2019, in press)
Metadata
Title
RouAlign: Cross-Version Function Alignment and Routine Recovery with Graphlet Edge Embedding
Authors
Can Yang
Jian Liu
Mengxia Luo
Xiaorui Gong
Baoxu Liu
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-58201-2_11

Premium Partner