Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 9/2021

16-04-2021 | Research Article-Computer Engineering and Computer Science

Multi-Level Cross-Architecture Binary Code Similarity Metric

Authors: Meng Qiao, Xiaochuan Zhang, Huihui Sun, Zheng Shan, Fudong Liu, Wenjie Sun, Xingwei Li

Published in: Arabian Journal for Science and Engineering | Issue 9/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Cross-architecture binary code similarity metric is a fundamental technique in many machine learning-based binary program analysis methods. Some researches recently utilize graph embedding methods to generate binary code embedding and regard Euclidean distance between two binary code as a similarity. However, these researches utilize manual features and do not make full use of binary code structure information, which causes the loss of binary code information. To solve above problems, we propose a multi-level neural network model to generate binary code embedding, which includes CFG(control flow graph) structure information and basic block information. We could measure the cross-architecture similarity through the Euclidean distance of binary code embedding. We conduct a series of experiments to compare the similarity of cross-architecture binary code, and the results demonstrate that our model can overcome the limitations described above and show superiority over the state-of-the-art methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alrabaee, S.; Shirani, P.; Wang, L.; Debbabi, M.: Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. (TOPS) 21(2), 1–34 (2018) Alrabaee, S.; Shirani, P.; Wang, L.; Debbabi, M.: Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. (TOPS) 21(2), 1–34 (2018)
3.
go back to reference Baker, B.S.; Manber, U.; Muth, R.: Compressing differences of executable code. In: ACMSIGPLAN Workshop on Compiler Support for System Software (WCSS), pp. 1–10. Citeseer (1999) Baker, B.S.; Manber, U.; Muth, R.: Compressing differences of executable code. In: ACMSIGPLAN Workshop on Compiler Support for System Software (WCSS), pp. 1–10. Citeseer (1999)
4.
go back to reference Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013) Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:​1312.​6203 (2013)
5.
go back to reference C. Kruegel E. Kirda, D.M.W.R.; Vigna, G.: Polymorphic worm detection using structural information of executables. Symopsium on Recent Advance in Intrusion Detection (2005) C. Kruegel E. Kirda, D.M.W.R.; Vigna, G.: Polymorphic worm detection using structural information of executables. Symopsium on Recent Advance in Intrusion Detection (2005)
6.
go back to reference Cai, H.; Zheng, V.W.; Chang, K.C.C.: A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 30(9), 1616–1637 (2018) Cai, H.; Zheng, V.W.; Chang, K.C.C.: A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 30(9), 1616–1637 (2018)
7.
go back to reference Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:​1406.​1078 (2014)
8.
go back to reference Alrabaee, S.; Shirani, P.; Wang, L.; Debbabi, M.: Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. (TOPS) 21(2), 1–34 (2018)CrossRef Alrabaee, S.; Shirani, P.; Wang, L.; Debbabi, M.: Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. (TOPS) 21(2), 1–34 (2018)CrossRef
9.
go back to reference Devlin, J.; Chang, M.W.; Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J.; Chang, M.W.; Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
10.
go back to reference Gao, D.; Reiter, M.K.; Song, D.: Binhunt: Automatically finding semantic differences in binary programs pp. 238–255 (2008) Gao, D.; Reiter, M.K.; Song, D.: Binhunt: Automatically finding semantic differences in binary programs pp. 238–255 (2008)
11.
go back to reference Gao, J.; Yang, X.; Fu, Y.; Jiang, Y.; Sun, J.: Vulseeker: a semantic learning based vulnerability seeker for cross-platform binary pp. 896–899 (2018) Gao, J.; Yang, X.; Fu, Y.; Jiang, Y.; Sun, J.: Vulseeker: a semantic learning based vulnerability seeker for cross-platform binary pp. 896–899 (2018)
12.
go back to reference Grover, A.; Leskovec, J.: node2vec: Scalable feature learning for networks. SIGKDD (2016) Grover, A.; Leskovec, J.: node2vec: Scalable feature learning for networks. SIGKDD (2016)
13.
go back to reference He, K.; Zhang, X.; Ren, S.; Sun, J.: Deep residual learning for image recognition pp. 770–778 (2016) He, K.; Zhang, X.; Ren, S.; Sun, J.: Deep residual learning for image recognition pp. 770–778 (2016)
14.
go back to reference Hughes, L.H.; Schmitt, M.; Mou, L.; Wang, Y.; Zhu, X.X.: Identifying corresponding patches in sar and optical images with a pseudo-siamese cnn. IEEE Geosci. Remote Sens. Lett. 15(5), 784–788 (2018) Hughes, L.H.; Schmitt, M.; Mou, L.; Wang, Y.; Zhu, X.X.: Identifying corresponding patches in sar and optical images with a pseudo-siamese cnn. IEEE Geosci. Remote Sens. Lett. 15(5), 784–788 (2018)
15.
go back to reference Cai, H.; Zheng, V.W.; Chang, K.C.C.: A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 30(9), 1616–1637 (2018)CrossRef Cai, H.; Zheng, V.W.; Chang, K.C.C.: A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. 30(9), 1616–1637 (2018)CrossRef
16.
go back to reference Jen, W.C.; Wettstein, D.; Turner, D.; Chitnis, A.; Kintner, C.: The notch ligand, x-delta-2, mediates segmentation of the paraxial mesoderm in xenopus embryos. Development 124(6), 1169–1178 (1997) Jen, W.C.; Wettstein, D.; Turner, D.; Chitnis, A.; Kintner, C.: The notch ligand, x-delta-2, mediates segmentation of the paraxial mesoderm in xenopus embryos. Development 124(6), 1169–1178 (1997)
17.
go back to reference Kawakami, K.: Supervised sequence labelling with recurrent neural networks. Ph. D. dissertation, PhD thesis. Ph. D. thesis (2008) Kawakami, K.: Supervised sequence labelling with recurrent neural networks. Ph. D. dissertation, PhD thesis. Ph. D. thesis (2008)
20.
go back to reference L. Luo J. Ming, D.W.P.L.; Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. International Symposium on Foundations of Software Engineering (2014) L. Luo J. Ming, D.W.P.L.; Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. International Symposium on Foundations of Software Engineering (2014)
21.
go back to reference Lin, Z.; Feng, M.; Santos, C.N.d.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y.: A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017) Lin, Z.; Feng, M.; Santos, C.N.d.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y.: A structured self-attentive sentence embedding. arXiv preprint arXiv:​1703.​03130 (2017)
22.
go back to reference Massarelli, L.; Di Luna, G.A.; Petroni, F.; Querzoni, L.; Baldoni, R.: Safe: Self-attentive function embeddings for binary similarity. Detection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference (2019) Massarelli, L.; Di Luna, G.A.; Petroni, F.; Querzoni, L.; Baldoni, R.: Safe: Self-attentive function embeddings for binary similarity. Detection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference (2019)
23.
go back to reference Niepert, M.; Ahmed, M.; Kutzkov, K.: Learning convolutional neural networks for graphs pp. 2014–2023 (2016) Niepert, M.; Ahmed, M.; Kutzkov, K.: Learning convolutional neural networks for graphs pp. 2014–2023 (2016)
24.
go back to reference Pewny, J.; Garmany, B.; Gawlik, R.; Rossow, C.; Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy, pp. 709–724. IEEE (2015) Pewny, J.; Garmany, B.; Gawlik, R.; Rossow, C.; Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy, pp. 709–724. IEEE (2015)
25.
go back to reference Polychronakis, M.; Anagnostakis, K.; Markatos, E.: Detection of intrusions and malware & vulnerability assessment. In: Proceedings of the Third international conference on Detection of Intrusions and Malware and Vulnerability Assessment, 4064: 54–73. Springer (2006) Polychronakis, M.; Anagnostakis, K.; Markatos, E.: Detection of intrusions and malware & vulnerability assessment. In: Proceedings of the Third international conference on Detection of Intrusions and Malware and Vulnerability Assessment, 4064: 54–73. Springer (2006)
26.
go back to reference Ribeiro, L.F.; Saverese, P.H.; Figueiredo, D.R.: struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017) Ribeiro, L.F.; Saverese, P.H.; Figueiredo, D.R.: struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017)
28.
go back to reference Jen, W.C.; Wettstein, D.; Turner, D.; Chitnis, A.; Kintner, C.: The notch ligand, x-delta-2, mediates segmentation of the paraxial mesoderm in xenopus embryos. Development 124(6), 1169–1178 (1997)CrossRef Jen, W.C.; Wettstein, D.; Turner, D.; Chitnis, A.; Kintner, C.: The notch ligand, x-delta-2, mediates segmentation of the paraxial mesoderm in xenopus embryos. Development 124(6), 1169–1178 (1997)CrossRef
29.
go back to reference U. Alon M. Zilberstein, O.L.; Yahav, E.: code2vec: learning distributed representations of code. PACMPL (2019) U. Alon M. Zilberstein, O.L.; Yahav, E.: code2vec: learning distributed representations of code. PACMPL (2019)
32.
go back to reference X. Xu C. Liu, Q.F.H.Y.L.S.; Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. ACM Conference on Computer and Communication Security (2017) X. Xu C. Liu, Q.F.H.Y.L.S.; Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. ACM Conference on Computer and Communication Security (2017)
33.
go back to reference Hughes, L.H.; Schmitt, M.; Mou, L.; Wang, Y.; Zhu, X.X.: Identifying corresponding patches in sar and optical images with a pseudo-siamese cnn. IEEE Geosci. Remote Sens. Lett. 15(5), 784–788 (2018)CrossRef Hughes, L.H.; Schmitt, M.; Mou, L.; Wang, Y.; Zhu, X.X.: Identifying corresponding patches in sar and optical images with a pseudo-siamese cnn. IEEE Geosci. Remote Sens. Lett. 15(5), 784–788 (2018)CrossRef
34.
go back to reference Y. Li C. Gu, T.D.O.V.; Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. Proceedings of the 36th International Conference on Machine Learning (2019) Y. Li C. Gu, T.D.O.V.; Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. Proceedings of the 36th International Conference on Machine Learning (2019)
35.
go back to reference Zhou, J.; Cui, G.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M.: Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434 (2018) Zhou, J.; Cui, G.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M.: Graph neural networks: A review of methods and applications. arXiv preprint arXiv:​1812.​08434 (2018)
36.
go back to reference Zuo, F.; Li, X.; Young, P.; Luo, L.; Zeng, Q.; Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018) Zuo, F.; Li, X.; Young, P.; Luo, L.; Zeng, Q.; Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:​1808.​04706 (2018)
Metadata
Title
Multi-Level Cross-Architecture Binary Code Similarity Metric
Authors
Meng Qiao
Xiaochuan Zhang
Huihui Sun
Zheng Shan
Fudong Liu
Wenjie Sun
Xingwei Li
Publication date
16-04-2021
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 9/2021
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-021-05630-7

Other articles of this Issue 9/2021

Arabian Journal for Science and Engineering 9/2021 Go to the issue

Research Article-Computer Engineering and Computer Science

Blockchain-Based Decentralized Lightweight Control Access Scheme for Smart Grids

Research Article-Computer Engineering and Computer Science

A Non-convex Economic Load Dispatch Using Hybrid Salp Swarm Algorithm

Research Article-Computer Engineering and Computer Science

Credit Card Fraud Detection Technique by Applying Graph Database Model

Premium Partners