Skip to main content

2017 | OriginalPaper | Buchkapitel

BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape

verfasst von : Paria Shirani, Lingyu Wang, Mourad Debbabi

Erschienen in: Detection of Intrusions and Malware, and Vulnerability Assessment

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Identifying library functions in program binaries is important to many security applications, such as threat analysis, digital forensics, software infringement, and malware detection. Today’s program binaries normally contain a significant amount of third-party library functions taken from standard libraries or free open-source software packages. The ability to automatically identify such library functions not only enhances the quality and the efficiency of threat analysis and reverse engineering tasks, but also improves their accuracy by avoiding false correlations between irrelevant code bases. Existing methods are found to either lack efficiency or are not robust enough to identify different versions of the same library function caused by the use of different compilers, different compilation settings, or obfuscation techniques. To address these limitations, we present a scalable and robust system called BinShape to identify standard library functions in binaries. The key idea of BinShape is twofold. First, we derive a robust signature for each library function based on heterogeneous features covering CFGs, instruction-level characteristics, statistical characteristics, and function-call graphs. Second, we design a novel data structure to store such signatures and facilitate efficient matching against a target function. We evaluate BinShape on a diverse set of C/C++ binaries, compiled with GCC and Visual Studio compilers on x86-x64 CPU architectures, at optimization levels \(O0-O3\). Our experiments show that BinShape is able to identify library functions in real binaries both efficiently and accurately, with an average accuracy of \(89\%\) and taking about 0.14 s to identify one function out of three million candidates. We also show that BinShape is robust enough when the code is subjected to different compilers, slight modification, or some obfuscation techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
8.
Zurück zum Zitat Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: OBA2: an Onion approach to binary code authorship attribution. Digital Invest. 11, S94–S103 (2014)CrossRef Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: OBA2: an Onion approach to binary code authorship attribution. Digital Invest. 11, S94–S103 (2014)CrossRef
9.
Zurück zum Zitat Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Invest. 12, S61–S71 (2015)CrossRef Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Invest. 12, S61–S71 (2015)CrossRef
10.
Zurück zum Zitat Alrabaee, S., Wang, L., Debbabi, M.: BinGold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Invest. 18, S11–S22 (2016)CrossRef Alrabaee, S., Wang, L., Debbabi, M.: BinGold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Invest. 18, S11–S22 (2016)CrossRef
11.
Zurück zum Zitat Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, p. 4. ACM (2013) Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, p. 4. ACM (2013)
12.
Zurück zum Zitat David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 266–280. ACM (2016) David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 266–280. ACM (2016)
13.
Zurück zum Zitat David, Y., Yahav, E.: Tracelet-based code search in executables. In: ACM SIGPLAN Notices, vol. 49, pp. 349–360. ACM (2014) David, Y., Yahav, E.: Tracelet-based code search in executables. In: ACM SIGPLAN Notices, vol. 49, pp. 349–360. ACM (2014)
14.
Zurück zum Zitat Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5, 1–3 (2005) Dullien, T., Rolles, R.: Graph-based comparison of executable objects (English version). SSTIC 5, 1–3 (2005)
15.
Zurück zum Zitat Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, San Francisco (2011) Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, San Francisco (2011)
16.
Zurück zum Zitat Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012) Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. (CSUR) 44(2), 6 (2012)
17.
Zurück zum Zitat Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: Usenix Security, pp. 303–317 (2014) Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: Usenix Security, pp. 303–317 (2014)
18.
Zurück zum Zitat Elmore, K.L., Richman, M.B.: Euclidean distance as a similarity metric for principal component analysis. Mon. Weather Rev. 129(3), 540–549 (2001)CrossRef Elmore, K.L., Richman, M.B.: Euclidean distance as a similarity metric for principal component analysis. Mon. Weather Rev. 129(3), 540–549 (2001)CrossRef
19.
Zurück zum Zitat Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In Proceedings of the 23th Symposium on Network and Distributed System Security (NDSS) (2016) Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In Proceedings of the 23th Symposium on Network and Distributed System Security (NDSS) (2016)
20.
Zurück zum Zitat Farhadi, M.R., Fung, B.C., Fung, Y.B., Charland, P., Preda, S., Debbabi, M.: Scalable code clone search for malware analysis. Digital Invest. 15, 46–60 (2015)CrossRef Farhadi, M.R., Fung, B.C., Fung, Y.B., Charland, P., Preda, S., Debbabi, M.: Scalable code clone search for malware analysis. Digital Invest. 15, 46–60 (2015)CrossRef
21.
Zurück zum Zitat Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 480–491. ACM (2016) Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 480–491. ACM (2016)
22.
Zurück zum Zitat Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H.: Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998)CrossRefMATH Frank, E., Wang, Y., Inglis, S., Holmes, G., Witten, I.H.: Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998)CrossRefMATH
23.
Zurück zum Zitat Gascon, H., Yamaguchi, F., Arp, D., Rieck, K.: Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec), pp. 45–54. ACM (2013) Gascon, H., Yamaguchi, F., Arp, D., Rieck, K.: Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AISec), pp. 45–54. ACM (2013)
25.
Zurück zum Zitat Hido, S., Kashima, H.: A linear-time graph kernel. In: Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 179–188. IEEE (2009) Hido, S., Kashima, H.: A linear-time graph kernel. In: Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 179–188. IEEE (2009)
26.
Zurück zum Zitat Hu, X., Chiueh, T.-C., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS), pp. 611–620. ACM (2009) Hu, X., Chiueh, T.-C., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS), pp. 611–620. ACM (2009)
27.
Zurück zum Zitat Huang, H., Youssef, A.M., Debbabi, M.: BinSequence: fast, accurate and scalable binary code reuse detection. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS), pp. 155–166. ACM (2017) Huang, H., Youssef, A.M., Debbabi, M.: BinSequence: fast, accurate and scalable binary code reuse detection. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS), pp. 155–166. ACM (2017)
28.
Zurück zum Zitat Jacobson, E.R., Rosenblum, N., Miller, B.P.: Labeling library functions in stripped binaries. In: Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), pp. 1–8. ACM (2011) Jacobson, E.R., Rosenblum, N., Miller, B.P.: Labeling library functions in stripped binaries. In: Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), pp. 1–8. ACM (2011)
29.
Zurück zum Zitat Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-LLVM: software protection for the masses. In: Proceedings of the 1st International Workshop on Software PROtection (SPRO), pp. 3–9. IEEE Press (2015) Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-LLVM: software protection for the masses. In: Proceedings of the 1st International Workshop on Software PROtection (SPRO), pp. 3–9. IEEE Press (2015)
30.
Zurück zum Zitat Khoo, W.M.: Decompilation as search. Technical report, University of Cambridge, Computer Laboratory (2013) Khoo, W.M.: Decompilation as search. Technical report, University of Cambridge, Computer Laboratory (2013)
31.
Zurück zum Zitat Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp. 329–338. IEEE Press (2013) Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp. 329–338. IEEE Press (2013)
32.
Zurück zum Zitat Kolbitsch, C., Holz, T., Kruegel, C., Kirda, E.: Inspector gadget: automated extraction of proprietary gadgets from malware binaries. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 29–44. IEEE (2010) Kolbitsch, C., Holz, T., Kruegel, C., Kirda, E.: Inspector gadget: automated extraction of proprietary gadgets from malware binaries. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 29–44. IEEE (2010)
33.
Zurück zum Zitat Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.1007/11663812_11 CrossRef Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.​1007/​11663812_​11 CrossRef
34.
Zurück zum Zitat Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). doi:10.1007/978-3-319-11379-1_1 Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). doi:10.​1007/​978-3-319-11379-1_​1
35.
Zurück zum Zitat Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)CrossRef Lin, D., Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)CrossRef
37.
Zurück zum Zitat Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 431–441. IEEE (2007) Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: fast, generic, and safe unpacking of malware. In: Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 431–441. IEEE (2007)
38.
Zurück zum Zitat Nouh, L., Rahimian, A., Mouheb, D., Debbabi, M., Hanna, A.: BinSign: fingerprinting binary functions to support automated analysis of code executables. In: IFIP International Information Security and Privacy Conference (IFIP SEC). Springer (2017) Nouh, L., Rahimian, A., Mouheb, D., Debbabi, M., Hanna, A.: BinSign: fingerprinting binary functions to support automated analysis of code executables. In: IFIP International Information Security and Privacy Conference (IFIP SEC). Springer (2017)
39.
Zurück zum Zitat Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 27(8), 1226–1238 (2005)CrossRef Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 27(8), 1226–1238 (2005)CrossRef
40.
Zurück zum Zitat Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy (SP), pp. 709–724. IEEE (2015) Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy (SP), pp. 709–724. IEEE (2015)
41.
Zurück zum Zitat Qiu, J., Su, X., Ma, P.: Using reduced execution flow graph to identify library functions in binary code. IEEE Trans. Softw. Eng. (TSE) 42(2), 187–202 (2016)CrossRef Qiu, J., Su, X., Ma, P.: Using reduced execution flow graph to identify library functions in binary code. IEEE Trans. Softw. Eng. (TSE) 42(2), 187–202 (2016)CrossRef
42.
Zurück zum Zitat Rad, B.B., Masrom, M., Ibrahim, S.: Opcodes histogram for classifying metamorphic portable executables malware. In: 2012 International Conference on e-Learning and e-Technologies in Education (ICEEE), pp. 209–213. IEEE (2012) Rad, B.B., Masrom, M., Ibrahim, S.: Opcodes histogram for classifying metamorphic portable executables malware. In: 2012 International Conference on e-Learning and e-Technologies in Education (ICEEE), pp. 209–213. IEEE (2012)
43.
Zurück zum Zitat Ramaswami, M., Bhaskaran, R.: A study on feature selection techniques in educational data mining. arXiv preprint arXiv:0912.3924 (2009) Ramaswami, M., Bhaskaran, R.: A study on feature selection techniques in educational data mining. arXiv preprint arXiv:​0912.​3924 (2009)
44.
Zurück zum Zitat Roobaert, D., Karakoulas, G., Chawla, N.V.: Information gain, correlation and support vector machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 463–470. Springer, Heidelberg (2006)CrossRef Roobaert, D., Karakoulas, G., Chawla, N.V.: Information gain, correlation and support vector machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 463–470. Springer, Heidelberg (2006)CrossRef
45.
Zurück zum Zitat Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23822-2_10 CrossRef Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23822-2_​10 CrossRef
46.
Zurück zum Zitat Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2013)CrossRef Toderici, A.H., Stamp, M.: Chi-squared distance and metamorphic virus detection. J. Comput. Virol. Hacking Tech. 9(1), 1–14 (2013)CrossRef
47.
Zurück zum Zitat van der Veen, V., Göktas, E., Contag, M., Pawoloski, A., Chen, X., Rawat, S., Bos, H., Holz, T., Athanasopoulos, E., Giuffrida, C.: A tough call: mitigating advanced code-reuse attacks at the binary level. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 934–953. IEEE (2016) van der Veen, V., Göktas, E., Contag, M., Pawoloski, A., Chen, X., Rawat, S., Bos, H., Holz, T., Athanasopoulos, E., Giuffrida, C.: A tough call: mitigating advanced code-reuse attacks at the binary level. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 934–953. IEEE (2016)
48.
Zurück zum Zitat Yokoyama, A., et al.: SandPrint: fingerprinting malware sandboxes to provide intelligence for sandbox evasion. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 165–187. Springer, Cham (2016). doi:10.1007/978-3-319-45719-2_8 CrossRef Yokoyama, A., et al.: SandPrint: fingerprinting malware sandboxes to provide intelligence for sandbox evasion. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 165–187. Springer, Cham (2016). doi:10.​1007/​978-3-319-45719-2_​8 CrossRef
49.
Zurück zum Zitat Zeng, J., Fu, Y., Miller, K.A., Lin, Z., Zhang, X., Xu, D.: Obfuscation resilient binary code reuse through trace-oriented programming. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (CCS), pp. 487–498. ACM (2013) Zeng, J., Fu, Y., Miller, K.A., Lin, Z., Zhang, X., Xu, D.: Obfuscation resilient binary code reuse through trace-oriented programming. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (CCS), pp. 487–498. ACM (2013)
50.
Zurück zum Zitat Ziegel, E.R.: Probability and Statistics for Engineering and the Sciences. Technometrics (2012) Ziegel, E.R.: Probability and Statistics for Engineering and the Sciences. Technometrics (2012)
Metadaten
Titel
BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape
verfasst von
Paria Shirani
Lingyu Wang
Mourad Debbabi
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-60876-1_14