Skip to main content

2017 | OriginalPaper | Buchkapitel

On the Feasibility of Malware Authorship Attribution

verfasst von : Saed Alrabaee, Paria Shirani, Mourad Debbabi, Lingyu Wang

Erschienen in: Foundations and Practice of Security

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribution of general purpose software mainly focus on source code, which is typically based on the style of programs and environment. However, those features critically depend on the availability of the program source code, which is usually not the case when dealing with malware binaries. Such program binaries often do not retain many semantic or stylistic features due to the compilation process. Therefore, authorship attribution in the domain of malware binaries based on features and styles that will survive the compilation process is challenging. This paper provides the state of the art in this literature. Further, we analyze the features involved in those techniques. By using a case study, we identify features that can survive the compilation process. Finally, we analyze existing works on binary authorship attribution and study their applicability to real malware binaries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
13.
Zurück zum Zitat Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: Oba2: an onion approach to binary code authorship attribution. Digit. Invest. 11, S94–S103 (2014)CrossRef Alrabaee, S., Saleem, N., Preda, S., Wang, L., Debbabi, M.: Oba2: an onion approach to binary code authorship attribution. Digit. Invest. 11, S94–S103 (2014)CrossRef
14.
Zurück zum Zitat Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: Sigma: a semantic integrated graph matching approach for identifying reused functions in binary code. Digit. Invest. 12, S61–S71 (2015)CrossRef Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: Sigma: a semantic integrated graph matching approach for identifying reused functions in binary code. Digit. Invest. 12, S61–S71 (2015)CrossRef
15.
Zurück zum Zitat Alrabaee, S., Wang, L., Debbabi, M.: Bingold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (sfgs). Digit. Invest. 18, S11–S22 (2016)CrossRef Alrabaee, S., Wang, L., Debbabi, M.: Bingold: towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (sfgs). Digit. Invest. 18, S11–S22 (2016)CrossRef
16.
Zurück zum Zitat Burrows, S., Tahaghoghi, S.M.: Source code authorship attribution using n-grams. Citeseer (2007) Burrows, S., Tahaghoghi, S.M.: Source code authorship attribution using n-grams. Citeseer (2007)
17.
Zurück zum Zitat Burrows, S., Uitdenbogerd, A.L., Turpin, A.: Application of information retrieval techniques for source code authorship attribution. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 699–713. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00887-0_61 CrossRef Burrows, S., Uitdenbogerd, A.L., Turpin, A.: Application of information retrieval techniques for source code authorship attribution. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 699–713. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-00887-0_​61 CrossRef
18.
Zurück zum Zitat Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F., Greenstadt, R.: De-anonymizing programmers via code stylometry. In: 24th USENIX Security Symposium (USENIX Security 2015) , pp. 255–270 (2015) Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F., Greenstadt, R.: De-anonymizing programmers via code stylometry. In: 24th USENIX Security Symposium (USENIX Security 2015) , pp. 255–270 (2015)
19.
Zurück zum Zitat Caliskan-Islam, A., Yamaguchi, F., Dauber, E., Harang, R., Rieck, K., Greenstadt, R., Narayanan, A.: When coding style survives compilation: de-anonymizing programmers from executable binaries. arXiv preprint arXiv:1512.08546 (2015) Caliskan-Islam, A., Yamaguchi, F., Dauber, E., Harang, R., Rieck, K., Greenstadt, R., Narayanan, A.: When coding style survives compilation: de-anonymizing programmers from executable binaries. arXiv preprint arXiv:​1512.​08546 (2015)
20.
Zurück zum Zitat Can, F., Patton, J.M.: Change of writing style with time. Comput. Humanit. 38(1), 61–82 (2004)CrossRef Can, F., Patton, J.M.: Change of writing style with time. Comput. Humanit. 38(1), 61–82 (2004)CrossRef
21.
Zurück zum Zitat Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012) Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)
22.
Zurück zum Zitat Chen, R., Hong, L., Lü, C., Deng, W.: Author identification of software source code with program dependence graphs. In: 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops (COMPSACW), pp. 281–286. IEEE (2010) Chen, R., Hong, L., Lü, C., Deng, W.: Author identification of software source code with program dependence graphs. In: 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops (COMPSACW), pp. 281–286. IEEE (2010)
23.
Zurück zum Zitat Edwards, N., Chen, L.: An historical examination of open source releases and their vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 183–194. ACM (2012) Edwards, N., Chen, L.: An historical examination of open source releases and their vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 183–194. ACM (2012)
24.
Zurück zum Zitat Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)CrossRefMATH Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)CrossRefMATH
25.
Zurück zum Zitat Fowler, M.: Refactoring: Improving the Design of Existing Code. Pearson Education India, New Delhi (2009)MATH Fowler, M.: Refactoring: Improving the Design of Existing Code. Pearson Education India, New Delhi (2009)MATH
26.
Zurück zum Zitat Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Source code author identification based on n-gram author profiles. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds.) AIAI 2006. IIFIP, vol. 204, pp. 508–515. Springer, Heidelberg (2006). doi:10.1007/0-387-34224-9_59 CrossRef Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Source code author identification based on n-gram author profiles. In: Maglogiannis, I., Karpouzis, K., Bramer, M. (eds.) AIAI 2006. IIFIP, vol. 204, pp. 508–515. Springer, Heidelberg (2006). doi:10.​1007/​0-387-34224-9_​59 CrossRef
27.
Zurück zum Zitat Holmes, D.I.: Authorship attribution. Comput. Humanit. 28(2), 87–106 (1994)CrossRef Holmes, D.I.: Authorship attribution. Comput. Humanit. 28(2), 87–106 (1994)CrossRef
28.
Zurück zum Zitat Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 309–320. ACM (2011) Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, pp. 309–320. ACM (2011)
29.
Zurück zum Zitat Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-llvm: software protection for the masses. In: Proceedings of the 1st International Workshop on Software Protection, pp. 3–9. IEEE Press (2015) Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-llvm: software protection for the masses. In: Proceedings of the 1st International Workshop on Software Protection, pp. 3–9. IEEE Press (2015)
30.
Zurück zum Zitat Kephart, J.O., et al.: A biologically inspired immune system for computers. In: Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 130–139 (1994) Kephart, J.O., et al.: A biologically inspired immune system for computers. In: Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 130–139 (1994)
31.
Zurück zum Zitat Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 329–338. IEEE Press (2013) Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 329–338. IEEE Press (2013)
32.
Zurück zum Zitat Knuth, D.E.: Backus normal form vs. backus naur form. Commun. ACM 7(12), 735–736 (1964)CrossRef Knuth, D.E.: Backus normal form vs. backus naur form. Commun. ACM 7(12), 735–736 (1964)CrossRef
33.
Zurück zum Zitat Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification. In: Fourth International Conference on Information Technology, ITNG 2007, pp. 243–248. IEEE (2007) Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification. In: Fourth International Conference on Information Technology, ITNG 2007, pp. 243–248. IEEE (2007)
34.
Zurück zum Zitat Krsul, I., Spafford, E.H.: Authorship analysis: identifying the author of a program. Comput. Secur. 16(3), 233–257 (1997)CrossRef Krsul, I., Spafford, E.H.: Authorship analysis: identifying the author of a program. Comput. Secur. 16(3), 233–257 (1997)CrossRef
35.
Zurück zum Zitat Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.1007/11663812_11 CrossRef Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic worm detection using structural information of executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006). doi:10.​1007/​11663812_​11 CrossRef
36.
Zurück zum Zitat Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)CrossRef Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)CrossRef
37.
Zurück zum Zitat Rahimian, A., Shirani, P., Alrbaee, S., Wang, L., Debbabi, M.: Bincomp: a stratified approach to compiler provenance attribution. Digit. Invest. 14, S146–S155 (2015)CrossRef Rahimian, A., Shirani, P., Alrbaee, S., Wang, L., Debbabi, M.: Bincomp: a stratified approach to compiler provenance attribution. Digit. Invest. 14, S146–S155 (2015)CrossRef
38.
Zurück zum Zitat Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? Identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23822-2_10 CrossRef Rosenblum, N., Zhu, X., Miller, B.P.: Who wrote this code? Identifying the authors of program binaries. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 172–189. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23822-2_​10 CrossRef
39.
Zurück zum Zitat Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the ICEIS, vol. 2(9), pp. 317–320 (2009) Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the ICEIS, vol. 2(9), pp. 317–320 (2009)
40.
Zurück zum Zitat Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: 2009 1st International Symposium on Search Based Software Engineering, pp. 69–78. IEEE (2009) Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: 2009 1st International Symposium on Search Based Software Engineering, pp. 69–78. IEEE (2009)
41.
Zurück zum Zitat Spafford, E.H., Weeber, S.A.: Software forensics: can we track code to its authors? Comput. Secur. 12(6), 585–595 (1993)CrossRef Spafford, E.H., Weeber, S.A.: Software forensics: can we track code to its authors? Comput. Secur. 12(6), 585–595 (1993)CrossRef
42.
Zurück zum Zitat Weiser, M.: Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, pp. 439–449. IEEE Press (1981) Weiser, M.: Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, pp. 439–449. IEEE Press (1981)
43.
Zurück zum Zitat Yang, K.-X., Hu, L., Zhang, N., Huo, Y.-M., Zhao, K.: Improving the defence against web server fingerprinting by eliminating compliance variation. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST), pp. 227–232. IEEE (2010) Yang, K.-X., Hu, L., Zhang, N., Huo, Y.-M., Zhao, K.: Improving the defence against web server fingerprinting by eliminating compliance variation. In: 2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST), pp. 227–232. IEEE (2010)
Metadaten
Titel
On the Feasibility of Malware Authorship Attribution
verfasst von
Saed Alrabaee
Paria Shirani
Mourad Debbabi
Lingyu Wang
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-51966-1_17