Skip to main content
Erschienen in: International Journal of Information Security 1/2024

14.08.2023 | Regular Contribution

Short- versus long-term performance of detection models for obfuscated MSOffice-embedded malware

verfasst von: Silviu Viţel, Marilena Lupaşcu, Dragoş Teodor Gavriluţ, Henri Luchian

Erschienen in: International Journal of Information Security | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper analyzes the efficiency of various machine learning models (artificial neural networks, random forest, decision tree, AdaBoost and XGBoost) against the evolution of VBA-based (Visual Basic for Applications) malware over a large period of time (1995–2021). The file set used in our research is comprehensive—approximately 1.9 million files (out of which 944,595 are malicious and the rest are benign)—which allowed to gain insights on the resilience of various machine learning models against the diversity and the evolution of file features that reflect obfuscation techniques in VBA-based malware. In studying detection of VBA-based malware, we focus on characteristics of both the classifiers—proactivity (short-term detection efficiency against future malware), endurance (long-term detection robustness)—and of the detection-wise relevant file features—feature perishability (dynamics of feature relevance). We also describe in some detail—as a prerequisite of the study—various obfuscation techniques used by the malware under investigation during the last decade.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
This information is not available anymore on the Microsoft website; it can still be found at https://​web.​archive.​org/​web/​20170412001248/​news.​microsoft.​com/​bythenumbers/​planet-office.
 
5
To what degree is a human reader confused.
 
6
\(\hbox {Precision}= \hbox {TP} / (\hbox {TP}+\hbox {FP})\); \(\hbox {recall}= \hbox {TP} / (\hbox {TP}+\hbox {FN})\), where: TP—true positive; FP—false positive; FN—false negative.
 
7
Term Frequency.
 
8
Term Frequency − Inverse Document Frequency.
 
9
Bag of Words.
 
10
Latent Semantic Indexing.
 
11
Sparse composite document vectors.
 
12
Office document files have the Open XML file format. It represents a ZIP archive containing data structured in separate XML files.
 
20
Despite the fact that Microsoft introduced security measures aimed at preventing the execution of malicious macros, attackers often managed to convince unsuspecting users to open infected documents, by disguising their origin or describing the enabling of macros as a necessary step to access a document’s data.
 
26
In the whole database D, ignoring the time stamps.
 
Literatur
1.
Zurück zum Zitat Viţel, S., Lupaşcu, M., Gavriluţ, D.T., Luchian, H.: Detection of msoffice-embedded malware: Feature mining and short- vs. long-term performance. In: Su, C., Gritzalis, D., Piuri, V. (eds.) Information Security Practice and Experience, pp. 287–305. Springer, Cham (2022)CrossRef Viţel, S., Lupaşcu, M., Gavriluţ, D.T., Luchian, H.: Detection of msoffice-embedded malware: Feature mining and short- vs. long-term performance. In: Su, C., Gritzalis, D., Piuri, V. (eds.) Information Security Practice and Experience, pp. 287–305. Springer, Cham (2022)CrossRef
3.
Zurück zum Zitat You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: 2010 International conference on broadband, wireless computing, communication and applications, pp. 297–300. IEEE (2010) You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: 2010 International conference on broadband, wireless computing, communication and applications, pp. 297–300. IEEE (2010)
5.
Zurück zum Zitat Ertaul, L., Venkatesh, S.: Jhide—a tool kit for code obfuscation. In: IASTED Conference on Software Engineering and Applications, pp. 133–138 (2004) Ertaul, L., Venkatesh, S.: Jhide—a tool kit for code obfuscation. In: IASTED Conference on Software Engineering and Applications, pp. 133–138 (2004)
6.
Zurück zum Zitat Ertaul, L., Venkatesh, S.: Novel obfuscation algorithms for software security. In: Proceedings of the 2005 International Conference on Software Engineering Research and Practice, SERP, Citeseer, vol. 5 (2005) Ertaul, L., Venkatesh, S.: Novel obfuscation algorithms for software security. In: Proceedings of the 2005 International Conference on Software Engineering Research and Practice, SERP, Citeseer, vol. 5 (2005)
9.
Zurück zum Zitat Chellapilla, K., Maykov, A.: A taxonomy of javascript redirection spam. In: AIRWeb ’07 (2007) Chellapilla, K., Maykov, A.: A taxonomy of javascript redirection spam. In: AIRWeb ’07 (2007)
10.
Zurück zum Zitat AL-Taharwa, I.A., Lee, H.M., Jeng, A.B., Wu, K.P., Ho, C.S., Chen, S.M.: Jsod: Javascript obfuscation detector. Secur. Commun. Netw. 8(6), 1092–1107 (2015)CrossRef AL-Taharwa, I.A., Lee, H.M., Jeng, A.B., Wu, K.P., Ho, C.S., Chen, S.M.: Jsod: Javascript obfuscation detector. Secur. Commun. Netw. 8(6), 1092–1107 (2015)CrossRef
11.
Zurück zum Zitat Xu, W., Zhang, F., Zhu, S.: Jstill: mostly static detection of obfuscated malicious javascript code. In: Proceedings of the third ACM conference on Data and application security and privacy, pp. 117–128 (2013) Xu, W., Zhang, F., Zhu, S.: Jstill: mostly static detection of obfuscated malicious javascript code. In: Proceedings of the third ACM conference on Data and application security and privacy, pp. 117–128 (2013)
12.
Zurück zum Zitat Choi, Y., Kim, T., Choi, S., Lee, C.: Automatic detection for javascript obfuscation attacks in web pages through string pattern analysis. In: Ślezak, D., Lee, Y., Kim, T., Fang, W. (eds.) Future Generation Information Technology, pp. 160–172. Springer, Berlin (2009)CrossRef Choi, Y., Kim, T., Choi, S., Lee, C.: Automatic detection for javascript obfuscation attacks in web pages through string pattern analysis. In: Ślezak, D., Lee, Y., Kim, T., Fang, W. (eds.) Future Generation Information Technology, pp. 160–172. Springer, Berlin (2009)CrossRef
13.
Zurück zum Zitat Liu, C., Xia, B., Yu, M., Liu, Y.: Psdem: a feasible de-obfuscation method for malicious powershell detection. In: 2018 IEEE Symposium on Computers and Communications (ISCC), pp 825–831. IEEE (2018) Liu, C., Xia, B., Yu, M., Liu, Y.: Psdem: a feasible de-obfuscation method for malicious powershell detection. In: 2018 IEEE Symposium on Computers and Communications (ISCC), pp 825–831. IEEE (2018)
14.
Zurück zum Zitat Ugarte, D., Maiorca, D., Cara, F., Giacinto, G.: Powerdrive: accurate de-obfuscation and analysis of powershell malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp 240–259. Springer (2019) Ugarte, D., Maiorca, D., Cara, F., Giacinto, G.: Powerdrive: accurate de-obfuscation and analysis of powershell malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp 240–259. Springer (2019)
15.
Zurück zum Zitat Hendler, D., Kels, S., Rubin, A.: Detecting malicious powershell commands using deep neural networks. In: Proceedings of the 2018 on Asia conference on computer and communications security, pp. 187–197 (2018) Hendler, D., Kels, S., Rubin, A.: Detecting malicious powershell commands using deep neural networks. In: Proceedings of the 2018 on Asia conference on computer and communications security, pp. 187–197 (2018)
16.
Zurück zum Zitat Aboud, E., O’Brien, D.: Detection of malicious VBA macros using machine learning methods (2018) Aboud, E., O’Brien, D.: Detection of malicious VBA macros using machine learning methods (2018)
17.
Zurück zum Zitat Kim, S., Hong, S., Oh, J., Lee, H.: Obfuscated VBA macro detection using machine learning. In: DSN, IEEE Computer Society, pp. 490–501 (2018) Kim, S., Hong, S., Oh, J., Lee, H.: Obfuscated VBA macro detection using machine learning. In: DSN, IEEE Computer Society, pp. 490–501 (2018)
18.
Zurück zum Zitat De los Santos, S., Torres, J.: Macro malware detection using machine learning techniques—a new approach. In: ICISSP, pp. 295–302 (2017) De los Santos, S., Torres, J.: Macro malware detection using machine learning techniques—a new approach. In: ICISSP, pp. 295–302 (2017)
19.
Zurück zum Zitat Bearden, R., Lo, DCT: Automated microsoft office macro malware detection using machine learning. In: 2017 IEEE International Conference on Big Data (2017) Bearden, R., Lo, DCT: Automated microsoft office macro malware detection using machine learning. In: 2017 IEEE International Conference on Big Data (2017)
20.
Zurück zum Zitat Huneault-Leblanc, S., Talhi, C.: P-code based classification to detect malicious vba macro. In: 2020 International Symposium on Networks. Computers and Communications (ISNCC), pp. 1–6. IEEE (2020) Huneault-Leblanc, S., Talhi, C.: P-code based classification to detect malicious vba macro. In: 2020 International Symposium on Networks. Computers and Communications (ISNCC), pp. 1–6. IEEE (2020)
21.
Zurück zum Zitat Mimura, M., Miura, H.: Detecting unseen malicious VBA macros with NLP techniques. J. Inf. Process. 27, 555–563 (2019) Mimura, M., Miura, H.: Detecting unseen malicious VBA macros with NLP techniques. J. Inf. Process. 27, 555–563 (2019)
22.
Zurück zum Zitat Mimura, M.: An improved method of detecting macro malware on an imbalanced dataset. IEEE Access 8, 204709–204717 (2020)CrossRef Mimura, M.: An improved method of detecting macro malware on an imbalanced dataset. IEEE Access 8, 204709–204717 (2020)CrossRef
24.
Zurück zum Zitat Mimura, M.: Using fake text vectors to improve the sensitivity of minority class for macro malware detection. J. Inf. Secur. Appl. 54, 102600 (2020) Mimura, M.: Using fake text vectors to improve the sensitivity of minority class for macro malware detection. J. Inf. Secur. Appl. 54, 102600 (2020)
25.
Zurück zum Zitat Ravi, V., Gururaj, S., Vedamurthy, H., Nirmala, M.: Analysing corpus of office documents for macro-based attacks using machine learning. Glob. Trans. Proc. 3, 20–24 (2022)CrossRef Ravi, V., Gururaj, S., Vedamurthy, H., Nirmala, M.: Analysing corpus of office documents for macro-based attacks using machine learning. Glob. Trans. Proc. 3, 20–24 (2022)CrossRef
26.
Zurück zum Zitat Nissim, N., Cohen, A., Elovici, Y.: Aldocx: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. EEE Trans. Inf. Forensic Secur. 12, 631–646 (2016)CrossRef Nissim, N., Cohen, A., Elovici, Y.: Aldocx: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. EEE Trans. Inf. Forensic Secur. 12, 631–646 (2016)CrossRef
27.
Zurück zum Zitat Cohen, A., Nissim, N., Rokach, L., Elovici, Y.: Sfem: structural feature extraction methodology for the detection of malicious office documents using machine learning methods. Expert Syst. Appl. 63, 324–343 (2016) CrossRef Cohen, A., Nissim, N., Rokach, L., Elovici, Y.: Sfem: structural feature extraction methodology for the detection of malicious office documents using machine learning methods. Expert Syst. Appl. 63, 324–343 (2016) CrossRef
28.
Zurück zum Zitat Casino, F., Totosis, N., Apostolopoulos, T., Lykousas, N., Patsakis, C.: Analysis and correlation of visual evidence in campaigns of malicious office documents. Association for Computing Machinery, New York, NY, USA (2022) https://doi.org/10.1145/3513025 Casino, F., Totosis, N., Apostolopoulos, T., Lykousas, N., Patsakis, C.: Analysis and correlation of visual evidence in campaigns of malicious office documents. Association for Computing Machinery, New York, NY, USA (2022) https://​doi.​org/​10.​1145/​3513025
29.
Zurück zum Zitat Rudd, EM., Harang, RE., Saxe, J.: MEADE: towards a malicious email attachment detection engine (2018) CoRR abs/1804.08162, arXiv:1804.08162 Rudd, EM., Harang, RE., Saxe, J.: MEADE: towards a malicious email attachment detection engine (2018) CoRR abs/1804.08162, arXiv:​1804.​08162
30.
Zurück zum Zitat Yang, S., Chen, W., Li, S., Xu, Q.: Approach using transforming structural data into image for detection of malicious ms-doc files based on deep learning models. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 28–32 (2019) Yang, S., Chen, W., Li, S., Xu, Q.: Approach using transforming structural data into image for detection of malicious ms-doc files based on deep learning models. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 28–32 (2019)
32.
Zurück zum Zitat Li, Wj., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.: A study of malcode-bearing documents (2007) Li, Wj., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.: A study of malcode-bearing documents (2007)
33.
Zurück zum Zitat Koutsokostas, V., Lykousas, N., Apostolopoulos, T., Orazi, G., Ghosal, A., Casino, F., Conti, M., Patsakis, C.: Invoice# 31415 attached: Automated analysis of malicious microsoft office documents. Comput. Secur. 114(102), 582 (2022) Koutsokostas, V., Lykousas, N., Apostolopoulos, T., Orazi, G., Ghosal, A., Casino, F., Conti, M., Patsakis, C.: Invoice# 31415 attached: Automated analysis of malicious microsoft office documents. Comput. Secur. 114(102), 582 (2022)
34.
Zurück zum Zitat Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.: Combining static and dynamic analysis for the detection of malicious documents (2011) Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.: Combining static and dynamic analysis for the detection of malicious documents (2011)
35.
Zurück zum Zitat Yu, M., Jiang, J., Li, G., Li, J., Lou, C., Liu, C., Huang, W., Wang, Y.: A unified malicious documents detection model based on two layers of abstraction (2019) Yu, M., Jiang, J., Li, G., Li, J., Lou, C., Liu, C., Huang, W., Wang, Y.: A unified malicious documents detection model based on two layers of abstraction (2019)
36.
Zurück zum Zitat Iwamoto, K., Wasaki, K.: A method for shellcode extraction from malicious document files using entropy and emulation. Int. J. Eng. Technol. 8, 101–106 (2015)CrossRef Iwamoto, K., Wasaki, K.: A method for shellcode extraction from malicious document files using entropy and emulation. Int. J. Eng. Technol. 8, 101–106 (2015)CrossRef
37.
Zurück zum Zitat Schreck, T., Berger, S., Göbel, J.: Bissam: automatic vulnerability identification of office documents (2012) Schreck, T., Berger, S., Göbel, J.: Bissam: automatic vulnerability identification of office documents (2012)
38.
Zurück zum Zitat Smutz, C., Stavrou, A.: Preventing exploits in microsoft office documents through content randomization (2015) Smutz, C., Stavrou, A.: Preventing exploits in microsoft office documents through content randomization (2015)
39.
Zurück zum Zitat Otsubo, Y.: O-checker : Detection of malicious documents through deviation from file format specifications (2016) Otsubo, Y.: O-checker : Detection of malicious documents through deviation from file format specifications (2016)
40.
Zurück zum Zitat Moubarak, J., Feghali, T.: Comparing machine learning techniques for malware detection. In: ICISSP (2020) Moubarak, J., Feghali, T.: Comparing machine learning techniques for malware detection. In: ICISSP (2020)
41.
Zurück zum Zitat Azeez, N.A., Odufuwa, O.E., Misra, S., Oluranti, J., Damaševičius, R.: Windows pe malware detection using ensemble learning. Informatics 8(1), 10 (2021) Azeez, N.A., Odufuwa, O.E., Misra, S., Oluranti, J., Damaševičius, R.: Windows pe malware detection using ensemble learning. Informatics 8(1), 10 (2021)
42.
Zurück zum Zitat Szandała, T.: Review and comparison of commonly used activation functions for deep neural networks. In: Bio-inspired Neurocomputing, pp. 203–224 (2021) Szandała, T.: Review and comparison of commonly used activation functions for deep neural networks. In: Bio-inspired Neurocomputing, pp. 203–224 (2021)
Metadaten
Titel
Short- versus long-term performance of detection models for obfuscated MSOffice-embedded malware
verfasst von
Silviu Viţel
Marilena Lupaşcu
Dragoş Teodor Gavriluţ
Henri Luchian
Publikationsdatum
14.08.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Information Security / Ausgabe 1/2024
Print ISSN: 1615-5262
Elektronische ISSN: 1615-5270
DOI
https://doi.org/10.1007/s10207-023-00736-5

Weitere Artikel der Ausgabe 1/2024

International Journal of Information Security 1/2024 Zur Ausgabe

Premium Partner