Skip to main content

2023 | OriginalPaper | Buchkapitel

Machine Learning Methodologies for Preventing Malware Obfuscation

verfasst von : Vincenzo Carletti, Alessia Saggese, Pasquale Foggia, Antonio Greco, Mario Vento

Erschienen in: Security, Trust and Privacy Models, and Architectures in IoT Environments

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Malware is a serious threat in a world where IoT devices are becoming more and more pervasive; indeed, every day new and more sophisticated malware can rely on an attack surface that grows together with the number of new devices coming to the market. There is a constant competition between malware detection systems that have to adapt their knowledge base and heuristics day by day and malware writers that have to find new techniques to evade these systems. In this scenario, machine learning methods are the best candidate to face the continuous evolution of malware; this justifies the increasing interest in such approaches to build antimalware systems able to learn and adapt themselves. However, a still open question is how robust machine learning-based systems are against obfuscation techniques: methods that base their effectiveness on what they are able to learn from a training set are potentially vulnerable to modifications of the code that alter the probabilistic distribution of the features observed during the training phase. In this paper we propose a comparison of seven different methods trained to classify malware, paying specific attention to the recent image-based approaches. The comparison has been conducted using one of the largest dataset of malware publicly released until now, i.e., the SOREL-20M, composed of more than 20 million of samples divided in 11 families of malware. In the proposed analysis, we have considered four basic obfuscation techniques based on the addition of a sequence of bytes at the end of the executable; they are very easy to implement for a malware writer. All the tested methods achieved a very high accuracy on unmodified test samples, but only few of them have demonstrated to be able to withstand the considered obfuscation techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alam MS, Vuong ST (2013) Random forest classification for detecting android malware. In: 2013 IEEE international conference on green computing and communications and IEEE Internet of Things and IEEE cyber, physical and social computing, IEEE, pp 663–669 Alam MS, Vuong ST (2013) Random forest classification for detecting android malware. In: 2013 IEEE international conference on green computing and communications and IEEE Internet of Things and IEEE cyber, physical and social computing, IEEE, pp 663–669
3.
Zurück zum Zitat Anderson HS, Roth P (2018) Ember: An open dataset for training static pe malware machine learning models. 1804.04637 Anderson HS, Roth P (2018) Ember: An open dataset for training static pe malware machine learning models. 1804.04637
9.
Zurück zum Zitat Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794 Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
12.
Zurück zum Zitat Corporation S (2020) Symantec internet security threat report Corporation S (2020) Symantec internet security threat report
16.
Zurück zum Zitat Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence 14(771–780):1612 Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence 14(771–780):1612
19.
Zurück zum Zitat Gibert D, Mateu C, Planes J, Vicens R (2019) Using convolutional neural networks for classification of malware represented as images. J comput virol hacking tech 15 Gibert D, Mateu C, Planes J, Vicens R (2019) Using convolutional neural networks for classification of malware represented as images. J comput virol hacking tech 15
21.
Zurück zum Zitat Harang R, Rudd EM (2020) Sorel-20m: A large scale benchmark dataset for malicious pe detection. 2012.07634 Harang R, Rudd EM (2020) Sorel-20m: A large scale benchmark dataset for malicious pe detection. 2012.07634
23.
Zurück zum Zitat Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, vol 1, pp 278–282 Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, vol 1, pp 278–282
24.
Zurück zum Zitat Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. 1704.04861 Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. 1704.04861
26.
Zurück zum Zitat Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol 30 Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol 30
28.
Zurück zum Zitat Khan RU, Zhang X, Kumar R (2019) Analysis of ResNet and GoogleNet models for malware detection. J comput virol hacking tech 15(1) Khan RU, Zhang X, Kumar R (2019) Analysis of ResNet and GoogleNet models for malware detection. J comput virol hacking tech 15(1)
29.
32.
Zurück zum Zitat Lin Y, Chang X (2021) Towards interpretable ensemble learning for image-based malware detection. arXiv preprint arXiv:210104889 Lin Y, Chang X (2021) Towards interpretable ensemble learning for image-based malware detection. arXiv preprint arXiv:210104889
35.
Zurück zum Zitat Morales-Molina CD, Santamaria-Guerrero D, Sanchez-Perez G, Perez-Meana H, Hernandez-Suarez A (2018) Methodology for malware classification using a random forest classifier. In: 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), https://doi.org/10.1109/ROPEC.2018.8661441 Morales-Molina CD, Santamaria-Guerrero D, Sanchez-Perez G, Perez-Meana H, Hernandez-Suarez A (2018) Methodology for malware classification using a random forest classifier. In: 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), https://​doi.​org/​10.​1109/​ROPEC.​2018.​8661441
36.
Zurück zum Zitat Nataraj L, Karthikeyan S, Jacob G, Manjunath BS (2011) Malware images: Visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, Association for Computing Machinery, https://doi.org/10.1145/2016904.2016908 Nataraj L, Karthikeyan S, Jacob G, Manjunath BS (2011) Malware images: Visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, Association for Computing Machinery, https://​doi.​org/​10.​1145/​2016904.​2016908
37.
Zurück zum Zitat Nisa M, Shah J, Kanwal S, Raza M, Khan M, Damasevicius R, Blazauskas T (2020) Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features. Applied Sciences 10, https://doi.org/10.3390/app10144966 Nisa M, Shah J, Kanwal S, Raza M, Khan M, Damasevicius R, Blazauskas T (2020) Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features. Applied Sciences 10, https://​doi.​org/​10.​3390/​app10144966
38.
Zurück zum Zitat Raff E, Barker J, Sylvester J, Brandon R, Catanzaro B, Nicholas C (2018) Malware detection by eating a whole exe Raff E, Barker J, Sylvester J, Brandon R, Catanzaro B, Nicholas C (2018) Malware detection by eating a whole exe
39.
Zurück zum Zitat Raff E, Zak R, Cox R, Sylvester J, Yacci P, Ward R, Tracy A, McLean M, Nicholas C (2018) An investigation of byte n-gram features for malware classification. J comput virol hacking tech 14 Raff E, Zak R, Cox R, Sylvester J, Yacci P, Ward R, Tracy A, McLean M, Nicholas C (2018) An investigation of byte n-gram features for malware classification. J comput virol hacking tech 14
41.
Zurück zum Zitat Rezende E, Ruppert G, Carvalho T, Ramos F, de Geus P (2017) Malicious software classification using transfer learning of resnet-50 deep neural network. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp 1011–1014, https://doi.org/10.1109/ICMLA.2017.00-19 Rezende E, Ruppert G, Carvalho T, Ramos F, de Geus P (2017) Malicious software classification using transfer learning of resnet-50 deep neural network. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp 1011–1014, https://​doi.​org/​10.​1109/​ICMLA.​2017.​00-19
42.
Zurück zum Zitat Rezende E, Ruppert G, Carvalho T, Theophilo A, Ramos F, Geus Pd (2018) Malicious software classification using VGG16 deep neural network’s bottleneck features. In: Advances in Intelligent Systems and Computing, Advances in intelligent systems and computing, Springer International Publishing, pp 51–59 Rezende E, Ruppert G, Carvalho T, Theophilo A, Ramos F, Geus Pd (2018) Malicious software classification using VGG16 deep neural network’s bottleneck features. In: Advances in Intelligent Systems and Computing, Advances in intelligent systems and computing, Springer International Publishing, pp 51–59
43.
Zurück zum Zitat Ronen R, Radu M, Feuerstein C, Yom-Tov E, Ahmadi M (2018) Microsoft malware classification challenge. arXiv preprint arXiv:180210135 Ronen R, Radu M, Feuerstein C, Yom-Tov E, Ahmadi M (2018) Microsoft malware classification challenge. arXiv preprint arXiv:180210135
45.
Zurück zum Zitat Saadat S, Joseph Raymond V (2021) Malware classification using CNN-XGBoost model. In: Artificial Intelligence Techniques for Advanced Computing Applications, Lecture notes in networks and systems, Springer Singapore Saadat S, Joseph Raymond V (2021) Malware classification using CNN-XGBoost model. In: Artificial Intelligence Techniques for Advanced Computing Applications, Lecture notes in networks and systems, Springer Singapore
48.
Zurück zum Zitat Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations
49.
Zurück zum Zitat Su J, Vargas DV, Prasad S, Sgandurra D, Feng Y, Sakurai K (2018) Lightweight Classification of IoT Malware based on Image Recognition. arXiv:180203714 [cs] Su J, Vargas DV, Prasad S, Sgandurra D, Feng Y, Sakurai K (2018) Lightweight Classification of IoT Malware based on Image Recognition. arXiv:180203714 [cs]
51.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
55.
Zurück zum Zitat Wang G, Liu Z (2020) Android malware detection model based on LightGBM. In: Recent Trends in Intelligent Computing, Communication and Devices, Advances in intelligent systems and computing, Springer Singapore, Singapore Wang G, Liu Z (2020) Android malware detection model based on LightGBM. In: Recent Trends in Intelligent Computing, Communication and Devices, Advances in intelligent systems and computing, Springer Singapore, Singapore
57.
Zurück zum Zitat Yadav B, Tokekar S (2021) Recent innovations and comparison of deep learning techniques in malware classification : A review. International Journal on Information Security Science 9:230–247 Yadav B, Tokekar S (2021) Recent innovations and comparison of deep learning techniques in malware classification : A review. International Journal on Information Security Science 9:230–247
58.
Zurück zum Zitat Yue S (2017) Imbalanced Malware Images Classification: a CNN based Approach. arXiv:170808042 [cs, stat] Yue S (2017) Imbalanced Malware Images Classification: a CNN based Approach. arXiv:170808042 [cs, stat]
59.
Zurück zum Zitat Zhang X, Sun M, Wang J, Wang J (2020) Malware detection based on opcode sequence and ResNet. In: Security with Intelligent Computing and Big-data Services, Advances in intelligent systems and computing, Springer International PublishingCrossRef Zhang X, Sun M, Wang J, Wang J (2020) Malware detection based on opcode sequence and ResNet. In: Security with Intelligent Computing and Big-data Services, Advances in intelligent systems and computing, Springer International PublishingCrossRef
60.
Zurück zum Zitat Zhang Y, Li H, Zheng Y, Yao S, Jiang J (2021) Enhanced DNNs for malware classification with GAN-based adversarial training. J comput virol hacking tech 17 Zhang Y, Li H, Zheng Y, Yao S, Jiang J (2021) Enhanced DNNs for malware classification with GAN-based adversarial training. J comput virol hacking tech 17
Metadaten
Titel
Machine Learning Methodologies for Preventing Malware Obfuscation
verfasst von
Vincenzo Carletti
Alessia Saggese
Pasquale Foggia
Antonio Greco
Mario Vento
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-21940-5_6