Skip to main content

2016 | OriginalPaper | Buchkapitel

Deep Learning for Classification of Malware System Call Sequences

verfasst von : Bojan Kolosnjaji, Apostolis Zarras, George Webster, Claudia Eckert

Erschienen in: AI 2016: Advances in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The increase in number and variety of malware samples amplifies the need for improvement in automatic detection and classification of the malware variants. Machine learning is a natural choice to cope with this increase, because it addresses the need of discovering underlying patterns in large-scale datasets. Nowadays, neural network methodology has been grown to the state that can surpass limitations of previous machine learning methods, such as Hidden Markov Models and Support Vector Machines. As a consequence, neural networks can now offer superior classification accuracy in many domains, such as computer vision or natural language processing. This improvement comes from the possibility of constructing neural networks with a higher number of potentially diverse layers and is known as Deep Learning.
In this paper, we attempt to transfer these performance improvements to model the malware system call sequences for the purpose of malware classification. We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
4.
Zurück zum Zitat Attaluri, S., McGhee, S., Stamp, M.: Profile Hidden Markov Models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)CrossRef Attaluri, S., McGhee, S., Stamp, M.: Profile Hidden Markov Models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)CrossRef
5.
Zurück zum Zitat Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74320-0_10CrossRef Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-74320-0_​10CrossRef
6.
Zurück zum Zitat Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: ISOC Network and Distributed System Security Symposium (NDSS) (2009) Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: ISOC Network and Distributed System Security Symposium (NDSS) (2009)
8.
Zurück zum Zitat Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Python for Scientific Computing Conference (SciPy) (2010) Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Python for Scientific Computing Conference (SciPy) (2010)
9.
Zurück zum Zitat Bu, Z., et al.: McAfee Threats Report: Second Quarter 2012 (2012) Bu, Z., et al.: McAfee Threats Report: Second Quarter 2012 (2012)
10.
Zurück zum Zitat Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013) Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
11.
Zurück zum Zitat Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRef Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRef
12.
Zurück zum Zitat Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996) Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
13.
Zurück zum Zitat Guarnieri, C., Tanasi, A., Bremer, J., Schloesser, M.: The Cuckoo Sandbox (2012) Guarnieri, C., Tanasi, A., Bremer, J., Schloesser, M.: The Cuckoo Sandbox (2012)
14.
Zurück zum Zitat Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC) (2003) Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC) (2003)
15.
Zurück zum Zitat Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA) (2016) Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA) (2016)
16.
Zurück zum Zitat Kolosnjaji, B., Zarras, A., Lengyel, T., Webster, G., Eckert, C.: Adaptive semantics-aware malware classification. In: Caballero, J., Zurutuza, U., Rodríguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 419–439. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40667-1_21CrossRef Kolosnjaji, B., Zarras, A., Lengyel, T., Webster, G., Eckert, C.: Adaptive semantics-aware malware classification. In: Caballero, J., Zurutuza, U., Rodríguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 419–439. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-40667-1_​21CrossRef
17.
Zurück zum Zitat Lengyel, T.K., Maresca, S., Payne, B.D., Webster, G.D., Vogl, S., Kiayias, A.: Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis system. In: Annual Computer Security Applications Conference (ACSAC) (2014) Lengyel, T.K., Maresca, S., Payne, B.D., Webster, G.D., Vogl, S., Kiayias, A.: Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis system. In: Annual Computer Security Applications Conference (ACSAC) (2014)
19.
Zurück zum Zitat Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015) Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
20.
Zurück zum Zitat Perdisci, R., ManChon, U.: VAMO: towards a fully automated malware clustering validity analysis. In: Annual Computer Security Applications Conference (ACSAC) (2012) Perdisci, R., ManChon, U.: VAMO: towards a fully automated malware clustering validity analysis. In: Annual Computer Security Applications Conference (ACSAC) (2012)
21.
Zurück zum Zitat Pfoh, J., Schneider, C., Eckert, C.: Leveraging string kernels for malware detection. In: Lopez, J., Huang, X., Sandhu, R. (eds.) NSS 2013. LNCS, vol. 7873, pp. 206–219. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38631-2_16CrossRef Pfoh, J., Schneider, C., Eckert, C.: Leveraging string kernels for malware detection. In: Lopez, J., Huang, X., Sandhu, R. (eds.) NSS 2013. LNCS, vol. 7873, pp. 206–219. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-38631-2_​16CrossRef
22.
Zurück zum Zitat Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). doi:10.1007/978-3-540-70542-0_6CrossRef Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-70542-0_​6CrossRef
24.
Zurück zum Zitat Saxe, J., Berlin, K.: Features, deep neural network based malware detection using two dimensional binary program arXiv preprint arXiv:1508.03096 (2015) Saxe, J., Berlin, K.: Features, deep neural network based malware detection using two dimensional binary program arXiv preprint arXiv:​1508.​03096 (2015)
25.
Zurück zum Zitat Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE Symposium on Security and Privacy (2001) Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: IEEE Symposium on Security and Privacy (2001)
26.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH
27.
Zurück zum Zitat Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: Botfinder: finding bots in network traffic without deep packet inspection. In International Conference on Emerging Networking Experiments and Technologies (CoNEXT) (2012) Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: Botfinder: finding bots in network traffic without deep packet inspection. In International Conference on Emerging Networking Experiments and Technologies (CoNEXT) (2012)
28.
Zurück zum Zitat Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). 85MATH Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). 85MATH
30.
Zurück zum Zitat Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Conference on Computer and Communications Security (CCS) (2002) Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Conference on Computer and Communications Security (CCS) (2002)
31.
Zurück zum Zitat Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy (1999) Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy (1999)
32.
Zurück zum Zitat Webster, G.D., Hanif, Z.D., Ludwig, A.L.P., Lengyel, T.K., Zarras, A., Eckert, C.: SKALD: a scalable architecture for feature extraction, multi-user analysis, and real-time information sharing. In: Bishop, M., Nascimento, A.C.A. (eds.) ISC 2016. LNCS, vol. 9866, pp. 231–249. Springer, Heidelberg (2016). doi:10.1007/978-3-319-45871-7_15CrossRef Webster, G.D., Hanif, Z.D., Ludwig, A.L.P., Lengyel, T.K., Zarras, A., Eckert, C.: SKALD: a scalable architecture for feature extraction, multi-user analysis, and real-time information sharing. In: Bishop, M., Nascimento, A.C.A. (eds.) ISC 2016. LNCS, vol. 9866, pp. 231–249. Springer, Heidelberg (2016). doi:10.​1007/​978-3-319-45871-7_​15CrossRef
33.
Zurück zum Zitat Xiao, H., Eckert, C.: Efficient online sequence prediction with side information. In: IEEE International Conference on Data Mining (ICDM) (2013) Xiao, H., Eckert, C.: Efficient online sequence prediction with side information. In: IEEE International Conference on Data Mining (ICDM) (2013)
34.
Zurück zum Zitat Xiao, H., Stibor, T.: A supervised topic transition model for detecting malicious system call sequences. In: Workshop on Knowledge Discovery, Modeling and Simulation (2011) Xiao, H., Stibor, T.: A supervised topic transition model for detecting malicious system call sequences. In: Workshop on Knowledge Discovery, Modeling and Simulation (2011)
Metadaten
Titel
Deep Learning for Classification of Malware System Call Sequences
verfasst von
Bojan Kolosnjaji
Apostolis Zarras
George Webster
Claudia Eckert
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-50127-7_11

Premium Partner