Skip to main content
Erschienen in: Knowledge and Information Systems 2/2018

09.05.2017 | Regular Paper

DeepAM: a heterogeneous deep learning framework for intelligent malware detection

verfasst von: Yanfang Ye, Lingwei Chen, Shifu Hou, William Hardy, Xin Li

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With computers and the Internet being essential in everyday life, malware poses serious and evolving threats to their security, making the detection of malware of utmost concern. Accordingly, there have been many researches on intelligent malware detection by applying data mining and machine learning techniques. Though great results have been achieved with these methods, most of them are built on shallow learning architectures. Due to its superior ability in feature learning through multilayer deep architecture, deep learning is starting to be leveraged in industrial and academic research for different applications. In this paper, based on the Windows application programming interface calls extracted from the portable executable files, we study how a deep learning architecture can be designed for intelligent malware detection. We propose a heterogeneous deep learning framework composed of an AutoEncoder stacked up with multilayer restricted Boltzmann machines and a layer of associative memory to detect newly unknown malware. The proposed deep learning model performs as a greedy layer-wise training operation for unsupervised feature learning, followed by supervised parameter fine-tuning. Different from the existing works which only made use of the files with class labels (either malicious or benign) during the training phase, we utilize both labeled and unlabeled file samples to pre-train multiple layers in the heterogeneous deep learning framework from bottom to up for feature learning. A comprehensive experimental study on a real and large file collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed deep learning framework can further improve the overall performance in malware detection compared with traditional shallow learning methods, deep learning methods with homogeneous framework, and other existing anti-malware scanners. The proposed heterogeneous deep learning framework can also be readily applied to other malware detection tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18CrossRef Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18CrossRef
2.
Zurück zum Zitat Bailey M, Oberheide J, Andersen J, Mao Z, Ahanian F, Nazario J (2007) Automated classification and analysis of internet malware. In: 10th international symposium on research in attacks, intrusions and defenses (RAID) 2007, LNCS, pp 178–197 Bailey M, Oberheide J, Andersen J, Mao Z, Ahanian F, Nazario J (2007) Automated classification and analysis of internet malware. In: 10th international symposium on research in attacks, intrusions and defenses (RAID) 2007, LNCS, pp 178–197
3.
Zurück zum Zitat Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Mach 34(5):1–41 Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Mach 34(5):1–41
5.
Zurück zum Zitat Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems 19 (NIPS’06), pp 153–160 Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems 19 (NIPS’06), pp 153–160
6.
Zurück zum Zitat Carreira-Perpinan M, Hinton G (2005) On contrastive divergence learning. In: Proceedings of the tenth international workshop on artificial intelligence and statistics Carreira-Perpinan M, Hinton G (2005) On contrastive divergence learning. In: Proceedings of the tenth international workshop on artificial intelligence and statistics
7.
Zurück zum Zitat Cesare S, Xiang Y, Zhou W (2014) Control flow-based malware variant detection. IEEE Trans Dependable Secure Comput 11(4):307–317CrossRef Cesare S, Xiang Y, Zhou W (2014) Control flow-based malware variant detection. IEEE Trans Dependable Secure Comput 11(4):307–317CrossRef
8.
Zurück zum Zitat Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning (ICML’08), pp 160–167 Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning (ICML’08), pp 160–167
9.
Zurück zum Zitat Dunne RA (2007) A statistical approach to neural networks for pattern recognition, 1st edn. Wiley, New YorkCrossRefMATH Dunne RA (2007) A statistical approach to neural networks for pattern recognition, 1st edn. Wiley, New YorkCrossRefMATH
10.
Zurück zum Zitat Egele M, Scholte T, Kirda E, Kruegel C (2008) A survey on automated dynamic malware analysis techniques and tools. In: ACM computing surveys (CSUR), vol 44(2), pp 6:1–6:42 Egele M, Scholte T, Kirda E, Kruegel C (2008) A survey on automated dynamic malware analysis techniques and tools. In: ACM computing surveys (CSUR), vol 44(2), pp 6:1–6:42
11.
Zurück zum Zitat Filiol E (2006) Malware pattern scanning schemes secure against blackbox analysis. J Comput Virol 2(1):35–50CrossRef Filiol E (2006) Malware pattern scanning schemes secure against blackbox analysis. J Comput Virol 2(1):35–50CrossRef
12.
Zurück zum Zitat Filiol E, Jacob G, Liard ML (2007) Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J Comput Virol 3(1):27–37CrossRef Filiol E, Jacob G, Liard ML (2007) Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J Comput Virol 3(1):27–37CrossRef
13.
15.
Zurück zum Zitat Hinton GE (2012) A practical guide to training restricted Boltzmann machines. Neural Netw Tricks Trade 7700:599–619CrossRef Hinton GE (2012) A practical guide to training restricted Boltzmann machines. Neural Netw Tricks Trade 7700:599–619CrossRef
16.
Zurück zum Zitat Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake-sleep algorithm for unsupervised neural networks. Science 268(5214):1158–1161CrossRef Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake-sleep algorithm for unsupervised neural networks. Science 268(5214):1158–1161CrossRef
17.
Zurück zum Zitat Hinton GE (2007) To recognize shapes, first learn to generate images. Prog Brain Res 165:535–547CrossRef Hinton GE (2007) To recognize shapes, first learn to generate images. Prog Brain Res 165:535–547CrossRef
18.
Zurück zum Zitat Hou S, Chen L, Tas E, Demihovskiy I, Ye Y (2015) Cluster-oriented ensemble classifiers for malware detection. In: IEEE international conference on semantic computing (IEEE ICSC), pp 189–196 Hou S, Chen L, Tas E, Demihovskiy I, Ye Y (2015) Cluster-oriented ensemble classifiers for malware detection. In: IEEE international conference on semantic computing (IEEE ICSC), pp 189–196
19.
Zurück zum Zitat Huang W, Song G, Hong H, Xie K (2014) Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans Intell Transp Syst 15(5):2191–2201CrossRef Huang W, Song G, Hong H, Xie K (2014) Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans Intell Transp Syst 15(5):2191–2201CrossRef
20.
Zurück zum Zitat Jung W, Kim S, Choi S (2015) Poster: deep learning for zero-day flash malware detection. In: 36th IEEE symposium on security and privacy Jung W, Kim S, Choi S (2015) Poster: deep learning for zero-day flash malware detection. In: 36th IEEE symposium on security and privacy
22.
Zurück zum Zitat Kavukcuoglu K, Sermanet P, Boureau Y, Gregor K, Mathieu M, LeCun Y (2010) Learning convolutional feature hierarchies for visual recognition. In: Advances in neural information processing systems (NIPS 2010), vol 23 Kavukcuoglu K, Sermanet P, Boureau Y, Gregor K, Mathieu M, LeCun Y (2010) Learning convolutional feature hierarchies for visual recognition. In: Advances in neural information processing systems (NIPS 2010), vol 23
23.
Zurück zum Zitat Kephart J, Arnold W (1994) Automatic extraction of computer virus signatures. In: Proceedings of 4th virus bulletin international conference, pp 178–184 Kephart J, Arnold W (1994) Automatic extraction of computer virus signatures. In: Proceedings of 4th virus bulletin international conference, pp 178–184
24.
Zurück zum Zitat Kolter J, Maloof M (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (ACM SIGKDD’04), pp 470–478 Kolter J, Maloof M (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (ACM SIGKDD’04), pp 470–478
25.
Zurück zum Zitat Kong D, Yan G (2013) Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1357–1365 Kong D, Yan G (2013) Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1357–1365
26.
Zurück zum Zitat Li Y, Ma R, Jiao R (2015) A hybrid malicious code detection method based on deep learning. Int J Secur Appl 9(5):205–216 Li Y, Ma R, Jiao R (2015) A hybrid malicious code detection method based on deep learning. Int J Secur Appl 9(5):205–216
27.
Zurück zum Zitat Lv Y, Duan Y, Kang W, Li Z, Wang F (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Transp Syst 16(2):865–873 Lv Y, Duan Y, Kang W, Li Z, Wang F (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Transp Syst 16(2):865–873
28.
Zurück zum Zitat Masud MM, Al-Khateeb TM, Hamlen KW, Gao J, Khan L, Han J, Thuraisingham B (2008) Cloud-based malware detection for evolving data streams. In: ACM transactions on management information systems (TMIS), vol 2(3), pp 16:1–16:27 Masud MM, Al-Khateeb TM, Hamlen KW, Gao J, Khan L, Han J, Thuraisingham B (2008) Cloud-based malware detection for evolving data streams. In: ACM transactions on management information systems (TMIS), vol 2(3), pp 16:1–16:27
29.
Zurück zum Zitat Menahem E, Shabtai A, Levhar A (2013) Detecting malware through temporal function-based features. In: Proceedings of the 2013 ACM SIGSAC conference on computer and communications security, pp 1379–1382 Menahem E, Shabtai A, Levhar A (2013) Detecting malware through temporal function-based features. In: Proceedings of the 2013 ACM SIGSAC conference on computer and communications security, pp 1379–1382
30.
Zurück zum Zitat Ouellette J, Pfeffer A, Lakhotia A (2013) Countering malware evolution using cloud-based learning. In: 8th international conference on malicious and unwanted software (MALWARE), pp 85–94 Ouellette J, Pfeffer A, Lakhotia A (2013) Countering malware evolution using cloud-based learning. In: 8th international conference on malicious and unwanted software (MALWARE), pp 85–94
31.
Zurück zum Zitat Park Y, Zhang Q, Reeves D, Mulukutla V (2010) AntiBot: clustering common semantic patterns for bot detection. In: IEEE 34th annual computer software and applications conference, pp 262–272 Park Y, Zhang Q, Reeves D, Mulukutla V (2010) AntiBot: clustering common semantic patterns for bot detection. In: IEEE 34th annual computer software and applications conference, pp 262–272
32.
Zurück zum Zitat Schultz M, Eskin E, Zadok E (2001) Data mining methods for detection of new malicious executables. In: Proccedings of IEEE symposium on security and privacy Schultz M, Eskin E, Zadok E (2001) Data mining methods for detection of new malicious executables. In: Proccedings of IEEE symposium on security and privacy
33.
Zurück zum Zitat Shah S, Jani H, Shetty S, Bhowmick K (2013) Virus detection using artificial neural networks. Int J Comput Appl 84(5):3–21 Shah S, Jani H, Shetty S, Bhowmick K (2013) Virus detection using artificial neural networks. Int J Comput Appl 84(5):3–21
34.
Zurück zum Zitat Sung A, Xu J, Chavez P, Mukkamala S (2005) Static analyzer of vicious executables (save). In: Proceedings of the 20th annual computer security applications conference (ACSAC), pp 326–334 Sung A, Xu J, Chavez P, Mukkamala S (2005) Static analyzer of vicious executables (save). In: Proceedings of the 20th annual computer security applications conference (ACSAC), pp 326–334
36.
Zurück zum Zitat Teh YW, Hinton GE (2001) Rate-coded restricted Boltzmann machines for face recognition. In: Proceedings of advances in neural information processing systems, pp 908–914 Teh YW, Hinton GE (2001) Rate-coded restricted Boltzmann machines for face recognition. In: Proceedings of advances in neural information processing systems, pp 908–914
37.
Zurück zum Zitat Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH
38.
Zurück zum Zitat Wang J, Deng P, Fan Y, Jaw L, Liu Y (2003) Virus detection using data mining techniques. In: Proccedings of IEEE 37th annual 2003 international Carnahan conference security technology Wang J, Deng P, Fan Y, Jaw L, Liu Y (2003) Virus detection using data mining techniques. In: Proccedings of IEEE 37th annual 2003 international Carnahan conference security technology
40.
Zurück zum Zitat Ye Y, Wang D, Li T, Ye D, Jiang Q (2008) An intelligent PE-malware detection system based on association mining. J Comput Virol 4:323–334CrossRef Ye Y, Wang D, Li T, Ye D, Jiang Q (2008) An intelligent PE-malware detection system based on association mining. J Comput Virol 4:323–334CrossRef
41.
Zurück zum Zitat Ye Y, Wang D, Li T, Ye D (2007) IMDS: intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD, pp 1043–1047 Ye Y, Wang D, Li T, Ye D (2007) IMDS: intelligent malware detection system. In: Proceedings of the 13th ACM SIGKDD, pp 1043–1047
42.
Zurück zum Zitat Ye Y, Li T, Zhu S, Zhuang W, Tas E, Gupta U, Abdulhayoglu M (2011) Combining file content and file relations for cloud based malware detection. In: Proceedings of ACM international conference on knowledge discovery and data mining (ACM SIGKDD), pp 222–230 Ye Y, Li T, Zhu S, Zhuang W, Tas E, Gupta U, Abdulhayoglu M (2011) Combining file content and file relations for cloud based malware detection. In: Proceedings of ACM international conference on knowledge discovery and data mining (ACM SIGKDD), pp 222–230
Metadaten
Titel
DeepAM: a heterogeneous deep learning framework for intelligent malware detection
verfasst von
Yanfang Ye
Lingwei Chen
Shifu Hou
William Hardy
Xin Li
Publikationsdatum
09.05.2017
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2018
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-017-1058-9

Weitere Artikel der Ausgabe 2/2018

Knowledge and Information Systems 2/2018 Zur Ausgabe