Top

The Journal of Supercomputing

Published in:

17-04-2023

BejaGNN: behavior-based Java malware detection via graph neural network

Authors: Pengbin Feng, Li Yang, Di Lu, Ning Xi, Jianfeng Ma

Published in: The Journal of Supercomputing | Issue 14/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

As a popular platform-independent language, Java is widely used in enterprise applications. In the past few years, language vulnerabilities exploited by Java malware have become increasingly prevalent, which cause threats for multi-platform. Security researchers continuously propose various approaches for fighting against Java malware programs. The low code path coverage and poor execution efficiency of dynamic analysis limit the large-scale application of dynamic Java malware detection methods. Therefore, researchers turn to extracting abundant static features to implement efficient malware detection. In this paper, we explore the direction of capturing malware semantic information by using graph learning algorithms and present BejaGNN (Behavior-based Java malware detection via Graph Neural Network), a novel behavior-based Java malware detection method using static analysis, word embedding technique, and graph neural network. Specifically, BejaGNN leverages static analysis techniques to extract ICFGs (Inter-procedural Control Flow Graph) from Java program files and then prunes these ICFGs to remove noisy instructions. Then, word embedding techniques are adopted to learn semantic representations for Java bytecode instructions. Finally, BejaGNN builds a graph neural network classifier to determine the maliciousness of Java programs. Experimental results on a public Java bytecode benchmark demonstrate that BejaGNN achieves high F1 98.8% and is superior to existing Java malware detection approaches, which verifies the promise of graph neural network in Java malware detection.

previous article A novel cryptocurrency price time series hybrid prediction model via machine learning with MATLAB/Simulink

next article A two-phase detection method against APT attack on flow table management in SDN

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Java.com, Learn About Java Technology. https://www.java.com/en/

Balan G, Popescu AS (2018) Detecting java compiled malware using machine learning techniques. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, pp 435–439

CVE Details, J, The Ultimate Security Vulnerability Datasource. https://www.cvedetails.com/product/19116/Oracle-JDK.html?vendor_id=93

Krebson Security, C, Live Coronavirus Map Used to Spread Malware. https://krebsonsecurity.com/2020/03/live-coronavirus-map-used-to-spread-malware/

Coker Z, Maass M, Ding T, Le Goues C, Sunshine J (2015) Evaluating the flexibility of the java sandbox. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp 1–10

Ye Y, Li T, Adjeroh D, Iyengar SS (2017) A survey on malware detection using data mining techniques. ACM Comput Surv (CSUR) 50(3):1–40CrossRef

You I, Yim K (2010) Malware obfuscation techniques: a brief survey. In: 2010 International Conference on Broadband, Wireless Computing, Communication and Applications. IEEE, pp 297–300

Dahl GE, Stokes JW, Deng L, Yu D (2013) Large-scale malware classification using random projections and neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, pp 3422–3426

Huang W, Stokes JW (2016) Mtnet: a multi-task neural network for dynamic malware classification. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, pp 399–418

10.

Kolosnjaji B, Zarras A, Webster G, Eckert C (2016) Deep learning for classification of malware system call sequences. In: Australasian Joint Conference on Artificial Intelligence. Springer, pp 137–149

11.

Jha PK, Shankar P, Sujadevi V, Prabhaharan P (2018) Deepmal4j: Java malware detection employing deep learning. In: International Symposium on Security in Computing and Communication. Springer, pp 389–402

12.

Shalaginov A, Banin S, Dehghantanha A, Franke K (2018) Machine learning aided static malware analysis: a survey and tutorial. In: Cyber threat intelligence, pp 7–45

13.

Le Q, Boydell O, Mac Namee B, Scanlon M (2018) Deep learning at the shallow end: Malware classification for non-domain experts. Digit Investig 26:118–126CrossRef

14.

Jian Y, Kuang H, Ren C, Ma Z, Wang H (2021) A novel framework for image-based malware detection with a deep neural network. Comput Secur 109:102400CrossRef

15.

Obaidat I, Sridhar M, Pham KM, Phung PH (2022) Jadeite: a novel image-behavior-based approach for java malware detection using deep learning. Comput Secur 113:102547CrossRef

16.

Vallee-Rai R, Hendren LJ (1998) Jimple: simplifying java bytecode for analyses and transformations. Technical report, McGill University

17.

Yu Z, Cao R, Tang Q, Nie S, Huang J, Wu S (2020) Order matters: semantic-aware neural networks for binary code similarity detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol 34, pp 1145–1152

18.

Gao H, Cheng S, Zhang W (2021) Gdroid: android malware detection and classification with graph convolutional network. Comput Secur 106:102264CrossRef

19.

Sun Q, Abdukhamidov E, Abuhmed T, Abuhamad M (2022) Leveraging spectral representations of control flow graphs for efficient analysis of windows malware. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pp 1240–1242

20.

Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy. IEEE, pp 590–604

21.

Siow JK, Liu S, Xie X, Meng G, Liu Y (2022) Learning program semantics with code representations: an empirical study. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE

22.

Bowman B, Huang HH (2021) Towards next-generation cybersecurity with graph ai. ACM SIGOPS Oper Syst Rev 55(1):61–67CrossRef

23.

Yang W, Kong D, Xie T, Gunter CA (2017) Malware detection in adversarial settings: exploiting feature evolutions and confusions in android apps. In: Proceedings of the 33rd Annual Computer Security Applications Conference, pp 288–302

24.

Narayanan A, Chandramohan M, Chen L, Liu Y (2018) A multi-view context-aware approach to android malware detection and malicious code localization. Empir Softw Eng 23(3):1222–1274CrossRef

25.

Ou F, Xu J (2022) S3feature: a static sensitive subgraph-based feature for android malware detection. Comput Secur 112:102513CrossRef

26.

Anderson HS, Kharkar A, Filar B, Roth P (2017) Evading machine learning malware detection. black Hat 2017

27.

Macedo HD, Touili T (2013) Mining malware specifications through static reachability analysis. In: European Symposium on Research in Computer Security. Springer, pp 517–535

28.

Osorio FCC, Qiu H, Arrott A (2015) Segmented sandboxing-a novel approach to malware polymorphism detection. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, pp 59–68

29.

Damodaran A, Troia FD, Visaggio CA, Austin TH, Stamp M (2017) A comparison of static, dynamic, and hybrid analysis for malware detection. J Comput Virol Hacking Tech 13:1–12CrossRef

30.

Hardy W, Chen L, Hou S, Ye Y, Li X (2016) Dl4md: a deep learning framework for intelligent malware detection. In: Proceedings of the International Conference on Data Science (ICDATA). The Steering Committee of The World Congress in Computer Science, Computer, p 61

31.

Athiwaratkun B, Stokes JW (2017) Malware classification with lstm and gru language models and a character-level cnn. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2482–2486

32.

Lakhotia A, Preda MD, Giacobazzi R (2013) Fast location of similar code fragments using semantic’juice’. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, pp 1–6

33.

Fass A, Backes M, Stock B (2019) Jstap: a static pre-filter for malicious javascript detection. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp 257–269

34.

Park YH, Reeves DS, Stamp M (2013) Deriving common malware behavior through graph clustering. Comput Secur 39:419–430CrossRef

35.

Yajamanam S, Selvin VRS, Di Troia F, Stamp M (2018) Deep learning versus gist descriptors for image-based malware classification. In: 2nd International Workshop on Formal Methods for Security Engineering (ForSE 2018), pp 553–561

36.

Cui Z, Du L, Wang P, Cai X, Zhang W (2019) Malicious code detection based on cnns and multi-objective algorithm. J Parallel Distrib Comput 129:50–58CrossRef

37.

Cho M, Kim J-S, Shin J, Shin I (2020) Mal2d: 2d based deep learning model for malware detection using black and white binary image. IEICE Trans Inf Syst 103(4):896–900CrossRef

38.

Nisa M, Shah JH, Kanwal S, Raza M, Khan MA, Damaševičius R, Blažauskas T (2020) Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features. Appl Sci 10(14):4966CrossRef

39.

Vasan D, Alazab M, Wassan S, Naeem H, Safaei B, Zheng Q (2020) Imcfn: image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138CrossRef

40.

Prajapati P, Stamp M (2021) An empirical analysis of image-based learning techniques for malware classification. In: Malware analysis using artificial intelligence and deep learning, pp 411–435

41.

Acar A, Lu L, Uluagac AS, Kirda E (2019) An analysis of malware trends in enterprise networks. In: International Conference on Information Security. Springer, pp 360–380

42.

Qiu J, Zhang J, Luo W, Pan L, Nepal S, Xiang Y (2020) A survey of android malware detection with deep neural models. ACM Comput Surv (CSUR) 53(6):1–36CrossRef

43.

Ding Y, Wu R, Xue F (2018) Detecting android malware using bytecode image. In: International Conference on Cognitive Computing. Springer, pp 164–169

44.

Xiao X, Yang S (2019) An image-inspired and cnn-based android malware detection approach. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 1259–1261

45.

Yadav P, Menon N, Ravi V, Vishvanathan S, Pham TD (2022) Efficientnet convolutional neural networks-based android malware detection. Comput Secur 115:102622CrossRef

46.

Pizzolotto D, Fellin R, Ceccato M (2019) Oblive: seamless code obfuscation for java programs and android apps. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 629–633

47.

Schlumberger J, Kruegel C, Vigna G (2012) Jarhead analysis and detection of malicious java applets. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp 249–257

48.

Gassen J, Chapman JP (2014) Honeyagent: detecting malicious java applets by using dynamic analysis. In: 2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE). IEEE, pp 109–117

49.

Herrera A, Cheney B (2015) Jmd: a hybrid approach for detecting java malware. In: Proceedings of the 13th Australasian Information Security Conference (AISC 2015). vol 27, p 30

50.

Kumar R, Vaishakh ARE (2016) Detection of obfuscation in java malware. Procedia Comput Sci 78:521–529CrossRef

51.

Pinheiro R, Lima S, Fernandes S, Albuquerque E, Medeiros S, Souza D, Monteiro T, Lopes P, Lima R, Oliveira J et al. (2019) Next generation antivirus applied to jar malware detection based on runtime behaviors using neural networks. In: 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, pp 28–32

52.

Lam P, Bodden E, Lhoták O, Hendren L (2011) The soot framework for java program analysis: a retrospective. In: Cetus Users and Compiler Infastructure Workshop (CETUS 2011). vol 15

53.

Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Le Traon Y, Octeau D, McDaniel P (2014) Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Not 49(6):259–269CrossRef

54.

Nistor A, Song L, Marinov D, Lu S (2013) Toddler: detecting performance problems via similar memory-access patterns. In: 2013 35th International Conference on Software Engineering (ICSE). IEEE, pp 562–571

55.

Holzinger P, Hermann B, Lerch J, Bodden E, Mezini M (2017) Hardening java’s access control by abolishing implicit privilege elevation. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE, pp 1027–1040

56.

Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis, pp 3–8

57.

Chandak A, Lee W, Stamp M (2021) A comparison of word2vec, hmm2vec, and pca2vec for malware classification. In: Malware analysis using artificial intelligence and deep learning, pp 287–320

58.

Kale AS, Pandya V, Di Troia F, Stamp M (2022) Malware classification with word2vec, hmm2vec, bert, and elmo. J Comput Virol Hacking Tech 19:1–16CrossRef

59.

Kwon O, Kim D, Lee S-R, Choi J, Lee S (2021) Handling out-of-vocabulary problem in hangeul word embeddings. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp 3213–3221

60.

Duan Y, Li X, Wang J, Yin H (2020) Deepbindiff: learning program-wide code representations for binary diffing. In: Network and Distributed System Security Symposium

61.

Xu Y, Xu Z, Chen B, Song F, Liu Y, Liu T (2020) Patch based vulnerability matching for binary programs. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp 376–387

62.

Xu K, Li Y, Deng RH, Chen K (2018) Deeprefiner: multi-layer android malware detection system applying deep neural networks. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P). IEEE, pp 473–487

63.

Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543

64.

Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef

65.

Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International Conference on Machine Learning. PMLR, pp 1188–1196

66.

Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81CrossRef

67.

Cai H, Zheng VW, Chang KC-C (2018) A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637CrossRef

68.

Mercaldo F, Santone A (2020) Deep learning for image-based mobile malware detection. J Comput Virol Hacking Tech 16(2):157–171CrossRef

69.

Yuan H, Yu H, Gui S, Ji S (2022) Explainability in graph neural networks: a taxonomic survey. IEEE Trans Pattern Anal Mach Intell 45(5):5782–5799

70.

Xie Y, Xu Z, Zhang J, Wang Z, Ji S (2022) Self-supervised learning of graph neural networks: a unified review. IEEE Trans Pattern Anal Mach Intell 45(2):2412–2429CrossRef

Title: BejaGNN: behavior-based Java malware detection via graph neural network
Authors: Pengbin Feng
Li Yang
Di Lu
Ning Xi
Jianfeng Ma
Publication date: 17-04-2023
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-023-05243-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 14/2023

TTLA: two-way trust between clients and fog servers using Bayesian learning automata

Examining the status of CPU working load, processing load and controller bandwidth under the influence of packet-in buffer status located in Openflow switches in SDN-based IoT framework

An adaptive traffic flow prediction model based on spatiotemporal graph neural network

A machine learning-based resource-efficient task scheduler for heterogeneous computer systems

An overview of machine learning methods in enabling IoMT-based epileptic seizure detection

Multivariate outlier filtering for A-NFVLearn: an advanced deep VNF resource usage forecasting technique

Premium Partner