nach oben

Neural Computing and Applications

Erschienen in:

17.05.2021 | Original Article

Deep neural-based vulnerability discovery demystified: data, model and performance

verfasst von: Guanjun Lin, Wei Xiao, Leo Yu Zhang, Shang Gao, Yonghang Tai, Jun Zhang

Erschienen in: Neural Computing and Applications | Ausgabe 20/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Detecting source-code level vulnerabilities at the development phase is a cost-effective solution to prevent potential attacks from happening at the software deployment stage. Many machine learning, including deep learning-based solutions, have been proposed to aid the process of vulnerability discovery. However, these approaches were mainly evaluated on self-constructed/-collected datasets. It is difficult to evaluate the effectiveness of proposed approaches due to lacking a unified baseline dataset. To bridge this gap, we construct a function-level vulnerability dataset from scratch, providing in source-code-label pairs. To evaluate the constructed dataset, a function-level vulnerability detection framework is built to incorporate six mainstream neural network models as vulnerability detectors. We perform experiments to investigate the performance behaviors of the neural model-based detectors using source code as raw input with continuous Bag-of-Words neural embeddings. Empirical results reveal that the variants of recurrent neural networks and convolutional neural network perform well on our dataset, as the former is capable of handling contextual information and the latter learns features from small context windows. In terms of generalization ability, the fully connected network outperforms the other network architectures. The performance evaluation can serve as a reference benchmark for neural model-based vulnerability detection at function-level granularity. Our dataset can serve as ground truth for ML-based function-level vulnerability detection and a baseline for evaluating relevant approaches.

Vorheriger Artikel Multi-chromosomal CGP-evolved RNN for signal reconstruction

Nächster Artikel Image retrieval based on texture using latent space representation of discrete Fourier transformed maps

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://www.nist.gov/.

https://cve.mitre.org/.

https://nvd.nist.gov/.

https://github.com/Seahymn2019/Function-level-Vulnerability-Dataset.

https://developer.nvidia.com/cudnn.

Equifax had patch 2 months before hack and didn’t install it, security group says. https://www.usatoday.com/story/money/2017/09/14/equifax-identity-theft-hackers-apache-struts/665100001/ September 2017. Accessed 8 June 2019

Guanjun L, Sheng W, QingLong H, Jun Z, Yang X (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 1080(10):1825–1848

David A (2016) Wheeler. Flawfinder. https://www.dwheeler.com/flawfinder/ Accessed 20 May 2018

Cadar C, Dunbar D, Engler DR, et al (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: OSDI, vol 8, pp 209–224

Sutton M, Greene A, Amini P (2007) Fuzzing: brute force vulnerability discovery. Pearson Education, London

Newsome J, Song D (2005) Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. Citeseer, Princeton

Yamaguchi F, Lindner F, Rieck K (2011) Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of the 5th USENIX conference on Offensive technologies. USENIX Association

Nan S, Jun Z, Paul R, Shang G, Zhang Leo Yu, Yang X (2019) Data-driven cybersecurity incident prediction: a survey. IEEE Commun Surv Tutor 210(2):1744–1772

Coulter R, Han Q-L, Pan L, Zhang J, Xiang Y (2019) Data-driven cyber security in perspective-intelligent traffic analysis. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2940940CrossRef

10.

Jun Z, Yang X, Wang Yu, Wanlei Z, Yong X, Yong G (2013) Network traffic classification using correlation information. IEEE Trans Parallel Distrib Syst 240(1):104–117

11.

Mohammad GS, Reza SH (2017) Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Comput Surv 500(4):56

12.

Yonghee S, Andrew M, Laurie W, Osborne Jason A (2011) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. TSE 370(6):772–787

13.

Liu L, De Vel O, Han Q-L, Zhang J, Xiang Y (2018) Detecting and preventing cyber insider threats: a survey. IEEE Commun Surv Tutor 200(2):1397–1417CrossRef

14.

Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE symposium on security and privacy (SP), pp 590–604. IEEE

15.

Yamaguchi F, Lottmann M, Rieck K (2012) Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th ACSAC, pp 359–368. ACM

16.

Chen X, Li C, Wang D, Wen S, Zhang J, Nepal S, Xiang Y, Ren K (2020) Android HIV: A study of repackaging malware for evading machine-learning detection. IEEE Trans Inf Forensics Secur 15:987–1001CrossRef

17.

Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, Fahl S, Acar Y (2015) Vccfinder: finding potential vulnerabilities in open-source projects to assist code audits. In: Proceedings of the 22nd SIGSAC conference on CCS, pp 426–437. ACM

18.

Guanjun L, Jun Z, Wei L, Lei P, Yang X, De Vel O, Paul M (2018) Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans Ind Inf 140(7):3289–3297

19.

Lin G, Zhang J, Luo W, Pan L, Xiang Y (2017) Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 SIGSAC Conference on CCS, pp 2539–2541. ACM

20.

Lin G, Zhang J, Luo W, Pan L, De VO, Montague P, Xiang Y (2019) Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2019.2954088CrossRef

21.

Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. TSE 400(10):993–1006

22.

Choi M, Jeong S, Oh H, Choo J (2017) End-to-end prediction of buffer overruns from raw source code via neural memory networks. arXiv preprint arXiv:1703.02458

23.

Sestili CD, Snavely WS, VanHoudnos NM (2018) Towards security defect prediction with AI. arXiv preprint arXiv:1808.09897

24.

Peng H, Mou L, Li G, Liu Y, Zhang L, Jin Z (2015) Building program vector representations for deep learning. In: International conference on knowledge science, engineering and management, pp 547–553. Springer

25.

Black PE (2018) A software assurance reference dataset: Thousands of programs with known bugs. J Res Natl Inst Stand Technol 123

26.

Black PE, Black PE (2018) Juliet 1.3 Test Suite: Changes From 1.2. US Department of Commerce, National Institute of Standards and Technology

27.

Ramsundar B, Zadeh RB (2018) TensorFlow for deep learning: from linear regression to reinforcement learning. O’Reilly Media Inc., Newton

28.

Shar LK, Tan HBK (2012) Predicting common web application vulnerabilities from input validation and sanitization code patterns. In: 2012 Proceedings of the 27th IEEE/ACM international conference on automated software engineering, pp 310–313. IEEE

29.

Grieco Gustavo, Grinblat Guillermo Luis, Uzal Lucas, Rawat Sanjay, Feist Josselin, Mounier Laurent (2016) Toward large-scale vulnerability discovery using machine learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 85–96. ACM

30.

Feng D, Wang LQ, Guoai X, Shaodong Z (2018) Defect prediction in android binary executables using deep neural network. Wireless Pers Commun 1020(3):2261–2285

31.

Lee YJ, Choi S-H, Kim C, Lim S-H, Park K-W (2017) Learning binary code with deep learning to detect software weakness. In: KSII the 9th international conference on internet (ICONI) 2017 symposium

32.

Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM et al (2018) Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497

33.

Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. IEEE

34.

Li Z, Zou D, Xu S, Jin H, Qi H, Hu J (2016) Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd ACCSA, pp 201–213. ACM

35.

Sepp H, Jürgen S (1997) Long short-term memory. Neural Comput 90(8):1735–1780

36.

Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2017) Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368

37.

Li Z, Zou D, Xu S, Jin H, Zhu Y, Chen Z, Wang S, Wang J (2018) Sysevr: a framework for using deep learning to detect software vulnerabilities. arXiv preprint arXiv:1807.06756

38.

Kostadinov S (2019) Understanding GRU networks. https://www.Towardsdatascience.com (December 2017). Accessed 30 Apr 2019

39.

Wu F, Wang J, Liu J, Wang W (2017) Vulnerability detection with deep learning. In: 2017 3rd IEEE international conference on computer and communications (ICCC), pp 1298–1302. IEEE

40.

Le T, Nguyen T, Le T, Phung D, Montague P, De Olivier V, Qu L (2018) Maximal divergence sequential autoencoder for binary software vulnerability detection

41.

Sukhbaatar S, Weston J, Fergus R et al (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2440–2448

42.

Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:1410.3916

43.

Yonghee S, Laurie W (2013) Can traditional fault prediction models be used for vulnerability prediction? ESE 180(1):25–59

44.

Wang M, Zhu T, Zhang T, Zhang J, Yu S, Zhou W (2020) Security and privacy in 6G networks: new areas and new challenges. Digit Commun Netw 6(3):281–291CrossRef

45.

Vivienne S, Yu-Hsin C, Tien-Ju Y, Emer Joel S (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 1050(12):2295–2329

46.

Miltiadis A, Barr Earl T, Premkumar D, Charles S (2018) A survey of machine learning for big code and naturalness. ACM Comput Surv 510(4):81

47.

Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of NDSS

48.

Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019

49.

Olah C (2015) Understanding LSTM networks. GITHUB blog. Accessed 30 Apr 2019

50.

Nguyen M (2018) Illustrated guide to LSTM’s and GRU’s: a step by step explanation. https://www.Towardsdatascience.com. Accessed 30 Apr 2019

51.

Britz D (2015) Recurrent neural network tutorial, part 4 - implementing a GRU/LSTM RNN with python and theano. https://www.Wildml.com. Accessed 30 Apr 2019

52.

Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

53.

Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820

54.

Yih W-T, He X, Meek C (2014) Semantic parsing for single-relation question answering. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 2, pp 643–648

55.

Junyang Q, Jun Z, Wei L, Lei P, Surya N, Yang X (2020) A survey of android malware detection with deep neural models. ACM Comput Surv (CSUR) 530(6):1–36

56.

Chollet F et al (2015) Keras. https://github.com/fchollet/keras

57.

Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International conference on machine learning, pp 1050–1059

58.

Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

59.

Christopher PR, Manning D, Schütze H (2009) Introduction to information retrieval. Cambridge University Press, CambridgeMATH

60.

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283

61.

Radim R, Petr S (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, pp 45–50

62.

Xu ZJ (2018) Understanding training and generalization in deep learning by Fourier analysis. arXiv preprint arXiv:1808.04295

Titel: Deep neural-based vulnerability discovery demystified: data, model and performance
verfasst von: Guanjun Lin
Wei Xiao
Leo Yu Zhang
Shang Gao
Yonghang Tai
Jun Zhang
Publikationsdatum: 17.05.2021
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 20/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-05954-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 20/2021

Image retrieval based on texture using latent space representation of discrete Fourier transformed maps

New exponential operation laws and operators for interval-valued q-rung orthopair fuzzy sets in group decision making process

A novel context-aware feature extraction method for convolutional neural network-based intrusion detection systems

U-net based analysis of MRI for Alzheimer’s disease diagnosis

New extension of data encryption standard over 128-bit key for digital images

A multi-branch deep convolutional fusion architecture for 3D microwave inverse scattering: stored grain application

Premium Partner