ABSTRACT
Software Vulnerabilities (SVs) are security flaws that are exploitable in cyber-attacks. Delay in the detection and assessment of SVs might cause serious consequences due to the unknown impacts on the attacked systems. The state-of-the-art approaches have been proposed to work directly on the committed code changes for early detection. However, none of them could provide both commit-level vulnerability detection and assessment at once. Moreover, the assessment approaches still suffer low accuracy due to limited representations for code changes and surrounding contexts.
We propose a Context-aware, Graph-based, Commit-level Vulnerability Detection and Assessment Model, VDA, that evaluates a code change, detects any vulnerability and provides the CVSS assessment grades. To build VDA, we have key novel components. First, we design a novel context-aware, graph-based, representation learning model to learn the contextualized embeddings for the code changes that integrate program dependencies and the surrounding contexts of code changes, facilitating the automated vulnerability detection and assessment. Second, VDA considers the mutual impact of learning to detect vulnerability and learning to assess each vulnerability assessment type. To do so, it leverages multi-task learning among the vulnerability detection and vulnerability assessment tasks, improving all the tasks at the same time. Our empirical evaluation shows that on a C vulnerability dataset, VDA achieves 25.5% and 26.9% relatively higher than the baselines in vulnerability assessment regarding F-score and MCC, respectively. In a Java dataset, it achieves 31% and 33.3% relatively higher than the baselines in F-score and MCC, respectively. VDA also relatively improves the vulnerability detection over the baselines from 13.4–322% in F-score.
Supplemental Material
- [n. d.]. Silhouette (clustering). https://en.wikipedia.org/wiki/Silhouette_(clustering). Last Accessed March 15, 2022 Google Scholar
- [n. d.]. T-SNE. https://scikit-learn.org/stable/modules/generated/ sklearn.manifold.TSNE.html. Last Accessed March 9, 2022 Google Scholar
- 2021. Common Vulnerability Scoring System. https://www.first.org/cvss/ Google Scholar
- 2021. The NNI autoML tool. https://github.com/microsoft/nni Google Scholar
- 2023. CAT. https://github.com/vulnerability-assessment-cat/vulnerability-assessment-cat Google Scholar
- Luca Allodi and Fabio Massacci. 2014. Comparing vulnerability severity and exploits using case-control studies. ACM Transactions on Information and System Security (TISSEC), 17, 1 (2014), 1–20. Google ScholarDigital Library
- Claudio Bellei, Hussain Alattas, and Nesrine Kaaniche. 2021. Label-GCN: An Effective Method for Adding Label Propagation to Graph Convolutional Networks. CoRR, abs/2104.02153 (2021), arXiv:2104.02153. arxiv:2104.02153 Google Scholar
- Mehran Bozorgi, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. 2010. Beyond heuristics: learning to classify vulnerabilities and predict exploits. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 105–114. Google ScholarDigital Library
- Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2021. Deep learning based vulnerability detection: Are we there yet. IEEE Transactions on Software Engineering. Google ScholarCross Ref
- Xiang Chen, Yingquan Zhao, Zhanqi Cui, Guozhu Meng, Yang Liu, and Zan Wang. 2019. Large-scale empirical studies on effort-aware security vulnerability prediction methods. IEEE Transactions on Reliability, 69, 1 (2019), 70–87. Google ScholarCross Ref
- Davide Falessi, Jacky Huang, Likhita Narayana, Jennifer Fong Thai, and Burak Turhan. 2020. On the need of preserving order of data when validating within-project defect classifiers. Empirical Software Engineering, 25, 6 (2020), 4805–4830. Google ScholarDigital Library
- Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. Association for Computing Machinery, New York, NY, USA. 508–512. isbn:9781450375177 https://doi.org/10.1145/3379597.3387501 Google ScholarDigital Library
- Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst., 9, 3 (1987), jul, 319–349. issn:0164-0925 https://doi.org/10.1145/24039.24041 Google ScholarDigital Library
- Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: A Transformer-based Line-Level Vulnerability Prediction. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR). 608–620. https://doi.org/10.1145/3524842.3528452 Google ScholarDigital Library
- J. Gorodkin. 2004. Comparing Two K-Category Assignments by a K-Category Correlation Coefficient. Comput. Biol. Chem., 28, 5–6 (2004), dec, 367–374. issn:1476-9271 https://doi.org/10.1016/j.compbiolchem.2004.09.006 Google ScholarDigital Library
- Zhuobing Han, Xiaohong Li, Zhenchang Xing, Hongtao Liu, and Zhiyong Feng. 2017. Learning to predict severity of software vulnerability using only vulnerability description. In 2017 IEEE International conference on software maintenance and evolution (ICSME). 125–136. Google ScholarCross Ref
- David Hin, Andrey Kan, Huaming Chen, and M Ali Babar. 2022. LineVD: Statement-level Vulnerability Detection using Graph Neural Networks. In International Conference on Mining Software Repositories (MSR’22). Google ScholarDigital Library
- Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. CC2Vec: Distributed Representations of Code Changes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 518–529. isbn:9781450371216 https://doi.org/10.1145/3377811.3380361 Google ScholarDigital Library
- Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482–7491. Google Scholar
- Saad Khan and Simon Parkinson. 2018. Review into state-of-the-art of vulnerability assessment using artificial intelligence. In Guide to Vulnerability Analysis for Computer Networks and Systems. Springer, 3–32. Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl Google Scholar
- Ahmed Lamkanfi, Serge Demeyer, Emanuel Giger, and Bart Goethals. 2010. Predicting the severity of a reported bug. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 1–10. Google ScholarCross Ref
- Triet Huynh Minh Le, Bushra Sabir, and Muhammad Ali Babar. 2019. Automated software vulnerability assessment with concept drift. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 371–382. Google Scholar
- Xin Li, Lu Wang, Yang Xin, Yixian Yang, and Yuling Chen. 2020. Automated vulnerability detection in source code using minimum intermediate representation learning. Applied Sciences, 10, 5 (2020), 1692. Google ScholarCross Ref
- Yi Li, Shaohua Wang, and Tien N. Nguyen. 2021. Vulnerability Detection with Fine-Grained Interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA. 292–303. isbn:9781450385626 https://doi.org/10.1145/3468264.3468597 Google ScholarDigital Library
- Zhen Li, Deqing Zou, Shouhuai Xu, Zhaoxuan Chen, Yawei Zhu, and Hai Jin. 2021. Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Transactions on Dependable and Secure Computing. Google Scholar
- Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing. Google Scholar
- Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681. Google Scholar
- Rocio Lozoya, Arnaud Baumann, Antonino Sabetta, and Michele Bezzi. 2019. commit2vec: Distributed Representation of Code Changes. Google Scholar
- Triet Huynh Minh Le, David Hin, Roland Croft, and M. Ali Babar. 2021. DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 717–729. https://doi.org/10.1109/ASE51524.2021.9678622 Google ScholarDigital Library
- Kartik Nayak, Daniel Marino, Petros Efstathopoulos, and Tudor Dumitraş. 2014. Some vulnerabilities are different than others. In International Workshop on Recent Advances in Intrusion Detection. 426–446. Google ScholarCross Ref
- Profir-Petru Pârundefinedachi, Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr. 2020. Flexeme: Untangling Commits Using Lexical Flows. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA. 63–74. isbn:9781450370431 https://doi.org/10.1145/3368089.3409693 Google ScholarDigital Library
- Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 426–437. Google ScholarDigital Library
- Serena Elisa Ponta, Henrik Plate, and Antonino Sabetta. 2018. Beyond metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 449–460. Google ScholarCross Ref
- Serena Elisa Ponta, Henrik Plate, and Antonino Sabetta. 2020. Detection, assessment and mitigation of vulnerabilities in open source dependencies. Empirical Software Engineering, 25, 5 (2020), 3175–3215. Google ScholarDigital Library
- Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 757–762. Google ScholarCross Ref
- Georgios Spanos and Lefteris Angelis. 2018. A multi-target approach to estimate software vulnerability characteristics and severity scores. Journal of Systems and Software, 146 (2018), 152–166. Google ScholarCross Ref
- Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. 2019. GNNExplainer: Generating Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 9244–9255. Google Scholar
- Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems, 32 (2019). Google Scholar
- Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th joint meeting on foundations of software engineering. 914–919. Google ScholarDigital Library
Index Terms
- Commit-Level, Neural Vulnerability Detection and Assessment
Recommendations
Automatic software vulnerability assessment by extracting vulnerability elements
AbstractSoftware vulnerabilities take threats to software security. When faced with multiple software vulnerabilities, the most urgent ones need to be fixed first. Therefore, it is critical to assess the severity of vulnerabilities in advance. ...
Highlights- Using vulnerability elements extracted from vulnerability description with BERT-MRC for vulnerability assessment.
Detecting Blind Cross-Site Scripting Attacks Using Machine Learning
SPML '18: Proceedings of the 2018 International Conference on Signal Processing and Machine LearningCross-site scripting (XSS) is a scripting attack targeting web applications by injecting malicious scripts into web pages. Blind XSS is a subset of stored XSS, where an attacker blindly deploys malicious payloads in web pages that are stored in a ...
Towards More Practical Automation of Vulnerability Assessment
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software EngineeringIt is increasingly suggested to identify emerging software vulnerabilities (SVs) through relevant development activities (e.g., issue reports) to allow early warnings to open source software (OSS) users. However, the support for the following assessment ...
Comments