Top

Empirical Software Engineering

Published in:

01-12-2022

SPVF: security property assisted vulnerability fixing via attention-based models

Authors: Zhou Zhou, Lili Bo, Xiaoxue Wu, Xiaobing Sun, Tao Zhang, Bin Li, Jiale Zhang, Sicong Cao

Published in: Empirical Software Engineering | Issue 7/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The past few years have witnessed the wide application of machine learning models to fix vulnerabilities automatically. However, existing approaches cannot capture the characteristics of vulnerabilities that are helpful to improve the effectiveness of automated vulnerability fixing. In this paper, we propose a novel approach for automatically fixing vulnerabilities, called SPVF. SPVF captures the security property from the descriptive information about the vulnerability. SPVF is based on the attention mechanism and uses the abstract syntax tree as well as the security properties, integrating them using the pointer generator. The experimental results on two public datasets show that SPVF outperforms the state-of-the-art approaches by 13% for C/C++ and 47% for Python. And SPVF is capable of successfully fixing 153 C/C++ vulnerabilities and 276 Python vulnerabilities.

previous article Test smells 20 years later: detectability, validity, and reliability

next article On the usage and development of deep learning compilers: an empirical study on TVM

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://zenodo.org/record/3559203#.XeRoytVG2Hs

https://zenodo.org/record/6324846

Note that a small number of vulnerabilities cannot be classified into the classifications listed due to the absence of CWE id information in the dataset.

https://github.com/SPVF/SPVF-for-vulnerability-fixing

Abhinav K, Sharvani V, Dubey A, D’Souza M, Bhardwaj N, Jain S, Arora V (2021) Repairnet: contextual sequence-to-sequence network for automated program repair. In: Roll I, McNamara DS, Sosnovsky SA, Luckin R, Dimitrova V (eds) Artificial intelligence in education - 22nd international conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Part I, Lecture notes in computer science, vol 12748. Springer, pp 3–15. https://doi.org/10.1007/978-3-030-78292-4_1

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings

Bird S, Loper E (2004) NLTK: the natural language toolkit. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, Spain, July 21–26, 2004 - Poster and Demonstration. ACL

Cao S, Sun X, Bo L, Wei Y, Li B (2021) BGNN4VD: constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 136(106):576. https://doi.org/10.1016/j.infsof.2021.106576CrossRef

Cao S, Sun X, Bo L, Wu R, Li B, Tao C (2022) MVD: memory-related vulnerability detection based on flow-sensitive graph neural networks. In: 44th IEEE/ACM 44th international conference on software engineering, ICSE 2022, Pittsburgh, PA, USA, May 25–27, 2022. IEEE, pp 1456–1468. https://doi.org/10.1145/3510003.3510219

Chakraborty S, Krishna R, Ding Y, Ray B (2020) Deep learning based vulnerability detection: are we there yet? arxiv:2009.07235

Chen Z, Kommrusch S, Monperrus M (2019) Using sequence-to-sequence learning for repairing C vulnerabilities. arxiv:1912.02015

Chen Z, Kommrusch S, Monperrus M (2022) Neural transfer learning for repairing security vulnerabilities in c code. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2019.2940179. arXiv:2104.08308

Chen Z, Kommrusch S, Tufano M, Pouchet L, Poshyvanyk D, Monperrus M (2021) Sequencer: sequence-to-sequence learning for end-to-end program repair. IEEE Trans Software Eng 47(9):1943–1959. https://doi.org/10.1109/TSE.2019.2940179CrossRef

Cheng X, Wang H, Hua J, Xu G, Sui Y (2021) Deepwukong: statically detecting software vulnerabilities using deep graph neural network. ACM Trans Softw Eng Methodol 30(3):38:1–38:33CrossRef

Chi J, Qu Y, Liu T, Zheng Q, Yin H (2020) Seqtrans: automatic vulnerability fix via sequence to sequence learning. arxiv:2010.10805. Accessed May 2021

Cooper N, Bernal-Cárdenas C, Chaparro O, Moran K, Poshyvanyk D (2021) It takes two to TANGO: combining visual and textual information for detecting duplicate video-based bug reports. In: 43rd IEEE/ACM international conference on software engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021. IEEE, pp 957–969. https://doi.org/10.1109/ICSE43902.2021.00091

CVE (2021) Common vulnerabilities and exposures. https://cve.mitre.org/. Accessed May 2021

CWE (2021a) Common weakness enumeration. https://cwe.mitre.org/. Accessed May 2021

CWE (2021b) CWE: the category by research concepts. https://cwe.mitre.org/data/definitions/1000.html. Accessed May 2021

Delamore B, Ko RKL (2015) A global, empirical analysis of the shellshock vulnerability in web applications. In: 2015 IEEE Trustcom/bigdataSE/ISPA, Helsinki, Finland, August 20–22, 2015, vol 1. IEEE, pp 1129–1135. https://doi.org/10.1109/Trustcom.2015.493

Durumeric Z, Kasten J, Adrian D, Halderman JA, Bailey M, Li F, Weaver N, Amann J, Beekman J, Payer M, Paxson V (2014) The matter of heartbleed. In: Proceedings of the 2014 internet measurement conference, IMC 2014, Vancouver, BC, Canada, November 5–7, 2014. ACM, pp 475–488. https://doi.org/10.1145/2663716.2663755

Exploit-DB (2021a) Online enrollment management system in php and paypal 1.0. https://www.exploit-db.com/exploits/50557. Accessed Dec 2021

Exploit-DB (2021b) Exploit database. https://www.exploit-db.com/. Accessed Dec 2021

Fan J, Li Y, Wang S, Nguyen TN (2020) A C/C++ code vulnerability dataset with code changes and CVE summaries. In: MSR ’20: 17th international conference on mining software repositories, Seoul, Republic of Korea, 29–30 June, 2020. ACM, pp 508–512

Freitag M, Al-Onaizan Y (2017) Beam search strategies for neural machine translation. In: Luong T, Birch A, Neubig G, Finch AM (eds) Proceedings of the first workshop on neural machine translation, NMT@ACL 2017, Vancouver, Canada, August 4, 2017. Association for Computational Linguistics, pp 56–60. https://doi.org/10.18653/v1/w17-3207

Github (2007) Github: a code hosting platform for version control and collaboration. https://github.com/

Harer J, Ozdemir O, Lazovich T, Reale CP, Russell RL, Kim LY, Chin SP (2018) Learning to repair software vulnerabilities with generative adversarial networks. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, canada, pp 7944–7954

Huang Z, Lie D, Tan G, Jaeger T (2019) Using safety properties to generate vulnerability patches. In: 2019 IEEE symposium on security and privacy, SP 2019, San Francisco, CA, USA, May 19–23, 2019. IEEE, pp 539–554

Jiang N, Lutellier T, Tan L (2021) CURE: code-aware neural machine translation for automatic program repair. In: 43rd IEEE/ACM international conference on software engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021. IEEE, pp 1161–1173. https://doi.org/10.1109/ICSE43902.2021.00107

Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. arXiv:1412.6980

Koyuncu A, Liu K, Bissyandé TF, Kim D, Klein J, Monperrus M, Traon YL (2020) Fixminer: mining relevant fix patterns for automated program repair. Empir Softw Eng 25(3):1980–2024. https://doi.org/10.1007/s10664-019-09780-zCrossRef

Li B, Wei Y, Sun X, Bo L, Chen D, Tao C (2022) Towards the identification of bug entities and relations in bug reports. Autom Softw Eng 29(1):24. https://doi.org/10.1007/s10515-022-00325-1CrossRef

Li F, Paxson V (2017) A large-scale empirical study of security patches. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, CCS 2017, Dallas, TX, USA, October 30–November 03, 2017. ACM, pp 2201–2215

Li Y, Wang S, Tien N (2021) Vulnerability detection with fine-grained interpretations, pp 292–303. https://doi.org/10.1145/3468264.3468597

Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. arxiv:1801.01681

Lutellier T, Pham HV, Pang L, Li Y, Wei M, Tan L (2020) Coconut: combining context-aware neural translation models using ensemble for program repair. In: ISSTA ’20: 29th ACM SIGSOFT international symposium on software testing and analysis, virtual event, USA, July 18–22, 2020. ACM, pp 101–114

Mesbah A, Rice A, Johnston E, Glorioso N, Aftandilian E (2019) Deepdelta: learning to repair compilation errors. In: Dumas M, Pfahl D, Apel S, Russo A (eds) Proceedings of the ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26–30, 2019. ACM, pp 925–936. https://doi.org/10.1145/3338906.3340455

Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surv 51(1):17:1–17:24

Ni Z, Li B, Sun X, Chen T, Tang B, Shi X (2020) Analyzing bug fix for automatic bug cause classification. J Syst Softw 163(110):538. https://doi.org/10.1016/j.jss.2020.110538CrossRef

Ott M, Edunov S, Baevski A, Fan A, Gross S, Ng N, Grangier D, Auli M (2019) Fairseq: a fast, extensible toolkit for sequence modeling. In: Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Demonstrations. Association for Computational Linguistics, pp 48–53

Palo Alto Networks (2021) The state of exploit development: 80% of exploits publish faster than cves. https://unit42.paloaltonetworks.com/state-of-exploit-development/. Accessed Dec 2021

Pradel M, Murali V, Qian R, Machalica M, Meijer E, Chandra S (2020) Scaffle: bug localization on millions of files. In: Khurshid S, Pasareanu CS (eds) ISSTA ’20: 29th ACM SIGSOFT international symposium on software testing and analysis, virtual event, USA, July 18–22, 2020. ACM, pp 225–236. https://doi.org/10.1145/3395363.3397356

Python (2021) Python program analysis by Microsoft. https://github.com/microsoft/python-program-analysis

See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, Volume 1: Long Papers. Association for Computational Linguistics, pp 1073–1083

Sun X, Peng X, Zhang K, Liu Y, Cai Y (2019) How security bugs are fixed and what can be improved: an empirical study with mozilla. Sci China Inf Sci 62(1):19,102:1–19,102:3. https://doi.org/10.1007/s11432-017-9459-5CrossRef

Tarlow D, Moitra S, Rice A, Chen Z, Manzagol P, Sutton C, Aftandilian E (2020) Learning to fix build errors with graph2diff neural networks. In: ICSE ’20: 42nd international conference on software engineering, workshops, Seoul, Republic of Korea, 27 June–19 July, 2020. ACM, pp 19–20. https://doi.org/10.1145/3387940.3392181

Transformer (2020) https://en.wikipedia.org/wiki/Transformer_(machine_learning_model). Accessed May 2021

Tufano M, Watson C, Bavota G, Penta MD, White M, Poshyvanyk D (2018) An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018, Montpellier, France, September 3–7, 2018. ACM, pp 832–837. https://doi.org/10.1145/3238147.3240732

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5998–6008

Vudenc (2021) Vulnerability detection with deep learning on a natural codebase. https://github.com/LauraWartschinski/VulnerabilityDetection. Accessed May 2021

Wartschinski L (2019) Vudenc - python corpus for word2vec. https://doi.org/10.5281/ZENODO.3559480. Accessed May 2021

Wei Y, Sun X, Bo L, Cao S, Xia X, Li B (2021) A comprehensive study on security bug characteristics. J Softw Evol Process 33(10). https://doi.org/10.1002/smr.2376

Wikipedia (2021) Scripting language by wikipedia. https://en.wikipedia.org/wiki/Scripting_language. Accessed Dec 2021

Xin Q, Reiss SP (2017) Leveraging syntax-related code for automated program repair. In: Rosu G, Penta MD, Nguyen TN (eds) Proceedings of the 32nd IEEE/ACM international conference on automated software engineering, ASE 2017, Urbana, IL, USA, October 30–November 03, 2017. IEEE Computer Society, pp 660–670. https://doi.org/10.1109/ASE.2017.8115676

Yamaguchi F Joern: a platform for robust analysis of c/c++ code. https://github.com/octopus-platform/joern/tree/master

Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE symposium on security and privacy, SP 2014, Berkeley, CA, USA, May 18–21, 2014. IEEE Computer Society, pp 590–604. https://doi.org/10.1109/SP.2014.44

Yang S, Wang Y, Chu X (2020) A survey of deep learning techniques for neural machine translation. arxiv:2002.07526

Yasunaga M, Liang P (2020) Graph-based, self-supervised program repair from diagnostic feedback. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, Virtual Event, Proceedings of machine learning research, vol 119, pp 10,799–10,808. PMLR

Zhou C, Li B, Sun X, Bo L (2021) Why and what happened? Aiding bug comprehension with automated category and causal link identification. Empir Softw Eng 26(6):118. https://doi.org/10.1007/s10664-021-10010-8CrossRef

Title: SPVF: security property assisted vulnerability fixing via attention-based models
Authors: Zhou Zhou
Lili Bo
Xiaoxue Wu
Xiaobing Sun
Tao Zhang
Bin Li
Jiale Zhang
Sicong Cao
Publication date: 01-12-2022
Publisher: Springer US
Published in: Empirical Software Engineering / Issue 7/2022
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-022-10216-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 7/2022

Static test flakiness prediction: How Far Can We Go?

FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction

A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction

On the usage and development of deep learning compilers: an empirical study on TVM

Stubbifier: debloating dynamic server-side JavaScript applications

Basic block coverage for search-based unit testing and crash reproduction

Premium Partner