nach oben

Data Mining and Knowledge Discovery

Erschienen in:

13.10.2023

Design and evaluation of highly accurate smart contract code vulnerability detection framework

verfasst von: Sowon Jeon, Gilhee Lee, Hyoungshick Kim, Simon S. Woo

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Smart contracts are self-executing programs stored and executed on a blockchain platform. However, previous studies demonstrated that developing secure smart contracts is not easy. Unfortunately, the use of insecure smart contracts results in a significant financial loss for service providers or customers. Therefore, identifying security vulnerabilities in smart contracts would be essential in blockchain platforms using smart contracts. In this paper, we present SmartConDetect as a tool for detecting security vulnerabilities in Solidity smart contracts. SmartConDetect is a static analysis tool that extracts code fragments from Solidity smart contracts and uses a pre-trained BERT model to find susceptible code patterns. To demonstrate the performance of SmartConDetect, we use two public datasets, and our dataset (SmartConDataset) collected from the real-world Ethereum blockchain network. Our experimental results show that SmartConDetect significantly outperforms all state-of-the-art methods, achieving 90.9% F1-score when using our own dataset. Specifically, SmartConDetect is about 2 times faster than SmartCheck in detection. Furthermore, we conduct a real-world case study to analyze the distribution of detected vulnerabilities.

Vorheriger Artikel Enhancing cluster analysis via topological manifold learning

Nächster Artikel Traffic forecasting on new roads using spatial contrastive pre-training (SCPT)

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Acampora G, Cosma G (2015) A fuzzy-based approach to programming language independent source-code plagiarism detection. In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8. https://doi.org/10.1109/FUZZ-IEEE.2015.7337935

Alqarni M, Azim A (2022) Low level source code vulnerability detection using advanced BERT language model. https://assets.pubpub.org/bbi2k2lr/31652980468154.pdf. Accessed 03 Sept 2022

Ashizawa N, Yanai N, Cruz JP, Okamura S (2021) Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:2101.02377

Atzei N, Bartoletti M, Cimoli T (2017) A survey of attacks on ethereum smart contracts sok. In: Proceedings of the 6th International conference on principles of security and trust, Springer, Berlin, Heidelberg, Vol. 10204, pp 164–186. https://doi.org/10.1007/978-3-662-54455-6_8

BeautifulSoup: beautiful soup documentation: beautiful soup 4.9.0 documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/. Accessed 26 Jan 2022

Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN international workshop on state of the art in java program analysis. SOAP ’12. Association for Computing Machinery, New York, pp 3–8. https://doi.org/10.1145/2259051.2259052

Buratti L, Pujar S, Bornea M, McCarley S, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, et al (2020) Exploring software naturalness through neural language models. arXiv preprint arXiv:2006.12641

Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Ding SHH (2021) Kam1n0 Server. McGill University. https://github.com/McGill-DMaS/Kam1n0-Community

Durieux T, Ferreira JF, Abreu R, Cruz P (2020) Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 530–541

Ethereum (2021) Gas and fees at ethereum development documentation. https://ethereum.org/ko/developers/docs/gas/. Accessed 1 Dec 2021

Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155

GitHub SD (2021) Solidity. Ethereum. https://docs.soliditylang.org/en/v0.8.4/

Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) UniXcoder: unified cross-modal pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2203.03850. arXiv:https://arxiv.org/abs/2203.03850

Guo D, Ren S, Lu S, Feng Z, Tang D, LIU S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) GraphCodeBERT: pre-training code representations with data flow. In: International conference on learning representations. https://openreview.net/forum?id=jLoC4ez43PZ

Hauser B (2021) py-solc. Ethereum. https://github.com/ethereum/py-solc

Hewa T, Ylianttila M, Liyanage M (2021) Survey on blockchain based smart contracts: applications, opportunities and challenges. J Netw Comput Appl 177:102857. https://doi.org/10.1016/j.jnca.2020.102857CrossRef

Jamin S, Cheng Jin, Kurc, AR, Raz D, Shavitt Y (2020) Smart contract vulnerability detection using graph neural network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020

Jeon S, Lee G, Kim H, Woo SS (2021) SmartConDetect: highly accurate smart contract code vulnerability detection mechanism using BERT. In: KDD workshop on programming language processing (PLP)

Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 96–105

Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, PMLR, pp 5110–5121

Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, et al (2021) CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664

Lutz O, Chen H, Fereidooni H, Sendner C, Dmitrienko A, Sadeghi AR, Koushanfar F (2021) ESCORT: ethereum smart COntRacTs vulnerability detection using deep neural network and transfer learning. arXiv preprint arXiv:2103.12607

Momeni P, Wang Y, Samavi R (2019) Machine learning model for smart contracts security analysis. In: 2019 17th international conference on privacy, security and trust (PST), IEEE, pp 1–6

Palladino S (2017) The parity wallet hack explained. OpenZeppelin blog, https://blog. openzeppelin. com/on-the-parity-wallet-multisig-hack-405a8c12e8f7

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNet

Popper N (2016) A hacking of more than \$50 million dashes hopes in the world of virtual currency. The New York Times

Reproducibility: PyTorch 1.13 documentation. https://pytorch.org/docs/stable/notes/randomness.html. Accessed 24 Dec 2022

Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. https://doi.org/10.1109/ICMLA.2018.00120

SafeMath (2017) https://github.com/OpenZeppelin/zeppelin-solidity/blob/master/contracts/math/SafeMath.sol

Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974

Solidity v0.5.0 Breaking Changes (2016) https://docs.soliditylang.org/en/latest/050-breaking-changes.html#

Swathi B, Anju R (2019) Reformulation of natural language queries on source code base using NLP techniques. Compusoft 8(2):3047–3052

Szabo N (1997) Formalizing and securing relationships on public networks. First Monday

Team TE (2021) Ethereum (ETH) blockchain explorer. https://etherscan.io/. (Accessed on 05/21/2021)

Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y (2018) Smartcheck: static analysis of ethereum smart contracts. In: 2018 IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB), pp 9–16

van Dam JK (2016) Identifying source code programming languages through natural language processing. PhD thesis, MS thesis, Faculty Sci., Math. Inform., Univ. Amsterdam, Amsterdam

Wang W, Song J, Xu G, Li Y, Wang H, Su C (2020) Contractward: automated vulnerability detection models for ethereum smart contracts. IEEE Trans Netw Sci Eng 8(2):1133–1144CrossRef

Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X (2021) SynCoBERT: syntax-guided multi-modal contrastive pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2108.04556. arXiv:https://arxiv.org/abs/2108.04556

Wu J (2021) Literature review on vulnerability detection using NLP technology

Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696

Titel: Design and evaluation of highly accurate smart contract code vulnerability detection framework
verfasst von: Sowon Jeon
Gilhee Lee
Hyoungshick Kim
Simon S. Woo
Publikationsdatum: 13.10.2023
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 3/2024
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-023-00981-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2024

Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data

Somtimes: self organizing maps for time series clustering and its application to serious illness conversations

Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series

OEC: an online ensemble classifier for mining data streams with noisy labels

Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms

Correction to: Effective signal reconstruction from multiple ranked lists via convex optimization

Premium Partner