Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 3/2024

13.10.2023

Design and evaluation of highly accurate smart contract code vulnerability detection framework

verfasst von: Sowon Jeon, Gilhee Lee, Hyoungshick Kim, Simon S. Woo

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Smart contracts are self-executing programs stored and executed on a blockchain platform. However, previous studies demonstrated that developing secure smart contracts is not easy. Unfortunately, the use of insecure smart contracts results in a significant financial loss for service providers or customers. Therefore, identifying security vulnerabilities in smart contracts would be essential in blockchain platforms using smart contracts. In this paper, we present SmartConDetect as a tool for detecting security vulnerabilities in Solidity smart contracts. SmartConDetect is a static analysis tool that extracts code fragments from Solidity smart contracts and uses a pre-trained BERT model to find susceptible code patterns. To demonstrate the performance of SmartConDetect, we use two public datasets, and our dataset (SmartConDataset) collected from the real-world Ethereum blockchain network. Our experimental results show that SmartConDetect significantly outperforms all state-of-the-art methods, achieving 90.9% F1-score when using our own dataset. Specifically, SmartConDetect is about 2 times faster than SmartCheck in detection. Furthermore, we conduct a real-world case study to analyze the distribution of detected vulnerabilities.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ashizawa N, Yanai N, Cruz JP, Okamura S (2021) Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:2101.02377 Ashizawa N, Yanai N, Cruz JP, Okamura S (2021) Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:​2101.​02377
Zurück zum Zitat Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN international workshop on state of the art in java program analysis. SOAP ’12. Association for Computing Machinery, New York, pp 3–8. https://doi.org/10.1145/2259051.2259052 Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN international workshop on state of the art in java program analysis. SOAP ’12. Association for Computing Machinery, New York, pp 3–8. https://​doi.​org/​10.​1145/​2259051.​2259052
Zurück zum Zitat Buratti L, Pujar S, Bornea M, McCarley S, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, et al (2020) Exploring software naturalness through neural language models. arXiv preprint arXiv:2006.12641 Buratti L, Pujar S, Bornea M, McCarley S, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, et al (2020) Exploring software naturalness through neural language models. arXiv preprint arXiv:​2006.​12641
Zurück zum Zitat Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
Zurück zum Zitat Durieux T, Ferreira JF, Abreu R, Cruz P (2020) Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 530–541 Durieux T, Ferreira JF, Abreu R, Cruz P (2020) Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 530–541
Zurück zum Zitat Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:​2002.​08155
Zurück zum Zitat Guo D, Ren S, Lu S, Feng Z, Tang D, LIU S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) GraphCodeBERT: pre-training code representations with data flow. In: International conference on learning representations. https://openreview.net/forum?id=jLoC4ez43PZ Guo D, Ren S, Lu S, Feng Z, Tang D, LIU S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) GraphCodeBERT: pre-training code representations with data flow. In: International conference on learning representations. https://​openreview.​net/​forum?​id=​jLoC4ez43PZ
Zurück zum Zitat Jamin S, Cheng Jin, Kurc, AR, Raz D, Shavitt Y (2020) Smart contract vulnerability detection using graph neural network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020 Jamin S, Cheng Jin, Kurc, AR, Raz D, Shavitt Y (2020) Smart contract vulnerability detection using graph neural network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020
Zurück zum Zitat Jeon S, Lee G, Kim H, Woo SS (2021) SmartConDetect: highly accurate smart contract code vulnerability detection mechanism using BERT. In: KDD workshop on programming language processing (PLP) Jeon S, Lee G, Kim H, Woo SS (2021) SmartConDetect: highly accurate smart contract code vulnerability detection mechanism using BERT. In: KDD workshop on programming language processing (PLP)
Zurück zum Zitat Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 96–105 Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 96–105
Zurück zum Zitat Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, PMLR, pp 5110–5121 Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, PMLR, pp 5110–5121
Zurück zum Zitat Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, et al (2021) CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, et al (2021) CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:​2102.​04664
Zurück zum Zitat Lutz O, Chen H, Fereidooni H, Sendner C, Dmitrienko A, Sadeghi AR, Koushanfar F (2021) ESCORT: ethereum smart COntRacTs vulnerability detection using deep neural network and transfer learning. arXiv preprint arXiv:2103.12607 Lutz O, Chen H, Fereidooni H, Sendner C, Dmitrienko A, Sadeghi AR, Koushanfar F (2021) ESCORT: ethereum smart COntRacTs vulnerability detection using deep neural network and transfer learning. arXiv preprint arXiv:​2103.​12607
Zurück zum Zitat Momeni P, Wang Y, Samavi R (2019) Machine learning model for smart contracts security analysis. In: 2019 17th international conference on privacy, security and trust (PST), IEEE, pp 1–6 Momeni P, Wang Y, Samavi R (2019) Machine learning model for smart contracts security analysis. In: 2019 17th international conference on privacy, security and trust (PST), IEEE, pp 1–6
Zurück zum Zitat Palladino S (2017) The parity wallet hack explained. OpenZeppelin blog, https://blog. openzeppelin. com/on-the-parity-wallet-multisig-hack-405a8c12e8f7 Palladino S (2017) The parity wallet hack explained. OpenZeppelin blog, https://​blog.​ openzeppelin. com/on-the-parity-wallet-multisig-hack-405a8c12e8f7
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNet Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNet
Zurück zum Zitat Popper N (2016) A hacking of more than \$50 million dashes hopes in the world of virtual currency. The New York Times Popper N (2016) A hacking of more than \$50 million dashes hopes in the world of virtual currency. The New York Times
Zurück zum Zitat Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. https://doi.org/10.1109/ICMLA.2018.00120 Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. https://​doi.​org/​10.​1109/​ICMLA.​2018.​00120
Zurück zum Zitat Swathi B, Anju R (2019) Reformulation of natural language queries on source code base using NLP techniques. Compusoft 8(2):3047–3052 Swathi B, Anju R (2019) Reformulation of natural language queries on source code base using NLP techniques. Compusoft 8(2):3047–3052
Zurück zum Zitat Szabo N (1997) Formalizing and securing relationships on public networks. First Monday Szabo N (1997) Formalizing and securing relationships on public networks. First Monday
Zurück zum Zitat Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y (2018) Smartcheck: static analysis of ethereum smart contracts. In: 2018 IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB), pp 9–16 Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y (2018) Smartcheck: static analysis of ethereum smart contracts. In: 2018 IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB), pp 9–16
Zurück zum Zitat van Dam JK (2016) Identifying source code programming languages through natural language processing. PhD thesis, MS thesis, Faculty Sci., Math. Inform., Univ. Amsterdam, Amsterdam van Dam JK (2016) Identifying source code programming languages through natural language processing. PhD thesis, MS thesis, Faculty Sci., Math. Inform., Univ. Amsterdam, Amsterdam
Zurück zum Zitat Wang W, Song J, Xu G, Li Y, Wang H, Su C (2020) Contractward: automated vulnerability detection models for ethereum smart contracts. IEEE Trans Netw Sci Eng 8(2):1133–1144CrossRef Wang W, Song J, Xu G, Li Y, Wang H, Su C (2020) Contractward: automated vulnerability detection models for ethereum smart contracts. IEEE Trans Netw Sci Eng 8(2):1133–1144CrossRef
Zurück zum Zitat Wu J (2021) Literature review on vulnerability detection using NLP technology Wu J (2021) Literature review on vulnerability detection using NLP technology
Metadaten
Titel
Design and evaluation of highly accurate smart contract code vulnerability detection framework
verfasst von
Sowon Jeon
Gilhee Lee
Hyoungshick Kim
Simon S. Woo
Publikationsdatum
13.10.2023
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 3/2024
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-023-00981-1

Weitere Artikel der Ausgabe 3/2024

Data Mining and Knowledge Discovery 3/2024 Zur Ausgabe

Premium Partner