Skip to main content
Top

2024 | OriginalPaper | Chapter

Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model

Authors : Ryunosuke Kaichi, Shinsuke Matsumoto, Shinji Kusumoto

Published in: Product-Focused Software Process Improvement

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program’s behavior. In this study, we propose a deep learning-based quirk fixation method that adopts grammatical error correction. One advantage of the proposed method is that it can be applied to any decompiler and programming language. Our experimental results show that the proposed method removes 55% of identifier quirks and 91% of structural quirks. In some cases, however, the proposed method injected a small amount of new quirks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. Trans. Dependable Secure Comput. (TDSC) 12(4), 400–412 (2014)CrossRef Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. Trans. Dependable Secure Comput. (TDSC) 12(4), 400–412 (2014)CrossRef
2.
go back to reference Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Experience 25(7), 811–829 (1995)CrossRef Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Experience 25(7), 811–829 (1995)CrossRef
3.
go back to reference Cifuentes, C., Waddington, T., Van Emmerik, M.: Computer security analysis through decompilation and high-level debugging. In: Working Conference on Reverse Engineering (WCRE), pp. 375–380 (2001) Cifuentes, C., Waddington, T., Van Emmerik, M.: Computer security analysis through decompilation and high-level debugging. In: Working Conference on Reverse Engineering (WCRE), pp. 375–380 (2001)
4.
go back to reference Falleri, J., Xavier Blanc, F.M., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: International Conference on Automated Software Engineering (ASE), pp. 313–324 (2014) Falleri, J., Xavier Blanc, F.M., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: International Conference on Automated Software Engineering (ASE), pp. 313–324 (2014)
5.
go back to reference Harrand, N., Soto-Valero, C., Monperrus, M., Baudry, B.: The strengths and behavioral quirks of java bytecode decompilers. In: International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 92–102 (2019) Harrand, N., Soto-Valero, C., Monperrus, M., Baudry, B.: The strengths and behavioral quirks of java bytecode decompilers. In: International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 92–102 (2019)
6.
go back to reference Hofmeister, J., Siegmund, J., Holt, D.: Shorter identifier names take longer to comprehend. In: International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 217–227 (2017) Hofmeister, J., Siegmund, J., Holt, D.: Shorter identifier names take longer to comprehend. In: International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 217–227 (2017)
7.
go back to reference Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019) Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:​1909.​09436 (2019)
8.
go back to reference Jaffe, A., Lacomis, J., Schwartz, E.J., Goues, C.L., Vasilescu, B.: Meaningful variable names for decompiled code: a machine translation approach. In: International Conference on Program Comprehension (ICPC), pp. 20–30 (2018) Jaffe, A., Lacomis, J., Schwartz, E.J., Goues, C.L., Vasilescu, B.: Meaningful variable names for decompiled code: a machine translation approach. In: International Conference on Program Comprehension (ICPC), pp. 20–30 (2018)
9.
go back to reference Lacomis, J., et al.: Dire: A neural approach to decompiled identifier naming. In: International Conference on Automated Software Engineering (ASE), pp. 628–639 (2019) Lacomis, J., et al.: Dire: A neural approach to decompiled identifier naming. In: International Conference on Automated Software Engineering (ASE), pp. 628–639 (2019)
10.
go back to reference Liu, H., Shen, M., Zhu, J., Niu, N., Li, G., Zhang, L.: Deep learning based program generation from requirements text: are we there yet? Trans. Softw. Eng.(TSE) 48(4), 1268–1289 (2022)CrossRef Liu, H., Shen, M., Zhu, J., Niu, N., Li, G., Zhang, L.: Deep learning based program generation from requirements text: are we there yet? Trans. Softw. Eng.(TSE) 48(4), 1268–1289 (2022)CrossRef
11.
go back to reference Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017)CrossRef Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017)CrossRef
12.
go back to reference Nitin, V., Saieva, A., Ray, B., Kaiser, G.: Direct: a transformer-based model for decompiled identifier renaming. In: Workshop on Natural Language Processing for Programming (NLP4Prog), pp. 48–57 (2021) Nitin, V., Saieva, A., Ray, B., Kaiser, G.: Direct: a transformer-based model for decompiled identifier renaming. In: Workshop on Natural Language Processing for Programming (NLP4Prog), pp. 48–57 (2021)
13.
go back to reference Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021) Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:​2109.​00859 (2021)
Metadata
Title
Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model
Authors
Ryunosuke Kaichi
Shinsuke Matsumoto
Shinji Kusumoto
Copyright Year
2024
DOI
https://doi.org/10.1007/978-3-031-49266-2_18

Premium Partner