Top

Published in:

2024 | OriginalPaper | Chapter

Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model

Authors : Ryunosuke Kaichi, Shinsuke Matsumoto, Shinji Kusumoto

Published in: Product-Focused Software Process Improvement

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program’s behavior. In this study, we propose a deep learning-based quirk fixation method that adopts grammatical error correction. One advantage of the proposed method is that it can be applied to any decompiler and programming language. Our experimental results show that the proposed method removes 55% of identifier quirks and 91% of structural quirks. In some cases, however, the proposed method injected a small amount of new quirks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Assessing IDEA Diagrams for Supporting Analysis of Capabilities and Issues in Technical Debt Management

next chapter Log Drift Impact on Online Anomaly Detection Workflows

Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. Trans. Dependable Secure Comput. (TDSC) 12(4), 400–412 (2014)CrossRef

Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Experience 25(7), 811–829 (1995)CrossRef

Cifuentes, C., Waddington, T., Van Emmerik, M.: Computer security analysis through decompilation and high-level debugging. In: Working Conference on Reverse Engineering (WCRE), pp. 375–380 (2001)

Falleri, J., Xavier Blanc, F.M., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: International Conference on Automated Software Engineering (ASE), pp. 313–324 (2014)

Harrand, N., Soto-Valero, C., Monperrus, M., Baudry, B.: The strengths and behavioral quirks of java bytecode decompilers. In: International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 92–102 (2019)

Hofmeister, J., Siegmund, J., Holt, D.: Shorter identifier names take longer to comprehend. In: International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 217–227 (2017)

Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)

Jaffe, A., Lacomis, J., Schwartz, E.J., Goues, C.L., Vasilescu, B.: Meaningful variable names for decompiled code: a machine translation approach. In: International Conference on Program Comprehension (ICPC), pp. 20–30 (2018)

Lacomis, J., et al.: Dire: A neural approach to decompiled identifier naming. In: International Conference on Automated Software Engineering (ASE), pp. 628–639 (2019)

10.

Liu, H., Shen, M., Zhu, J., Niu, N., Li, G., Zhang, L.: Deep learning based program generation from requirements text: are we there yet? Trans. Softw. Eng.(TSE) 48(4), 1268–1289 (2022)CrossRef

11.

Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017)CrossRef

12.

Nitin, V., Saieva, A., Ray, B., Kaiser, G.: Direct: a transformer-based model for decompiled identifier renaming. In: Workshop on Natural Language Processing for Programming (NLP4Prog), pp. 48–57 (2021)

13.

Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021)

Title: Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model
Authors: Ryunosuke Kaichi
Shinsuke Matsumoto
Shinji Kusumoto
Publisher: Springer Nature Switzerland
Book: Product-Focused Software Process Improvement
Print ISBN: 978-3-031-49265-5

Electronic ISBN: 978-3-031-49266-2

Copyright Year: 2024
DOI: https://doi.org/10.1007/978-3-031-49266-2_18

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner