Skip to main content
Top

2025 | OriginalPaper | Chapter

Evaluating AI-Based Code Segmentation for ABAP Programs in an Industrial Use Case

Authors : Richard Mayer, Michael Moser, Niklas Greif, Florian Schnitzhofer, Verena Geist, Martin Pinzger

Published in: Product-Focused Software Process Improvement. Industry-, Workshop-, and Doctoral Symposium Papers

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many maintenance and evolution tasks in software engineering depend on the availability of logical code segments, especially AI- and data-driven approaches rely on segmented source code. In order to obtain logical segments a manual step is typically necessary. However, manual segmentation requires a basic understanding of the source code and delays the application of code analysis and refactoring tools. Automatic code segmentation provides an efficient way of extracting code snippets for further analysis to provide developers with actionable insights on software products and processes. Rule-based approaches rely on syntactic boundaries and lack the applicability of segmentation on multiple languages. In this article, we present our approach to learning logical code snippets using a BiLSTM neural network model. Driven by the requirements of an industrial use case, we train two models, one on the programming languages Go, Java, JavaScript, Python, PHP, and Ruby, the other model is additionally trained on ABAP (“Advanced Business Application Programming”) snippets. To evaluate the performance of the models, we use real-world samples used in the SAP applications of our industry partner. We also compare the predictions of the model, with an accuracy >98%, to the results of human experts for segmenting ABAP code to evaluate whether AI-based code segmentation is perceived as effective by practitioners in our industrial use case. The study shows that only 42–51% of the predicted ABAP code snippets match the manual segmentation of the experts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Geist, V., Moser, M., Pichler, J., Schnitzhofer, F.: Innovating industry with research: eknows and sysparency. IEEE Softw. 41(3), 41–48 (2024)CrossRef Geist, V., Moser, M., Pichler, J., Schnitzhofer, F.: Innovating industry with research: eknows and sysparency. IEEE Softw. 41(3), 41–48 (2024)CrossRef
2.
go back to reference Ning, J.Q., Engberts, A., Kozaczynski, W.V.: Automated support for legacy code understanding. Commun. ACM 37(5), 50–58 (1994)CrossRef Ning, J.Q., Engberts, A., Kozaczynski, W.V.: Automated support for legacy code understanding. Commun. ACM 37(5), 50–58 (1994)CrossRef
3.
go back to reference Wang, X., Pollock, L., Vijay-Shanker, K.: Automatic segmentation of method code into meaningful blocks to improve readability. In: 2011 18th Working Conference on Reverse Engineering, pp. 35–44. IEEE (2011) Wang, X., Pollock, L., Vijay-Shanker, K.: Automatic segmentation of method code into meaningful blocks to improve readability. In: 2011 18th Working Conference on Reverse Engineering, pp. 35–44. IEEE (2011)
4.
go back to reference Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 125–130 (2016) Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 125–130 (2016)
7.
go back to reference Stein, A.J., Schwartz, D., Shi, Y., Mancoridis, S.: Linguistic approach to segmenting source code. In: 2022 IEEE 16th International Conference on Semantic Computing (ICSC), pp. 177–178 (2022) Stein, A.J., Schwartz, D., Shi, Y., Mancoridis, S.: Linguistic approach to segmenting source code. In: 2022 IEEE 16th International Conference on Semantic Computing (ICSC), pp. 177–178 (2022)
8.
go back to reference Mayer, R., Moser, M., Geist, V.: Leveraging and evaluating automatic code summarization for JPA program comprehension. In: IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 768–772. IEEE (2023) Mayer, R., Moser, M., Geist, V.: Leveraging and evaluating automatic code summarization for JPA program comprehension. In: IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 768–772. IEEE (2023)
9.
go back to reference Ahmad, W., Chakraborty, S., Ray, B., Chang, K.-W.: A transformer-based approach for source code summarization. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4998–5007. Association for Computational Linguistics (2020). https://aclanthology.org/2020.acl-main.449 Ahmad, W., Chakraborty, S., Ray, B., Chang, K.-W.: A transformer-based approach for source code summarization. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4998–5007. Association for Computational Linguistics (2020). https://​aclanthology.​org/​2020.​acl-main.​449
10.
go back to reference LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, ser. ICPC 2020, pp. 184–195. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3387904.3389268 LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, ser. ICPC 2020, pp. 184–195. Association for Computing Machinery, New York (2020). https://​doi.​org/​10.​1145/​3387904.​3389268
11.
go back to reference Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv e-prints, arXiv-1909 (2019) Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv e-prints, arXiv-1909 (2019)
12.
go back to reference Kocetkov, D., et al.: The stack: 3 TB of permissively licensed source code. Preprint (2022) Kocetkov, D., et al.: The stack: 3 TB of permissively licensed source code. Preprint (2022)
13.
go back to reference Puri, R., et al.: Codenet: a large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655 (2021) Puri, R., et al.: Codenet: a large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:​2105.​12655 (2021)
14.
go back to reference Yao, Z., Weld, D.S., Chen, W.-P., Sun, H.: StaQC: a systematically mined question-code dataset from stack overflow. In: Proceedings of the 2018 World Wide Web Conference, pp. 1693–1703 (2018) Yao, Z., Weld, D.S., Chen, W.-P., Sun, H.: StaQC: a systematically mined question-code dataset from stack overflow. In: Proceedings of the 2018 World Wide Web Conference, pp. 1693–1703 (2018)
15.
go back to reference Fournier, C.: Evaluating text segmentation using boundary edit distance. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1702–1712 (2013) Fournier, C.: Evaluating text segmentation using boundary edit distance. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1702–1712 (2013)
17.
go back to reference Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef
18.
go back to reference Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34, 177–210 (1999)CrossRef Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34, 177–210 (1999)CrossRef
19.
go back to reference Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef
20.
go back to reference Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
21.
go back to reference Gong, L., Elhoushi, M., Cheung, A.: AST-T5: structure-aware pretraining for code generation and understanding. arXiv preprint arXiv:2401.03003 (2024) Gong, L., Elhoushi, M., Cheung, A.: AST-T5: structure-aware pretraining for code generation and understanding. arXiv preprint arXiv:​2401.​03003 (2024)
Metadata
Title
Evaluating AI-Based Code Segmentation for ABAP Programs in an Industrial Use Case
Authors
Richard Mayer
Michael Moser
Niklas Greif
Florian Schnitzhofer
Verena Geist
Martin Pinzger
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-78392-0_9

Premium Partner