Skip to main content

2025 | OriginalPaper | Buchkapitel

Evaluating AI-Based Code Segmentation for ABAP Programs in an Industrial Use Case

verfasst von : Richard Mayer, Michael Moser, Niklas Greif, Florian Schnitzhofer, Verena Geist, Martin Pinzger

Erschienen in: Product-Focused Software Process Improvement. Industry-, Workshop-, and Doctoral Symposium Papers

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many maintenance and evolution tasks in software engineering depend on the availability of logical code segments, especially AI- and data-driven approaches rely on segmented source code. In order to obtain logical segments a manual step is typically necessary. However, manual segmentation requires a basic understanding of the source code and delays the application of code analysis and refactoring tools. Automatic code segmentation provides an efficient way of extracting code snippets for further analysis to provide developers with actionable insights on software products and processes. Rule-based approaches rely on syntactic boundaries and lack the applicability of segmentation on multiple languages. In this article, we present our approach to learning logical code snippets using a BiLSTM neural network model. Driven by the requirements of an industrial use case, we train two models, one on the programming languages Go, Java, JavaScript, Python, PHP, and Ruby, the other model is additionally trained on ABAP (“Advanced Business Application Programming”) snippets. To evaluate the performance of the models, we use real-world samples used in the SAP applications of our industry partner. We also compare the predictions of the model, with an accuracy >98%, to the results of human experts for segmenting ABAP code to evaluate whether AI-based code segmentation is perceived as effective by practitioners in our industrial use case. The study shows that only 42–51% of the predicted ABAP code snippets match the manual segmentation of the experts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Geist, V., Moser, M., Pichler, J., Schnitzhofer, F.: Innovating industry with research: eknows and sysparency. IEEE Softw. 41(3), 41–48 (2024)CrossRef Geist, V., Moser, M., Pichler, J., Schnitzhofer, F.: Innovating industry with research: eknows and sysparency. IEEE Softw. 41(3), 41–48 (2024)CrossRef
2.
Zurück zum Zitat Ning, J.Q., Engberts, A., Kozaczynski, W.V.: Automated support for legacy code understanding. Commun. ACM 37(5), 50–58 (1994)CrossRef Ning, J.Q., Engberts, A., Kozaczynski, W.V.: Automated support for legacy code understanding. Commun. ACM 37(5), 50–58 (1994)CrossRef
3.
Zurück zum Zitat Wang, X., Pollock, L., Vijay-Shanker, K.: Automatic segmentation of method code into meaningful blocks to improve readability. In: 2011 18th Working Conference on Reverse Engineering, pp. 35–44. IEEE (2011) Wang, X., Pollock, L., Vijay-Shanker, K.: Automatic segmentation of method code into meaningful blocks to improve readability. In: 2011 18th Working Conference on Reverse Engineering, pp. 35–44. IEEE (2011)
4.
Zurück zum Zitat Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 125–130 (2016) Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 125–130 (2016)
6.
7.
Zurück zum Zitat Stein, A.J., Schwartz, D., Shi, Y., Mancoridis, S.: Linguistic approach to segmenting source code. In: 2022 IEEE 16th International Conference on Semantic Computing (ICSC), pp. 177–178 (2022) Stein, A.J., Schwartz, D., Shi, Y., Mancoridis, S.: Linguistic approach to segmenting source code. In: 2022 IEEE 16th International Conference on Semantic Computing (ICSC), pp. 177–178 (2022)
8.
Zurück zum Zitat Mayer, R., Moser, M., Geist, V.: Leveraging and evaluating automatic code summarization for JPA program comprehension. In: IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 768–772. IEEE (2023) Mayer, R., Moser, M., Geist, V.: Leveraging and evaluating automatic code summarization for JPA program comprehension. In: IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 768–772. IEEE (2023)
9.
Zurück zum Zitat Ahmad, W., Chakraborty, S., Ray, B., Chang, K.-W.: A transformer-based approach for source code summarization. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4998–5007. Association for Computational Linguistics (2020). https://aclanthology.org/2020.acl-main.449 Ahmad, W., Chakraborty, S., Ray, B., Chang, K.-W.: A transformer-based approach for source code summarization. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4998–5007. Association for Computational Linguistics (2020). https://​aclanthology.​org/​2020.​acl-main.​449
10.
Zurück zum Zitat LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, ser. ICPC 2020, pp. 184–195. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3387904.3389268 LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, ser. ICPC 2020, pp. 184–195. Association for Computing Machinery, New York (2020). https://​doi.​org/​10.​1145/​3387904.​3389268
11.
Zurück zum Zitat Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv e-prints, arXiv-1909 (2019) Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv e-prints, arXiv-1909 (2019)
12.
Zurück zum Zitat Kocetkov, D., et al.: The stack: 3 TB of permissively licensed source code. Preprint (2022) Kocetkov, D., et al.: The stack: 3 TB of permissively licensed source code. Preprint (2022)
13.
Zurück zum Zitat Puri, R., et al.: Codenet: a large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655 (2021) Puri, R., et al.: Codenet: a large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:​2105.​12655 (2021)
14.
Zurück zum Zitat Yao, Z., Weld, D.S., Chen, W.-P., Sun, H.: StaQC: a systematically mined question-code dataset from stack overflow. In: Proceedings of the 2018 World Wide Web Conference, pp. 1693–1703 (2018) Yao, Z., Weld, D.S., Chen, W.-P., Sun, H.: StaQC: a systematically mined question-code dataset from stack overflow. In: Proceedings of the 2018 World Wide Web Conference, pp. 1693–1703 (2018)
15.
Zurück zum Zitat Fournier, C.: Evaluating text segmentation using boundary edit distance. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1702–1712 (2013) Fournier, C.: Evaluating text segmentation using boundary edit distance. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1702–1712 (2013)
17.
Zurück zum Zitat Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef
18.
Zurück zum Zitat Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34, 177–210 (1999)CrossRef Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34, 177–210 (1999)CrossRef
19.
Zurück zum Zitat Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef
20.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
21.
Zurück zum Zitat Gong, L., Elhoushi, M., Cheung, A.: AST-T5: structure-aware pretraining for code generation and understanding. arXiv preprint arXiv:2401.03003 (2024) Gong, L., Elhoushi, M., Cheung, A.: AST-T5: structure-aware pretraining for code generation and understanding. arXiv preprint arXiv:​2401.​03003 (2024)
Metadaten
Titel
Evaluating AI-Based Code Segmentation for ABAP Programs in an Industrial Use Case
verfasst von
Richard Mayer
Michael Moser
Niklas Greif
Florian Schnitzhofer
Verena Geist
Martin Pinzger
Copyright-Jahr
2025
DOI
https://doi.org/10.1007/978-3-031-78392-0_9