Skip to main content
Erschienen in: Empirical Software Engineering 4/2023

01.07.2023

A graph-based code representation method to improve code readability classification

verfasst von: Qing Mi, Yi Zhan, Han Weng, Qinghang Bao, Longjie Cui, Wei Ma

Erschienen in: Empirical Software Engineering | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Context

Code readability is crucial for developers since it is closely related to code maintenance and affects developers’ work efficiency. Code readability classification refers to the source code being classified as pre-defined certain levels according to its readability. So far, many code readability classification models have been proposed in existing studies, including deep learning networks that have achieved relatively high accuracy and good performance.

Objective

However, in terms of representation, these methods lack effective preservation of the syntactic and semantic structure of the source code. To extract these features, we propose a graph-based code representation method.

Method

Firstly, the source code is parsed into a graph containing its abstract syntax tree (AST) combined with control and data flow edges to reserve the semantic structural information and then we convert the graph nodes’ source code and type information into vectors. Finally, we train our graph neural networks model composing Graph Convolutional Network (GCN), DMoNPooling, and K-dimensional Graph Neural Networks (k-GNNs) layers to extract these features from the program graph.

Result

We evaluate our approach to the task of code readability classification using a Java dataset provided by Scalabrino et al. (2016). The results show that our method achieves 72.5% and 88% in three-class and two-class classification accuracy, respectively.

Conclusion

We are the first to introduce graph-based representation into code readability classification. Our method outperforms state-of-the-art readability models, which suggests that the graph-based code representation method is effective in extracting syntactic and semantic information from source code, and ultimately improves code readability classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alawad DM, Panta M, Zibran MF, et al (2019) An empirical study of the relationships between code readability and software complexity. arXiv:1909.01760 Alawad DM, Panta M, Zibran MF, et al (2019) An empirical study of the relationships between code readability and software complexity. arXiv:​1909.​01760
Zurück zum Zitat Allamanis M, Barr ET, Sutton C (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering Allamanis M, Barr ET, Sutton C (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering
Zurück zum Zitat Cao S, Sun X, Bo L et al (2021) Bgnn4vd: Constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 136:106576CrossRef Cao S, Sun X, Bo L et al (2021) Bgnn4vd: Constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 136:106576CrossRef
Zurück zum Zitat Devlin J, Chang MW, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang MW, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
Zurück zum Zitat Fakhoury S, Roy D, Hassan SA, et al (2019) Improving source code readability: Theory and practice. In: 2019 IEEE/ACM 27th international conference on program comprehension (ICPC). pp 2–12 Fakhoury S, Roy D, Hassan SA, et al (2019) Improving source code readability: Theory and practice. In: 2019 IEEE/ACM 27th international conference on program comprehension (ICPC). pp 2–12
Zurück zum Zitat Feng Z, Guo D, Tang D, et al (2020) Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 Feng Z, Guo D, Tang D, et al (2020) Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:​2002.​08155
Zurück zum Zitat Johnson J, Lubo S, Yedla N, et al (2019) An empirical study assessing source code readability in comprehension. In: 2019 IEEE International conference on software maintenance and evolution (ICSME). pp 513–523 Johnson J, Lubo S, Yedla N, et al (2019) An empirical study assessing source code readability in comprehension. In: 2019 IEEE International conference on software maintenance and evolution (ICSME). pp 513–523
Zurück zum Zitat LeClair A, Haque S, Wu LL, et al (2020) Improved code summarization via a graph neural network. In: Proceedings of the 28th international conference on program comprehension LeClair A, Haque S, Wu LL, et al (2020) Improved code summarization via a graph neural network. In: Proceedings of the 28th international conference on program comprehension
Zurück zum Zitat Ling C, Huang J, Zhang H (2003) Auc: a statistically consistent and more discriminating measure than accuracy. In: Proc 18th Int’l joint conf artificial intelligence (IJCAI) Ling C, Huang J, Zhang H (2003) Auc: a statistically consistent and more discriminating measure than accuracy. In: Proc 18th Int’l joint conf artificial intelligence (IJCAI)
Zurück zum Zitat Li Y, Tarlow D, Brockschmidt M, et al (2016) Gated graph sequence neural networks. In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://arxiv.org/abs/1511.05493 Li Y, Tarlow D, Brockschmidt M, et al (2016) Gated graph sequence neural networks. In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://​arxiv.​org/​abs/​1511.​05493
Zurück zum Zitat Maddison CJ, Tarlow D (2014) Structured generative models of natural source code. ArXiv abs/1401.0514 Maddison CJ, Tarlow D (2014) Structured generative models of natural source code. ArXiv abs/1401.0514
Zurück zum Zitat Mannan UA, Ahmed I, Sarma A (2018) Towards understanding code readability and its impact on design quality. In: Proceedings of the 4th ACM SIGSOFT international workshop on NLP for software engineering. pp 18–21 Mannan UA, Ahmed I, Sarma A (2018) Towards understanding code readability and its impact on design quality. In: Proceedings of the 4th ACM SIGSOFT international workshop on NLP for software engineering. pp 18–21
Zurück zum Zitat Ma Y, Wang S, Aggarwal CC, et al (2019) Graph convolutional networks with eigenpooling. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 723–731 Ma Y, Wang S, Aggarwal CC, et al (2019) Graph convolutional networks with eigenpooling. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 723–731
Zurück zum Zitat Mi Q, Hao Y, Wu M, et al (2022b) An enhanced data augmentation approach to support multi-class code readability classification. In: International conference on software engineering and knowledge engineering Mi Q, Hao Y, Wu M, et al (2022b) An enhanced data augmentation approach to support multi-class code readability classification. In: International conference on software engineering and knowledge engineering
Zurück zum Zitat Morris C, Ritzert M, Fey M, et al (2019) Weisfeiler and leman go neural: Higher-order graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence. pp 4602–4609 Morris C, Ritzert M, Fey M, et al (2019) Weisfeiler and leman go neural: Higher-order graph neural networks. In: Proceedings of the AAAI conference on artificial intelligence. pp 4602–4609
Zurück zum Zitat Pantiuchina J, Lanza M, Bavota G (2018) Improving code: The (mis) perception of quality metrics. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). pp 80–91 Pantiuchina J, Lanza M, Bavota G (2018) Improving code: The (mis) perception of quality metrics. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). pp 80–91
Zurück zum Zitat Piantadosi V, Fierro F, Scalabrino S et al (2020) How does code readability change during software evolution? Empir Softw Eng 25:5374–5412CrossRef Piantadosi V, Fierro F, Scalabrino S et al (2020) How does code readability change during software evolution? Empir Softw Eng 25:5374–5412CrossRef
Zurück zum Zitat Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the 8th working conference on mining software repositories. pp 73–82 Posnett D, Hindle A, Devanbu P (2011) A simpler model of software readability. In: Proceedings of the 8th working conference on mining software repositories. pp 73–82
Zurück zum Zitat Raychev V, Bielik P, Vechev MT (2016) Probabilistic model for code with decision trees. In: Visser E, Smaragdakis Y (eds) Proceedings of the 2016 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016. ACM, pp 731–747. https://doi.org/10.1145/2983990.2984041 Raychev V, Bielik P, Vechev MT (2016) Probabilistic model for code with decision trees. In: Visser E, Smaragdakis Y (eds) Proceedings of the 2016 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016. ACM, pp 731–747. https://​doi.​org/​10.​1145/​2983990.​2984041
Zurück zum Zitat Scalabrino S, Linares-Vasquez M, Poshyvanyk D, et al (2016) Improving code readability models with textual features. In: 2016 IEEE 24th International conference on program comprehension (ICPC), IEEE, pp 1–10 Scalabrino S, Linares-Vasquez M, Poshyvanyk D, et al (2016) Improving code readability models with textual features. In: 2016 IEEE 24th International conference on program comprehension (ICPC), IEEE, pp 1–10
Zurück zum Zitat Sedano T (2016) Code readability testing, an empirical study. In: 2016 IEEE 29th International conference on software engineering education and training (CSEET). pp 111–117 Sedano T (2016) Code readability testing, an empirical study. In: 2016 IEEE 29th International conference on software engineering education and training (CSEET). pp 111–117
Zurück zum Zitat Tsitsulin A, Palowitch J, Perozzi B, et al (2020) Graph clustering with graph neural networks. arXiv preprint arXiv:2006.16904 Tsitsulin A, Palowitch J, Perozzi B, et al (2020) Graph clustering with graph neural networks. arXiv preprint arXiv:​2006.​16904
Zurück zum Zitat Vagavolu D, Swarna KC, Chimalakonda S (2021) A mocktail of source code representations. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). pp 1296–1300 Vagavolu D, Swarna KC, Chimalakonda S (2021) A mocktail of source code representations. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). pp 1296–1300
Zurück zum Zitat Wang X, Ji H, Shi C, et al (2019) Heterogeneous graph attention network. In: The world wide web conference. pp 2022–2032 Wang X, Ji H, Shi C, et al (2019) Heterogeneous graph attention network. In: The world wide web conference. pp 2022–2032
Zurück zum Zitat Wang W, Li G, Ma B, et al (2020a) Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Kontogiannis K, Khomh F, Chatzigeorgiou A, et al (eds) 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020. IEEE, pp 261–271. https://doi.org/10.1109/SANER48275.2020.9054857 Wang W, Li G, Ma B, et al (2020a) Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Kontogiannis K, Khomh F, Chatzigeorgiou A, et al (eds) 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020. IEEE, pp 261–271. https://​doi.​org/​10.​1109/​SANER48275.​2020.​9054857
Zurück zum Zitat Xia X, Bao L, Lo D et al (2017) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Softw Eng 44(10):951–976CrossRef Xia X, Bao L, Lo D et al (2017) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Softw Eng 44(10):951–976CrossRef
Zurück zum Zitat Yamaguchi F, Golde N, Arp D, et al (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on security and privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014. IEEE Computer Society, pp 590–604, https://doi.org/10.1109/SP.2014.44 Yamaguchi F, Golde N, Arp D, et al (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on security and privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014. IEEE Computer Society, pp 590–604, https://​doi.​org/​10.​1109/​SP.​2014.​44
Zurück zum Zitat Zhang C, Song D, Huang C, et al (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 793–803 Zhang C, Song D, Huang C, et al (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 793–803
Zurück zum Zitat Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Process Syst 915:11 Zhou Y, Liu S, Siow J, Du X, Liu Y (2019) Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv Neural Inf Process Syst 915:11
Metadaten
Titel
A graph-based code representation method to improve code readability classification
verfasst von
Qing Mi
Yi Zhan
Han Weng
Qinghang Bao
Longjie Cui
Wei Ma
Publikationsdatum
01.07.2023
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 4/2023
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10319-6

Weitere Artikel der Ausgabe 4/2023

Empirical Software Engineering 4/2023 Zur Ausgabe

Premium Partner