Skip to main content

2020 | OriginalPaper | Buchkapitel

On Data Analysis of Software Repositories

verfasst von : Dmitry Namiot, Vladimir Romanov

Erschienen in: Convergent Cognitive Information Technologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This article discusses the analysis of software repositories using data analysis methods. A review is made of methods for analyzing programs based on information retrieved from the program code stored in code repositories. A review is made of methods for analyzing programs based on information retrieved from the program code stored in repositories. The article reviews the works that apply methods of classification, clustering and depth learning in software development. For example, for classifying and predicting errors, changing the properties of code in the process of its evolution, detecting design flaws and debts, assist for code refactoring. The main ultimate goal for all models is, of course, an automation of programming. In practice, we are talking about more simple tasks. This includes, for example, information retrieval (program code), error prediction, clone detection, link analysis, evolution analysis, etc. Firstly, we discuss recurrent neural networks and their deployment for the analysis of software repositories. In the simplest case, recurrent networks model a programming language as a sequence of characters. Also, the paper covers clustering and topic modeling.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Rich, C., Waters, R.C. (eds.): Readings in Artificial Intelligence and Software Engineering. Morgan Kaufmann Publishers Inc., San Francisco (1986) Rich, C., Waters, R.C. (eds.): Readings in Artificial Intelligence and Software Engineering. Morgan Kaufmann Publishers Inc., San Francisco (1986)
5.
Zurück zum Zitat White, M., Vendome, C., Linares-Vásquez, M., Poshyvanyk, D.: Toward deep learning software repositories. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, pp. 334–345 (2015). https://doi.org/10.1109/msr.2015.38 White, M., Vendome, C., Linares-Vásquez, M., Poshyvanyk, D.: Toward deep learning software repositories. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, pp. 334–345 (2015). https://​doi.​org/​10.​1109/​msr.​2015.​38
7.
Zurück zum Zitat Nguyen, T.T., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: A statistical semantic language model for source code. In: Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013), pp. 532–542. ACM, New York (2013). https://doi.org/10.1145/2491411.2491458 Nguyen, T.T., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: A statistical semantic language model for source code. In: Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013), pp. 532–542. ACM, New York (2013). https://​doi.​org/​10.​1145/​2491411.​2491458
8.
Zurück zum Zitat Afshan, S., McMinn, P., Stevenson, M.: Evolving readable string test inputs using a natural language model to reduce human oracle cost. In: Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luxembourg, pp. 352–361 (2013). https://doi.org/10.1109/icst.2013.11 Afshan, S., McMinn, P., Stevenson, M.: Evolving readable string test inputs using a natural language model to reduce human oracle cost. In: Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, Luxembourg, pp. 352–361 (2013). https://​doi.​org/​10.​1109/​icst.​2013.​11
9.
Zurück zum Zitat Movshovitz-Attias, D., Cohen, W.W.: Natural language models for predicting programming comments. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 35–40. Association for Computational Linguistics (2013) Movshovitz-Attias, D., Cohen, W.W.: Natural language models for predicting programming comments. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 35–40. Association for Computational Linguistics (2013)
10.
Zurück zum Zitat Allamanis, M., Sutton, C.A.: Mining source code repositories at massive scale using language modeling. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), San Francisco, CA, USA, May 2013, pp. 207–216 (2013) Allamanis, M., Sutton, C.A.: Mining source code repositories at massive scale using language modeling. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), San Francisco, CA, USA, May 2013, pp. 207–216 (2013)
11.
Zurück zum Zitat Campbell, J.C., Hindle, A., Amaral, J.N.: Syntax errors just aren’t natural: Improving error reporting with language models. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014), pp. 252–261. ACM, New York (2014). https://doi.org/10.1145/2597073.2597102 Campbell, J.C., Hindle, A., Amaral, J.N.: Syntax errors just aren’t natural: Improving error reporting with language models. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014), pp. 252–261. ACM, New York (2014). https://​doi.​org/​10.​1145/​2597073.​2597102
14.
Zurück zum Zitat Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014), pp. 281–293. ACM, New York (2014). https://doi.org/10.1145/2635868.2635883 Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014), pp. 281–293. ACM, New York (2014). https://​doi.​org/​10.​1145/​2635868.​2635883
18.
Zurück zum Zitat Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. In: Caivano, D., Oivo, M., Baldassarre, M.T., Visaggio, G. (eds.) PROFES 2011. LNCS, vol. 6759, pp. 247–261. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21843-9_20CrossRef Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. In: Caivano, D., Oivo, M., Baldassarre, M.T., Visaggio, G. (eds.) PROFES 2011. LNCS, vol. 6759, pp. 247–261. Springer, Heidelberg (2011). https://​doi.​org/​10.​1007/​978-3-642-21843-9_​20CrossRef
20.
Zurück zum Zitat Kouroshfar, E., Mirakhorli, M., Bagheri, H., Xiao, L., Malek, S., Cai, Y.: A study on the role of software architecture in the evolution and quality of software. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, pp. 246–257 (2015). https://doi.org/10.1109/msr.2015.30 Kouroshfar, E., Mirakhorli, M., Bagheri, H., Xiao, L., Malek, S., Cai, Y.: A study on the role of software architecture in the evolution and quality of software. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, pp. 246–257 (2015). https://​doi.​org/​10.​1109/​msr.​2015.​30
21.
Zurück zum Zitat Li, Z., Liang, P., Avgeriou, P., Guelfi, N., Ampatzoglou, A.: An empirical investigation of modularity metrics for indicating architectural technical debt. In: Proceedings of the 10th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA 2014), pp. 119–128. ACM, New York (2014). https://doi.org/10.1145/2602576.2602581 Li, Z., Liang, P., Avgeriou, P., Guelfi, N., Ampatzoglou, A.: An empirical investigation of modularity metrics for indicating architectural technical debt. In: Proceedings of the 10th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA 2014), pp. 119–128. ACM, New York (2014). https://​doi.​org/​10.​1145/​2602576.​2602581
22.
Zurück zum Zitat Fernandes, E., Oliveira, J., Vale, G., Paiva, T., Figueiredo, E.: A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE 2016), Article 18, p. 18. ACM, New York (2016). https://doi.org/10.1145/2915970.2915984 Fernandes, E., Oliveira, J., Vale, G., Paiva, T., Figueiredo, E.: A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE 2016), Article 18, p. 18. ACM, New York (2016). https://​doi.​org/​10.​1145/​2915970.​2915984
23.
Zurück zum Zitat Blincoe, K., Harrison, F., Damian, D.K.: Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, pp. 202–211 (2015). https://doi.org/10.1109/msr.2015.26 Blincoe, K., Harrison, F., Damian, D.K.: Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, Florence, pp. 202–211 (2015). https://​doi.​org/​10.​1109/​msr.​2015.​26
26.
Zurück zum Zitat Thomas, S.W.: Mining unstructured software repositories using IR models. Ph.D. thesis, Queen’s University, Canada (2012) Thomas, S.W.: Mining unstructured software repositories using IR models. Ph.D. thesis, Queen’s University, Canada (2012)
Metadaten
Titel
On Data Analysis of Software Repositories
verfasst von
Dmitry Namiot
Vladimir Romanov
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-37436-5_24