Skip to main content
Erschienen in: Innovations in Systems and Software Engineering 2/2022

07.03.2021 | S.I. : ACITSEP

On the use of textual feature extraction techniques to support the automated detection of refactoring documentation

verfasst von: Licelot Marmolejos, Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni

Erschienen in: Innovations in Systems and Software Engineering | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Refactoring is the art of improving the internal structure of a program without altering its external behavior, and it is an important task when it comes to software maintainability. While existing studies have focused on the detection of refactoring operations by mining software repositories, little was done to understand how developers document their refactoring activities. Therefore, there is recent trend trying to detect developers documentation of refactoring, by manually analyzing their internal and external software documentation. However, these techniques are limited by their manual process, which hinders their scalability. Hence, in this study, we tackle the detection of refactoring documentation as binary classification problem. We focus on the automatic detection of refactoring activities in commit messages by relying on text-mining, natural language preprocessing, and supervised machine learning techniques. We design our tool to overcome the limitation of the manual process, previously proposed by existing studies, through exploring the transformation of commit messages into features that are used to train various models. For our evaluation, we use and compare five different binary classification algorithms, and we test the effectiveness of these models using an existing dataset of manually curated messages that are known to be documenting refactoring activities in the source code. The experiments are carried out with different data sizes and number of bits. As per our results, the combination of Chi-Squared with Bayes point machine and Fisher score with Bayes point machine could be the most efficient when it comes to automatically identifying refactoring text patterns in commit messages, with an accuracy of 0.96, and an F-score of 0.96.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat AlOmar EA, Mkaouer MW, Ouni A (2019) Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In: Proceedings of the 3nd international workshop on refactoring-accepted. IEEE AlOmar EA, Mkaouer MW, Ouni A (2019) Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In: Proceedings of the 3nd international workshop on refactoring-accepted. IEEE
3.
Zurück zum Zitat AlOmar EA, Mkaouer MW, Ouni A, Kessentini M (2019) On the impact of refactoring on the relationship between quality attributes and design metrics. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–11. IEEE AlOmar EA, Mkaouer MW, Ouni A, Kessentini M (2019) On the impact of refactoring on the relationship between quality attributes and design metrics. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–11. IEEE
4.
Zurück zum Zitat AlOmar EA, Peruma A, Mkaouer MW, Newman C, Ouni A, Kessentini M (2020) How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl 167:114176CrossRef AlOmar EA, Peruma A, Mkaouer MW, Newman C, Ouni A, Kessentini M (2020) How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl 167:114176CrossRef
5.
Zurück zum Zitat AlOmar EA, Rodriguez PT, Bowman J, Wang T, Adepoju B, Lopez K, Newman CD, Ouni A, Mkaouer MW (2020) How do developers refactor code to improve code reusability? In: International conference on software and systems reuse. Springer AlOmar EA, Rodriguez PT, Bowman J, Wang T, Adepoju B, Lopez K, Newman CD, Ouni A, Mkaouer MW (2020) How do developers refactor code to improve code reusability? In: International conference on software and systems reuse. Springer
6.
Zurück zum Zitat Alrubaye H, Mkaouer MW, Ouni A (2019) Migrationminer: an automated detection tool of third-party java library migration at the method level. In: 2019 IEEE international conference on software maintenance and evolution (ICSME), pp 414–417. IEEE Alrubaye H, Mkaouer MW, Ouni A (2019) Migrationminer: an automated detection tool of third-party java library migration at the method level. In: 2019 IEEE international conference on software maintenance and evolution (ICSME), pp 414–417. IEEE
7.
Zurück zum Zitat Alrubaye H, Mkaouer MW, Ouni A (2019) On the use of information retrieval to automate the detection of third-party java library migration at the method level. In: Proceedings of the 27th international conference on program comprehension, pp 347–357. IEEE Press Alrubaye H, Mkaouer MW, Ouni A (2019) On the use of information retrieval to automate the detection of third-party java library migration at the method level. In: Proceedings of the 27th international conference on program comprehension, pp 347–357. IEEE Press
8.
Zurück zum Zitat Bavota G, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) An empirical study on the developers’ perception of software coupling. In: Proceedings of the 2013 international conference on software engineering, pp 692–701. IEEE Press Bavota G, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) An empirical study on the developers’ perception of software coupling. In: Proceedings of the 2013 international conference on software engineering, pp 692–701. IEEE Press
9.
Zurück zum Zitat Bavota G, Panichella S, Tsantalis N, Di Penta M, Oliveto R, Canfora G (2014)Recommending refactorings based on team co-maintenance patterns. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 337–342. ACM Bavota G, Panichella S, Tsantalis N, Di Penta M, Oliveto R, Canfora G (2014)Recommending refactorings based on team co-maintenance patterns. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 337–342. ACM
10.
Zurück zum Zitat Chávez A, Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes?: a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, pp 74–83. ACM Chávez A, Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes?: a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, pp 74–83. ACM
11.
Zurück zum Zitat Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Soft Eng 20(6):476–493CrossRef Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Soft Eng 20(6):476–493CrossRef
12.
Zurück zum Zitat Demeyer S, Ducasse S, Nierstrasz O (2000) Finding refactorings via change metrics. In: ACM SIGPLAN notices, vol 35, pp 166–177. ACM Demeyer S, Ducasse S, Nierstrasz O (2000) Finding refactorings via change metrics. In: ACM SIGPLAN notices, vol 35, pp 166–177. ACM
13.
Zurück zum Zitat Di Z, Li B, Li Z, Liang P (2018) A preliminary investigation of self-admitted refactorings in open source software (S). In: The 30th international conference on software engineering and knowledge engineering, Hotel Pullman, Redwood City, California, USA, July 1–3, [13], pp 165–164. https://doi.org/10.18293/SEKE2018-081 Di Z, Li B, Li Z, Liang P (2018) A preliminary investigation of self-admitted refactorings in open source software (S). In: The 30th international conference on software engineering and knowledge engineering, Hotel Pullman, Redwood City, California, USA, July 1–3, [13], pp 165–164. https://​doi.​org/​10.​18293/​SEKE2018-081
14.
Zurück zum Zitat Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming, pp 404–428. Springer Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming, pp 404–428. Springer
15.
Zurück zum Zitat Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296CrossRef Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296CrossRef
16.
Zurück zum Zitat Hattori LP, Lanza M (2008) On the nature of commits. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering, pp III–63. IEEE Press Hattori LP, Lanza M (2008) On the nature of commits. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering, pp III–63. IEEE Press
17.
Zurück zum Zitat Hayashi S, Tsuda Y, Saeki M (2010) Search-based refactoring detection from source code revisions. IEICE Trans Inf Syst 93(4):754–762CrossRef Hayashi S, Tsuda Y, Saeki M (2010) Search-based refactoring detection from source code revisions. IEICE Trans Inf Syst 93(4):754–762CrossRef
18.
19.
Zurück zum Zitat Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on Mining software repositories, pp 99–108. ACM Hindle A, German DM, Holt R (2008) What do large commits tell us?: a taxonomical study of large commits. In: Proceedings of the 2008 international working conference on Mining software repositories, pp 99–108. ACM
20.
Zurück zum Zitat Howe NR, Rath TM, Manmatha R (2005) Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 377–383. ACM Howe NR, Rath TM, Manmatha R (2005) Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 377–383. ACM
21.
Zurück zum Zitat Kehrer T, Kelter U, Taentzer G (2011) A rule-based approach to the semantic lifting of model differences in the context of model versioning. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, pp 163–172. IEEE Computer Society Kehrer T, Kelter U, Taentzer G (2011) A rule-based approach to the semantic lifting of model differences in the context of model versioning. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, pp 163–172. IEEE Computer Society
22.
Zurück zum Zitat Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, pp 371–372. ACM Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, pp 371–372. ACM
23.
Zurück zum Zitat Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150CrossRef Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150CrossRef
24.
Zurück zum Zitat Mahouachi R, Kessentini M, Cinnéide MÓ (2013) Search-based refactoring detection using software metrics variation. In: International symposium on search based software engineering, pp 126–140. Springer Mahouachi R, Kessentini M, Cinnéide MÓ (2013) Search-based refactoring detection using software metrics variation. In: International symposium on search based software engineering, pp 126–140. Springer
25.
Zurück zum Zitat Mansouri MM (2018) Detection of rename local variable refactoring instances in commit history. PhD thesis, Concordia University Mansouri MM (2018) Detection of rename local variable refactoring instances in commit history. PhD thesis, Concordia University
26.
Zurück zum Zitat Mkaouer MW, Kessentini M, Bechikh S, Deb K, Ó Cinnéide M (2014) Recommendation system for software refactoring using innovization and interactive dynamic optimization. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 331–336. ACM Mkaouer MW, Kessentini M, Bechikh S, Deb K, Ó Cinnéide M (2014) Recommendation system for software refactoring using innovization and interactive dynamic optimization. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 331–336. ACM
27.
Zurück zum Zitat Mkaouer MW, Kessentini M, Cinnéide MÓ, Hayashi S, Deb K (2017) A robust multi-objective approach to balance severity and importance of refactoring opportunities. Emp Softw Eng 22(2):894–927CrossRef Mkaouer MW, Kessentini M, Cinnéide MÓ, Hayashi S, Deb K (2017) A robust multi-objective approach to balance severity and importance of refactoring opportunities. Emp Softw Eng 22(2):894–927CrossRef
28.
Zurück zum Zitat Mund S (2015) Microsoft azure machine learning. Packt Publishing Ltd Mund S (2015) Microsoft azure machine learning. Packt Publishing Ltd
29.
Zurück zum Zitat Murphy-Hill E, Parnin C, Black AP (2011) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18CrossRef Murphy-Hill E, Parnin C, Black AP (2011) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18CrossRef
30.
Zurück zum Zitat Opdyke WF (1992) Refactoring object-oriented frameworks. University of Illinois at Urbana-Champaign, Champaign Opdyke WF (1992) Refactoring object-oriented frameworks. University of Illinois at Urbana-Champaign, Champaign
31.
Zurück zum Zitat Pan B, Tian Y, Zhou TS, Wang F, Li JS (2015) Study on image encryption method in clinical data exchange. In: 2015 7th international conference on information technology in medicine and education (ITME), pp 252–255. IEEE Pan B, Tian Y, Zhou TS, Wang F, Li JS (2015) Study on image encryption method in clinical data exchange. In: 2015 7th international conference on information technology in medicine and education (ITME), pp 252–255. IEEE
32.
Zurück zum Zitat Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, pp 35–38. ACM Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, pp 35–38. ACM
34.
Zurück zum Zitat Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan S (2009) Hash kernels for structured data. J Mach Learn Res 10:2615–2637MathSciNetMATH Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan S (2009) Hash kernels for structured data. J Mach Learn Res 10:2615–2637MathSciNetMATH
35.
Zurück zum Zitat Silva D, Valente MT (2017) Refdiff: detecting refactorings in version histories. In: Proceedings of the 14th international conference on mining software repositories, pp 269–279. IEEE Press Silva D, Valente MT (2017) Refdiff: detecting refactorings in version histories. In: Proceedings of the 14th international conference on mining software repositories, pp 269–279. IEEE Press
36.
Zurück zum Zitat Soares G, Gheyi R, Serey D, Massoni T (2010) Making program refactoring safer. IEEE Softw 27(4):52–57CrossRef Soares G, Gheyi R, Serey D, Massoni T (2010) Making program refactoring safer. IEEE Softw 27(4):52–57CrossRef
37.
Zurück zum Zitat Soetens QD, Perez J, Demeyer S (2013) An initial investigation into change-based reconstruction of floss-refactorings. In: 2013 IEEE international conference on software maintenance, pp 384–387. IEEE Soetens QD, Perez J, Demeyer S (2013) An initial investigation into change-based reconstruction of floss-refactorings. In: 2013 IEEE international conference on software maintenance, pp 384–387. IEEE
38.
Zurück zum Zitat Stroggylos K, Spinellis D (2007) Refactoring–does it improve software quality? In: 15th international workshop on software quality (WoSQ’07: ICSE workshops 2007), p 10. IEEE Stroggylos K, Spinellis D (2007) Refactoring–does it improve software quality? In: 15th international workshop on software quality (WoSQ’07: ICSE workshops 2007), p 10. IEEE
39.
Zurück zum Zitat Taneja K, Dig D, Xie T (2007) Automated detection of api refactorings in libraries. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 377–380. ACM Taneja K, Dig D, Xie T (2007) Automated detection of api refactorings in libraries. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, pp 377–380. ACM
40.
Zurück zum Zitat Thangthumachit S, Hayashi S, Saeki M (2011) Understanding source code differences by separating refactoring effects. In: 2011 18th Asia-Pacific software engineering conference, pp 339–347. IEEE Thangthumachit S, Hayashi S, Saeki M (2011) Understanding source code differences by separating refactoring effects. In: 2011 18th Asia-Pacific software engineering conference, pp 339–347. IEEE
41.
Zurück zum Zitat Tsantalis N, Chaikalis T, Chatzigeorgiou A (2008) Jdeodorant: identification and removal of type-checking bad smells. In: 2008 12th European conference on software maintenance and reengineering, pp 329–331. IEEE Tsantalis N, Chaikalis T, Chatzigeorgiou A (2008) Jdeodorant: identification and removal of type-checking bad smells. In: 2008 12th European conference on software maintenance and reengineering, pp 329–331. IEEE
42.
Zurück zum Zitat Tsantalis N, Mansouri M, Eshkevari L, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp 483–494. IEEE Tsantalis N, Mansouri M, Eshkevari L, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp 483–494. IEEE
43.
Zurück zum Zitat Weinberger K, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. ArXiv preprint arXiv:0902.2206 Weinberger K, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. ArXiv preprint arXiv:​0902.​2206
44.
Zurück zum Zitat Weissgerber P, Diehl S (2006) Identifying refactorings from source-code changes. In: 21st IEEE/ACM international conference on automated software engineering (ASE’06), pp 231–240. IEEE Weissgerber P, Diehl S (2006) Identifying refactorings from source-code changes. In: 21st IEEE/ACM international conference on automated software engineering (ASE’06), pp 231–240. IEEE
45.
Zurück zum Zitat Xing Z, Stroulia E (2005) Umldiff: an algorithm for object-oriented design differencing. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, pp 54–65. ACM Xing Z, Stroulia E (2005) Umldiff: an algorithm for object-oriented design differencing. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, pp 54–65. ACM
46.
Zurück zum Zitat Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, p 35 Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, p 35
Metadaten
Titel
On the use of textual feature extraction techniques to support the automated detection of refactoring documentation
verfasst von
Licelot Marmolejos
Eman Abdullah AlOmar
Mohamed Wiem Mkaouer
Christian Newman
Ali Ouni
Publikationsdatum
07.03.2021
Verlag
Springer London
Erschienen in
Innovations in Systems and Software Engineering / Ausgabe 2/2022
Print ISSN: 1614-5046
Elektronische ISSN: 1614-5054
DOI
https://doi.org/10.1007/s11334-021-00388-5

Weitere Artikel der Ausgabe 2/2022

Innovations in Systems and Software Engineering 2/2022 Zur Ausgabe

Premium Partner