Skip to main content
Top
Published in: Empirical Software Engineering 2/2023

01-03-2023

Refactoring practices in the context of data-intensive systems

Authors: Biruk Asmare Muse, Foutse Khomh, Giuliano Antoniol

Published in: Empirical Software Engineering | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Developers often refactor code to improve the maintainability and comprehension of the software. There are many studies on refactoring activities in traditional software systems. However, refactoring in data-intensive systems is not well explored. Understanding the refactoring practices of developers is important to develop efficient tool support. We conducted a longitudinal study of refactoring activities in data-access classes using 29 SQL and NoSQL database based data-intensive systems. We investigated the prevalence, co-occurrence, and evolution of data-access refactorings, and the association of data-access refactorings with data-access smells. We also conducted a manual analysis of 500 samples of data-access refactoring instances to identify the functionalities of the code that are targeted by such refactorings. Furthermore, we analyzed 500 sample data-access refactoring commits to understand the context behind the applied refactorings and explored the characteristics and contribution of developers involved in the refactorings. We also conducted a developer survey to complement our analysis on the subject systems. Our results show that data-access refactorings are prevalent and different in type. Most of the data-access refactorings target codes that implement data fetching and insertion, but they mostly do not modify data-access queries. Most of the data-access refactorings are done when adding or modifying features and during bug fixes. data-access refactoring is often performed by developers with higher development and refactoring experience. Overall, the results show that data-access refactorings focus on improving the code quality but not optimizing the underlying data-access operations by fixing data-access smells. Hence, more work is needed from the research community on providing awareness and support to practitioners on the benefits of addressing data-access smells with refactorings.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216 Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216
go back to reference Agrawal R, Srikant R, et al. (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol 1215. Citeseer, pp 487–499 Agrawal R, Srikant R, et al. (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol 1215. Citeseer, pp 487–499
go back to reference de Almeida Filho FG, Martins ADF, Vinuto TdS, Monteiro JM, de Sousa ÍP, de Castro Machado J, Rocha LS (2019) Prevalence of bad smells in PL/SQL projects. In: Proceedings of the 27th international conference on program comprehension. IEEE Press, pp 116–121 de Almeida Filho FG, Martins ADF, Vinuto TdS, Monteiro JM, de Sousa ÍP, de Castro Machado J, Rocha LS (2019) Prevalence of bad smells in PL/SQL projects. In: Proceedings of the 27th international conference on program comprehension. IEEE Press, pp 116–121
go back to reference Alomar EA, Peruma A, Mkaouer MW, Newman C, Ouni A, Kessentini M (2021) How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl 167(114):176 Alomar EA, Peruma A, Mkaouer MW, Newman C, Ouni A, Kessentini M (2021) How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Syst Appl 167(114):176
go back to reference Arzamasova N, Schäler M, Böhm K (2018) Cleaning antipatterns in an SQL query log. IEEE Trans Knowl Data Eng 30(3):421–434CrossRef Arzamasova N, Schäler M, Böhm K (2018) Cleaning antipatterns in an SQL query log. IEEE Trans Knowl Data Eng 30(3):421–434CrossRef
go back to reference Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pp 255–264 Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pp 255–264
go back to reference Chávez A, Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes? a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, pp 74–83 Chávez A, Ferreira I, Fernandes E, Cedrim D, Garcia A (2017) How does refactoring affect internal quality attributes? a multi-project study. In: Proceedings of the 31st Brazilian symposium on software engineering, pp 74–83
go back to reference Chen CP, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: A survey on big data. Inf Sci 275:314–347CrossRef Chen CP, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: A survey on big data. Inf Sci 275:314–347CrossRef
go back to reference Cramer H (1946) Mathematical methods of statistics. Princeton U, Press, Princeton, p 500MATH Cramer H (1946) Mathematical methods of statistics. Princeton U, Press, Princeton, p 500MATH
go back to reference Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming. Springer, pp 404–428 Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming. Springer, pp 404–428
go back to reference Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, pp 313–324 Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, pp 313–324
go back to reference Ferreira I, Fernandes E, Cedrim D, Uchôa A, Bibiano AC, Garcia A, Correia JL, Santos F, Nunes G, Barbosa C et al (2018) The buggy side of code refactoring: Understanding the relationship between refactorings and bugs. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp 406–407 Ferreira I, Fernandes E, Cedrim D, Uchôa A, Bibiano AC, Garcia A, Correia JL, Santos F, Nunes G, Barbosa C et al (2018) The buggy side of code refactoring: Understanding the relationship between refactorings and bugs. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp 406–407
go back to reference Foidl H, Felderer M, Ramler R (2022) Data smells: categories, causes and consequences, and detection of suspicious data in ai-based systems. In: Crnkovic I (ed) Proceedings of the 1st international conference on AI engineering: software engineering for AI, CAIN 2022, Pittsburgh, Pennsylvania, May 16-24, 2022. https://doi.org/10.1145/3522664.3528590. ACM, pp 229–239 Foidl H, Felderer M, Ramler R (2022) Data smells: categories, causes and consequences, and detection of suspicious data in ai-based systems. In: Crnkovic I (ed) Proceedings of the 1st international conference on AI engineering: software engineering for AI, CAIN 2022, Pittsburgh, Pennsylvania, May 16-24, 2022. https://​doi.​org/​10.​1145/​3522664.​3528590. ACM, pp 229–239
go back to reference Fowler M (2002) Refactoring: Improving the design of existing code. Extreme Program Agile Methods–XP/Agil Universe 2002:256CrossRefMATH Fowler M (2002) Refactoring: Improving the design of existing code. Extreme Program Agile Methods–XP/Agil Universe 2002:256CrossRefMATH
go back to reference Hummel O, Eichelberger H, Giloj A, Werle D, Schmid K (2018) A collection of software engineering challenges for big data system development. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2018.00066, pp 362–369 Hummel O, Eichelberger H, Giloj A, Werle D, Schmid K (2018) A collection of software engineering challenges for big data system development. In: 2018 44th Euromicro conference on software engineering and advanced applications (SEAA). https://​doi.​org/​10.​1109/​SEAA.​2018.​00066, pp 362–369
go back to reference Hummel O, Eichelberger H, Giloj A, Werle D, Schmid K (2018) A collection of software engineering challenges for big data system development. In: 2018 44th euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 362–369 Hummel O, Eichelberger H, Giloj A, Werle D, Schmid K (2018) A collection of software engineering challenges for big data system development. In: 2018 44th euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 362–369
go back to reference Iammarino M, Zampetti F, Aversano L, Di Penta M (2019) Self-admitted technical debt removal and refactoring actions: Co-occurrence or more?. In: 2019 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 186–190 Iammarino M, Zampetti F, Aversano L, Di Penta M (2019) Self-admitted technical debt removal and refactoring actions: Co-occurrence or more?. In: 2019 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 186–190
go back to reference Karwin B (2010) SQL Antipatterns: Avoiding the pitfalls of database programming Pragmatic Bookshelf Karwin B (2010) SQL Antipatterns: Avoiding the pitfalls of database programming Pragmatic Bookshelf
go back to reference Khumnin P, Senivongse T (2017) SQL antipatterns detection and database refactoring process. In: 2017 18th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp 199–205 Khumnin P, Senivongse T (2017) SQL antipatterns detection and database refactoring process. In: 2017 18th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp 199–205
go back to reference Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT international symposium on Foundations of software engineering, pp 371–372 Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-finder: a refactoring reconstruction tool based on logic query templates. In: Proceedings of the 18th ACM SIGSOFT international symposium on Foundations of software engineering, pp 371–372
go back to reference Kurtanović Z, Maalej W (2018) On user rationale in software engineering. Requir Eng 23(3):357–379CrossRef Kurtanović Z, Maalej W (2018) On user rationale in software engineering. Requir Eng 23(3):357–379CrossRef
go back to reference Mahmoudi M, Nadi S, Tsantalis N (2019) Are refactorings to blame? an empirical study of refactorings in merge conflicts. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 151–162 Mahmoudi M, Nadi S, Tsantalis N (2019) Are refactorings to blame? an empirical study of refactorings in merge conflicts. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 151–162
go back to reference McDonald N, Goggins S (2013) Performance and participation in open source software on Github. In: CHI’13 extended abstracts on human factors in computing systems, pp 139–144 McDonald N, Goggins S (2013) Performance and participation in open source software on Github. In: CHI’13 extended abstracts on human factors in computing systems, pp 139–144
go back to reference Meurice L, Nagy C, Cleve A (2016) Static analysis of dynamic database usage in Java systems. In: International conference on advanced information systems engineering. Springer, pp 491–506 Meurice L, Nagy C, Cleve A (2016) Static analysis of dynamic database usage in Java systems. In: International conference on advanced information systems engineering. Springer, pp 491–506
go back to reference Murphy GC, Kersten M, Findlater L (2006) How are java software developers using the elipse ide? IEEE Softw 23(4):76–83CrossRef Murphy GC, Kersten M, Findlater L (2006) How are java software developers using the elipse ide? IEEE Softw 23(4):76–83CrossRef
go back to reference Muse BA, Khomh F, Antoniol G (2022) Do developers refactor data access code? an empirical study. In: the 29th IEEE international conference on software analysis, evolution and reengineering(SANER) Muse BA, Khomh F, Antoniol G (2022) Do developers refactor data access code? an empirical study. In: the 29th IEEE international conference on software analysis, evolution and reengineering(SANER)
go back to reference Muse BA, Rahman MM, Nagy C, Cleve A, Khomh F, Antoniol G (2020) On the prevalence, impact, and evolution of SQL code smells in data-intensive systems. In: Proceedings of the 17th international conference on mining software repositories, pp 327–338 Muse BA, Rahman MM, Nagy C, Cleve A, Khomh F, Antoniol G (2020) On the prevalence, impact, and evolution of SQL code smells in data-intensive systems. In: Proceedings of the 17th international conference on mining software repositories, pp 327–338
go back to reference Nagy C, Cleve A (2017) A static code smell detector for SQL queries embedded in Java code. In: 2017 IEEE 17th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 147–152 Nagy C, Cleve A (2017) A static code smell detector for SQL queries embedded in Java code. In: 2017 IEEE 17th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 147–152
go back to reference Nagy C, Cleve A (2018) SQLInspect: A static analyzer to inspect database usage in Java applications. In: Proceedings of the 40th international conference on software engineering: companion proceedings. ACM, pp 93–96 Nagy C, Cleve A (2018) SQLInspect: A static analyzer to inspect database usage in Java applications. In: Proceedings of the 40th international conference on software engineering: companion proceedings. ACM, pp 93–96
go back to reference Park B, Rao DL, Gudivada VN (2021) Dangers of bias in data-intensive information systems. In: Deshpande P, Abraham A, Iyer B, Ma K (eds) Next generation information processing system. Springer Singapore, Singapore, pp 259–271 Park B, Rao DL, Gudivada VN (2021) Dangers of bias in data-intensive information systems. In: Deshpande P, Abraham A, Iyer B, Ma K (eds) Next generation information processing system. Springer Singapore, Singapore, pp 259–271
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetMATH
go back to reference Perez-Palacin D, Merseguer J, Requeno JI, Guerriero M, Di Nitto E, Tamburri DA (2019) A UML profile for the design, quality assessment and deployment of data-intensive applications. Softw Syst Model 18(6):3577–3614CrossRef Perez-Palacin D, Merseguer J, Requeno JI, Guerriero M, Di Nitto E, Tamburri DA (2019) A UML profile for the design, quality assessment and deployment of data-intensive applications. Softw Syst Model 18(6):3577–3614CrossRef
go back to reference Peruma A (2019) A preliminary study of android refactorings. In: 2019 IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 148–149 Peruma A (2019) A preliminary study of android refactorings. In: 2019 IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 148–149
go back to reference Peruma A, Mkaouer MW, Decker MJ, Newman CD (2018) An empirical investigation of how and why developers rename identifiers. In: Proceedings of the 2nd international workshop on refactoring, pp 26–33 Peruma A, Mkaouer MW, Decker MJ, Newman CD (2018) An empirical investigation of how and why developers rename identifiers. In: Proceedings of the 2nd international workshop on refactoring, pp 26–33
go back to reference Peruma A, Mkaouer MW, Decker MJ, Newman CD (2020) Contextualizing rename decisions using refactorings, commit messages, and data types. J Syst Softw 169(110):704 Peruma A, Mkaouer MW, Decker MJ, Newman CD (2020) Contextualizing rename decisions using refactorings, commit messages, and data types. J Syst Softw 169(110):704
go back to reference Piatetsky S, Frawley G, William J (1991) Discovery, analysis and presentation of strong rules knowledge discovery in databases Piatetsky S, Frawley G, William J (1991) Discovery, analysis and presentation of strong rules knowledge discovery in databases
go back to reference Sharma T, Fragkoulis M, Rizou S, Bruntink M, Spinellis D (2018) Smelly relations: Measuring and understanding database schema quality. In: 2018 IEEE/ACM 40th international conference on software engineering: software engineering in practice track (ICSE-SEIP), pp 55–64 Sharma T, Fragkoulis M, Rizou S, Bruntink M, Spinellis D (2018) Smelly relations: Measuring and understanding database schema quality. In: 2018 IEEE/ACM 40th international conference on software engineering: software engineering in practice track (ICSE-SEIP), pp 55–64
go back to reference Shome A, Cruz L, van Deursen A (2022) Data smells in public datasets. In: Crnkovic I (ed) Proceedings of the 1st international conference on AI engineering: software engineering for AI, CAIN 2022, Pittsburgh, Pennsylvania, May 16-24, 2022. https://doi.org/10.1145/3522664.3528621. ACM, pp 205–216 Shome A, Cruz L, van Deursen A (2022) Data smells in public datasets. In: Crnkovic I (ed) Proceedings of the 1st international conference on AI engineering: software engineering for AI, CAIN 2022, Pittsburgh, Pennsylvania, May 16-24, 2022. https://​doi.​org/​10.​1145/​3522664.​3528621. ACM, pp 205–216
go back to reference Silva D, Silva J, Santos GJDS, Terra R, Valente MTO (2020) Refdiff 2.0: A multi-language refactoring detection tool. IEEE Trans Softw Eng Silva D, Silva J, Santos GJDS, Terra R, Valente MTO (2020) Refdiff 2.0: A multi-language refactoring detection tool. IEEE Trans Softw Eng
go back to reference Silva D, Tsantalis N, Valente MT (2016) Why we refactor? confessions of Github contributors. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 858–870 Silva D, Tsantalis N, Valente MT (2016) Why we refactor? confessions of Github contributors. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 858–870
go back to reference Vassallo C, Grano G, Palomba F, Gall HC, Bacchelli A (2019) A large-scale empirical exploration on refactoring activities in open source software projects. Sci Comput Program 180:1–15CrossRef Vassallo C, Grano G, Palomba F, Gall HC, Bacchelli A (2019) A large-scale empirical exploration on refactoring activities in open source software projects. Sci Comput Program 180:1–15CrossRef
go back to reference Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media, BerlinCrossRefMATH Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media, BerlinCrossRefMATH
go back to reference Zhou C, Kuttal SK, Ahmed I (2018) What makes a good developer? an empirical study of developers’ technical and social competencies. In: 2018 IEEE symposium on visual languages and human-centric computing (VL/HCC). IEEE, pp 319–321 Zhou C, Kuttal SK, Ahmed I (2018) What makes a good developer? an empirical study of developers’ technical and social competencies. In: 2018 IEEE symposium on visual languages and human-centric computing (VL/HCC). IEEE, pp 319–321
Metadata
Title
Refactoring practices in the context of data-intensive systems
Authors
Biruk Asmare Muse
Foutse Khomh
Giuliano Antoniol
Publication date
01-03-2023
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 2/2023
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-022-10271-x

Other articles of this Issue 2/2023

Empirical Software Engineering 2/2023 Go to the issue

Premium Partner