Abstract
Refactoring and, in particular, remodularization operations can be performed to repair the design of a software system and remove the erosion caused by software evolution. Various approaches have been proposed to support developers during the remodularization of a software system. Most of these approaches are based on the underlying assumption that developers pursue an optimal balance between cohesion and coupling when modularizing the classes of their systems. Thus, a remodularization recommender proposes a solution that implicitly provides a (near) optimal balance between such quality attributes. However, there is still no empirical evidence that such a balance is the desideratum by developers. This article aims at analyzing both objectively and subjectively the aforementioned phenomenon. Specifically, we present the results of (1) a large study analyzing the modularization quality, in terms of package cohesion and coupling, of 100 open-source systems, and (2) a survey conducted with 29 developers aimed at understanding the driving factors they consider when performing modularization tasks. The results achieved have been used to distill a set of lessons learned that might be considered to design more effective remodularization recommenders.
- H. Abdeen, S. Ducasse, and H. A. Sahraoui. 2011. Modularization metrics: Assessing package organization in legacy large object-oriented software. In Proceedings of the 18th Working Conference on Reverse Engineering (WCRE’11). 394--398. Google ScholarDigital Library
- H. Abdeen, H. Sahraoui, O. Shata, N. Anquetil, and S. Ducasse. 2013. Towards automatically improving package structure while respecting original design decisions. In Proceedings of the 2013 20th Working Conference on Reverse Engineering (WCRE’13). 212--221. DOI:http://dx.doi.org/10.1109/WCRE.2013.6671296Google ScholarCross Ref
- N. Anquetil and T. Lethbridge. 1999. Experiments with clustering as a software remodularization method. In Proceedings of the 7th Working Conference on Reverse Engineering (WCRE'99). 235--255. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern Information Retrieval. Addison-Wesley. Google ScholarDigital Library
- G. Bavota, F. Carnevale, A. De Lucia, M. Di Penta, and R. Oliveto. 2012. Putting the developer in-the-loop: An interactive GA for software re-modularization. In Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE’12). 75--89. Google ScholarDigital Library
- G. Bavota, B. Dit, R. Oliveto, M. Di Penta, D. Poshyvanyk, and A. De Lucia. 2013. An empirical study on the developers’ perception of software coupling. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE/ACM, 692--701. Google ScholarDigital Library
- G. Bavota, M. Gethers, R. Oliveto, D. Poshyvanyk, and A. De Lucia. 2014. Improving software modularization via automated analysis of latent topics and dependencies. ACM Transactions on Software Engineering and Methodology 23, 1 (2014), 4. Google ScholarDigital Library
- G. Bavota, A. De Lucia, A. Marcus, and R. Oliveto. 2013. Using structural and semantic measures to improve software modularization. Empirical Software Engineering 18, 5 (2013), 901--932.Google ScholarCross Ref
- G. Bavota, A. De Lucia, A. Marcus, and R. Oliveto. 2014. Recommending refactoring operations in large software systems. In Recommendation Systems in Software Engineering. 387--419.Google Scholar
- G. Bavota, R. Oliveto, M. Gethers, D. Poshyvanyk, and A. De Lucia. 2013. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering 99, PrePrints (2013), 1. DOI:http://dx.doi.org/10.1109/TSE.2013.60 Google ScholarDigital Library
- G. Bavota, S. Panichella, N. Tsantalis, M. Di Penta, R. Oliveto, and G. Canfora. 2014a. Recommending refactorings based on team co-maintenance patterns. In Proceedings of the ACM/IEEE International Conference on Automated Software Engineering (ASE’14). 337--342. Google ScholarDigital Library
- G. Bavota, M. Penta, and R. Oliveto. 2014b. Search based software maintenance: Methods and tools. In Evolving Software Systems. Springer, Berlin.Google Scholar
- F. Beck and S. Diehl. 2011. On the congruence of modularity and code coupling. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). 354--364. Google ScholarDigital Library
- F. Beck and S. Diehl. 2013. On the impact of software evolution on software clustering. Empirical Software Engineering 18, 5 (2013), 970--1004.Google ScholarCross Ref
- F. Brito e Abreu and M. Goulao. 2001. Coupling and cohesion as modularization drivers: Are we being over-persuaded? In Proceedings of the 5th European Conference on Software Maintenance and Reengineering, 2001. 47--57. Google ScholarDigital Library
- I. Candela, G. Bavota, B. Russo, and R. Oliveto. 2015. Using Cohesion and Coupling for Software Remodularization: Is it Enough? -- Replication Package. (2015). http://tinyurl.com/pm5ez7m.Google Scholar
- G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella. 2012. Who is going to mentor newcomers in open source projects? In Proceedings of the 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-20) (SIGSOFT/FSE’12). ACM, 44. Google ScholarDigital Library
- S. R. Chidamber and C. F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering (TSE) 20, 6 (June 1994), 476--493. Google ScholarDigital Library
- J. Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Earlbaum Associates.Google Scholar
- A. Corazza, S. Di Martino, and G. Scanniello. 2010a. A probabilistic based approach towards software system clustering. In Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering (CSMR’10). 88--96. Google ScholarDigital Library
- A. Corazza, S. Di Martino, V. Maggio, and G. Scanniello. 2010b. Investigating the use of lexical information for software system clustering. In Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR’10). 35--44. Google ScholarDigital Library
- W. Cunningham. 1993. The WyCash portfolio management system. OOPS Messenger 4, 2 (1993), 29--30. DOI:http://dx.doi.org/10.1145/157710.157715 Google ScholarDigital Library
- B. Dagenais, H. Ossher, R. K. E. Bellamy, M. P. Robillard, and J. de Vries. 2010. Moving into a new software project landscape. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE’10). ACM, 275--284. Google ScholarDigital Library
- M. B. de Oliveira, F. de Almeida Farzat, and G. H. Travassos. 2015. Learning from optimization: A case study with apache ant. Information and Software Technology 57 (2015), 684--704. DOI:http://dx.doi.org/10.1016/j.infsof.2014.07.015Google ScholarCross Ref
- M. de Oliveira Barros. 2012. An analysis of the effects of composite objectives in multiobjective software module clustering. In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation (GECCO’12). 1205--1212. Google ScholarDigital Library
- M. de Oliveira Barros. 2013. An experimental study on incremental search-based software engineering. In Search Based Software Engineering. Lecture Notes in Computer Science, Vol. 8084. Springer, Berlin, 34--49. DOI:http://dx.doi.org/10.1007/978-3-642-39742-4_5 Google ScholarDigital Library
- K. Deb. 2001. Multi-Objective Optimization Using Evolutionary Algorithms. Wiley. Google ScholarDigital Library
- K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197. DOI:http://dx.doi.org/10.1109/4235.996017 Google ScholarDigital Library
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391--407.Google ScholarCross Ref
- F. DeRemer and H. H. Kron. 1976. Programming in the large versus programming in the small. IEEE Transactions on Software Engineering 2, 2 (1976), 80--86. Google ScholarDigital Library
- D. Doval, S. Mancoridis, and B. S. Mitchell. 1999. Automatic clustering of software systems using a genetic algorithm. In Proceedings of the Software Technology and Engineering Practice (STEP’99). IEEE Computer Society, 73--82. Google ScholarDigital Library
- S. Ducasse, D. Pollet, M. Suen, H. Abdeen, and I. Alloui. 2007. Package surface blueprints: Visually supporting the understanding of package relationships. In Proceedings of the IEEE International Conference on Software Maintenance, 2007 (ICSM’07). 94--103.Google Scholar
- S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mockus. 2001. Does code decay? Assessing the evidence from change management data. IEEE Transactions on Software Engineering 27, 1 (Jan. 2001), 1--12. Google ScholarDigital Library
- M. Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston, MA. Google ScholarDigital Library
- H. Gall, K. Hajek, and M. Jazayeri. 1998. Detection of logical coupling based on product release history. In Proceedings of the 14th International Conference on Software Maintenance. IEEE CS Press, 190--198. Google ScholarDigital Library
- H. Gall, M. Jazayeri, and J. Krajewski. 2003. CVS release history data for detecting logical couplings. In Proceedings of the 6th International Workshop on Principle of Software Evolution. IEEE CS Press, 13--23. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman. Google ScholarDigital Library
- D. E. Goldberg. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co. Google ScholarDigital Library
- R. M. Groves. 2009. Survey Methodology (2nd ed.). Wiley.Google Scholar
- M. Hall, M. Khojaye, N. Walkinshaw, and P. McMinn. 2014. Establishing the source code disruption caused by automated remodularization tools. In Proceedings of the 2014 30th IEEE International Conference on Software Maintenance and Evolution (ICSME’14). Google ScholarDigital Library
- M. Hall, N. Walkinshaw, and P. McMinn. 2012. Supervised software modularisation. In Proceedings of the 2012 28th IEEE International Conference on Software Maintenance (ICSM’12). 472--481. Google ScholarDigital Library
- M. Harman, S. Swift, and K. Mahdavi. 2005. An empirical study of the robustness of two module clustering fitness functions. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation (GECCO’05). ACM, New York, NY, 1029--1036. DOI:http://dx.doi.org/10.1145/1068009.1068184 Google ScholarDigital Library
- A. Kuhn, S. Ducasse, and T. Gîrba. 2007. Semantic clustering: Identifying topics in source code. Information and Software Technology 49, 3 (2007), 230--243. Google ScholarDigital Library
- M. Lanza and R. Marinescu. 2006. Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer. Google ScholarDigital Library
- S. Mancoridis, B. S. Mitchell, C. Rorres, Y.-F. Chen, and E. R. Gansner. 1998. Using automatic clustering to produce high-level system organizations of source code. In Proceedings of the 6th International Workshop on Program Comprehension (IWPC'98). 45--52. Google ScholarDigital Library
- O. Maqbool and H. A. Babri. 2007. Hierarchical clustering for software architecture recovery. IEEE Transactions on Software Engineering 33, 11 (2007), 759--780. Google ScholarDigital Library
- A. Marcus, D. Poshyvanyk, and R. Ferenc. 2008. Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Transactions on Software Engineering 34, 2 (2008), 287--300. Google ScholarDigital Library
- Z. Michalewicz and D. B. Fogel. 2004. How to Solve It: Modern Heuristics (2nd ed.). SV, Berlin, Germany. Google ScholarDigital Library
- B. S. Mitchell and S. Mancoridis. 2001. Comparing the decompositions produced by software clustering algorithms using similarity measurements. In Proceedings of the IEEE International Conference on Software Maintenance, 2001. 744--753. Google ScholarDigital Library
- B. S. Mitchell and S. Mancoridis. 2006. On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering 32, 3 (2006), 193--208. Google ScholarDigital Library
- W. Mkaouer, M. Kessentini, A. Shaout, P. Koligheu, S. Bechikh, K. Deb, and A. Ouni. 2015. Many-objective software remodularization using NSGA-III. ACM Transactions on Software Engineering Methodology 24, 3 (May 2015), 17:1--17:45. Google ScholarDigital Library
- O. Nierstrasz, S. Ducasse, and S. Demeyer. 2003. Object-Oriented Reengineering Patterns. Morgan Kaufmann Publishers. Google ScholarDigital Library
- D. L. Parnas. 1994. Software aging. In Proceedings of the 16th International Conference on Software Engineering. IEEE Computer Society/ACM Press, 279--287. Google ScholarDigital Library
- M. F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130--137.Google ScholarCross Ref
- D. Poshyvanyk and A. Marcus. 2006. The conceptual coupling metrics for object-oriented systems. In Proceedings of the 22nd IEEE International Conference on Software Maintenance (ICSM'06). 469--478. Google ScholarDigital Library
- K. Praditwong, M. Harman, and X. Yao. 2011. Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering 37, 2 (2011), 264--282. Google ScholarDigital Library
- G. Scanniello, A. D’Amico, C. D’Amico, and T. D’Amico. 2010. Using the Kleinberg algorithm and vector space model for software system clustering. In Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension (ICPC’10). 180--189. Google ScholarDigital Library
- M. Siff and T. W. Reps. 1997. Identifying modules via concept analysis. In ICSM. 170--179. Google ScholarDigital Library
- C. Simons, J. Singer, and D. R. White. 2015. Search-based refactoring: Metrics are not enough. In Search-Based Software Engineering. Lecture Notes in Computer Science, Vol. 9275. Springer International Publishing, 47--61. DOI:http://dx.doi.org/10.1007/978-3-319-22183-0_4Google Scholar
- E. Burton Swanson. 1976. The dimensions of maintenance. In Proceedings of the 2nd International Conference on Software Engineering. 492--497. Google ScholarDigital Library
- C. Taube-Schock, R. J. Walker, and I. H. Witten. 2011. Can we avoid high coupling? In 2011 Object-Oriented Programming (ECOOP’11), Mira Mezini (Ed.). Lecture Notes in Computer Science, Vol. 6813. Springer Berlin Heidelberg, 204--228. Google ScholarDigital Library
- A. Tucker, S. Swift, and X. Liu. 2001. Variable grouping in multivariate time series via correlation. IEEE Transactions on Systems, Man, and Cybernetics 31, 2 (2001), 235245. Google ScholarDigital Library
- Z. Wen and V. Tzerpos. 2004. An effectiveness measure for software clustering algorithms. In Proceedings of the 12th IEEE International Workshop on Program Comprehension. 194--203. Google ScholarDigital Library
- T. A. Wiggerts. 1997. Using clustering algorithms in legacy systems remodularization. In Proceedings of the 4th Working Conference on Reverse Engineering (WCRE’97). IEEE Computer Society, 33. Google ScholarDigital Library
- J. Wu, A. E. Hassan, and R. C. Holt. 2005. Comparison of clustering algorithms in the context of software evolution. In Proceedings of the 21st IEEE International Conference on Software Maintenance, 2005 (ICSM’05). 525--535. Google ScholarDigital Library
- J. H. Zar. 1972. Significance testing of the Spearman rank correlation coefficient. Journal of the American Statistical Association 67, 339 (1972), 578--580.Google ScholarCross Ref
Index Terms
- Using Cohesion and Coupling for Software Remodularization: Is It Enough?
Recommendations
Many-Objective Software Remodularization Using NSGA-III
Software systems nowadays are complex and difficult to maintain due to continuous changes and bad design choices. To handle the complexity of systems, software products are, in general, decomposed in terms of packages/modules containing classes that are ...
Multi-dimensional information-driven many-objective software remodularization approach
AbstractMost of the search-based software remodularization (SBSR) approaches designed to address the software remodularization problem (SRP) areutilizing only structural information-based coupling and cohesion quality criteria. However, in practice apart ...
Software Remodularization Using Tabu Search
ICICSE '20: Proceedings of the 2020 International Conference on Internet Computing for Science and EngineeringOne desired quality attribute of software is software modularity which is a degree to which the software is decomposed into loosely couple packages. Each software package consists of highly cohesive classes. When the software is modified due to software ...
Comments