Abstract
Code cloning is a controversial software engineering practice due to contradictory claims regarding its effect on software maintenance. Code stability is a recently introduced measurement technique that has been used to determine the impact of code cloning by quantifying the changeability of a code region. Although most existing stability analysis studies agree that cloned code is more stable than non-cloned code, the studies have two major flaws: (i) each study only considered a single stability measurement (e.g., lines of code changed, frequency of change, age of change); and, (ii) only a small number of subject systems were analyzed and these were of limited variety.
In this paper, we present a comprehensive empirical study on code stability using four different stability measuring methods. We use a recently introduced hybrid clone detection tool, NiCAD, to detect the clones and analyze their stability in different dimensions: by clone type, by measuring method, by programming language, and by system size and age. Our in-depth investigation on 12 diverse subject systems written in three programming languages considering three types of clones reveals that: (i) cloned code is generally less stable than non-cloned code, and more specifically both Type-1 and Type-2 clones show higher instability than Type-3 clones; (ii) clones in both Java and C systems exhibit higher instability compared to the clones in C# systems; (iii) a system's development strategy might play a key role in defining its comparative code stability scenario; and, (iv) cloned and non-cloned regions of a subject system do not follow any consistent change pattern.
- Aversano, L., Cerulo, L., and Penta, M. D., "How clones are maintained: An empirical study", in Proc. The 11th European Conference on Software Maintenance and Reengineering (CSMR), 2007, pp. 81--90. Google ScholarDigital Library
- CCFinderX. http://www.ccfinder.net/ccfinderxos.htmlGoogle Scholar
- Cordy, J. R., and Roy, C. K., "The NiCad Clone Detector", in Proc. The Tool Demo Track of the 19th International Conference on Program Comprehension (ICPC), 2011, pp. 219--220. Google ScholarDigital Library
- Cordy, J. R., and Roy, C. K., "Tuning Research Tools for Scalability and Performance: The NICAD Experience", in Science of Computer Programming, 2012, 26 pp. (to appear)Google Scholar
- Fisher's Exact Test. http://in-silico.net/statistics/fisher_exact_test/2x3.Google Scholar
- Göde, N., and Harder, J., "Clone Stability", in Proc. The 15th European Conference on Software Maintenance and Reengineering (CSMR), 2011, pp. 65--74. Google ScholarDigital Library
- Hotta, K., Sano, Y., Higo, Y., and Kusumoto, S., "Is Duplicate Code More Frequently Modified than Non-duplicate Code in Software Evolution?: An Empirical Study on Open Source Software", in Proc. The Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), 2010, pp. 73--82 Google ScholarDigital Library
- Juergens, E., Deissenboeck, F., Hummel, B., and Wagner, S., "Do Code Clones Matter?", in Proc. The 31st International Conference on Software Engineering (ICSE), 2009, pp. 485--495. Google ScholarDigital Library
- Kapser, C., and Godfrey, M. W., ""Cloning considered harmful" considered harmful: patterns of cloning in software", in Journal of Empirical Software Engineering. 13(6), 2008, pp. 645--692. Google ScholarDigital Library
- Kim, M, Sazawal, V., Notkin, D., and Murphy, G. C., "An empirical study of code clone genealogies", in Proc. The joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE), 2005, pp. 187--196. Google ScholarDigital Library
- Krinke, J., "A study of consistent and inconsistent changes to code clones", in Proc. The 14th Working Conference on Reverse Engineering (WCRE), 2007, pp. 170--178. Google ScholarDigital Library
- Krinke, J., "Is cloned code more stable than non-cloned code?", in Proc. The 8th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), 2008, pp. 57--66.Google ScholarCross Ref
- Krinke, J., "Is Cloned Code older than Non-Cloned Code?", in Proc. The 5th International Workshop on Software Clones (IWSC), 2011, pp. 28--33. Google ScholarDigital Library
- Lozano, A., Wermelinger, M., and Nuseibeh, B., "Evaluating the Harmfulness of Cloning: A Change Based Experiment", in Proc. The 4th International Workshop on Mining Software Repositories (MSR), 2007, pp. 18--21. Google ScholarDigital Library
- Lozano, A., and Wermelinger, M., "Tracking clones' imprint", in Proc. The 4th International Workshop on Software Clones (IWSC), 2010, pp. 65--72. Google ScholarDigital Library
- Lozano, A., and Wermelinger, M., "Assessing the effect of clones on changeability", in Proc. The 24th IEEE International Conference on Software Maintenance (ICSM), 2008, pp. 227--236.Google ScholarCross Ref
- Mann-Whitney-Wilcoxon Test: http://elegans.som.vcu.edu/leon/stats/utest.htmlGoogle Scholar
- Mondal, M., Roy, C. K., Rahman, M. S., Saha, R. K., Krinke, J., and Schneider, K. A., "Comparative Stability of Cloned and Non-cloned Code: An Empirical Study", in Proc. The 27th Annual ACM Symposium on Applied Computing (SAC), 2012, pp. 1227--1234. Google ScholarDigital Library
- Mondal, M., Roy, C. K., and Schneider, K. A., "Dispersion of Changes in Cloned and Non-cloned Code", in Proc. The 6th International Workshop on Software Clones (IWSC), 2012, pp. 29--35.Google ScholarDigital Library
- Roy, C. K., and Cordy, J. R., "A mutation / injection-based automatic framework for evaluating code clone detection tools", in Proc. The IEEE International Conference on Software Testing, Verification, and Validation Workshops, 2009, pp. 157--166. Google ScholarDigital Library
- Roy, C. K., and Cordy, J. R., "NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization" in Proc. The 16th IEEE International Conference on Program Comprehension (ICPC), 2008, pp. 172--181. Google ScholarDigital Library
- Roy, C. K., Cordy, J. R., and Koschke, R., "Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach", in Science of Computer Programming, 74 (2009) 470--495, 2009. Google ScholarDigital Library
- Roy, C. K., and Cordy, J. R., "Near-miss Function Clones in Open Source Software: An Empirical Study", in Journal of Software Maintenance and Evolution: Research and Practice, 22(3), 2010, pp. 165--189. Google ScholarDigital Library
- Roy, C. K., and Cordy, J. R., "An Empirical Evaluation of Function Clones in Open Source Software", in Proc. The 15th Working Conference on Reverse Engineering (WCRE), 2008, pp. 81--90. Google ScholarDigital Library
- Roy, C. K., and Cordy, J. R., "Scenario-based Comparison of Clone Detection Techniques", in Proc. The 16th IEEE International Conference on Program Comprehension (ICPC), 2008, pp. 153--162. Google ScholarDigital Library
- Saha, R. K., Roy, C. K., and Schneider, K. A., "An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies", in Proc. The 27th IEEE International Conference on Software Maintenance (ICSM), 2011, pp. 293--302. Google ScholarDigital Library
- Saha, R. K., Asaduzzaman, M., Zibran, M. F., Roy, C. K., and Schneider, K. A., "Evaluating code clone genealogies at release level: An empirical study", in Proc. The 10th IEEE International Conference on Source Code Analysis and Manipulation (SCAM), 2010, pp. 87--96. Google ScholarDigital Library
- Thummalapenta, S., Cerulo, L., Aversano, L., and Penta, M. D., "An empirical study on the maintenance of source code clones", in Journal of Empirical Software Engineering (ESE), 15(1), 2009, pp. 1--34. Google Scholar
- Zibran, M. F., Saha, R. K., Asaduzzaman, M., and Roy, C. K., "Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study", in Proc. The 16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS), 2011, pp. 295--304. Google ScholarDigital Library
Index Terms
- An empirical study on clone stability
Recommendations
Comparative stability of cloned and non-cloned code: an empirical study
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingCode cloning is a controversial software engineering practice due to contradictory claims regarding its effect on software maintenance. Code stability is a recently introduced measurement technique that has been used to determine the impact of code ...
Dispersion of changes in cloned and non-cloned code
IWSC '12: Proceedings of the 6th International Workshop on Software ClonesCurrently, the impacts of clones in software maintenance activities are being investigated by different researchers in different ways. Comparative stability analysis of cloned and non-cloned regions of a subject system is a well-known way of measuring ...
A case study on applying clone technology to an industrial application framework
IWSC '12: Proceedings of the 6th International Workshop on Software ClonesDealing with clones is a common problem in large-scale software projects. While most of the research in this area is focused on detecting, and investigating the reasons of clones, little research has been done on the use of clone technology in large-...
Comments