Skip to main content
Erschienen in: Software Quality Journal 2/2011

01.06.2011

An extended assessment of type-3 clones as detected by state-of-the-art tools

verfasst von: Rebecca Tiarks, Rainer Koschke, Raimar Falke

Erschienen in: Software Quality Journal | Ausgabe 2/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human’s perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The semantic information is not needed here.
 
Literatur
Zurück zum Zitat Baker, B. S. (1995). On finding duplication and near-duplication in large software systems. In L. Wills, P. Newcomb, & E. Chikofsky (Eds.), Proceedings of WCRE (pp. 86–95). Baker, B. S. (1995). On finding duplication and near-duplication in large software systems. In L. Wills, P. Newcomb, & E. Chikofsky (Eds.), Proceedings of WCRE (pp. 86–95).
Zurück zum Zitat Balazinska, M., Merlo, E. M., Dagenais, M., Lague, B., & Kontogiannis, K. (1999). Measuring clone based reengineering opportunities. In IEEE symposium on software metrics (pp. 292–303). IEEE Computer Society Press. Balazinska, M., Merlo, E. M., Dagenais, M., Lague, B., & Kontogiannis, K. (1999). Measuring clone based reengineering opportunities. In IEEE symposium on software metrics (pp. 292–303). IEEE Computer Society Press.
Zurück zum Zitat Balazinska, M., Merlo, E., Dagenais, M., Lague, B., & Kontogiannis, K. (2000). Advanced clone-analysis to support object-oriented system refactoring. In WCRE (pp. 98–107). IEEE Computer Society Press. Balazinska, M., Merlo, E., Dagenais, M., Lague, B., & Kontogiannis, K. (2000). Advanced clone-analysis to support object-oriented system refactoring. In WCRE (pp. 98–107). IEEE Computer Society Press.
Zurück zum Zitat Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In T. M. Koshgoftaar & K. Bennett (Eds.), ICSM, (pp. 368–378). Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In T. M. Koshgoftaar & K. Bennett (Eds.), ICSM, (pp. 368–378).
Zurück zum Zitat Bellon, S., Koschke, R., Antoniol, G., Krinke, J., & Merlo, E. (2007). Comparison and evaluation of clone detection tools. IEEE Computer Society Transactions on Software Engineering, 33, 577–591. Bellon, S., Koschke, R., Antoniol, G., Krinke, J., & Merlo, E. (2007). Comparison and evaluation of clone detection tools. IEEE Computer Society Transactions on Software Engineering, 33, 577–591.
Zurück zum Zitat Chen, X., Kwong, S., & Li, M. (2000) A compression algorithm for dna sequences and its applications in genome comparison. In RECOMB ’00: Proceedings of the fourth annual international conference on computational molecular biology (p. 107). New York, NY, USA: ACM. doi:10.1145/332306.332352. Chen, X., Kwong, S., & Li, M. (2000) A compression algorithm for dna sequences and its applications in genome comparison. In RECOMB ’00: Proceedings of the fourth annual international conference on computational molecular biology (p. 107). New York, NY, USA: ACM. doi:10.​1145/​332306.​332352.
Zurück zum Zitat Ducasse, S., Rieger, M., & Demeyer, S. (1999). A language independent approach for detecting duplicated code. In ICSM ’99: Proceedings of the IEEE international conference on software maintenance (p. 109). Washington, DC, USA: IEEE Computer Society. Ducasse, S., Rieger, M., & Demeyer, S. (1999). A language independent approach for detecting duplicated code. In ICSM ’99: Proceedings of the IEEE international conference on software maintenance (p. 109). Washington, DC, USA: IEEE Computer Society.
Zurück zum Zitat Evans, W. S., Fraser, C. W., & Ma, F. (2007). Clone detection via structural abstraction. In WCRE (pp. 150–159). Evans, W. S., Fraser, C. W., & Ma, F. (2007). Clone detection via structural abstraction. In WCRE (pp. 150–159).
Zurück zum Zitat Frenzel, P., Koschke, R., Breu, A. P. J., & Angstmann, K. (2007). Extending the reflection method for consolidating software variants into product lines. In WCRE (pp. 160–169). IEEE Computer Society Press. Frenzel, P., Koschke, R., Breu, A. P. J., & Angstmann, K. (2007). Extending the reflection method for consolidating software variants into product lines. In WCRE (pp. 160–169). IEEE Computer Society Press.
Zurück zum Zitat Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2004). Aries: Refactoring support environment based on code clone analysis. In IASTED Conference on software engineering and applications (pp. 222–229). Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2004). Aries: Refactoring support environment based on code clone analysis. In IASTED Conference on software engineering and applications (pp. 222–229).
Zurück zum Zitat Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2007). Method and implementation for investigating code clones in a software system. Information and Software Technology, 49(9–10), 985–998.CrossRef Higo, Y., Kamiya, T., Kusumoto, S., & Inoue, K. (2007). Method and implementation for investigating code clones in a software system. Information and Software Technology, 49(9–10), 985–998.CrossRef
Zurück zum Zitat Jia, Y., Binkley, D., Harman, M., Krinke, J., & Matsushita, M. (2009) Kclone: A proposed approach to fast precise code clone detection. In Proceedings of CSMR’09 (pp. 12–16). Jia, Y., Binkley, D., Harman, M., Krinke, J., & Matsushita, M. (2009) Kclone: A proposed approach to fast precise code clone detection. In Proceedings of CSMR’09 (pp. 12–16).
Zurück zum Zitat Kamiya, T., Kusumoto, S., & Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Computer Society Transactions on Software Engineering, 28(7), 654–670.CrossRef Kamiya, T., Kusumoto, S., & Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Computer Society Transactions on Software Engineering, 28(7), 654–670.CrossRef
Zurück zum Zitat Kapser, C., Anderson, P., Godfrey, M., Koschke, R., Rieger, M., van Rysselberghe, F., & Weißgerber, P. (2007). Subjectivity in clone judgment: Can we ever agree? In Duplication, redundancy, and similarity in software, dagstuhl seminar proceedings, No. 06301. Kapser, C., Anderson, P., Godfrey, M., Koschke, R., Rieger, M., van Rysselberghe, F., & Weißgerber, P. (2007). Subjectivity in clone judgment: Can we ever agree? In Duplication, redundancy, and similarity in software, dagstuhl seminar proceedings, No. 06301.
Zurück zum Zitat Kapser, C. J., & Godfrey, M. W. (2003a) A taxonomy of clones in source code: The re-engineers most wanted list. In Proceedings of IWDSC’03. Kapser, C. J., & Godfrey, M. W. (2003a) A taxonomy of clones in source code: The re-engineers most wanted list. In Proceedings of IWDSC’03.
Zurück zum Zitat Kapser, C. J., & Godfrey, M. W. (2003b) Toward a taxonomy of clones in source code: A case study. In Evolution of large scale industrial software architectures (pp. 67–78). Kapser, C. J., & Godfrey, M. W. (2003b) Toward a taxonomy of clones in source code: A case study. In Evolution of large scale industrial software architectures (pp. 67–78).
Zurück zum Zitat Kapser, C. J., & Godfrey, M. W. (2006). Supporting the analysis of clones in software systems: Research articles. Journal of Software Maintenance and Evolution, 18(2), 61–82.CrossRef Kapser, C. J., & Godfrey, M. W. (2006). Supporting the analysis of clones in software systems: Research articles. Journal of Software Maintenance and Evolution, 18(2), 61–82.CrossRef
Zurück zum Zitat Koschke, R. (2007). Survey of research on software clones. In R. Koschke, E. Merlo, & A. Walenstein (Eds.), Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings. Koschke, R. (2007). Survey of research on software clones. In R. Koschke, E. Merlo, & A. Walenstein (Eds.), Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings.
Zurück zum Zitat Koschke, R. (2008a). Frontiers in software clone management. In Proceedings of the international conference on software maintenance. Koschke, R. (2008a). Frontiers in software clone management. In Proceedings of the international conference on software maintenance.
Zurück zum Zitat Koschke, R. (2008b). Identifying and removing software clones, chap. 2 (pp. 15–39). Berlin: Springer. Koschke, R. (2008b). Identifying and removing software clones, chap. 2 (pp. 15–39). Berlin: Springer.
Zurück zum Zitat Koschke, R., Girard, J. F., Würthner, M. (1998). Intermediate representations for reverse engineering. In WCRE (pp. 241–250). IEEE Computer Society Press. Koschke, R., Girard, J. F., Würthner, M. (1998). Intermediate representations for reverse engineering. In WCRE (pp. 241–250). IEEE Computer Society Press.
Zurück zum Zitat Koschke, R., Frenzel, P., Breu, A. P., & Angstmann, K. (2009). Extending the reflexion method for consolidating software variants into product lines. Software Quality Journal, 17(4), 331–366.CrossRef Koschke, R., Frenzel, P., Breu, A. P., & Angstmann, K. (2009). Extending the reflexion method for consolidating software variants into product lines. Software Quality Journal, 17(4), 331–366.CrossRef
Zurück zum Zitat Krinke, J. (2001). Identifying similar code with program dependence graphs. In WCRE (pp. 301–309). Krinke, J. (2001). Identifying similar code with program dependence graphs. In WCRE (pp. 301–309).
Zurück zum Zitat Li, M., Chen, X., Li, X., Ma, B., & Vitányi, P. M. B. (2004). The similarity metric. Transactions on Information Theory, 50(12), 3250–3264.CrossRefMathSciNet Li, M., Chen, X., Li, X., Ma, B., & Vitányi, P. M. B. (2004). The similarity metric. Transactions on Information Theory, 50(12), 3250–3264.CrossRefMathSciNet
Zurück zum Zitat Mayrand, J., Leblanc, C., & Merlo, E. (1996). Experiment on the automatic detection of function clones in a software system using metrics. In ICSM (p. 244). IEEE Computer Society. Mayrand, J., Leblanc, C., & Merlo, E. (1996). Experiment on the automatic detection of function clones in a software system using metrics. In ICSM (p. 244). IEEE Computer Society.
Zurück zum Zitat Mende, T., Beckwermert, F., Koschke, R., & Meier, G. (2008). Supporting the grow-and-prune model in software product lines evolution using clone detection. In European Conference on Software Maintenance and Reengineering (pp. 163–172). IEEE Computer Society Press. Mende, T., Beckwermert, F., Koschke, R., & Meier, G. (2008). Supporting the grow-and-prune model in software product lines evolution using clone detection. In European Conference on Software Maintenance and Reengineering (pp. 163–172). IEEE Computer Society Press.
Zurück zum Zitat Mende, T., Koschke, R., & Beckwermert, F. (2009). An evaluation of code similarity identification for the grow-and-prune model. Journal of Software Maintenance and Evolution: Research and Practice, 21(2), 143–169.CrossRef Mende, T., Koschke, R., & Beckwermert, F. (2009). An evaluation of code similarity identification for the grow-and-prune model. Journal of Software Maintenance and Evolution: Research and Practice, 21(2), 143–169.CrossRef
Zurück zum Zitat Nevill-Manning, C. G., & Witten, I. H. (1997). Linear-time, incremental hierarchy inference for compression. In DCC (pp. 3–11). Washington, DC, USA: IEEE Computer Society. Nevill-Manning, C. G., & Witten, I. H. (1997). Linear-time, incremental hierarchy inference for compression. In DCC (pp. 3–11). Washington, DC, USA: IEEE Computer Society.
Zurück zum Zitat Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Zurück zum Zitat Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. Technical report no. 2007-541. Ontario, Canada: School of Computing, Queen’s University at Kingston. Roy, C. K., & Cordy, J. R. (2007). A survey on software clone detection research. Technical report no. 2007-541. Ontario, Canada: School of Computing, Queen’s University at Kingston.
Zurück zum Zitat Roy, C. K., Cordy, J. R., & Koschke, R. (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Journal of Science of Computer Programming doi:10.1016/j.scico.2009.02.007, accepted for publication. Roy, C. K., Cordy, J. R., & Koschke, R. (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Journal of Science of Computer Programming doi:10.​1016/​j.​scico.​2009.​02.​007, accepted for publication.
Zurück zum Zitat Shasha, D., & Zhang, K. (1989). Fast parallel algorithms for the unit cost editing distance between trees. In SPAA ’89: Proceedings of the first annual ACM symposium on parallel algorithms and architectures (pp. 117–126). New York, NY, USA: ACM. doi:10.1145/72935.72949. Shasha, D., & Zhang, K. (1989). Fast parallel algorithms for the unit cost editing distance between trees. In SPAA ’89: Proceedings of the first annual ACM symposium on parallel algorithms and architectures (pp. 117–126). New York, NY, USA: ACM. doi:10.​1145/​72935.​72949.
Zurück zum Zitat Smith, R., & Horwitz, S. (2009). Detecting and measuring similarity in code clones. Smith, R., & Horwitz, S. (2009). Detecting and measuring similarity in code clones.
Zurück zum Zitat Tiarks, R., Koschke, R., & Falke, R. (2009). An assessment of type-3 clones as detected by state-of-the-art tools. In Workshop source code analysis and manipulation (pp. 67–76). IEEE Computer Society Press. Tiarks, R., Koschke, R., & Falke, R. (2009). An assessment of type-3 clones as detected by state-of-the-art tools. In Workshop source code analysis and manipulation (pp. 67–76). IEEE Computer Society Press.
Zurück zum Zitat Valiente, G. (2002). Algorithms on trees and graphs, 1st Ed.. New York: Springer.MATH Valiente, G. (2002). Algorithms on trees and graphs, 1st Ed.. New York: Springer.MATH
Zurück zum Zitat Walenstein, A. (2007). Code clones: Reconsidering terminology. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, No. 06301. Walenstein, A. (2007). Code clones: Reconsidering terminology. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, No. 06301.
Zurück zum Zitat Walenstein, A., Jyoti, N., Li, J., Yang, Y., & Lakhotia, A. (2003). Problems creating task-relevant clone detection reference data. In WCRE. IEEE Computer Society Press. Walenstein, A., Jyoti, N., Li, J., Yang, Y., & Lakhotia, A. (2003). Problems creating task-relevant clone detection reference data. In WCRE. IEEE Computer Society Press.
Zurück zum Zitat Walenstein, A., El-Ramly, M., Cordy, J. R., S W, Mahdavi, K., Pizka, M., Ramalingam, G., & von Gudenberg, J. W. (2007a). Similarity in programs. In Duplication, redundancy, and similarity in software. Walenstein, A., El-Ramly, M., Cordy, J. R., S W, Mahdavi, K., Pizka, M., Ramalingam, G., & von Gudenberg, J. W. (2007a). Similarity in programs. In Duplication, redundancy, and similarity in software.
Zurück zum Zitat Walenstein, A., Venable, M., Hayes, M., Thompson, C., & Lakhotia, A. (2007b) Exploiting similarity between variants to defeat malware. In Proceedings of BlackHat 2007 DC Briefings. Walenstein, A., Venable, M., Hayes, M., Thompson, C., & Lakhotia, A. (2007b) Exploiting similarity between variants to defeat malware. In Proceedings of BlackHat 2007 DC Briefings.
Zurück zum Zitat Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Scientific Computing, 18(6), 1245–1262. doi:10.1137/0218082.MATHMathSciNet Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Scientific Computing, 18(6), 1245–1262. doi:10.​1137/​0218082.MATHMathSciNet
Metadaten
Titel
An extended assessment of type-3 clones as detected by state-of-the-art tools
verfasst von
Rebecca Tiarks
Rainer Koschke
Raimar Falke
Publikationsdatum
01.06.2011
Verlag
Springer US
Erschienen in
Software Quality Journal / Ausgabe 2/2011
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-010-9115-6

Weitere Artikel der Ausgabe 2/2011

Software Quality Journal 2/2011 Zur Ausgabe

Premium Partner