Skip to main content
Top

2021 | OriginalPaper | Chapter

Source Code Clone Search

Authors : Iman Keivanloo, Juergen Rilling

Published in: Code Clone Analysis

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Identifying similarities in source code is the main challenge for reuse, plagiarism, and code clone detection. Code clone search has emerged as a new research branch in clone detection, aiming to provide similarity search functionality for code snippets. While clone search shares its fundamentals with clone detection, both its objective and requirements differ significantly. Clone search focuses on search engines that are designed to find clones of a single input code snippet (i.e., query) from a large set of code snippets (i.e., corpus). Scalability, short response time, and the ability to rank result sets among the major challenges have to be dealt with by a clone search engine. In this chapter, we identify and define major concepts related to clone search. We then present a framework that summarizes the architecture of a clone search engine and enables us to provide a systematic view of the internals of such an engine. Finally, we discuss how to benchmark and evaluate the performance of clone search engines. The discussion includes a set of measures that are helpful in evaluating clone search engines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference H.J. Webber, New horticultural and agricultural terms. Science 18(459), 501–503 (1903)CrossRef H.J. Webber, New horticultural and agricultural terms. Science 18(459), 501–503 (1903)CrossRef
2.
go back to reference L. Barbour, H. Yuan, Y. Zou, A technique for just-in-time clone detection in large scale systems, in International Conference on Program Comprehension (2010) L. Barbour, H. Yuan, Y. Zou, A technique for just-in-time clone detection in large scale systems, in International Conference on Program Comprehension (2010)
3.
go back to reference I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, in Working Conference on Reverse Engineering (2011) I. Keivanloo, J. Rilling, P. Charland, Internet-scale real-time code clone search via multi-level indexing, in Working Conference on Reverse Engineering (2011)
4.
go back to reference I. Keivanloo, Source code similarity and clone search. Ph.D. thesis, Concordia University (2013) I. Keivanloo, Source code similarity and clone search. Ph.D. thesis, Concordia University (2013)
5.
go back to reference A. Walenstein, A. Lakhotia, Clone detector evaluation can be improved: ideas from information retrieval, in International Workshop on Detection of Software Clones (2003) A. Walenstein, A. Lakhotia, Clone detector evaluation can be improved: ideas from information retrieval, in International Workshop on Detection of Software Clones (2003)
6.
go back to reference C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, 2008) C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, 2008)
7.
go back to reference J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, Towards a big data curated benchmark of InterProject code clones, in 30th International Conference on Software Maintenance and Evolution (2014) J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, Towards a big data curated benchmark of InterProject code clones, in 30th International Conference on Software Maintenance and Evolution (2014)
8.
go back to reference K.J. Ottenstein, An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. (1976) K.J. Ottenstein, An algorithmic approach to the detection and prevention of plagiarism. ACM SIGCSE Bull. (1976)
9.
go back to reference S. Grier, A tool that detects plagiarism in Pascal programs, in SIGCSE Technical Symposium on Computer Science Education (1981) S. Grier, A tool that detects plagiarism in Pascal programs, in SIGCSE Technical Symposium on Computer Science Education (1981)
10.
go back to reference P.S. Abrams, J.W. Myrna, Automatic control of execution: an overview, in International Conference on APL (1979) P.S. Abrams, J.W. Myrna, Automatic control of execution: an overview, in International Conference on APL (1979)
11.
go back to reference J. Jacobsen, An automated management system for applications software, in ACM SIGUCCS Conference on User Services (1984) J. Jacobsen, An automated management system for applications software, in ACM SIGUCCS Conference on User Services (1984)
12.
go back to reference P.J. Caudill, A. Wirfs-Brock, A third generation Smalltalk-80 implementation, in Conference on Object-Oriented Programming Systems, Languages and Applications (1986) P.J. Caudill, A. Wirfs-Brock, A third generation Smalltalk-80 implementation, in Conference on Object-Oriented Programming Systems, Languages and Applications (1986)
13.
go back to reference A.S. Tanenbaum, A UNIX clone with source code for operating systems courses. ACM SIGOPS Operating Systems Review (1987) A.S. Tanenbaum, A UNIX clone with source code for operating systems courses. ACM SIGOPS Operating Systems Review (1987)
14.
go back to reference M.I. Kellner, Ten years of software maintenance: progress or promises?, in Conference on Software Maintenance (1993) M.I. Kellner, Ten years of software maintenance: progress or promises?, in Conference on Software Maintenance (1993)
15.
go back to reference J.V. Lombardi, Computer Literacy: The Basic Concepts and Language (Indiana University Press, 1983) J.V. Lombardi, Computer Literacy: The Basic Concepts and Language (Indiana University Press, 1983)
16.
go back to reference S. Carter, R.J. Frank, D.S.W. Tansley, Clone detection in telecommunications software systems: a neural net approach, in International Workshop on Applications of Neural Networks to Telecommunications (1993) S. Carter, R.J. Frank, D.S.W. Tansley, Clone detection in telecommunications software systems: a neural net approach, in International Workshop on Applications of Neural Networks to Telecommunications (1993)
17.
go back to reference T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. (2002) T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. (2002)
18.
go back to reference M.W. Lee, J.W. Roh, S.W. Hwang, S. Kim, Instant code clone search, in International Symposium on Foundations of Software Engineering (2010) M.W. Lee, J.W. Roh, S.W. Hwang, S. Kim, Instant code clone search, in International Symposium on Foundations of Software Engineering (2010)
19.
go back to reference V. Balachandran, Reducing accidental clones using instant clone search in automatic code review, in IEEE International Conference on Software Maintenance and Evolution (2020) V. Balachandran, Reducing accidental clones using instant clone search in automatic code review, in IEEE International Conference on Software Maintenance and Evolution (2020)
20.
go back to reference S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, H. Iida, SHINOBI: a tool for automatic code clone detection in the IDE, in Working Conference on Reverse Engineering (2009) S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, H. Iida, SHINOBI: a tool for automatic code clone detection in the IDE, in Working Conference on Reverse Engineering (2009)
21.
go back to reference S. Bazrafshan, R. Koschke, N. Gode, Approximate code search in program histories, in Working Conference on Reverse Engineering (2011) S. Bazrafshan, R. Koschke, N. Gode, Approximate code search in program histories, in Working Conference on Reverse Engineering (2011)
22.
go back to reference I. Keivanloo, C.K. Roy, J. Rilling, SeByte: scalable clone and similarity search for bytecode. Sci. Comput. Program. 426–444 (2014) I. Keivanloo, C.K. Roy, J. Rilling, SeByte: scalable clone and similarity search for bytecode. Sci. Comput. Program. 426–444 (2014)
23.
go back to reference B. Hummel, E. Juergens, L. Heinemann, M. Conradt, Index-based code clone detection: incremental, distributed, scalable, in International Conference on Software Maintenance (2010) B. Hummel, E. Juergens, L. Heinemann, M. Conradt, Index-based code clone detection: incremental, distributed, scalable, in International Conference on Software Maintenance (2010)
24.
go back to reference M.F. Zibran, C.K. Roy, IDE-based real-time focused search for near-miss clones, in ACM Symposium on Applied Computing (2012) M.F. Zibran, C.K. Roy, IDE-based real-time focused search for near-miss clones, in ACM Symposium on Applied Computing (2012)
25.
go back to reference I. Keivanloo, J. Rilling, P. Charland, SeClone-a hybrid approach to internet-scale real-time code clone search, in International Conference on Program Comprehension (2011) I. Keivanloo, J. Rilling, P. Charland, SeClone-a hybrid approach to internet-scale real-time code clone search, in International Conference on Program Comprehension (2011)
26.
go back to reference C. Ragkhitwetsagul, J. Krinke, Siamese: scalable and incremental code clone search via multiple code representations. Empir. Softw. Eng. (2019) C. Ragkhitwetsagul, J. Krinke, Siamese: scalable and incremental code clone search via multiple code representations. Empir. Softw. Eng. (2019)
27.
go back to reference I. Keivanloo, J. Rilling, Y. Zou, Spotting working code examples, in 36th International Conference on Software Engineering ICSE (2014) I. Keivanloo, J. Rilling, Y. Zou, Spotting working code examples, in 36th International Conference on Software Engineering ICSE (2014)
28.
go back to reference I. Keivanloo, J. Rilling, P. Charland, Threshold-free code clone detection for a large-scale heterogeneous Java repository, in IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015) I. Keivanloo, J. Rilling, P. Charland, Threshold-free code clone detection for a large-scale heterogeneous Java repository, in IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015)
Metadata
Title
Source Code Clone Search
Authors
Iman Keivanloo
Juergen Rilling
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-1927-4_9

Premium Partner