ABSTRACT
Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed.
In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo
- Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 896–899. Google ScholarDigital Library
- Bryan Helmkamp, Chris Hulton, and Devon Blandin. 2018. Code Climate. https: //docs.codeclimate.com/docs/duplication. {Online; accessed 18-Sept-2018}.Google Scholar
- Rainer Koschke, Raimar Falke, and Pierre Frenzel. 2006. Clone detection using abstract syntax suffix trees. In Reverse Engineering, 2006. WCRE’06. 13th Working Conference on. IEEE, 253–262. Google ScholarDigital Library
- Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. CCLearner: A Deep Learning-Based Clone Detection Approach. In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on. IEEE, 249– 260.Google ScholarCross Ref
- Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semanticsbased obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 389–400. Google ScholarDigital Library
- Than McIntosh. 2018. gollvm - Git at Google. https://go.googlesource.com/gollvm/. {Online; accessed 20-Sept-2018}.Google Scholar
- Mibk. 2018. Dupl. https://github.com/mibk/dupl. {Online; accessed 18-Sept-2018}. Abstract 1 Introduction 2 Go-Clone Design 3 Evaluation 3.1 Experiment Setup 3.2 Result 4 Conclusion ReferencesGoogle Scholar
Index Terms
- Go-clone: graph-embedding based clone detector for Golang
Recommendations
DSFM: Enhancing Functional Code Clone Detection with Deep Subtree Interactions
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software EngineeringFunctional code clone detection is important for software maintenance. In recent years, deep learning techniques are introduced to improve the performance of functional code clone detectors. By representing each code snippet as a vector containing its ...
Context-based detection of clone-related bugs
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringStudies show that programs contain much similar code, commonly known as clones. One of the main reasons for introducing clones is programmers' tendency to copy and paste code to quickly duplicate functionality. We commonly believe that clones can make ...
Folding Repeated Instructions for Improving Token-Based Code Clone Detection
SCAM '12: Proceedings of the 2012 IEEE 12th International Working Conference on Source Code Analysis and ManipulationA variety of code clone detection methods have been proposed before now. However, only a small part of them is widely used. Widely-used methods are line-based and token-based ones. They have high scalability because they neither require deep source code ...
Comments