research-article

MeCC: memory comparison-based clone detector

Authors:
Heejung Kim

Seoul National University, Seoul, South Korea

Seoul National University, Seoul, South Korea
View Profile

,
Yungbum Jung

Seoul National University, Seoul, South Korea

Seoul National University, Seoul, South Korea
View Profile

,
Sunghun Kim

The Hong Kong University of Science and Technology, Hong Kong, China

The Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Kwankeun Yi

Seoul National University, Seoul, South Korea

Seoul National University, Seoul, South Korea
View Profile

ICSE '11: Proceedings of the 33rd International Conference on Software EngineeringMay 2011Pages 301–310https://doi.org/10.1145/1985793.1985835

Published:21 May 2011Publication History

ICSE '11: Proceedings of the 33rd International Conference on Software Engineering

Pages 301–310

ABSTRACT

In this paper, we propose a new semantic clone detection technique by comparing programs' abstract memory states, which are computed by a semantic-based static analyzer.

Our experimental study using three large-scale open source projects shows that our technique can detect semantic clones that existing syntactic- or semantic-based clone detectors miss. Our technique can help developers identify inconsistent clone changes, find refactoring candidates, and understand software evolution related to semantic clones.

References

S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering, 33(9), 2007. Google ScholarDigital Library
B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. A static analyzer for large safety-critical software. In PLDI, 2003. Google ScholarDigital Library
P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google ScholarDigital Library
I. Dillig, T. Dillig, and A. Aiken. Sound, complete and scalable path-sensitive analysis. In PLDI, 2008. Google ScholarDigital Library
I. Dillig, T. Dillig, and A. Aiken. Small formulas for large programs: On-line constraint simplification in scalable static analysis. In SAS, 2010. Google ScholarDigital Library
E. Duala-Ekoko and M. P. Robillard. Tracking code clones in evolving software. In ICSE, 2007. Google ScholarDigital Library
B. Dutertre and L. D. Moura. A fast linear-arithmetic solver for DPLL(T). In CAV. Springer, 2006. Google ScholarDigital Library
P. E. G. Cousineau. Program equivalence and provability. In MFCS, 1979.Google ScholarCross Ref
M. Gabel, L. Jiang, and Z. Su. Scalable detection of semantic clones. In ICSE, 2008. Google ScholarDigital Library
M. Gabel, J. Yang, Y. Yu, M. Goldszmidt, and Z. Su. Scalable and systematic detection of buggy inconsistencies in source code. In OOPSLA, 2010. Google ScholarDigital Library
Y. Higo, T. Kamiya, S. Kusumoto, K. Inoue, and K. Words. Aries: Refactoring support environment based on code clone analysis. In SEA, 2004.Google Scholar
L. Jiang, G. Misherghi, and Z. Su. Deckard: Scalable and accurate tree-based detection of code clones. In ICSE, 2007. Google ScholarDigital Library
L. Jiang and Z. Su. Automatic mining of functionally equivalent code fragments via random testing. In ISSTA, 2009. Google ScholarDigital Library
L. Jiang, Z. Su, and E. Chiu. Context-based detection of clone-related bugs. In ESEC/FSE, 2007. Google ScholarDigital Library
E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In ICSE, 2009. Google ScholarDigital Library
Y. Jung, J. Kim, J. Shin, and K. Yi. Taming false alarms from a domain-unaware c analyzer by a bayesian statistical post analysis. In SAS, 2005. Google ScholarDigital Library
Y. Jung and K. Yi. Practical memory leak detector based on parameterized procedural summaries. In ISMM, 2008. Google ScholarDigital Library
T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28, 2002. Google ScholarDigital Library
M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. SIGSOFT Softw. Eng. Notes, 30(5), 2005. Google ScholarDigital Library
S. Kim, K. Pan, and E. E. J. Whitehead, Jr. Memories of bug fixes. In SIGSOFT FSE, 2006. Google ScholarDigital Library
R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In SAS, 2001. Google ScholarDigital Library
H. W. Kuhn. The hungarian method for the assignment problem. In 50 Years of Integer Programming 1958--2008, 2009.Google Scholar
Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Trans. Softw. Eng., 32(3), 2006. Google ScholarDigital Library
C. Liu, C. Chen, J. Han, and P. S. Yu. Gplag: detection of software plagiarism by program dependence graph analysis. In KDD, 2006. Google ScholarDigital Library
R. Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17, 1978.Google Scholar
C. Pacheco, S. K. Lahiri, and T. Ball. Finding errors in .net with feedback-directed random testing. In ISSTA, 2008. Google ScholarDigital Library
C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In ICSE, 2007. Google ScholarDigital Library
L. Prechelt, G. Malpohl, and M. Philippsen. Finding plagiarisms among a set of programs with jplag. Journal of Universal Computer Science, 8, 2001.Google Scholar
C. K. Roy and J. R. Cordy. A survey on software clone detection research. SCHOOL OF COMPUTING TR 2007-541, QUEEN'S UNIVERSITY, 115, 2007.Google Scholar
C. K. Roy and J. R. Cordy. Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In ICPC, 2008. Google ScholarDigital Library
C. K. Roy, J. R. Cordy, and R. Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Sci. Comput. Program., 74(7), 2009. Google ScholarDigital Library
B. S.Baker. A program for identifying duplicated code. In Computer Science and Statistics: Proc. Symp. on the Interface, 1992.Google Scholar
S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, 2003. Google ScholarDigital Library
S. Thummalapenta, T. Xie, N. Tillmann, J. de Halleux, and W. Schulte. Mseqgen: object-oriented unit-test generation via mining source code. In ESEC/FSE, 2009. Google ScholarDigital Library
Y. Xie and A. Aiken. Context- and path-sensitive memory leak detection. In ESEC/FSE, 2005. Google ScholarDigital Library

Index Terms

MeCC: memory comparison-based clone detector
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software reverse engineering
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program analysis
    2. Program semantics

Recommendations

CCAligner: a token based large-gap clone detector
ICSE '18: Proceedings of the 40th International Conference on Software Engineering

Copying code and then pasting with large number of edits is a common activity in software development, and the pasted code is a kind of complicated Type-3 clone. Due to large number of edits, we consider the clone as a large-gap clone. Large-gap clone ...
Read More
Survey on Software Clone Detection Research
ICMSS 2019: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences

In order to improve the efficiency of software development, developers often copy-paste code. It is found that the clone code may affect the quality of the software system, especially the maintenance and comprehension of the software, so it is necessary ...
Read More
Pushdown control-flow analysis for free
POPL '16

Traditional control-flow analysis (CFA) for higher-order languages introduces spurious connections between callers and callees, and different invocations of a function may pollute each other's return flows. Recently, three distinct approaches have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '11: Proceedings of the 33rd International Conference on Software Engineering
May 2011
1258 pages
ISBN:9781450304450
DOI:10.1145/1985793
General Chair:
Richard N. Taylor
UC Irvine, USA
,
Program Chairs:
Harald Gall
University of Zurich, Switzerland
,
Nenad Medvidović
University of Southern California, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 May 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abstract interpretation
clone detection
software maintenance
static analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 82
  Total Citations
  View Citations
- 503
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MeCC: memory comparison-based clone detector

ICSE '11: Proceedings of the 33rd International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

CCAligner: a token based large-gap clone detector

Survey on Software Clone Detection Research

Pushdown control-flow analysis for free