research-article

High-confidence near-duplicate image detection

Authors:
Wei Dong

Independent Researcher, Ann Arbor MI

Independent Researcher, Ann Arbor MI
View Profile

,
Zhe Wang

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

,
Moses Charikar

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

,
Kai Li

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia RetrievalJune 2012Article No.: 1Pages 1–8https://doi.org/10.1145/2324796.2324798

Published:05 June 2012Publication History

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Pages 1–8

ABSTRACT

In this paper, we propose two techniques for near-duplicate image detection at high confidence and large scale. First, we show that entropy-based filtering eliminates ambiguous SIFT features that cause most of the false positives, and enables claiming near-duplicity with a single match of the retained high-quality features. Second, we show that graph cut can be used for query expansion with a duplicity graph computed offline to substantially improve search quality. Evaluation with web images show that when combined with sketch embedding [6], our methods achieve false positive rate orders of magnitude lower than the standard visual word approach. We demonstrate the proposed techniques with a large-scale image search engine which, using indexing data structure offline computed with a Hadoop cluster, is capable of serving more than 50 million web images with a single commodity server.

References

R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. FOCS, 2006. Google ScholarDigital Library
O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In CVPR, 2008.Google ScholarCross Ref
O. Chum, J. Philbin, M. Isard, and A. Zisserman. Scalable near identical image and shot detection. In CIVR, 2007. Google ScholarDigital Library
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, 2007.Google ScholarCross Ref
W. Dong, M. Charikar, and K. Li. Efficiently matching sets of features with random histograms. In MM'08: Proceedings of the 16th ACM International Conference on Multimedia, Vancouver, Canada, 2008. Google ScholarDigital Library
W. Dong, M. Charikar, and K. Li. High dimensional similarity search with sketches. In SIGIR, 2008.Google Scholar
M. Douze, H. Jégou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of GIST descriptors for web-scale image search. In Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, pages 19:1--19:8. ACM, 2009. Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, 1998. Google ScholarDigital Library
H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3304--3311, jun 2010.Google ScholarCross Ref
Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM MM, 2004. Google ScholarDigital Library
D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, 2004. Google ScholarDigital Library
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarDigital Library
G. S. Manku, A. Jain, and A. D. Sarma. Detecting near-duplicates for web crawling. In WWW, 2007. Google ScholarDigital Library
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, 2006. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.Google ScholarCross Ref
D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC, 2004. Google ScholarDigital Library
P. Turcot and D. Lowe. Better matching with fewer features: The selection of useful features in large database recognition problems. In ICCV Workshop on Emergent Issues in Large Amounts of Visual Data, 2009.Google ScholarCross Ref
A. Vedaldi and B. Fulkerson. Vlfeat -- an open and portable library of computer vision algorithms. In Proceedings of the 18th annual ACM international conference on Multimedia, 2010. Google ScholarDigital Library
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS. 2009.Google Scholar
Z. Wu, Q. Ke, M. Isard, and J. Sun. Bundling features for large scale partial-duplicate web image search. In CVPR, 2009.Google Scholar
D. Xu, T.-J. Cham, S. Yan, and S.-F. Chang. Near duplicate image identification with patially aligned pyramid matching. In CVPR, 2008.Google Scholar
S. Zhang, Q. Tian, G. Hua, Q. Huang, and S. Li. Descriptive visual words and visual phrases for image applications. In ACM MM, 2009. Google ScholarDigital Library

Index Terms

High-confidence near-duplicate image detection
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Detection of near-duplicate images for web search
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval

Among the vast numbers of images on the web are many duplicates and near-duplicates, that is, variants derived from the same original image. Such near-duplicates appear in many web image searches and may represent infringements of copyright or indicate ...
Read More
Speed up duplicate/near-duplicate image detection
ICIMCS '10: Proceedings of the Second International Conference on Internet Multimedia Computing and Service

Finding duplicate and near-duplicate images plays an important role on redundancy reduction for image storage, summarization and recommendation. This paper introduces how to speed up Duplicate/Near-Duplicate(D/ND) image detection. Image clustering was ...
Read More
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

Successful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
June 2012
489 pages
ISBN:9781450313292
DOI:10.1145/2324796
Conference Chairs:
Horace H. S. Ip
City University of Hong Kong
,
Yong Rui
Microsoft, China
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entropy
graph cut
near-duplicate
query expansion
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 438
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

High-confidence near-duplicate image detection

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detection of near-duplicate images for web search

Speed up duplicate/near-duplicate image detection

Document expansion for image retrieval