skip to main content
10.1145/2324796.2324798acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

High-confidence near-duplicate image detection

Published:05 June 2012Publication History

ABSTRACT

In this paper, we propose two techniques for near-duplicate image detection at high confidence and large scale. First, we show that entropy-based filtering eliminates ambiguous SIFT features that cause most of the false positives, and enables claiming near-duplicity with a single match of the retained high-quality features. Second, we show that graph cut can be used for query expansion with a duplicity graph computed offline to substantially improve search quality. Evaluation with web images show that when combined with sketch embedding [6], our methods achieve false positive rate orders of magnitude lower than the standard visual word approach. We demonstrate the proposed techniques with a large-scale image search engine which, using indexing data structure offline computed with a Hadoop cluster, is capable of serving more than 50 million web images with a single commodity server.

References

  1. R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. FOCS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In CVPR, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. O. Chum, J. Philbin, M. Isard, and A. Zisserman. Scalable near identical image and shot detection. In CIVR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. Dong, M. Charikar, and K. Li. Efficiently matching sets of features with random histograms. In MM'08: Proceedings of the 16th ACM International Conference on Multimedia, Vancouver, Canada, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Dong, M. Charikar, and K. Li. High dimensional similarity search with sketches. In SIGIR, 2008.Google ScholarGoogle Scholar
  7. M. Douze, H. Jégou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of GIST descriptors for web-scale image search. In Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, pages 19:1--19:8. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3304--3311, jun 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. Y. Ke, R. Sukthankar, and L. Huston. An efficient parts-based near-duplicate and sub-image retrieval system. In ACM MM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. S. Manku, A. Jain, and A. D. Sarma. Detecting near-duplicates for web crawling. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Turcot and D. Lowe. Better matching with fewer features: The selection of useful features in large database recognition problems. In ICCV Workshop on Emergent Issues in Large Amounts of Visual Data, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Vedaldi and B. Fulkerson. Vlfeat -- an open and portable library of computer vision algorithms. In Proceedings of the 18th annual ACM international conference on Multimedia, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS. 2009.Google ScholarGoogle Scholar
  21. Z. Wu, Q. Ke, M. Isard, and J. Sun. Bundling features for large scale partial-duplicate web image search. In CVPR, 2009.Google ScholarGoogle Scholar
  22. D. Xu, T.-J. Cham, S. Yan, and S.-F. Chang. Near duplicate image identification with patially aligned pyramid matching. In CVPR, 2008.Google ScholarGoogle Scholar
  23. S. Zhang, Q. Tian, G. Hua, Q. Huang, and S. Li. Descriptive visual words and visual phrases for image applications. In ACM MM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. High-confidence near-duplicate image detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
        June 2012
        489 pages
        ISBN:9781450313292
        DOI:10.1145/2324796

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%

        Upcoming Conference

        ICMR '24
        International Conference on Multimedia Retrieval
        June 10 - 14, 2024
        Phuket , Thailand

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader