skip to main content
research-article
Public Access

Scalable Object Retrieval with Compact Image Representation from Generic Object Regions

Authors Info & Claims
Published:20 October 2015Publication History
Skip Abstract Section

Abstract

In content-based visual object retrieval, image representation is one of the fundamental issues in improving retrieval performance. Existing works adopt either local SIFT-like features or holistic features, and may suffer sensitivity to noise or poor discrimination power. In this article, we propose a compact representation for scalable object retrieval from few generic object regions. The regions are identified with a general object detector and are described with a fusion of learning-based features and aggregated SIFT features. Further, we compress feature representation in large-scale image retrieval scenarios. We evaluate the performance of the proposed method on two public ground-truth datasets, with promising results. Experimental results on a million-scale image database demonstrate superior retrieval accuracy with efficiency gain in both computation and memory usage.

References

  1. Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2189--2202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Satpathy Amit, Jiang Xudong, and Eng How-Lung. 2014. Human detection by quadratic classification on subspace of extended histogram of gradients. IEEE Transactions on Image Processing 23, 1, 287--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Relja Arandjelovic and Andrew Zisserman. 2012. Three things everyone should know to improve object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2911--2918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of European Conference on Computer Vision. Springer, 404--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2, 1, 1--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang. 2011. Edgel index for large-scale sketch-based image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 761--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mingming Cheng, Z. Zhang, W. Lin, and P. Torr. 2014. BING: Binarized normed gradients for objectness estimation at 300fps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lingyang Chu, Shuqiang Jiang, Shuhui Wang, Yanyan Zhang, and Qingming Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Transactions on Multimedia 15, 8, 1982--1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lingyang Chu, Shuhui Wang, Yanyan Zhang, Shuqiang Jiang, and Qingming Huang. 2014. Graph-density-based visual word vocabulary for image retrieval. In IEEE International Conference on Multimedia and Expo. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ondrej Chum and Jiri Matas. 2010. Unsupervised discovery of co-occurrence in sparse high dimensional data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3416--3423.Google ScholarGoogle ScholarCross RefCross Ref
  11. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ian Endres and Derek Hoiem. 2010. Category independent object proposals. In Proceedings of European Conference on Computer Vision. Springer, 575--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pedro Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision. Springer, 392--407.Google ScholarGoogle ScholarCross RefCross Ref
  15. Steven Ch Hoi, Wei Liu, and Shih-Fu Chang. 2010. Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications and Applications 6, 3, 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Eva Hörster and Rainer Lienhart. 2008. Deep networks for image retrieval on large-scale databases. In Proceedings of the 16th ACM International Conference on Multimedia. ACM, New York, NY, 643--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hervé Jégou and Ondřej Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Proceedings of European Conference on Computer Vision. Springer, 774--787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of European Conference on Computer Vision. Springer, 304--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving bag-of-features for large scale image search. International Journal of Computer Vision 87, 3, 316--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3304--3311.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google ScholarGoogle Scholar
  23. Timor Kadir, Andrew Zisserman, and Michael Brady. 2004. An affine invariant salient region detector. In Proceedings of European Conference on Computer Vision. Springer, 228--241.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yan Ke and Rahul Sukthankar. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, II--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Neural Information Processing Systems.Google ScholarGoogle Scholar
  26. Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications and Applications 2, 1, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhen Liu, Houqiang Li, Liyan Zhang, Wengang Zhou, and Qi Tian. 2014. Cross-indexing of binary SIFT codes for large-scale image search. IEEE Transactions on Image Processing.Google ScholarGoogle Scholar
  28. Zhen Liu, Houqiang Li, Wengang Zhou, Richang Hong, and Qi Tian. 2015. Uniting keypoints: Local visual information fusion for large-scale image search. IEEE Transactions on Multimedia 17, 4, 538--548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhen Liu, Houqiang Li, Wengang Zhou, Ruizhen Zhao, and Qi Tian. 2014. Contextual hashing for large-scale image search. IEEE Transactions on Image Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tao Mei, Yong Rui, Shipeng Li, and Qi Tian. 2014. Multimedia search reranking: A literature survey. Computing Surveys 46, 3, 38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Krystian Mikolajczyk and Cordelia Schmid. 2004. Scale and affine invariant interest point detectors. International Journal of Computer Vision 60, 1, 63--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2161--2168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision. Springer, 143--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. 2013. Pedestrian detection with unsupervised multi-stage feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3626--3633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xiaohui Shen, Zhe Lin, Jonathan Brandt, Shai Avidan, and Ying Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3013--3020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision. 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shaoyan Sun, Wengang Zhou, Houqiang Li, and Qi Tian. 2014. Search by detection: Object-level feature for image retrieval. In Proceedings of International Conference on Internet Multimedia Computing and Service. ACM, New York, NY, 46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. JRR Uijlings, KEA van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision, 154--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2, 137--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications and Applications 11, 3, 37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiaoyu Wang, Ming Yang, Timothee Cour, Shenghuo Zhu, Kai Yu, and Tony X. Han. 2011. Contextual weighting for vocabulary tree based image retrieval. In Proceedings of the International Conference on Computer Vision. 209--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Christian Wengert, Matthijs Douze, and Hervé Jégou. 2011. Bag-of-colors for improved image search. In ACM International Conference on Multimedia. ACM, New York, NY, 1437--1440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lingxi Xie, Qi Tian, Wengang Zhou, and Bo Zhang. 2014. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. Computer Vision and Image Understanding 124, 31--41.Google ScholarGoogle ScholarCross RefCross Ref
  47. Lingxi Xie, Jingdong Wang, Bo Zhang, and Qi Tian. 2015. Fine-grained image search. IEEE Transactions on Multimedia 17, 5, 636--647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, and Wen Gao. 2011. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing 20, 9, 2664--2677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Shiliang Zhang, Qi Tian, Ke Lu, Qingming Huang, and Wen Gao. 2013. Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search. IEEE Transactions on Image Processing 22, 7, 2889--2902.Google ScholarGoogle ScholarCross RefCross Ref
  50. Shaoting Zhang, Ming Yang, Timothee Cour, Kai Yu, and Dimitris N. Metaxas. 2012. Query specific fusion for image retrieval. In Proceedings of European Conference on Computer Vision. Springer, 660--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Shiliang Zhang, Ming Yang, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2013. Semantic-aware co-indexing for image retrieval. In Proceedings of the International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Liang Zheng, Shengjin Wang, Ziqiong Liu, and Qi Tian. 2014a. Packing and padding: Coupled multi-index for accurate image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Liang Zheng, Shengjin Wang, Wengang Zhou, and Qi Tian. 2014b. Bayes merging of multiple vocabularies for scalable image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1963--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wengang Zhou, Houqiang Li, Richang Hong, Yijuan Lu, and Qi Tian. 2015. BSIFT: Towards data-independent codebook for large scale image search. IEEE Transactions on Image Processing 24, 3, 967--979.Google ScholarGoogle ScholarCross RefCross Ref
  55. Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Transactions on Multimedia Computing, Communications and Applications, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2014. Encoding spatial context for large-scale partial-duplicate web image retrieval. Journal of Computer Science and Technology 29, 5, 837--848.Google ScholarGoogle ScholarCross RefCross Ref
  57. Wengang Zhou, Qi Tian, Yijuan Lu, Linjun Yang, and Houqiang Li. 2011. Latent visual context learning for web image applications. Pattern Recognition 44, 10, 2263--2273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Wengang Zhou, Ming Yang, Houqiang Li, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2014. Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Transactions on Multimedia 16, 3, 601--611. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Object Retrieval with Compact Image Representation from Generic Object Regions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 12, Issue 2
      March 2016
      224 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2837041
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 October 2015
      • Accepted: 1 May 2015
      • Revised: 1 March 2015
      • Received: 1 December 2014
      Published in tomm Volume 12, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader