Scalable Object Retrieval with Compact Image Representation from Generic Object Regions

Authors:
Shaoyan Sun

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei
View Profile

,
Wengang Zhou

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei
View Profile

,
Qi Tian

University of Texas at San Antonio, San Antonio, TX

University of Texas at San Antonio, San Antonio, TX
View Profile

,
Houqiang Li

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12 Issue 2Article No.: 29pp 1–21https://doi.org/10.1145/2818708

Published:20 October 2015Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

In content-based visual object retrieval, image representation is one of the fundamental issues in improving retrieval performance. Existing works adopt either local SIFT-like features or holistic features, and may suffer sensitivity to noise or poor discrimination power. In this article, we propose a compact representation for scalable object retrieval from few generic object regions. The regions are identified with a general object detector and are described with a fusion of learning-based features and aggregated SIFT features. Further, we compress feature representation in large-scale image retrieval scenarios. We evaluate the performance of the proposed method on two public ground-truth datasets, with promising results. Experimental results on a million-scale image database demonstrate superior retrieval accuracy with efficiency gain in both computation and memory usage.

References

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2189--2202. Google ScholarDigital Library
Satpathy Amit, Jiang Xudong, and Eng How-Lung. 2014. Human detection by quadratic classification on subspace of extended histogram of gradients. IEEE Transactions on Image Processing 23, 1, 287--297. Google ScholarDigital Library
Relja Arandjelovic and Andrew Zisserman. 2012. Three things everyone should know to improve object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2911--2918. Google ScholarDigital Library
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of European Conference on Computer Vision. Springer, 404--417. Google ScholarDigital Library
Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2, 1, 1--127. Google ScholarDigital Library
Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang. 2011. Edgel index for large-scale sketch-based image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 761--768. Google ScholarDigital Library
Mingming Cheng, Z. Zhang, W. Lin, and P. Torr. 2014. BING: Binarized normed gradients for objectness estimation at 300fps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarDigital Library
Lingyang Chu, Shuqiang Jiang, Shuhui Wang, Yanyan Zhang, and Qingming Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Transactions on Multimedia 15, 8, 1982--1996. Google ScholarDigital Library
Lingyang Chu, Shuhui Wang, Yanyan Zhang, Shuqiang Jiang, and Qingming Huang. 2014. Graph-density-based visual word vocabulary for image retrieval. In IEEE International Conference on Multimedia and Expo. IEEE, 1--6.Google ScholarCross Ref
Ondrej Chum and Jiri Matas. 2010. Unsupervised discovery of co-occurrence in sparse high dimensional data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3416--3423.Google ScholarCross Ref
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 886--893. Google ScholarDigital Library
Ian Endres and Derek Hoiem. 2010. Category independent object proposals. In Proceedings of European Conference on Computer Vision. Springer, 575--588. Google ScholarDigital Library
Pedro Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google ScholarCross Ref
Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision. Springer, 392--407.Google ScholarCross Ref
Steven Ch Hoi, Wei Liu, and Shih-Fu Chang. 2010. Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications and Applications 6, 3, 18. Google ScholarDigital Library
Eva Hörster and Rainer Lienhart. 2008. Deep networks for image retrieval on large-scale databases. In Proceedings of the 16th ACM International Conference on Multimedia. ACM, New York, NY, 643--646. Google ScholarDigital Library
Hervé Jégou and Ondřej Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Proceedings of European Conference on Computer Vision. Springer, 774--787. Google ScholarDigital Library
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of European Conference on Computer Vision. Springer, 304--317. Google ScholarDigital Library
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving bag-of-features for large scale image search. International Journal of Computer Vision 87, 3, 316--336. Google ScholarDigital Library
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 117--128. Google ScholarDigital Library
Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3304--3311.Google ScholarCross Ref
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
Timor Kadir, Andrew Zisserman, and Michael Brady. 2004. An affine invariant salient region detector. In Proceedings of European Conference on Computer Vision. Springer, 228--241.Google ScholarCross Ref
Yan Ke and Rahul Sukthankar. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, II--506. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Neural Information Processing Systems.Google Scholar
Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications and Applications 2, 1, 1--19. Google ScholarDigital Library
Zhen Liu, Houqiang Li, Liyan Zhang, Wengang Zhou, and Qi Tian. 2014. Cross-indexing of binary SIFT codes for large-scale image search. IEEE Transactions on Image Processing.Google Scholar
Zhen Liu, Houqiang Li, Wengang Zhou, Richang Hong, and Qi Tian. 2015. Uniting keypoints: Local visual information fusion for large-scale image search. IEEE Transactions on Multimedia 17, 4, 538--548.Google ScholarDigital Library
Zhen Liu, Houqiang Li, Wengang Zhou, Ruizhen Zhao, and Qi Tian. 2014. Contextual hashing for large-scale image search. IEEE Transactions on Image Processing. Google ScholarDigital Library
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 91--110. Google ScholarDigital Library
Tao Mei, Yong Rui, Shipeng Li, and Qi Tian. 2014. Multimedia search reranking: A literature survey. Computing Surveys 46, 3, 38. Google ScholarDigital Library
Krystian Mikolajczyk and Cordelia Schmid. 2004. Scale and affine invariant interest point detectors. International Journal of Computer Vision 60, 1, 63--86. Google ScholarDigital Library
David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2161--2168. Google ScholarDigital Library
Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 145--175. Google ScholarDigital Library
Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision. Springer, 143--156. Google ScholarDigital Library
Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. 2013. Pedestrian detection with unsupervised multi-stage feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3626--3633. Google ScholarDigital Library
Xiaohui Shen, Zhe Lin, Jonathan Brandt, Shai Avidan, and Ying Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3013--3020. Google ScholarDigital Library
Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision. 1470--1477. Google ScholarDigital Library
Shaoyan Sun, Wengang Zhou, Houqiang Li, and Qi Tian. 2014. Search by detection: Object-level feature for image retrieval. In Proceedings of International Conference on Internet Multimedia Computing and Service. ACM, New York, NY, 46. Google ScholarDigital Library
JRR Uijlings, KEA van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision, 154--171. Google ScholarDigital Library
Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2, 137--154. Google ScholarDigital Library
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 157--166. Google ScholarDigital Library
Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications and Applications 11, 3, 37. Google ScholarDigital Library
Xiaoyu Wang, Ming Yang, Timothee Cour, Shenghuo Zhu, Kai Yu, and Tony X. Han. 2011. Contextual weighting for vocabulary tree based image retrieval. In Proceedings of the International Conference on Computer Vision. 209--216. Google ScholarDigital Library
Christian Wengert, Matthijs Douze, and Hervé Jégou. 2011. Bag-of-colors for improved image search. In ACM International Conference on Multimedia. ACM, New York, NY, 1437--1440. Google ScholarDigital Library
Lingxi Xie, Qi Tian, Wengang Zhou, and Bo Zhang. 2014. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. Computer Vision and Image Understanding 124, 31--41.Google ScholarCross Ref
Lingxi Xie, Jingdong Wang, Bo Zhang, and Qi Tian. 2015. Fine-grained image search. IEEE Transactions on Multimedia 17, 5, 636--647.Google ScholarDigital Library
Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, and Wen Gao. 2011. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing 20, 9, 2664--2677. Google ScholarDigital Library
Shiliang Zhang, Qi Tian, Ke Lu, Qingming Huang, and Wen Gao. 2013. Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search. IEEE Transactions on Image Processing 22, 7, 2889--2902.Google ScholarCross Ref
Shaoting Zhang, Ming Yang, Timothee Cour, Kai Yu, and Dimitris N. Metaxas. 2012. Query specific fusion for image retrieval. In Proceedings of European Conference on Computer Vision. Springer, 660--673. Google ScholarDigital Library
Shiliang Zhang, Ming Yang, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2013. Semantic-aware co-indexing for image retrieval. In Proceedings of the International Conference on Computer Vision. Google ScholarDigital Library
Liang Zheng, Shengjin Wang, Ziqiong Liu, and Qi Tian. 2014a. Packing and padding: Coupled multi-index for accurate image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarDigital Library
Liang Zheng, Shengjin Wang, Wengang Zhou, and Qi Tian. 2014b. Bayes merging of multiple vocabularies for scalable image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1963--1970. Google ScholarDigital Library
Wengang Zhou, Houqiang Li, Richang Hong, Yijuan Lu, and Qi Tian. 2015. BSIFT: Towards data-independent codebook for large scale image search. IEEE Transactions on Image Processing 24, 3, 967--979.Google ScholarCross Ref
Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Transactions on Multimedia Computing, Communications and Applications, 4. Google ScholarDigital Library
Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2014. Encoding spatial context for large-scale partial-duplicate web image retrieval. Journal of Computer Science and Technology 29, 5, 837--848.Google ScholarCross Ref
Wengang Zhou, Qi Tian, Yijuan Lu, Linjun Yang, and Houqiang Li. 2011. Latent visual context learning for web image applications. Pattern Recognition 44, 10, 2263--2273. Google ScholarDigital Library
Wengang Zhou, Ming Yang, Houqiang Li, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2014. Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Transactions on Multimedia 16, 3, 601--611. Google ScholarDigital Library

Index Terms

Scalable Object Retrieval with Compact Image Representation from Generic Object Regions
1. Information systems
  1. Information retrieval

Recommendations

A novel technique for location independent object based image retrieval

This paper proposes an approach of object based image retrieval to retrieve the images based on location independent region of interest (ROI). In this approach, instead of extracting the features of the whole query image, features of the objects of ...
Read More
Region-based image retrieval using an object ontology and relevance feedback

An image retrieval methodology suited for search in large collections of heterogeneous images is presented. The proposed approach employs a fully unsupervised segmentation algorithm to divide images into regions and endow the indexing and retrieval ...
Read More
Specific object retrieval based on salient regions

In this paper, we present an image retrieval technique for specific objects based on salient regions. The salient regions we select are invariant to geometric and photometric variations. Those salient regions are detected based on low level features, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12, Issue 2
March 2016
224 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2837041
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 October 2015
- Accepted: 1 May 2015
- Revised: 1 March 2015
- Received: 1 December 2014
Published in tomm Volume 12, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Image retrieval
compact image representation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 649
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Object Retrieval with Compact Image Representation from Generic Object Regions

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

A novel technique for location independent object based image retrieval

Region-based image retrieval using an object ontology and relevance feedback

Specific object retrieval based on salient regions