Abstract
In content-based visual object retrieval, image representation is one of the fundamental issues in improving retrieval performance. Existing works adopt either local SIFT-like features or holistic features, and may suffer sensitivity to noise or poor discrimination power. In this article, we propose a compact representation for scalable object retrieval from few generic object regions. The regions are identified with a general object detector and are described with a fusion of learning-based features and aggregated SIFT features. Further, we compress feature representation in large-scale image retrieval scenarios. We evaluate the performance of the proposed method on two public ground-truth datasets, with promising results. Experimental results on a million-scale image database demonstrate superior retrieval accuracy with efficiency gain in both computation and memory usage.
- Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2189--2202. Google ScholarDigital Library
- Satpathy Amit, Jiang Xudong, and Eng How-Lung. 2014. Human detection by quadratic classification on subspace of extended histogram of gradients. IEEE Transactions on Image Processing 23, 1, 287--297. Google ScholarDigital Library
- Relja Arandjelovic and Andrew Zisserman. 2012. Three things everyone should know to improve object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2911--2918. Google ScholarDigital Library
- Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of European Conference on Computer Vision. Springer, 404--417. Google ScholarDigital Library
- Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2, 1, 1--127. Google ScholarDigital Library
- Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang. 2011. Edgel index for large-scale sketch-based image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 761--768. Google ScholarDigital Library
- Mingming Cheng, Z. Zhang, W. Lin, and P. Torr. 2014. BING: Binarized normed gradients for objectness estimation at 300fps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarDigital Library
- Lingyang Chu, Shuqiang Jiang, Shuhui Wang, Yanyan Zhang, and Qingming Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Transactions on Multimedia 15, 8, 1982--1996. Google ScholarDigital Library
- Lingyang Chu, Shuhui Wang, Yanyan Zhang, Shuqiang Jiang, and Qingming Huang. 2014. Graph-density-based visual word vocabulary for image retrieval. In IEEE International Conference on Multimedia and Expo. IEEE, 1--6.Google ScholarCross Ref
- Ondrej Chum and Jiri Matas. 2010. Unsupervised discovery of co-occurrence in sparse high dimensional data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3416--3423.Google ScholarCross Ref
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 886--893. Google ScholarDigital Library
- Ian Endres and Derek Hoiem. 2010. Category independent object proposals. In Proceedings of European Conference on Computer Vision. Springer, 575--588. Google ScholarDigital Library
- Pedro Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google ScholarCross Ref
- Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of European Conference on Computer Vision. Springer, 392--407.Google ScholarCross Ref
- Steven Ch Hoi, Wei Liu, and Shih-Fu Chang. 2010. Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications and Applications 6, 3, 18. Google ScholarDigital Library
- Eva Hörster and Rainer Lienhart. 2008. Deep networks for image retrieval on large-scale databases. In Proceedings of the 16th ACM International Conference on Multimedia. ACM, New York, NY, 643--646. Google ScholarDigital Library
- Hervé Jégou and Ondřej Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Proceedings of European Conference on Computer Vision. Springer, 774--787. Google ScholarDigital Library
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of European Conference on Computer Vision. Springer, 304--317. Google ScholarDigital Library
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving bag-of-features for large scale image search. International Journal of Computer Vision 87, 3, 316--336. Google ScholarDigital Library
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 117--128. Google ScholarDigital Library
- Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3304--3311.Google ScholarCross Ref
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
- Timor Kadir, Andrew Zisserman, and Michael Brady. 2004. An affine invariant salient region detector. In Proceedings of European Conference on Computer Vision. Springer, 228--241.Google ScholarCross Ref
- Yan Ke and Rahul Sukthankar. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, II--506. Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Neural Information Processing Systems.Google Scholar
- Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications and Applications 2, 1, 1--19. Google ScholarDigital Library
- Zhen Liu, Houqiang Li, Liyan Zhang, Wengang Zhou, and Qi Tian. 2014. Cross-indexing of binary SIFT codes for large-scale image search. IEEE Transactions on Image Processing.Google Scholar
- Zhen Liu, Houqiang Li, Wengang Zhou, Richang Hong, and Qi Tian. 2015. Uniting keypoints: Local visual information fusion for large-scale image search. IEEE Transactions on Multimedia 17, 4, 538--548.Google ScholarDigital Library
- Zhen Liu, Houqiang Li, Wengang Zhou, Ruizhen Zhao, and Qi Tian. 2014. Contextual hashing for large-scale image search. IEEE Transactions on Image Processing. Google ScholarDigital Library
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 91--110. Google ScholarDigital Library
- Tao Mei, Yong Rui, Shipeng Li, and Qi Tian. 2014. Multimedia search reranking: A literature survey. Computing Surveys 46, 3, 38. Google ScholarDigital Library
- Krystian Mikolajczyk and Cordelia Schmid. 2004. Scale and affine invariant interest point detectors. International Journal of Computer Vision 60, 1, 63--86. Google ScholarDigital Library
- David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2161--2168. Google ScholarDigital Library
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 145--175. Google ScholarDigital Library
- Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of European Conference on Computer Vision. Springer, 143--156. Google ScholarDigital Library
- Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. 2013. Pedestrian detection with unsupervised multi-stage feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3626--3633. Google ScholarDigital Library
- Xiaohui Shen, Zhe Lin, Jonathan Brandt, Shai Avidan, and Ying Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3013--3020. Google ScholarDigital Library
- Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision. 1470--1477. Google ScholarDigital Library
- Shaoyan Sun, Wengang Zhou, Houqiang Li, and Qi Tian. 2014. Search by detection: Object-level feature for image retrieval. In Proceedings of International Conference on Internet Multimedia Computing and Service. ACM, New York, NY, 46. Google ScholarDigital Library
- JRR Uijlings, KEA van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision, 154--171. Google ScholarDigital Library
- Paul Viola and Michael J. Jones. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2, 137--154. Google ScholarDigital Library
- Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, NY, 157--166. Google ScholarDigital Library
- Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications and Applications 11, 3, 37. Google ScholarDigital Library
- Xiaoyu Wang, Ming Yang, Timothee Cour, Shenghuo Zhu, Kai Yu, and Tony X. Han. 2011. Contextual weighting for vocabulary tree based image retrieval. In Proceedings of the International Conference on Computer Vision. 209--216. Google ScholarDigital Library
- Christian Wengert, Matthijs Douze, and Hervé Jégou. 2011. Bag-of-colors for improved image search. In ACM International Conference on Multimedia. ACM, New York, NY, 1437--1440. Google ScholarDigital Library
- Lingxi Xie, Qi Tian, Wengang Zhou, and Bo Zhang. 2014. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. Computer Vision and Image Understanding 124, 31--41.Google ScholarCross Ref
- Lingxi Xie, Jingdong Wang, Bo Zhang, and Qi Tian. 2015. Fine-grained image search. IEEE Transactions on Multimedia 17, 5, 636--647.Google ScholarDigital Library
- Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, and Wen Gao. 2011. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing 20, 9, 2664--2677. Google ScholarDigital Library
- Shiliang Zhang, Qi Tian, Ke Lu, Qingming Huang, and Wen Gao. 2013. Edge-SIFT: Discriminative binary descriptor for scalable partial-duplicate mobile search. IEEE Transactions on Image Processing 22, 7, 2889--2902.Google ScholarCross Ref
- Shaoting Zhang, Ming Yang, Timothee Cour, Kai Yu, and Dimitris N. Metaxas. 2012. Query specific fusion for image retrieval. In Proceedings of European Conference on Computer Vision. Springer, 660--673. Google ScholarDigital Library
- Shiliang Zhang, Ming Yang, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2013. Semantic-aware co-indexing for image retrieval. In Proceedings of the International Conference on Computer Vision. Google ScholarDigital Library
- Liang Zheng, Shengjin Wang, Ziqiong Liu, and Qi Tian. 2014a. Packing and padding: Coupled multi-index for accurate image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Google ScholarDigital Library
- Liang Zheng, Shengjin Wang, Wengang Zhou, and Qi Tian. 2014b. Bayes merging of multiple vocabularies for scalable image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1963--1970. Google ScholarDigital Library
- Wengang Zhou, Houqiang Li, Richang Hong, Yijuan Lu, and Qi Tian. 2015. BSIFT: Towards data-independent codebook for large scale image search. IEEE Transactions on Image Processing 24, 3, 967--979.Google ScholarCross Ref
- Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Transactions on Multimedia Computing, Communications and Applications, 4. Google ScholarDigital Library
- Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2014. Encoding spatial context for large-scale partial-duplicate web image retrieval. Journal of Computer Science and Technology 29, 5, 837--848.Google ScholarCross Ref
- Wengang Zhou, Qi Tian, Yijuan Lu, Linjun Yang, and Houqiang Li. 2011. Latent visual context learning for web image applications. Pattern Recognition 44, 10, 2263--2273. Google ScholarDigital Library
- Wengang Zhou, Ming Yang, Houqiang Li, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2014. Towards codebook-free: Scalable cascaded hashing for mobile image search. IEEE Transactions on Multimedia 16, 3, 601--611. Google ScholarDigital Library
Index Terms
- Scalable Object Retrieval with Compact Image Representation from Generic Object Regions
Recommendations
A novel technique for location independent object based image retrieval
This paper proposes an approach of object based image retrieval to retrieve the images based on location independent region of interest (ROI). In this approach, instead of extracting the features of the whole query image, features of the objects of ...
Region-based image retrieval using an object ontology and relevance feedback
An image retrieval methodology suited for search in large collections of heterogeneous images is presented. The proposed approach employs a fully unsupervised segmentation algorithm to divide images into regions and endow the indexing and retrieval ...
Specific object retrieval based on salient regions
In this paper, we present an image retrieval technique for specific objects based on salient regions. The salient regions we select are invariant to geometric and photometric variations. Those salient regions are detected based on low level features, ...
Comments