Query expansion for object retrieval with active learning using BoW and CNN feature

Zhao, Xin; Ding, Guiguang

doi:10.1007/s11042-016-4142-3

Query expansion for object retrieval with active learning using BoW and CNN feature

Published: 28 December 2016

Volume 76, pages 12133–12147, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xin Zhao¹ &
Guiguang Ding¹

377 Accesses
8 Citations
Explore all metrics

Abstract

Most effective particular object and image retrieval approaches are based on the bag-of-words (BoW) model, and all state-of-the-art performance mainly involves a query expansion procedure, which is able to significantly improve retrieval results. Nowadays, Convolutional Neural Network(CNN) is widely applied in computer vision field, including image classification, caption, recognition and retrieval, etc. We introduce an extension to query expansion: an automatic method to select good candidate samples for interactive annotation which is used in query expansion using both BoW method and CNN feature. In this work, we address the query expansion framework using active learning, where the main focus is on the sample selection step in the process of query expansion. More specifically, we propose an active sample selection algorithm based on binary relevance classification, based on the assumption that most confusing samples of the classifiers have high probability to contain helpful true positives for query expansion, which significantly improves the retrieval performance. It takes full use of the multimodal information of the shortlist obtained from the basic retrieval to train a binary relevance classifier, which is used to pick up the most confusing samples for human annotation, with top list as unlabeled data and bottom list as fake negatives. And it can achieve a faster and better retrieval than naive top sample selection method. We also fuse BoW vector and CNN prediction in the retrieval system for a better performance. To evaluate the performance of our proposed method, experiments are conducted on Standard Oxford (5K and 105K) and Paris (6K) datasets, and experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-Shot Object Detection Algorithm Based on Adaptive Relation Distillation

Image Retrieval with Similar Object Detection and Local Similarity to Detected Objects

A Semi-Supervised Active Learning FSVM for Content Based Image Retrieval

References

Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2911–2918
Blanchard G, Lee G, Scott C (2010) Semi-supervised novelty detection. J Mach Learn Res 11:2973–3009
MathSciNet MATH Google Scholar
Chum O, Matas J (2010) Unsupervised discovery of co-occurrence in sparse high dimensional data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3416– 3423
Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: Pattern Recognition, pp 236–243
Chum O, Mikulík A, Perdoch M, Matas J (2011) Total recall II: query expansion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 889–896
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval. In: International Conference on Computer Vision , pp 1–8
du Plessis MC, Niu G, Sugiyama M (2014) Analysis of learning from positive and unlabeled data. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) NIPS, pp 703– 711
du Plessis MC, Niu G, Sugiyama M (2015) Convex formulation for learning from positive and unlabeled data. In: International Conference on Machine Learning, pp 1386–1394
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer Vision - European Conference on Computer Vision, pp 304– 317
Jegou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1169–1176
Jegou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304– 3311
Jegou H, Harzallah H, Schmid C (2007) A contextual dissimilarity measure for accurate and efficient image search. In: IEEE Conference on Computer Vision and Pattern Recognition
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
Lebeda K, Matas J, Chum O (2012) Fixing the locally optimized RANSAC. In: BMVC, pp 1–11
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR, pp 3–12
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Article Google Scholar
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LJV (2005) A comparison of affine region detectors. Int J Comput Vis 65(1-2):43–72
Article Google Scholar
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2161–2168
Perdoch M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9–16
Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3384– 3391
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477
Tong S, Chang EY (2001) Support vector machine active learning for image retrieval. In: International Conference on Multimedia, pp 107–118
Wan J, Wang D, Hoi SC, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, FL, USA, pp 157–166
Winder SAJ, Hua G, Brown M (2009) Picking the best DAISY. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 178–185
Wu P, Hoi SCH, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: ACM Multimedia Conference, MM ’13, Barcelona, Spain, pp 153– 162
Zhu C, Jegou H, Satoh S (2013) Query-adaptive asymmetrical dissimilarities for visual object retrieval. In: International Conference on Computer Vision, pp 1705–1712

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant No.61571269). The authors would like to thank the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Beijing Haidian Disctrict, Tsinghua University, Beijing Shi, China
Xin Zhao & Guiguang Ding

Authors

Xin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guiguang Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiguang Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Ding, G. Query expansion for object retrieval with active learning using BoW and CNN feature. Multimed Tools Appl 76, 12133–12147 (2017). https://doi.org/10.1007/s11042-016-4142-3

Download citation

Received: 16 April 2016
Revised: 13 September 2016
Accepted: 08 November 2016
Published: 28 December 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11042-016-4142-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Query expansion for object retrieval with active learning using BoW and CNN feature

Abstract

Access this article

Similar content being viewed by others

Few-Shot Object Detection Algorithm Based on Adaptive Relation Distillation

Image Retrieval with Similar Object Detection and Local Similarity to Detected Objects

A Semi-Supervised Active Learning FSVM for Content Based Image Retrieval

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Query expansion for object retrieval with active learning using BoW and CNN feature

Abstract

Access this article

Similar content being viewed by others

Few-Shot Object Detection Algorithm Based on Adaptive Relation Distillation

Image Retrieval with Similar Object Detection and Local Similarity to Detected Objects

A Semi-Supervised Active Learning FSVM for Content Based Image Retrieval

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation