Abstract
Most effective particular object and image retrieval approaches are based on the bag-of-words (BoW) model, and all state-of-the-art performance mainly involves a query expansion procedure, which is able to significantly improve retrieval results. Nowadays, Convolutional Neural Network(CNN) is widely applied in computer vision field, including image classification, caption, recognition and retrieval, etc. We introduce an extension to query expansion: an automatic method to select good candidate samples for interactive annotation which is used in query expansion using both BoW method and CNN feature. In this work, we address the query expansion framework using active learning, where the main focus is on the sample selection step in the process of query expansion. More specifically, we propose an active sample selection algorithm based on binary relevance classification, based on the assumption that most confusing samples of the classifiers have high probability to contain helpful true positives for query expansion, which significantly improves the retrieval performance. It takes full use of the multimodal information of the shortlist obtained from the basic retrieval to train a binary relevance classifier, which is used to pick up the most confusing samples for human annotation, with top list as unlabeled data and bottom list as fake negatives. And it can achieve a faster and better retrieval than naive top sample selection method. We also fuse BoW vector and CNN prediction in the retrieval system for a better performance. To evaluate the performance of our proposed method, experiments are conducted on Standard Oxford (5K and 105K) and Paris (6K) datasets, and experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2911–2918
Blanchard G, Lee G, Scott C (2010) Semi-supervised novelty detection. J Mach Learn Res 11:2973–3009
Chum O, Matas J (2010) Unsupervised discovery of co-occurrence in sparse high dimensional data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3416– 3423
Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: Pattern Recognition, pp 236–243
Chum O, Mikulík A, Perdoch M, Matas J (2011) Total recall II: query expansion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 889–896
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval. In: International Conference on Computer Vision , pp 1–8
du Plessis MC, Niu G, Sugiyama M (2014) Analysis of learning from positive and unlabeled data. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) NIPS, pp 703– 711
du Plessis MC, Niu G, Sugiyama M (2015) Convex formulation for learning from positive and unlabeled data. In: International Conference on Machine Learning, pp 1386–1394
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer Vision - European Conference on Computer Vision, pp 304– 317
Jegou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1169–1176
Jegou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304– 3311
Jegou H, Harzallah H, Schmid C (2007) A contextual dissimilarity measure for accurate and efficient image search. In: IEEE Conference on Computer Vision and Pattern Recognition
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
Lebeda K, Matas J, Chum O (2012) Fixing the locally optimized RANSAC. In: BMVC, pp 1–11
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR, pp 3–12
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LJV (2005) A comparison of affine region detectors. Int J Comput Vis 65(1-2):43–72
Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2161–2168
Perdoch M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9–16
Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3384– 3391
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477
Tong S, Chang EY (2001) Support vector machine active learning for image retrieval. In: International Conference on Multimedia, pp 107–118
Wan J, Wang D, Hoi SC, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, FL, USA, pp 157–166
Winder SAJ, Hua G, Brown M (2009) Picking the best DAISY. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 178–185
Wu P, Hoi SCH, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: ACM Multimedia Conference, MM ’13, Barcelona, Spain, pp 153– 162
Zhu C, Jegou H, Satoh S (2013) Query-adaptive asymmetrical dissimilarities for visual object retrieval. In: International Conference on Computer Vision, pp 1705–1712
Acknowledgments
This research was supported by the National Natural Science Foundation of China (Grant No.61571269). The authors would like to thank the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, X., Ding, G. Query expansion for object retrieval with active learning using BoW and CNN feature. Multimed Tools Appl 76, 12133–12147 (2017). https://doi.org/10.1007/s11042-016-4142-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4142-3