Skip to main content

Advertisement

Log in

Query expansion for object retrieval with active learning using BoW and CNN feature

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most effective particular object and image retrieval approaches are based on the bag-of-words (BoW) model, and all state-of-the-art performance mainly involves a query expansion procedure, which is able to significantly improve retrieval results. Nowadays, Convolutional Neural Network(CNN) is widely applied in computer vision field, including image classification, caption, recognition and retrieval, etc. We introduce an extension to query expansion: an automatic method to select good candidate samples for interactive annotation which is used in query expansion using both BoW method and CNN feature. In this work, we address the query expansion framework using active learning, where the main focus is on the sample selection step in the process of query expansion. More specifically, we propose an active sample selection algorithm based on binary relevance classification, based on the assumption that most confusing samples of the classifiers have high probability to contain helpful true positives for query expansion, which significantly improves the retrieval performance. It takes full use of the multimodal information of the shortlist obtained from the basic retrieval to train a binary relevance classifier, which is used to pick up the most confusing samples for human annotation, with top list as unlabeled data and bottom list as fake negatives. And it can achieve a faster and better retrieval than naive top sample selection method. We also fuse BoW vector and CNN prediction in the retrieval system for a better performance. To evaluate the performance of our proposed method, experiments are conducted on Standard Oxford (5K and 105K) and Paris (6K) datasets, and experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2911–2918

  2. Blanchard G, Lee G, Scott C (2010) Semi-supervised novelty detection. J Mach Learn Res 11:2973–3009

    MathSciNet  MATH  Google Scholar 

  3. Chum O, Matas J (2010) Unsupervised discovery of co-occurrence in sparse high dimensional data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3416– 3423

  4. Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: Pattern Recognition, pp 236–243

  5. Chum O, Mikulík A, Perdoch M, Matas J (2011) Total recall II: query expansion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 889–896

  6. Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: Automatic query expansion with a generative feature model for object retrieval. In: International Conference on Computer Vision , pp 1–8

  7. du Plessis MC, Niu G, Sugiyama M (2014) Analysis of learning from positive and unlabeled data. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) NIPS, pp 703– 711

  8. du Plessis MC, Niu G, Sugiyama M (2015) Convex formulation for learning from positive and unlabeled data. In: International Conference on Machine Learning, pp 1386–1394

  9. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer Vision - European Conference on Computer Vision, pp 304– 317

  10. Jegou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1169–1176

  11. Jegou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304– 3311

  12. Jegou H, Harzallah H, Schmid C (2007) A contextual dissimilarity measure for accurate and efficient image search. In: IEEE Conference on Computer Vision and Pattern Recognition

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114

  14. Lebeda K, Matas J, Chum O (2012) Fixing the locally optimized RANSAC. In: BMVC, pp 1–11

  15. Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR, pp 3–12

  16. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  17. Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86

    Article  Google Scholar 

  18. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LJV (2005) A comparison of affine region detectors. Int J Comput Vis 65(1-2):43–72

    Article  Google Scholar 

  19. Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2161–2168

  20. Perdoch M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9–16

  21. Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3384– 3391

  22. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition

  23. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition

  24. Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477

  25. Tong S, Chang EY (2001) Support vector machine active learning for image retrieval. In: International Conference on Multimedia, pp 107–118

  26. Wan J, Wang D, Hoi SC, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, FL, USA, pp 157–166

  27. Winder SAJ, Hua G, Brown M (2009) Picking the best DAISY. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 178–185

  28. Wu P, Hoi SCH, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: ACM Multimedia Conference, MM ’13, Barcelona, Spain, pp 153– 162

  29. Zhu C, Jegou H, Satoh S (2013) Query-adaptive asymmetrical dissimilarities for visual object retrieval. In: International Conference on Computer Vision, pp 1705–1712

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant No.61571269). The authors would like to thank the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiguang Ding.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Ding, G. Query expansion for object retrieval with active learning using BoW and CNN feature. Multimed Tools Appl 76, 12133–12147 (2017). https://doi.org/10.1007/s11042-016-4142-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4142-3

Keywords

Navigation