Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 3/2019

29.04.2019 | Regular Paper

Video instance search via spatial fusion of visual words and object proposals

verfasst von: Vinh-Tiep Nguyen, Duy Dinh Le, Minh-Triet Tran, Tam V. Nguyen, Thanh Duc Ngo, Shin’ichi Satoh, Duc Anh Duong

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most popular systems for object instance search are based on the bag-of-visual-word model. The inherent weaknesses of this standard model such as quantization error, unstructured representation, burstiness phenomenon are to some extent solved. However, it has a serious problem of searching small objects on a database with cluttered background. In many situations, even the irrelevant objects which share the same texture or shape with a query object get higher score than relevant ones. To overcome this problem, we propose a novel fusion method to significantly boost the accuracy of instance search systems. Firstly, we use the state-of-the-art object detector with denser feature for finding object bounding box and similarity score. Secondly, to exploit the spatial relationship of each visual word with an object proposal, a detected area that might contain a query object, we define three categories of visual word pairs, i.e., discriminative, weak relevant, and context inferred ones. Finally, we propose a new re-ranking scheme with three weighting functions corresponding to the three categories of visual word pairs to compute the final similarity score between a query topic and a video shot. To illustrate the efficiency of the proposed method, we conduct experiments on datasets which have a wide variety of types of query objects. Experimental results on TRECVID Instance Search datasets (INS2013 and INS2014) show the superiority of our proposed method over the state-of-the-art approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE conference on computer vision and pattern recognition Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE conference on computer vision and pattern recognition
2.
Zurück zum Zitat Araujo A, Girod B (2017) Large-scale video retrieval using image queries. In: IEEE transactions on circuits and systems for video technology Araujo A, Girod B (2017) Large-scale video retrieval using image queries. In: IEEE transactions on circuits and systems for video technology
3.
Zurück zum Zitat Awad G, Kraaij W, Over P, Satoh S (2017) Instance search retrospective with focus on trecvid. Int J Multimed Inf Retr 6(1):1–29CrossRef Awad G, Kraaij W, Over P, Satoh S (2017) Instance search retrospective with focus on trecvid. Int J Multimed Inf Retr 6(1):1–29CrossRef
4.
Zurück zum Zitat Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3352–3359 Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3352–3359
5.
Zurück zum Zitat Chum O, Mikulik A, Perdoch M, Matas J (2011) Total recall II: query expansion revisited. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, CVPR ’11, pp 889–896, Washington, DC, USA, IEEE Computer Society Chum O, Mikulik A, Perdoch M, Matas J (2011) Total recall II: query expansion revisited. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, CVPR ’11, pp 889–896, Washington, DC, USA, IEEE Computer Society
6.
Zurück zum Zitat Chum O, Philbin J, Sivic J, Isard M, Zisserman A(2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: IEEE international conference on computer vision Chum O, Philbin J, Sivic J, Isard M, Zisserman A(2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: IEEE international conference on computer vision
7.
Zurück zum Zitat Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conference Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conference
8.
Zurück zum Zitat Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef
9.
Zurück zum Zitat Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of the 10th European conference on computer vision: part I, ECCV ’08, Springer, Berlin, pp 304–317 Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of the 10th European conference on computer vision: part I, ECCV ’08, Springer, Berlin, pp 304–317
10.
Zurück zum Zitat Le DD, Zhu CZ, Phan S, Poullot S, Duong DA, Satoh S (2013) National institute of informatics, japan at trecvid 2013. In TRECVID, Orlando Le DD, Zhu CZ, Phan S, Poullot S, Duong DA, Satoh S (2013) National institute of informatics, japan at trecvid 2013. In TRECVID, Orlando
11.
Zurück zum Zitat Li H, Huang Y, Zhang Z (2017) An improved faster r-cnn for same object retrieval. IEEE Access 5:13665–13676CrossRef Li H, Huang Y, Zhang Z (2017) An improved faster r-cnn for same object retrieval. IEEE Access 5:13665–13676CrossRef
12.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef
13.
14.
Zurück zum Zitat Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: European conference on computer vision, pp 128–142. Springer, Berlin Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: European conference on computer vision, pp 128–142. Springer, Berlin
15.
Zurück zum Zitat Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86CrossRef Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86CrossRef
16.
Zurück zum Zitat Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, ICMR ’16, pp 327–331, New York, NY, USA, ACM Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, ICMR ’16, pp 327–331, New York, NY, USA, ACM
17.
Zurück zum Zitat Mohedano E, Salvador A, McGuinness K, Giró-i Nieto X, OConnor NE, Marqués F (2017) Object retrieval with deep convolutional. Deep Learn Image Process Appl 31:137 Mohedano E, Salvador A, McGuinness K, Giró-i Nieto X, OConnor NE, Marqués F (2017) Object retrieval with deep convolutional. Deep Learn Image Process Appl 31:137
18.
Zurück zum Zitat Nguyen V, Nguyen D, Tran M, Le D, Duong DA, Satoh S (2015) Query-adaptive late fusion with neural network for instance search. In: MMSP, pp 1–6. IEEE Nguyen V, Nguyen D, Tran M, Le D, Duong DA, Satoh S (2015) Query-adaptive late fusion with neural network for instance search. In: MMSP, pp 1–6. IEEE
19.
Zurück zum Zitat Over P, Fiscus J, Sanders G, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Qunot G (2014) Trecvid 2014—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2014. NIST, USA Over P, Fiscus J, Sanders G, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Qunot G (2014) Trecvid 2014—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2014. NIST, USA
20.
Zurück zum Zitat Over P, Fiscus J, Sanders G, Michel M, Awad G, Smeaton AF, Kraaij W, Quénot G (2013) TRECVID 2013—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST, USA Over P, Fiscus J, Sanders G, Michel M, Awad G, Smeaton AF, Kraaij W, Quénot G (2013) TRECVID 2013—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST, USA
21.
Zurück zum Zitat Perdoch M, Chum O, Matas J (2009) Efficient representation of localgeometry for large scale object retrieval. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp 9–16 Perdoch M, Chum O, Matas J (2009) Efficient representation of localgeometry for large scale object retrieval. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp 9–16
22.
Zurück zum Zitat Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition
23.
Zurück zum Zitat Philbin J, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: In CVPR Philbin J, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: In CVPR
24.
Zurück zum Zitat Pratikakis I, Savelonas MA, Arnaoutoglou F, Ioannakis G, Koutsoudis A, Theoharis T, Tran M-T, Nguyen V-T, Pham V-K, Nguyen H-D, et al.(2016) Partial shape queries for 3d object retrieval. In: Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval, pp 79–88. Eurographics Association Pratikakis I, Savelonas MA, Arnaoutoglou F, Ioannakis G, Koutsoudis A, Theoharis T, Tran M-T, Nguyen V-T, Pham V-K, Nguyen H-D, et al.(2016) Partial shape queries for 3d object retrieval. In: Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval, pp 79–88. Eurographics Association
25.
Zurück zum Zitat Radenović F, Iscen A, Tolias G, Avrithis Y, Chum O (2018) Revisiting oxford and paris: large-scale image retrieval benchmarking. arXiv preprint arXiv:1803.11285 Radenović F, Iscen A, Tolias G, Avrithis Y, Chum O (2018) Revisiting oxford and paris: large-scale image retrieval benchmarking. arXiv preprint arXiv:​1803.​11285
26.
Zurück zum Zitat Radenović F, Tolias G, Chum O (2018) Fine-tuning CNN image retrieval with no human annotation. In: IEEE transactions on pattern analysis and machine intelligence Radenović F, Tolias G, Chum O (2018) Fine-tuning CNN image retrieval with no human annotation. In: IEEE transactions on pattern analysis and machine intelligence
27.
Zurück zum Zitat Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. arXiv preprint Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. arXiv preprint
28.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Neural information processing systems (NIPS) Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Neural information processing systems (NIPS)
29.
Zurück zum Zitat Salvador A, Giro-i Nieto X, Marques F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops Salvador A, Giro-i Nieto X, Marques F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
30.
Zurück zum Zitat Shen X, Lin Z, Brandt J, Avidan S, Wu Y (June 2012) Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 3013–3020 Shen X, Lin Z, Brandt J, Avidan S, Wu Y (June 2012) Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 3013–3020
31.
Zurück zum Zitat Tolias G, Avrithis YS (2011) Speeded-up, relaxed spatial matching. In :IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp 1653–1660 Tolias G, Avrithis YS (2011) Speeded-up, relaxed spatial matching. In :IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp 1653–1660
32.
Zurück zum Zitat Tolias G, Jégou H (2014) Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recognit 47(10):3466–3476CrossRef Tolias G, Jégou H (2014) Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recognit 47(10):3466–3476CrossRef
33.
Zurück zum Zitat van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRef van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRef
34.
Zurück zum Zitat Zhang W, Ngo C-W (2013) Searching visual instances with topology checking and context modeling. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 57–64, New York, NY, USA, ACM Zhang W, Ngo C-W (2013) Searching visual instances with topology checking and context modeling. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 57–64, New York, NY, USA, ACM
35.
Zurück zum Zitat Zhou W, Li H, Hong R, Lu Y, Tian Q (2015) Bsift: toward data-independent codebook for large scale image search. IEEE Trans Image Process 24(3):967–979MathSciNetCrossRefMATH Zhou W, Li H, Hong R, Lu Y, Tian Q (2015) Bsift: toward data-independent codebook for large scale image search. IEEE Trans Image Process 24(3):967–979MathSciNetCrossRefMATH
36.
Zurück zum Zitat Zhou X, Zhu C-Z, Zhu Q, Satoh S, Guo Y-T (2014) A practical spatial re-ranking method for instance search from videos. In: 2014 IEEE International conference on image processing (ICIP), pp 3008–3012 Zhou X, Zhu C-Z, Zhu Q, Satoh S, Guo Y-T (2014) A practical spatial re-ranking method for instance search from videos. In: 2014 IEEE International conference on image processing (ICIP), pp 3008–3012
37.
Zurück zum Zitat Zhu C, Jegou H, Satoh S (2013) Query-adaptive asymmetrical dissimilarities for visual object retrieval. In: IEEE international conference on computer vision, ICCV 2013, Sydney, Australia, December 1–8, 2013, pp 1705–1712. IEEE Zhu C, Jegou H, Satoh S (2013) Query-adaptive asymmetrical dissimilarities for visual object retrieval. In: IEEE international conference on computer vision, ICCV 2013, Sydney, Australia, December 1–8, 2013, pp 1705–1712. IEEE
38.
Zurück zum Zitat Zhu C-Z, Zheng Y, Ide I, Satoh S, Takeda K (2014) Nagoya university at trecvid 2014: the instance search task. Participant Notebook Paper of TRECVID Zhu C-Z, Zheng Y, Ide I, Satoh S, Takeda K (2014) Nagoya university at trecvid 2014: the instance search task. Participant Notebook Paper of TRECVID
Metadaten
Titel
Video instance search via spatial fusion of visual words and object proposals
verfasst von
Vinh-Tiep Nguyen
Duy Dinh Le
Minh-Triet Tran
Tam V. Nguyen
Thanh Duc Ngo
Shin’ichi Satoh
Duc Anh Duong
Publikationsdatum
29.04.2019
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 3/2019
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-019-00172-z

Weitere Artikel der Ausgabe 3/2019

International Journal of Multimedia Information Retrieval 3/2019 Zur Ausgabe