nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

29.04.2019 | Regular Paper

Video instance search via spatial fusion of visual words and object proposals

verfasst von: Vinh-Tiep Nguyen, Duy Dinh Le, Minh-Triet Tran, Tam V. Nguyen, Thanh Duc Ngo, Shin’ichi Satoh, Duc Anh Duong

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Most popular systems for object instance search are based on the bag-of-visual-word model. The inherent weaknesses of this standard model such as quantization error, unstructured representation, burstiness phenomenon are to some extent solved. However, it has a serious problem of searching small objects on a database with cluttered background. In many situations, even the irrelevant objects which share the same texture or shape with a query object get higher score than relevant ones. To overcome this problem, we propose a novel fusion method to significantly boost the accuracy of instance search systems. Firstly, we use the state-of-the-art object detector with denser feature for finding object bounding box and similarity score. Secondly, to exploit the spatial relationship of each visual word with an object proposal, a detected area that might contain a query object, we define three categories of visual word pairs, i.e., discriminative, weak relevant, and context inferred ones. Finally, we propose a new re-ranking scheme with three weighting functions corresponding to the three categories of visual word pairs to compute the final similarity score between a query topic and a video shot. To illustrate the efficiency of the proposed method, we conduct experiments on datasets which have a wide variety of types of query objects. Experimental results on TRECVID Instance Search datasets (INS2013 and INS2014) show the superiority of our proposed method over the state-of-the-art approaches.

Vorheriger Artikel Spatiotemporal wavelet correlogram for human action recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: IEEE conference on computer vision and pattern recognition

Araujo A, Girod B (2017) Large-scale video retrieval using image queries. In: IEEE transactions on circuits and systems for video technology

Awad G, Kraaij W, Over P, Satoh S (2017) Instance search retrospective with focus on trecvid. Int J Multimed Inf Retr 6(1):1–29CrossRef

Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 3352–3359

Chum O, Mikulik A, Perdoch M, Matas J (2011) Total recall II: query expansion revisited. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, CVPR ’11, pp 889–896, Washington, DC, USA, IEEE Computer Society

Chum O, Philbin J, Sivic J, Isard M, Zisserman A(2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: IEEE international conference on computer vision

Crowley EJ, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions. In: British machine vision conference

Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef

Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of the 10th European conference on computer vision: part I, ECCV ’08, Springer, Berlin, pp 304–317

10.

Le DD, Zhu CZ, Phan S, Poullot S, Duong DA, Satoh S (2013) National institute of informatics, japan at trecvid 2013. In TRECVID, Orlando

11.

Li H, Huang Y, Zhang Z (2017) An improved faster r-cnn for same object retrieval. IEEE Access 5:13665–13676CrossRef

12.

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef

13.

Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British machine vision conference, pp 36.1–36.10. BMVA Press. https://doi.org/10.5244/C.16.36

14.

Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: European conference on computer vision, pp 128–142. Springer, Berlin

15.

Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86CrossRef

16.

Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giro-i Nieto X (2016) Bags of local convolutional features for scalable instance search. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, ICMR ’16, pp 327–331, New York, NY, USA, ACM

17.

Mohedano E, Salvador A, McGuinness K, Giró-i Nieto X, OConnor NE, Marqués F (2017) Object retrieval with deep convolutional. Deep Learn Image Process Appl 31:137

18.

Nguyen V, Nguyen D, Tran M, Le D, Duong DA, Satoh S (2015) Query-adaptive late fusion with neural network for instance search. In: MMSP, pp 1–6. IEEE

19.

Over P, Fiscus J, Sanders G, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Qunot G (2014) Trecvid 2014—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2014. NIST, USA

20.

Over P, Fiscus J, Sanders G, Michel M, Awad G, Smeaton AF, Kraaij W, Quénot G (2013) TRECVID 2013—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST, USA

21.

Perdoch M, Chum O, Matas J (2009) Efficient representation of localgeometry for large scale object retrieval. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp 9–16

22.

Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition

23.

Philbin J, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: In CVPR

24.

Pratikakis I, Savelonas MA, Arnaoutoglou F, Ioannakis G, Koutsoudis A, Theoharis T, Tran M-T, Nguyen V-T, Pham V-K, Nguyen H-D, et al.(2016) Partial shape queries for 3d object retrieval. In: Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval, pp 79–88. Eurographics Association

25.

Radenović F, Iscen A, Tolias G, Avrithis Y, Chum O (2018) Revisiting oxford and paris: large-scale image retrieval benchmarking. arXiv preprint arXiv:1803.11285

26.

Radenović F, Tolias G, Chum O (2018) Fine-tuning CNN image retrieval with no human annotation. In: IEEE transactions on pattern analysis and machine intelligence

27.

Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. arXiv preprint

28.

Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Neural information processing systems (NIPS)

29.

Salvador A, Giro-i Nieto X, Marques F, Satoh S (2016) Faster r-cnn features for instance search. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

30.

Shen X, Lin Z, Brandt J, Avidan S, Wu Y (June 2012) Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 3013–3020

31.

Tolias G, Avrithis YS (2011) Speeded-up, relaxed spatial matching. In :IEEE international conference on computer vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp 1653–1660

32.

Tolias G, Jégou H (2014) Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recognit 47(10):3466–3476CrossRef

33.

van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596CrossRef

34.

Zhang W, Ngo C-W (2013) Searching visual instances with topology checking and context modeling. In: Proceedings of the 3rd ACM conference on international conference on multimedia retrieval, ICMR ’13, pp 57–64, New York, NY, USA, ACM

35.

Zhou W, Li H, Hong R, Lu Y, Tian Q (2015) Bsift: toward data-independent codebook for large scale image search. IEEE Trans Image Process 24(3):967–979MathSciNetCrossRefMATH

36.

Zhou X, Zhu C-Z, Zhu Q, Satoh S, Guo Y-T (2014) A practical spatial re-ranking method for instance search from videos. In: 2014 IEEE International conference on image processing (ICIP), pp 3008–3012

37.

Zhu C, Jegou H, Satoh S (2013) Query-adaptive asymmetrical dissimilarities for visual object retrieval. In: IEEE international conference on computer vision, ICCV 2013, Sydney, Australia, December 1–8, 2013, pp 1705–1712. IEEE

38.

Zhu C-Z, Zheng Y, Ide I, Satoh S, Takeda K (2014) Nagoya university at trecvid 2014: the instance search task. Participant Notebook Paper of TRECVID

Titel: Video instance search via spatial fusion of visual words and object proposals
verfasst von: Vinh-Tiep Nguyen
Duy Dinh Le
Minh-Triet Tran
Tam V. Nguyen
Thanh Duc Ngo
Shin’ichi Satoh
Duc Anh Duong
Publikationsdatum: 29.04.2019
Verlag: Springer London
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 3/2019
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-019-00172-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2019

Brain disease diagnosis using local binary pattern and steerable pyramid

Spatiotemporal wavelet correlogram for human action recognition

DHFML: deep heterogeneous feature metric learning for matching photograph and cartoon pairs

Hybrid descriptors and Weighted PCA-EFMNet for Face Verification in the Wild