nach oben

Neural Processing Letters

Erschienen in:

03.04.2021

A Comprehensive Study on VLAD

verfasst von: Xin Li, Lei Zhang, Zhiping Jian, Liyun Zuo

Erschienen in: Neural Processing Letters | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recently, the vector of locally aggregated descriptor (VLAD) has shown its great effectiveness in diverse computer vision tasks including image retrieval, Scene classification, and action recognition. Its great success stems from its powerful representation ability and computational efficiency. However, it remains unclear about its theoretical foundation and how it is connected to basic while important algorithms, e.g., the bag-of-words model and match kernels, and how its performance is affected by parameter configurations, e.g., normalization and pooling, which are also widely used in state-of-the-art algorithms based on local features. In this paper, with an aim to achieve the full capacity of VLAD, we conduct a comprehensive and in-depth study from both theoretical analysis and experimental practice perspectives. As a theoretical contribution, we provide a new formulation of VLAD via match kernels, which serves to connect VLAD with existing important encoding methods based on local features. As a contribution to the practical use of VLAD, we comprehensively investigate the roles and effects of the two widely-used operations in local feature encoding: normalization and pooling. To the best of our knowledge, our work provides the first comprehensive study on VLAD, which will not only enable a full understanding of it but also provide an important guidance for state-of-the-art algorithms based on local features. We have conducted extensive experiments on three benchmark datasets: Scene-15, Caltech 101 and PPMI for both image classification and action recognition.

Vorheriger Artikel Feature Extraction via Sparse Fuzzy Difference Embedding (SFDE) for Robust Subspace Learning

Nächster Artikel Approximate Analytic Solution of Burger Huxley Equation Using Feed-Forward Artificial Neural Network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Fekriershad S, Saberi M, Tajeripour F (2012) An innovative skin detection approach using color based image retrieval technique. Int J Multimed Appl 4(3):57–65

Yan S, Xu X, Xu D, Lin S, Li X (2015) Image classification with densely sampled image windows and generalized adaptive multiple kernel learning. IEEE Trans Cybern 45(3):381–390CrossRef

Yu J, Rui Y, Tang Y, Tao D (2014) High-order distance-based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442CrossRef

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int Conf Comput Vision 60(2):91–110CrossRef

Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606CrossRef

Tang J, Shao L, Li X, Lu K (2016) A local structural descriptor for image matching via normalized graph laplacian embedding. IEEE Trans Cybern 46(2):410–420CrossRef

Boureau Y.-L, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition, In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 2559–2566

Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation, In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 3304–3311

Jegou H, Perronnin F, Douze M, Sanchez J (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716CrossRef

10.

Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features, In: European Conference on Computer Vision, Springer, pp. 392–407

11.

Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A (2014) Describing textures in the wild, In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 3606–3613

12.

Kantorov V, Laptev I (2014) Efficient feature extraction, encoding and classification for action recognition, In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 1–8

13.

Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris IY, Tsoumakas G, Vlahavas I (2014) A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Trans Multimed 16(6):1713–1728CrossRef

14.

Faraki M, Harandi M, Porikli F (2015) More about vlad: A leap from euclidean to riemannian manifolds, In: IEEE Conference on computer vision and pattern recognition, pp. 4951–4960

15.

Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification, In: European Conference on computer vision, pp. 143-156

16.

Husain SS, Bober M (2017) Improving large-scale image retrieval through robust aggregation of local descriptors. IEEE Trans Pattern Anal Mach Intell 99:1783–1796CrossRef

17.

Delhumeau J, Gosselin P.-H, Jégou H, Pérez P (2013) Revisiting the vlad image representation, In: ACM international conference on multimedia, ACM, pp. 653–656

18.

Arandjelovic R, Zisserman A (2013) All about vlad, In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 1578-1585

19.

Tolias G, Avrithis Y, Jégou H (2013) To aggregate or not to aggregate: Selective match kernels for image search, In: IEEE International Conference on computer vision, IEEE, pp. 1401–1408

20.

Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search, In: European Conference on computer vision, Springer, pp. 304–317

21.

Jégou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vision 87(3):316–336CrossRef

22.

Angelina Uy. Mikaela, Lee Gim Hee (2018) PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, In: IEEE Conference on computer vision and pattern recognition, pp. 4470-4479

23.

Qi C. R, Su H, Mo K, Guibas L. J (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation, In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 652-660

24.

Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: IEEE Conference on computer vision and pattern recognition, IEEE, pp. 5297-5307

25.

Haussler D (1999) Convolution kernels on discrete structures, Technical report 7. University of California at Santa Cruz, Department of Computer Science, pp 95–174

26.

Grauman K, Darrell T (2007) The pyramid match kernel: Efficient learning with sets of features. J Mach Learn Res 8:725–760MATH

27.

Bo L, Sminchisescu C (2009) Efficient match kernel between sets of features for visual recognition, In: Advances in neural information processing systems, pp. 135–143

28.

Murray N, Perronnin F (2014) Generalized max pooling, In: IEEE Conference on computer vision and pattern recognition, pp. 2473–2480

29.

Kondor R, Jebara T (2003) A kernel between sets of vectors, In: International conference on machine learning, pp. 361–368

30.

Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features, In: IEEE International Conference on computer vision, pp. 1458–1465

31.

Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, In: IEEE Conference on computer vision and pattern recognition, pp. 2169–2178

32.

Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):1–48MATH

33.

Boureau Y.-L, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in visual recognition, In: International Conference on machine learning, pp. 111–118

34.

Boureau Y, Roux N. L, Bach F, Ponce J, LeCun Y (2011) Ask the locals: multi-way local pooling for image recognition, In: International Conference on computer vision, IEEE, pp. 1–8

35.

Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval, In: IEEE Conference on computer vision and pattern recognition, pp. 1–8

36.

Douze M, Jégou H, Schmid C, Pérez P (2010) Compact video description for copy detection with precise temporal alignment, In: European Conference on computer vision, Springer, pp. 522–535

37.

Zhang X, Li Z, Zhang L, Ma W.-Y, Shum H.-Y (2009) Efficient indexing for large scale visual search, In: IEEE 12th International conference on computer vision, pp. 1103–1110

38.

Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput Vision Imag Underst 106(1):59–70CrossRef

39.

Yao B, Jiang X, Khosla A, Lin A. L, Guibas L, Fei-Fei L (2011) Human action recognition by learning bases of action attributes and parts, In: IEEE International Conference on computer vision (ICCV), pp. 1331–1338

40.

Fekriershad S, Tajeripour F (2017) Color texture classification based on proposed impulse-noise resistant color local binary patterns and significant points selection algorithm. Sens Rev 37(1):33–42CrossRef

41.

Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification, In: European conference on computer vision, Springer, pp. 490–503

42.

Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27CrossRef

43.

Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification, In: IEEE Conference on computer vision and pattern recognition, pp. 3360–3367

44.

Zuo Z, Wang G (2014) Learning discriminative hierarchical features for object recognition. Signal Process Lett 21(9):1159–1163CrossRef

45.

Zhu F, Jiang Z, Shao L (2014) Submodular object recognition, In: IEEE Conference on computer vision and pattern recognition, pp. 2457–2464

46.

Long X, Lu H, Peng Y et al (2016) Image classification based on improved VLAD. Multimed Tools Appl 75(10):5533–5555CrossRef

47.

Zhang L, Zhen X, Shao L (2014) Learning object-to-class kernels for scene classification. IEEE Trans Image Process 23(8):3241–3253MathSciNetCrossRef

48.

Wang P, Wang J, Zeng G, Xu W, Zha H, Li S (2013) Supervised kernel descriptors for visual recognition, In: IEEE Conference on computer vision and pattern recognition, pp. 2858–2865

49.

Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition, In: Advances in neural information processing systems, pp. 244–252

50.

Li Q, Peng Q, Yan C (2017) Multiple VLAD encoding of CNNs for image classification. Comput Sci Eng 99:1–8

Titel: A Comprehensive Study on VLAD
verfasst von: Xin Li
Lei Zhang
Zhiping Jian
Liyun Zuo
Publikationsdatum: 03.04.2021
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 3/2021
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-021-10502-0

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Teilzeit/© Fokussiert / stock.adobe.com, Hans-Joachim Lefeld/© Lucht Probst Associates GmbH, Dr. Alexandru Oproiescu/© Dr. Alexandru Oproiescu, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2021

Weighted Discriminative Sparse Representation for Image Classification

A New Method for Separating EMI Signal Based on CEEMDAN and ICA

Related Study Based on Otsu Watershed Algorithm and New Squeeze-and-Excitation Networks for Segmentation and Level Classification of Tea Buds

Analysis of Effect of Weight Variation on SNN Chip with PCM-Refresh Method

An Effective Microscopic Detection Method for Automated Silicon-Substrate Ultra-microtome (ASUM)

Enhanced Non-parametric Sequence-based Learning Algorithm for Outlier Detection in the Internet of Things

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.