Skip to main content

2014 | OriginalPaper | Buchkapitel

2. Bag-of-Words Image Representation: Key Ideas and Further Insight

verfasst von : Marc T. Law, Nicolas Thome, Matthieu Cord

Erschienen in: Fusion in Computer Vision

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the context of object and scene recognition, state-of-the-art performances are obtained with visual Bag-of-Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g., Scale-Invariant Feature Transform (SIFT)). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification. In this chapter, we study in detail the different components of the BoW model in the context of image classification. Particularly, we focus on the coding and pooling steps and investigate the impact of the main parameters of the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid-level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme that exploits the specificities of edge-based descriptors. Low and high contrast regions are pooled separately and combined to provide a powerful representation of images. We study the impact on classification performance of the contrast threshold that determines whether a SIFT descriptor corresponds to a low contrast region or a high contrast region. Successful experiments are provided on the Caltech-101 and Scene-15 datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Chatfield et al. [8] report that their re-implementation of Zhou et al. [51] performs 6 % below the published results. From personal communication with the authors of Zhou et al.  [51], the results reported in Chatfield et al. [8] are representative of the method performances, without including non trivial modifications not discussed in the chapter.
 
3
In the provided source codes for evaluation, the sampling is sometimes set to lower values: e.g., \(6\) pixels in http://​www.​ifp.​illinois.​edu/​~jyang29/​ScSPM.​htm for Liu et al. [31] or http://​users.​cecs.​anu.​edu.​au/​~lingqiao/​ for Liu et al. [31]. Compared to the value of \(8\) pixels, the performances decrease of about 1–2 %, making some reported results in published papers over-estimated.
 
4
Note that from personal communication with the authors, we discover that the performances of 74 % in Liu et al. [31] in the Caltech-101 dataset have been obtained with a wrong evaluation metric. The level of performances that can be obtained with the setup depicted in Liu et al. [31] is about 70 % (see Sect. 2.5). However, the conclusion regarding the relative performances of LSC with respect to sparse coding remains valid.
 
5
Available on Svetlana Lazebnik’s professional homepage: http://​www.​cs.​illinois.​edu/​homes/​slazebni/​.
 
Literatur
1.
Zurück zum Zitat Avila S, Thome N, Cord M, Valle E, de Araujo A (2011) Bossa: extended bow formalism for image classification. In: Proceedings of the IEEE international conference on image processing (ICIP) Avila S, Thome N, Cord M, Valle E, de Araujo A (2011) Bossa: extended bow formalism for image classification. In: Proceedings of the IEEE international conference on image processing (ICIP)
2.
Zurück zum Zitat Bach FR, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the twenty-first international conference on machine learning (ICML) Bach FR, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the twenty-first international conference on machine learning (ICML)
3.
Zurück zum Zitat Bay H, Ess A, Tuytelaars T, van Gool L (2008) SURF: speeded Up robust features. Comput Vis Image Underst (CVIU) 110(3):346–359CrossRef Bay H, Ess A, Tuytelaars T, van Gool L (2008) SURF: speeded Up robust features. Comput Vis Image Underst (CVIU) 110(3):346–359CrossRef
4.
Zurück zum Zitat Benois-Pineau J, Bugeau A, Karaman S, Mégret R (2012) Spatial and multi-resolution context in visual indexing. In: Visual Indexing and Retrieval, pp 41–63 Benois-Pineau J, Bugeau A, Karaman S, Mégret R (2012) Spatial and multi-resolution context in visual indexing. In: Visual Indexing and Retrieval, pp 41–63
5.
Zurück zum Zitat Boureau Y-L, Bach, F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Boureau Y-L, Bach, F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
6.
Zurück zum Zitat Boureau Y-L, Le Roux N, Bach F, Ponce J, LeCun Y (2011) Ask the locals: multi-way local pooling for image recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV) Boureau Y-L, Le Roux N, Bach F, Ponce J, LeCun Y (2011) Ask the locals: multi-way local pooling for image recognition. In: Proceedings of the IEEE international conference on computer vision (ICCV)
7.
Zurück zum Zitat Boureau Y-L, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in vision algorithms. In: Proceedings of the international conference on machine learning (ICML) Boureau Y-L, Ponce J, LeCun Y (2010) A theoretical analysis of feature pooling in vision algorithms. In: Proceedings of the international conference on machine learning (ICML)
8.
Zurück zum Zitat Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British machine vision conference (BMVC) Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the British machine vision conference (BMVC)
9.
Zurück zum Zitat Coates A, Ng A (2011) The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the 28th international conference on machine learning (ICML) Coates A, Ng A (2011) The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the 28th international conference on machine learning (ICML)
10.
Zurück zum Zitat Cord M, Cunningham P (2008) Machine learning techniques for multimedia: case studies on organization and retrieval. Machine learning techniques for multimedia, cognitive technologies. Springer, Heidelberg Cord M, Cunningham P (2008) Machine learning techniques for multimedia: case studies on organization and retrieval. Machine learning techniques for multimedia, cognitive technologies. Springer, Heidelberg
11.
Zurück zum Zitat Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
12.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
13.
Zurück zum Zitat Duchenne O, Joulin A, Ponce J (2011) A graph-matching kernel for object categorization. In: Proceedings of the IEEE international conference on computer vision (ICCV) Duchenne O, Joulin A, Ponce J (2011) A graph-matching kernel for object categorization. In: Proceedings of the IEEE international conference on computer vision (ICCV)
14.
Zurück zum Zitat Everingham M, Zisserman A, Williams C, Van Gool L (2007) The PASCAL visual obiect classes challenge 2007 (VOC2007) results. Technical Report, Pascal Challenge Everingham M, Zisserman A, Williams C, Van Gool L (2007) The PASCAL visual obiect classes challenge 2007 (VOC2007) results. Technical Report, Pascal Challenge
15.
Zurück zum Zitat Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res (JMLR) 9:1871–1874 Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res (JMLR) 9:1871–1874
16.
Zurück zum Zitat de Avila Fontes SE, Thome N, Cord M, Valle E, de Albuquerque Arajo A (2013) Pooling in image representation: The visual codeword point of view. Comp Vis Image Underst 117(5):453–465 de Avila Fontes SE, Thome N, Cord M, Valle E, de Albuquerque Arajo A (2013) Pooling in image representation: The visual codeword point of view. Comp Vis Image Underst 117(5):453–465
17.
Zurück zum Zitat Fei-fei L (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Fei-fei L (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
18.
Zurück zum Zitat Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshop on GMBV Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshop on GMBV
19.
Zurück zum Zitat Feng J, Ni B, Tian Q, Yan S (2011) Geometric \(\ell _p\)-norm feature pooling for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Feng J, Ni B, Tian Q, Yan S (2011) Geometric \(\ell _p\)-norm feature pooling for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
20.
Zurück zum Zitat Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: Proceedings of the IEEE international conference on computer vision (ICCV) Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: Proceedings of the IEEE international conference on computer vision (ICCV)
21.
Zurück zum Zitat van Gemert J, Veenman C, Smeulders A, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(7):1271–1283 van Gemert J, Veenman C, Smeulders A, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(7):1271–1283
22.
Zurück zum Zitat Goh H, Thome N, Cord M, Lim J-H (2012) Unsupervised and supervised visual codes with restricted Boltzmann machines. In: Proceedings of the European conference on computer vision (ECCV) Goh H, Thome N, Cord M, Lim J-H (2012) Unsupervised and supervised visual codes with restricted Boltzmann machines. In: Proceedings of the European conference on computer vision (ECCV)
23.
Zurück zum Zitat González-Díaz I, Buso V, Benois-Pineau J, Bourmaud G, Megret R (2013) Modeling instrumental activities of daily livinf in egocentric vision as sequences of active objects and context for Alzheimer disease research. In: ACM multimedia workshop on multimedia information indexing and retrieval for healthcare González-Díaz I, Buso V, Benois-Pineau J, Bourmaud G, Megret R (2013) Modeling instrumental activities of daily livinf in egocentric vision as sequences of active objects and context for Alzheimer disease research. In: ACM multimedia workshop on multimedia information indexing and retrieval for healthcare
24.
Zurück zum Zitat Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the IEEE international conference on computer vision (ICCV) Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of the IEEE international conference on computer vision (ICCV)
25.
Zurück zum Zitat Harris S, Stephens M (1988) A combined corner and edge detector. In: Proceedings of the 4th Alvey vision conference, pp 147–151 Harris S, Stephens M (1988) A combined corner and edge detector. In: Proceedings of the 4th Alvey vision conference, pp 147–151
26.
Zurück zum Zitat Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
27.
Zurück zum Zitat Karaman S, Benois-Pineau J, Mgret R, Bugeau A (2012) Multi-layer local graph words for object recognition. In: Proceedings of the international conference on multimedia modeling Karaman S, Benois-Pineau J, Mgret R, Bugeau A (2012) Multi-layer local graph words for object recognition. In: Proceedings of the international conference on multimedia modeling
28.
Zurück zum Zitat Kavukcuoglu K, Sermanet P, Boureau Y-L, Gregor K, Mathieu M, LeCun Y (2010) Learning convolutional feature hierachies for visual recognition. In: Proceedings of advances in neural information processing systems (NIPS), pp 1090–1098 Kavukcuoglu K, Sermanet P, Boureau Y-L, Gregor K, Mathieu M, LeCun Y (2010) Learning convolutional feature hierachies for visual recognition. In: Proceedings of advances in neural information processing systems (NIPS), pp 1090–1098
29.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of advances in neural information processing systems (NIPS), pp. 1106–1114 Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of advances in neural information processing systems (NIPS), pp. 1106–1114
30.
Zurück zum Zitat Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
31.
Zurück zum Zitat Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: Proceedings of the IEEE international conference on computer vision (ICCV) Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: Proceedings of the IEEE international conference on computer vision (ICCV)
32.
Zurück zum Zitat Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV) 60:91–110CrossRef Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV) 60:91–110CrossRef
33.
Zurück zum Zitat Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis (IJCV) 60(1):63–86CrossRef Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis (IJCV) 60(1):63–86CrossRef
34.
Zurück zum Zitat Mironica I, Uijlings J, Rostamzadeh N, Ionescu B, Sebe N (2013) Time matters! capturing variation in time in video using fisher kernels. In: Proceedings of the 21st ACM international conference on multimedia Mironica I, Uijlings J, Rostamzadeh N, Ionescu B, Sebe N (2013) Time matters! capturing variation in time in video using fisher kernels. In: Proceedings of the 21st ACM international conference on multimedia
35.
Zurück zum Zitat Perronnin F, Dance CR (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR) Perronnin F, Dance CR (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR)
36.
Zurück zum Zitat Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the European conference on computer vision (ECCV) Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of the European conference on computer vision (ECCV)
37.
Zurück zum Zitat Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29:411–426CrossRef Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29:411–426CrossRef
38.
Zurück zum Zitat Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
39.
Zurück zum Zitat Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE international conference on computer vision (ICCV) Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the IEEE international conference on computer vision (ICCV)
40.
Zurück zum Zitat Smith JR, Chang S-F (1997) VisualSEEk: a fully automated content-based image query system. In: Proceedings of the fourth ACM international conference on Multimedia, ACM, pp 87–98 Smith JR, Chang S-F (1997) VisualSEEk: a fully automated content-based image query system. In: Proceedings of the fourth ACM international conference on Multimedia, ACM, pp 87–98
41.
Zurück zum Zitat Snoek C, Worring M, Hauptmann A (2006) Learning rich semantics from news video archives by style analysis. ACM Transa Multimedia Comput Commun Appl (TOMCCAP) 2(2):91–108 Snoek C, Worring M, Hauptmann A (2006) Learning rich semantics from news video archives by style analysis. ACM Transa Multimedia Comput Commun Appl (TOMCCAP) 2(2):91–108
42.
Zurück zum Zitat Thériault C, Thome N, Cord M (2013) Extended coding and pooling in the HMAX model. IEEE Trans Image Process 22(2):764–777CrossRefMathSciNet Thériault C, Thome N, Cord M (2013) Extended coding and pooling in the HMAX model. IEEE Trans Image Process 22(2):764–777CrossRefMathSciNet
43.
Zurück zum Zitat van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(9):1582–1596CrossRef van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell (TPAMI) 32(9):1582–1596CrossRef
45.
Zurück zum Zitat Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV) Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
46.
Zurück zum Zitat Vedaldi A, Zisserman A (2011) Efficient additive kernels via explicit feature maps. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34:480–492 Vedaldi A, Zisserman A (2011) Efficient additive kernels via explicit feature maps. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34:480–492
47.
Zurück zum Zitat Vig E, Dorr, M, Cox DD (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Proceedings of the European conference on computer vision (ECCV) Vig E, Dorr, M, Cox DD (2012) Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Proceedings of the European conference on computer vision (ECCV)
48.
Zurück zum Zitat Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
49.
Zurück zum Zitat Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
50.
Zurück zum Zitat Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
51.
Zurück zum Zitat Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of the european conference on computer vision (ECCV) Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of the european conference on computer vision (ECCV)
Metadaten
Titel
Bag-of-Words Image Representation: Key Ideas and Further Insight
verfasst von
Marc T. Law
Nicolas Thome
Matthieu Cord
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-05696-8_2