Skip to main content
Top
Published in: International Journal of Computer Vision 3/2013

01-12-2013

Image Classification with the Fisher Vector: Theory and Practice

Authors: Jorge Sánchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek

Published in: International Journal of Computer Vision | Issue 3/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K—with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
3
Normalizing by any \(\ell _p\)-norm would cancel-out the effect of \(\omega \). Perronnin et al. (2010c) chose the \(\ell _2\)-norm because it is the natural norm associated with the dot-product. In Sect. 3.2 we experiment with different \(\ell _p\)-norms.
 
4
See Appendix A.2 in the extended version of Jaakkola and Haussler (1998) which is available at: http://​people.​csail.​mit.​edu/​tommi/​papers/​gendisc.​ps
 
5
Xiao et al. (2010) also report results with one training sample per class. However, a single sample does not provide any way to perform cross-validation which is the reason why we do not report results in this setting.
 
7
Actually, any continuous distribution can be approximated with arbitrary precision by a GMM with isotropic covariance matrices.
 
8
Note that since \(q\) draws values in a finite set, we could replace the \(\int \nolimits _q\) by \(\sum _q\) in the following equations but we will keep the integral notation for simplicity.
 
9
While it is standard practice to report per-class accuracy on this dataset (see Deng et al. 2010; Sánchez and Perronnin 2011), Krizhevsky et al. (2012); Le et al. (2012) report a per-image accuracy. This results in a more optimistic number since those classes which are over-represented in the test data also have more training samples and therefore have (on average) a higher accuracy than those classes which are under-represented. This was clarified through a personal correspondence with the first authors of Krizhevsky et al. (2012) and Le et al. (2012).
 
Literature
go back to reference Amari, S., & Nagaoka, H. (2000). Methods of information geometry, translations of mathematical monographs (Vol. 191). Oxford: Oxford University Press. Amari, S., & Nagaoka, H. (2000). Methods of information geometry, translations of mathematical monographs (Vol. 191). Oxford: Oxford University Press.
go back to reference Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In CVPR. Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In CVPR.
go back to reference Bishop, C. (1995). Training with noise is equivalent to tikhonov regularization. In Neural computation (Vol 7). Bishop, C. (1995). Training with noise is equivalent to tikhonov regularization. In Neural computation (Vol 7).
go back to reference Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In NIPS. Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In NIPS.
go back to reference Bo, L., Ren, X., & Fox, D. (2012). Multipath sparse coding using hierarchical matching pursuit. In NIPS workshop on deep learning. Bo, L., Ren, X., & Fox, D. (2012). Multipath sparse coding using hierarchical matching pursuit. In NIPS workshop on deep learning.
go back to reference Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR. Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR.
go back to reference Bottou, L., & Bousquet, O. (2007). The tradeoffs of large scale learning. In NIPS. Bottou, L., & Bousquet, O. (2007). The tradeoffs of large scale learning. In NIPS.
go back to reference Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR. Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.
go back to reference Boureau, Y. L., LeRoux, N., Bach, F., Ponce, J., & LeCun, Y. (2011). Ask the locals: Multi-way local pooling for image recognition. In ICCV. Boureau, Y. L., LeRoux, N., Bach, F., Ponce, J., & LeCun, Y. (2011). Ask the locals: Multi-way local pooling for image recognition. In ICCV.
go back to reference Burrascano, P. (1991). A norm selection criterion for the generalized delta rule. IEEE Transactions on Neural Networks, 2(1), 125–30.CrossRef Burrascano, P. (1991). A norm selection criterion for the generalized delta rule. IEEE Transactions on Neural Networks, 2(1), 125–30.CrossRef
go back to reference Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC.
go back to reference Cinbis, G., Verbeek, J., & Schmid, C. (2012). Image categorization using Fisher kernels of non-iid image models. In CVPR. Cinbis, G., Verbeek, J., & Schmid, C. (2012). Image categorization using Fisher kernels of non-iid image models. In CVPR.
go back to reference Clinchant, S., Csurka, G., Perronnin, F., & Renders, J. M. (2007). XRCEs participation to imageval. In ImageEval workshop at CVIR. Clinchant, S., Csurka, G., Perronnin, F., & Renders, J. M. (2007). XRCEs participation to imageval. In ImageEval workshop at CVIR.
go back to reference Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV SLCV workshop. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV SLCV workshop.
go back to reference Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
go back to reference Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In ECCV. Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In ECCV.
go back to reference Everingham, M., Gool, L.V., Williams, C., Winn, J. & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. Everingham, M., Gool, L.V., Williams, C., Winn, J. & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results.
go back to reference Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results.
go back to reference Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
go back to reference Farquhar, J., Szedmak, S., Meng, H., & Shawe-Taylor, J. (2005). Improving “bag-of-keypoints” image categorisation. Technical report. Southampton: University of Southampton. Farquhar, J., Szedmak, S., Meng, H., & Shawe-Taylor, J. (2005). Improving “bag-of-keypoints” image categorisation. Technical report. Southampton: University of Southampton.
go back to reference Feng, J., Ni, B., Tian, Q., & Yan, S. (2011). Geometric \(\ell _p\)-norm feature pooling for image classification. In CVPR. Feng, J., Ni, B., Tian, Q., & Yan, S. (2011). Geometric \(\ell _p\)-norm feature pooling for image classification. In CVPR.
go back to reference Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV.
go back to reference Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In CVPR. Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In CVPR.
go back to reference Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV.
go back to reference Haussler, D. (1999). Convolution kernels on discrete structures. Technical report. Santa Cruz: UCSC. Haussler, D. (1999). Convolution kernels on discrete structures. Technical report. Santa Cruz: UCSC.
go back to reference Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS. Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS.
go back to reference Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR. Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR.
go back to reference Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR. Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.
go back to reference Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. In IEEE PAMI. Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. In IEEE PAMI.
go back to reference Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.CrossRef Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.CrossRef
go back to reference Krapac, J., Verbeek, J., & Jurie, F. (2011). Modeling spatial layout with fisher vectors for image categorization. In ICCV. Krapac, J., Verbeek, J., & Jurie, F. (2011). Modeling spatial layout with fisher vectors for image categorization. In ICCV.
go back to reference Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Image classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Image classification with deep convolutional neural networks. In NIPS.
go back to reference Kulkarni, N., & Li, B. (2011). Discriminative affine sparse codes for image classification. In CVPR. Kulkarni, N., & Li, B. (2011). Discriminative affine sparse codes for image classification. In CVPR.
go back to reference Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
go back to reference Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., et al. (2012). Building high-level features using large scale unsupervised learning. In ICML. Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., et al. (2012). Building high-level features using large scale unsupervised learning. In ICML.
go back to reference Lin, Y., Lv, F., Zhu, S., Yu, K., Yang, M., & Cour, T. (2011). Large-scale image classification: Fast feature extraction and svm training. In CVPR. Lin, Y., Lv, F., Zhu, S., Yu, K., Yang, M., & Cour, T. (2011). Large-scale image classification: Fast feature extraction and svm training. In CVPR.
go back to reference Liu, Y., & Perronnin, F. (2008). A similarity measure between unordered vector sets with application to image categorization. In CVPR. Liu, Y., & Perronnin, F. (2008). A similarity measure between unordered vector sets with application to image categorization. In CVPR.
go back to reference Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
go back to reference Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR. Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR.
go back to reference Maji, S., & Berg, A. (2009). Max-margin additive classifiers for detection. In ICCV. Maji, S., & Berg, A. (2009). Max-margin additive classifiers for detection. In ICCV.
go back to reference Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In CVPR. Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In CVPR.
go back to reference Mensink, T., Verbeek, J., Csurka, G., & Perronnin, F. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV. Mensink, T., Verbeek, J., Csurka, G., & Perronnin, F. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV.
go back to reference Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR. Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.
go back to reference Perronnin, F., Dance, C., Csurka, G., & Bressan, M. (2006). Adapted vocabularies for generic visual categorization. In ECCV. Perronnin, F., Dance, C., Csurka, G., & Bressan, M. (2006). Adapted vocabularies for generic visual categorization. In ECCV.
go back to reference Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010a). Large-scale image retrieval with compressed Fisher vectors. In CVPR. Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010a). Large-scale image retrieval with compressed Fisher vectors. In CVPR.
go back to reference Perronnin, F., Sánchez, J., & Liu, Y. (2010b). Large-scale image categorization with explicit data embedding. In CVPR. Perronnin, F., Sánchez, J., & Liu, Y. (2010b). Large-scale image categorization with explicit data embedding. In CVPR.
go back to reference Perronnin, F., Sánchez, J., & Mensink, T. (2010c). Improving the Fisher kernel for large-scale image classification. In ECCV. Perronnin, F., Sánchez, J., & Mensink, T. (2010c). Improving the Fisher kernel for large-scale image classification. In ECCV.
go back to reference Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In CVPR. Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In CVPR.
go back to reference Sabin, M., & Gray, R. (1984). Product code vector quantizers for waveform and voice coding. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3), 474–488.CrossRef Sabin, M., & Gray, R. (1984). Product code vector quantizers for waveform and voice coding. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3), 474–488.CrossRef
go back to reference Sánchez, J., & Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In CVPR. Sánchez, J., & Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In CVPR.
go back to reference Sánchez, J., Perronnin, F., & de Campos, T. (2012). Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognition Letters, 33(16), 2216–2223.CrossRef Sánchez, J., Perronnin, F., & de Campos, T. (2012). Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognition Letters, 33(16), 2216–2223.CrossRef
go back to reference Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimate sub-gradient solver for SVM. In ICML. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimate sub-gradient solver for SVM. In ICML.
go back to reference Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.
go back to reference Smith, N., & Gales, M. (2001). Speech recognition using SVMs. In NIPS. Smith, N., & Gales, M. (2001). Speech recognition using SVMs. In NIPS.
go back to reference Spruill, M. (2007). Asymptotic distribution of coordinates on high dimensional spheres. In Electronic communications in probability (Vol. 12). Spruill, M. (2007). Asymptotic distribution of coordinates on high dimensional spheres. In Electronic communications in probability (Vol. 12).
go back to reference Sreekanth, V., Vedaldi, A., Jawahar, C., & Zisserman, A. (2010). Generalized rbf feature maps for efficient detection. In BMVC. Sreekanth, V., Vedaldi, A., Jawahar, C., & Zisserman, A. (2010). Generalized rbf feature maps for efficient detection. In BMVC.
go back to reference Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley.MATH Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley.MATH
go back to reference Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR. Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR.
go back to reference Uijlings, J., Smeulders, A., & Scha, R. (2009). What is the spatial extent of an object? In CVPR. Uijlings, J., Smeulders, A., & Scha, R. (2009). What is the spatial extent of an object? In CVPR.
go back to reference van de Sande, K., Gevers, T., & Snoek, C. (2010). Evaluating color descriptors for object and scene recognition. IEEE PAMI, 32(9), 1582–1596.CrossRef van de Sande, K., Gevers, T., & Snoek, C. (2010). Evaluating color descriptors for object and scene recognition. IEEE PAMI, 32(9), 1582–1596.CrossRef
go back to reference VanGemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. (2010). Visual word ambiguity. In IEEE TPAMI. VanGemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. (2010). Visual word ambiguity. In IEEE TPAMI.
go back to reference Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR. Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR.
go back to reference Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In CVPR. Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In CVPR.
go back to reference Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In ICCV. Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In ICCV.
go back to reference Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr groups using stochastic intersection kernel machines. In ICCV. Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr groups using stochastic intersection kernel machines. In ICCV.
go back to reference Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.
go back to reference Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned visual dictionary. In ICCV. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned visual dictionary. In ICCV.
go back to reference Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR. Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR.
go back to reference Yan, S., Zhou, X., Liu, M., Hasegawa-Johnson, M., & Huang, T. (2008). Regression from patch-kernel. In CVPR. Yan, S., Zhou, X., Liu, M., Hasegawa-Johnson, M., & Huang, T. (2008). Regression from patch-kernel. In CVPR.
go back to reference Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group sensitive multiple kernel learning for object categorization. In ICCV. Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group sensitive multiple kernel learning for object categorization. In ICCV.
go back to reference Yang, J., Yu, K., Gong, Y., & Huang, T. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In CVPR. Yang, J., Yu, K., Gong, Y., & Huang, T. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In CVPR.
go back to reference Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, S., Valtchev V. & Woodland P. (2002). The HTK book (version 3.2.1). Cambridge: Cambridge University Engineering Department. Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, S., Valtchev V. & Woodland P. (2002). The HTK book (version 3.2.1). Cambridge: Cambridge University Engineering Department.
go back to reference Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 123–138.CrossRef Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 123–138.CrossRef
go back to reference Zhou, Z., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV. Zhou, Z., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Metadata
Title
Image Classification with the Fisher Vector: Theory and Practice
Authors
Jorge Sánchez
Florent Perronnin
Thomas Mensink
Jakob Verbeek
Publication date
01-12-2013
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 3/2013
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0636-x

Other articles of this Issue 3/2013

International Journal of Computer Vision 3/2013 Go to the issue

Premium Partner