Skip to main content
Erschienen in: International Journal of Computer Vision 3/2013

01.12.2013

Image Classification with the Fisher Vector: Theory and Practice

verfasst von: Jorge Sánchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K—with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
3
Normalizing by any \(\ell _p\)-norm would cancel-out the effect of \(\omega \). Perronnin et al. (2010c) chose the \(\ell _2\)-norm because it is the natural norm associated with the dot-product. In Sect. 3.2 we experiment with different \(\ell _p\)-norms.
 
4
See Appendix A.2 in the extended version of Jaakkola and Haussler (1998) which is available at: http://​people.​csail.​mit.​edu/​tommi/​papers/​gendisc.​ps
 
5
Xiao et al. (2010) also report results with one training sample per class. However, a single sample does not provide any way to perform cross-validation which is the reason why we do not report results in this setting.
 
7
Actually, any continuous distribution can be approximated with arbitrary precision by a GMM with isotropic covariance matrices.
 
8
Note that since \(q\) draws values in a finite set, we could replace the \(\int \nolimits _q\) by \(\sum _q\) in the following equations but we will keep the integral notation for simplicity.
 
9
While it is standard practice to report per-class accuracy on this dataset (see Deng et al. 2010; Sánchez and Perronnin 2011), Krizhevsky et al. (2012); Le et al. (2012) report a per-image accuracy. This results in a more optimistic number since those classes which are over-represented in the test data also have more training samples and therefore have (on average) a higher accuracy than those classes which are under-represented. This was clarified through a personal correspondence with the first authors of Krizhevsky et al. (2012) and Le et al. (2012).
 
Literatur
Zurück zum Zitat Amari, S., & Nagaoka, H. (2000). Methods of information geometry, translations of mathematical monographs (Vol. 191). Oxford: Oxford University Press. Amari, S., & Nagaoka, H. (2000). Methods of information geometry, translations of mathematical monographs (Vol. 191). Oxford: Oxford University Press.
Zurück zum Zitat Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In CVPR. Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In CVPR.
Zurück zum Zitat Bishop, C. (1995). Training with noise is equivalent to tikhonov regularization. In Neural computation (Vol 7). Bishop, C. (1995). Training with noise is equivalent to tikhonov regularization. In Neural computation (Vol 7).
Zurück zum Zitat Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In NIPS. Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In NIPS.
Zurück zum Zitat Bo, L., Ren, X., & Fox, D. (2012). Multipath sparse coding using hierarchical matching pursuit. In NIPS workshop on deep learning. Bo, L., Ren, X., & Fox, D. (2012). Multipath sparse coding using hierarchical matching pursuit. In NIPS workshop on deep learning.
Zurück zum Zitat Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR. Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR.
Zurück zum Zitat Bottou, L., & Bousquet, O. (2007). The tradeoffs of large scale learning. In NIPS. Bottou, L., & Bousquet, O. (2007). The tradeoffs of large scale learning. In NIPS.
Zurück zum Zitat Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR. Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.
Zurück zum Zitat Boureau, Y. L., LeRoux, N., Bach, F., Ponce, J., & LeCun, Y. (2011). Ask the locals: Multi-way local pooling for image recognition. In ICCV. Boureau, Y. L., LeRoux, N., Bach, F., Ponce, J., & LeCun, Y. (2011). Ask the locals: Multi-way local pooling for image recognition. In ICCV.
Zurück zum Zitat Burrascano, P. (1991). A norm selection criterion for the generalized delta rule. IEEE Transactions on Neural Networks, 2(1), 125–30.CrossRef Burrascano, P. (1991). A norm selection criterion for the generalized delta rule. IEEE Transactions on Neural Networks, 2(1), 125–30.CrossRef
Zurück zum Zitat Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC.
Zurück zum Zitat Cinbis, G., Verbeek, J., & Schmid, C. (2012). Image categorization using Fisher kernels of non-iid image models. In CVPR. Cinbis, G., Verbeek, J., & Schmid, C. (2012). Image categorization using Fisher kernels of non-iid image models. In CVPR.
Zurück zum Zitat Clinchant, S., Csurka, G., Perronnin, F., & Renders, J. M. (2007). XRCEs participation to imageval. In ImageEval workshop at CVIR. Clinchant, S., Csurka, G., Perronnin, F., & Renders, J. M. (2007). XRCEs participation to imageval. In ImageEval workshop at CVIR.
Zurück zum Zitat Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV SLCV workshop. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV SLCV workshop.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
Zurück zum Zitat Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In ECCV. Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. In ECCV.
Zurück zum Zitat Everingham, M., Gool, L.V., Williams, C., Winn, J. & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. Everingham, M., Gool, L.V., Williams, C., Winn, J. & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results.
Zurück zum Zitat Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results.
Zurück zum Zitat Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. Everingham, M., van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Zurück zum Zitat Farquhar, J., Szedmak, S., Meng, H., & Shawe-Taylor, J. (2005). Improving “bag-of-keypoints” image categorisation. Technical report. Southampton: University of Southampton. Farquhar, J., Szedmak, S., Meng, H., & Shawe-Taylor, J. (2005). Improving “bag-of-keypoints” image categorisation. Technical report. Southampton: University of Southampton.
Zurück zum Zitat Feng, J., Ni, B., Tian, Q., & Yan, S. (2011). Geometric \(\ell _p\)-norm feature pooling for image classification. In CVPR. Feng, J., Ni, B., Tian, Q., & Yan, S. (2011). Geometric \(\ell _p\)-norm feature pooling for image classification. In CVPR.
Zurück zum Zitat Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV. Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In ICCV.
Zurück zum Zitat Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In CVPR. Guillaumin, M., Verbeek, J., & Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In CVPR.
Zurück zum Zitat Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV.
Zurück zum Zitat Haussler, D. (1999). Convolution kernels on discrete structures. Technical report. Santa Cruz: UCSC. Haussler, D. (1999). Convolution kernels on discrete structures. Technical report. Santa Cruz: UCSC.
Zurück zum Zitat Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS. Jaakkola, T., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS.
Zurück zum Zitat Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR. Jégou, H., Douze, M., & Schmid, C. (2009). On the burstiness of visual elements. In CVPR.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR. Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.
Zurück zum Zitat Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. In IEEE PAMI. Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. In IEEE PAMI.
Zurück zum Zitat Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.CrossRef Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1704–1716.CrossRef
Zurück zum Zitat Krapac, J., Verbeek, J., & Jurie, F. (2011). Modeling spatial layout with fisher vectors for image categorization. In ICCV. Krapac, J., Verbeek, J., & Jurie, F. (2011). Modeling spatial layout with fisher vectors for image categorization. In ICCV.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Image classification with deep convolutional neural networks. In NIPS. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Image classification with deep convolutional neural networks. In NIPS.
Zurück zum Zitat Kulkarni, N., & Li, B. (2011). Discriminative affine sparse codes for image classification. In CVPR. Kulkarni, N., & Li, B. (2011). Discriminative affine sparse codes for image classification. In CVPR.
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
Zurück zum Zitat Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., et al. (2012). Building high-level features using large scale unsupervised learning. In ICML. Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., et al. (2012). Building high-level features using large scale unsupervised learning. In ICML.
Zurück zum Zitat Lin, Y., Lv, F., Zhu, S., Yu, K., Yang, M., & Cour, T. (2011). Large-scale image classification: Fast feature extraction and svm training. In CVPR. Lin, Y., Lv, F., Zhu, S., Yu, K., Yang, M., & Cour, T. (2011). Large-scale image classification: Fast feature extraction and svm training. In CVPR.
Zurück zum Zitat Liu, Y., & Perronnin, F. (2008). A similarity measure between unordered vector sets with application to image categorization. In CVPR. Liu, Y., & Perronnin, F. (2008). A similarity measure between unordered vector sets with application to image categorization. In CVPR.
Zurück zum Zitat Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR. Lyu, S. (2005). Mercer kernels for object recognition with local features. In CVPR.
Zurück zum Zitat Maji, S., & Berg, A. (2009). Max-margin additive classifiers for detection. In ICCV. Maji, S., & Berg, A. (2009). Max-margin additive classifiers for detection. In ICCV.
Zurück zum Zitat Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In CVPR. Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In CVPR.
Zurück zum Zitat Mensink, T., Verbeek, J., Csurka, G., & Perronnin, F. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV. Mensink, T., Verbeek, J., Csurka, G., & Perronnin, F. (2012). Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV.
Zurück zum Zitat Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR. Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.
Zurück zum Zitat Perronnin, F., Dance, C., Csurka, G., & Bressan, M. (2006). Adapted vocabularies for generic visual categorization. In ECCV. Perronnin, F., Dance, C., Csurka, G., & Bressan, M. (2006). Adapted vocabularies for generic visual categorization. In ECCV.
Zurück zum Zitat Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010a). Large-scale image retrieval with compressed Fisher vectors. In CVPR. Perronnin, F., Liu, Y., Sánchez, J., & Poirier, H. (2010a). Large-scale image retrieval with compressed Fisher vectors. In CVPR.
Zurück zum Zitat Perronnin, F., Sánchez, J., & Liu, Y. (2010b). Large-scale image categorization with explicit data embedding. In CVPR. Perronnin, F., Sánchez, J., & Liu, Y. (2010b). Large-scale image categorization with explicit data embedding. In CVPR.
Zurück zum Zitat Perronnin, F., Sánchez, J., & Mensink, T. (2010c). Improving the Fisher kernel for large-scale image classification. In ECCV. Perronnin, F., Sánchez, J., & Mensink, T. (2010c). Improving the Fisher kernel for large-scale image classification. In ECCV.
Zurück zum Zitat Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In CVPR. Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In CVPR.
Zurück zum Zitat Sabin, M., & Gray, R. (1984). Product code vector quantizers for waveform and voice coding. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3), 474–488.CrossRef Sabin, M., & Gray, R. (1984). Product code vector quantizers for waveform and voice coding. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3), 474–488.CrossRef
Zurück zum Zitat Sánchez, J., & Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In CVPR. Sánchez, J., & Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In CVPR.
Zurück zum Zitat Sánchez, J., Perronnin, F., & de Campos, T. (2012). Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognition Letters, 33(16), 2216–2223.CrossRef Sánchez, J., Perronnin, F., & de Campos, T. (2012). Modeling the spatial layout of images beyond spatial pyramids. Pattern Recognition Letters, 33(16), 2216–2223.CrossRef
Zurück zum Zitat Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimate sub-gradient solver for SVM. In ICML. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimate sub-gradient solver for SVM. In ICML.
Zurück zum Zitat Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.
Zurück zum Zitat Smith, N., & Gales, M. (2001). Speech recognition using SVMs. In NIPS. Smith, N., & Gales, M. (2001). Speech recognition using SVMs. In NIPS.
Zurück zum Zitat Spruill, M. (2007). Asymptotic distribution of coordinates on high dimensional spheres. In Electronic communications in probability (Vol. 12). Spruill, M. (2007). Asymptotic distribution of coordinates on high dimensional spheres. In Electronic communications in probability (Vol. 12).
Zurück zum Zitat Sreekanth, V., Vedaldi, A., Jawahar, C., & Zisserman, A. (2010). Generalized rbf feature maps for efficient detection. In BMVC. Sreekanth, V., Vedaldi, A., Jawahar, C., & Zisserman, A. (2010). Generalized rbf feature maps for efficient detection. In BMVC.
Zurück zum Zitat Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley.MATH Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley.MATH
Zurück zum Zitat Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR. Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In CVPR.
Zurück zum Zitat Uijlings, J., Smeulders, A., & Scha, R. (2009). What is the spatial extent of an object? In CVPR. Uijlings, J., Smeulders, A., & Scha, R. (2009). What is the spatial extent of an object? In CVPR.
Zurück zum Zitat van de Sande, K., Gevers, T., & Snoek, C. (2010). Evaluating color descriptors for object and scene recognition. IEEE PAMI, 32(9), 1582–1596.CrossRef van de Sande, K., Gevers, T., & Snoek, C. (2010). Evaluating color descriptors for object and scene recognition. IEEE PAMI, 32(9), 1582–1596.CrossRef
Zurück zum Zitat VanGemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. (2010). Visual word ambiguity. In IEEE TPAMI. VanGemert, J., Veenman, C., Smeulders, A., & Geusebroek, J. (2010). Visual word ambiguity. In IEEE TPAMI.
Zurück zum Zitat Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR. Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In CVPR.
Zurück zum Zitat Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In CVPR. Vedaldi, A., & Zisserman, A. (2012). Sparse kernel approximations for efficient classification and detection. In CVPR.
Zurück zum Zitat Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In ICCV. Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In ICCV.
Zurück zum Zitat Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr groups using stochastic intersection kernel machines. In ICCV. Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr groups using stochastic intersection kernel machines. In ICCV.
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.
Zurück zum Zitat Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned visual dictionary. In ICCV. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned visual dictionary. In ICCV.
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR. Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR.
Zurück zum Zitat Yan, S., Zhou, X., Liu, M., Hasegawa-Johnson, M., & Huang, T. (2008). Regression from patch-kernel. In CVPR. Yan, S., Zhou, X., Liu, M., Hasegawa-Johnson, M., & Huang, T. (2008). Regression from patch-kernel. In CVPR.
Zurück zum Zitat Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group sensitive multiple kernel learning for object categorization. In ICCV. Yang, J., Li, Y., Tian, Y., Duan, L., & Gao, W. (2009). Group sensitive multiple kernel learning for object categorization. In ICCV.
Zurück zum Zitat Yang, J., Yu, K., Gong, Y., & Huang, T. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In CVPR. Yang, J., Yu, K., Gong, Y., & Huang, T. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In CVPR.
Zurück zum Zitat Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, S., Valtchev V. & Woodland P. (2002). The HTK book (version 3.2.1). Cambridge: Cambridge University Engineering Department. Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, S., Valtchev V. & Woodland P. (2002). The HTK book (version 3.2.1). Cambridge: Cambridge University Engineering Department.
Zurück zum Zitat Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 123–138.CrossRef Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 123–138.CrossRef
Zurück zum Zitat Zhou, Z., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV. Zhou, Z., Yu, K., Zhang, T., & Huang, T. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Metadaten
Titel
Image Classification with the Fisher Vector: Theory and Practice
verfasst von
Jorge Sánchez
Florent Perronnin
Thomas Mensink
Jakob Verbeek
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2013
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0636-x

Weitere Artikel der Ausgabe 3/2013

International Journal of Computer Vision 3/2013 Zur Ausgabe