Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 1/2015

01.03.2015 | Regular Paper

Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off

verfasst von: J. Uijlings, I. C. Duta, E. Sangineto, Nicu Sebe

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The current state-of-the-art in video classification is based on Bag-of-Words using local visual descriptors. Most commonly these are histogram of oriented gradients (HOG), histogram of optical flow (HOF) and motion boundary histograms (MBH) descriptors. While such approach is very powerful for classification, it is also computationally expensive. This paper addresses the problem of computational efficiency. Specifically: (1) We propose several speed-ups for densely sampled HOG, HOF and MBH descriptors and release Matlab code; (2) We investigate the trade-off between accuracy and computational efficiency of descriptors in terms of frame sampling rate and type of Optical Flow method; (3) We investigate the trade-off between accuracy and computational efficiency for computing the feature vocabulary, using and comparing most of the commonly adopted vector quantization techniques: \(k\)-means, hierarchical \(k\)-means, Random Forests, Fisher Vectors and VLAD.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: CVPR Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: CVPR
2.
Zurück zum Zitat Baker S, Scharstein D, Lewis JP, Roth S, Black MJ, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92:1–31 Baker S, Scharstein D, Lewis JP, Roth S, Black MJ, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92:1–31
3.
Zurück zum Zitat Bay H, Ess A, Tuytelaars T, Van L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359CrossRef Bay H, Ess A, Tuytelaars T, Van L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359CrossRef
5.
Zurück zum Zitat Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: ECCV, pp 25–36 Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: ECCV, pp 25–36
6.
Zurück zum Zitat Brox T, Malik J (2011) Large displacement optical flow: descriptor matching in variational motion estimation. PAMI 33(3):500–513CrossRef Brox T, Malik J (2011) Large displacement optical flow: descriptor matching in variational motion estimation. PAMI 33(3):500–513CrossRef
7.
Zurück zum Zitat Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: ECCV Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: ECCV
9.
Zurück zum Zitat Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC
10.
Zurück zum Zitat Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV international workshop on statistical learning in computer vision, Prague Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV international workshop on statistical learning in computer vision, Prague
11.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR
12.
Zurück zum Zitat Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: ECCV Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: ECCV
13.
Zurück zum Zitat Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS
14.
Zurück zum Zitat Everts I, van Gemert J, Gevers T (2013) Evaluation of color STIPs for human action recognition. In: CVPR Everts I, van Gemert J, Gevers T (2013) Evaluation of color STIPs for human action recognition. In: CVPR
15.
Zurück zum Zitat Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis
16.
Zurück zum Zitat Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42CrossRefMATH Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42CrossRefMATH
17.
Zurück zum Zitat Horn B, Schunck B (1981) Determining optical flow. Artif Intell 17:185–203CrossRef Horn B, Schunck B (1981) Determining optical flow. Artif Intell 17:185–203CrossRef
18.
Zurück zum Zitat Jaakkola T, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: NIPS Jaakkola T, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: NIPS
19.
Zurück zum Zitat Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR, pp 3304–3311 Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR, pp 3304–3311
20.
Zurück zum Zitat Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: ICCV Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: ICCV
21.
Zurück zum Zitat Karaman S, Seidenari L, Bagdanov A, del Bimbo A (2013) L1-regularized logistic regression stacking and transductive CRF smoothing for action recognition in video. In: ICCV workshop on action recognition with a large number of classes Karaman S, Seidenari L, Bagdanov A, del Bimbo A (2013) L1-regularized logistic regression stacking and transductive CRF smoothing for action recognition in video. In: ICCV workshop on action recognition with a large number of classes
22.
Zurück zum Zitat Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC
23.
Zurück zum Zitat Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (2012) Motion interchange patterns for action recognition in unconstrained videos. In: ECCV Kliper-Gross O, Gurovich Y, Hassner T, Wolf L (2012) Motion interchange patterns for action recognition in unconstrained videos. In: ECCV
24.
Zurück zum Zitat Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: ICCV Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: ICCV
25.
Zurück zum Zitat Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR
26.
Zurück zum Zitat Lazebnik S, Schmid C, Ponce J (2006) Spatial pyramid matching for recognizing natural scene categories. In: CVPR. Beyond Bags of Features Lazebnik S, Schmid C, Ponce J (2006) Spatial pyramid matching for recognizing natural scene categories. In: CVPR. Beyond Bags of Features
27.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60:91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60:91–110CrossRef
28.
Zurück zum Zitat Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: International joint conference on artificial intelligence
29.
Zurück zum Zitat Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: CVPR Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: CVPR
30.
Zurück zum Zitat Moosmann F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal Mach Intell 9:1632–1646CrossRef Moosmann F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal Mach Intell 9:1632–1646CrossRef
31.
Zurück zum Zitat Perronnin F, Sanchez J, Mensink T (2010) Improving the Fisher kernel for large-scale image classification. In: ECCV Perronnin F, Sanchez J, Mensink T (2010) Improving the Fisher kernel for large-scale image classification. In: ECCV
32.
Zurück zum Zitat Reddy K, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981 Reddy K, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981
33.
Zurück zum Zitat Sánchez J, Perronnin F, Mensink T, Verbeek JJ (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245CrossRefMATHMathSciNet Sánchez J, Perronnin F, Mensink T, Verbeek JJ (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245CrossRefMATHMathSciNet
34.
Zurück zum Zitat Sangineto E (2013) Pose and expression independent facial landmark localization using dense-SURF and the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 35(3):624–638 Sangineto E (2013) Pose and expression independent facial landmark localization using dense-SURF and the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 35(3):624–638
35.
Zurück zum Zitat Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: ICIP Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: ICIP
36.
Zurück zum Zitat Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM MM Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM MM
37.
Zurück zum Zitat Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: ICCV Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: ICCV
38.
Zurück zum Zitat Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVID. In: ACM SIGMM international workshop on multimedia information retrieval (MIR) Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVID. In: ACM SIGMM international workshop on multimedia information retrieval (MIR)
39.
Zurück zum Zitat Snoek CGM, Worring M, Gemert J, Geusebroek J, Smeulders A (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM MM Snoek CGM, Worring M, Gemert J, Geusebroek J, Smeulders A (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM MM
40.
Zurück zum Zitat Solmaz B, Assari SM, Shah M (2013) Classifying web videos using a global video descriptor. Mach Vis Appl 24(7):1473–1485 Solmaz B, Assari SM, Shah M (2013) Classifying web videos using a global video descriptor. Mach Vis Appl 24(7):1473–1485
41.
Zurück zum Zitat Sun D, Roth S, Black M (2014) A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int J Comput Vis 106:115–137 Sun D, Roth S, Black M (2014) A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int J Comput Vis 106:115–137
42.
Zurück zum Zitat Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimed 12(7):665–681 Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimed 12(7):665–681
43.
Zurück zum Zitat Vedaldi A, Fulkerson B (2010) VLFeat—an open and portable library of computer vision algorithms. In: ACM MM Vedaldi A, Fulkerson B (2010) VLFeat—an open and portable library of computer vision algorithms. In: ACM MM
44.
Zurück zum Zitat Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proc CVPR 1:511–518 Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proc CVPR 1:511–518
45.
Zurück zum Zitat Wang H, Kläser A, Schmid C, Liu C (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79CrossRefMathSciNet Wang H, Kläser A, Schmid C, Liu C (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79CrossRefMathSciNet
46.
Zurück zum Zitat Wang H, Ullah M, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC Wang H, Ullah M, Kläser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC
Metadaten
Titel
Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off
verfasst von
J. Uijlings
I. C. Duta
E. Sangineto
Nicu Sebe
Publikationsdatum
01.03.2015
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 1/2015
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-014-0069-5

Weitere Artikel der Ausgabe 1/2015

International Journal of Multimedia Information Retrieval 1/2015 Zur Ausgabe