Skip to main content
Erschienen in: Machine Vision and Applications 7/2013

01.10.2013 | Original Paper

Classifying web videos using a global video descriptor

verfasst von: Berkan Solmaz, Shayan Modiri Assari, Mubarak Shah

Erschienen in: Machine Vision and Applications | Ausgabe 7/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Computing descriptors for videos is a crucial task in computer vision. In this paper, we propose a global video descriptor for classification of videos. Our method, bypasses the detection of interest points, the extraction of local video descriptors and the quantization of descriptors into a code book; it represents each video sequence as a single feature vector. Our global descriptor is computed by applying a bank of 3-D spatio-temporal filters on the frequency spectrum of a video sequence; hence, it integrates the information about the motion and scene structure. We tested our approach on three datasets, KTH (Schuldt et al., Proceedings of the 17th international conference on, pattern recognition (ICPR’04), vol. 3, pp. 32–36, 2004), UCF50 (http://​vision.​eecs.​ucf.​edu/​datasetsActions.​html) and HMDB51 (Kuehne et al., HMDB: a large video database for human motion recognition, 2011), and obtained promising results which demonstrate the robustness and the discriminative power of our global video descriptor for classifying videos of various actions. In addition, the combination of our global descriptor and a local descriptor resulted in the highest classification accuracies on UCF50 and HMDB51 datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on, pattern recognition (ICPR’04), vol. 3, pp. 32–36 (2004) Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on, pattern recognition (ICPR’04), vol. 3, pp. 32–36 (2004)
3.
Zurück zum Zitat Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011) Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
4.
Zurück zum Zitat Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28, 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28, 976–990 (2010)CrossRef
5.
Zurück zum Zitat Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)CrossRef Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)CrossRef
6.
Zurück zum Zitat Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Computer Society conference on computer vision and, pattern recognition (CVPR ’08) (2008) Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Computer Society conference on computer vision and, pattern recognition (CVPR ’08) (2008)
7.
Zurück zum Zitat Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and, pattern recognition (CVPR ’09), pp. 1996–2003 (2009) Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE conference on computer vision and, pattern recognition (CVPR ’09), pp. 1996–2003 (2009)
8.
Zurück zum Zitat Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)CrossRef Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)CrossRef
9.
Zurück zum Zitat Yilmaz, A., Shah, M.: A differential geometric approach to representing the human actions. Comput. Vis. Image Underst. 109, 335–351 (2008)CrossRef Yilmaz, A., Shah, M.: A differential geometric approach to representing the human actions. Comput. Vis. Image Underst. 109, 335–351 (2008)CrossRef
10.
Zurück zum Zitat Black, M.: Explaining optical flow events with parameterized spatio-temporal models. In: IEEE Computer Society conference on computer vision and, pattern recognition (CVPR ’99), vol. 1, pp. 326–332 (1999) Black, M.: Explaining optical flow events with parameterized spatio-temporal models. In: IEEE Computer Society conference on computer vision and, pattern recognition (CVPR ’99), vol. 1, pp. 326–332 (1999)
11.
Zurück zum Zitat Polana, R., Nelson, R.C.: Detection and recognition of periodic, non-rigid motion. Int. J. Comput. Vision 23, 261–282 (1997)CrossRef Polana, R., Nelson, R.C.: Detection and recognition of periodic, non-rigid motion. Int. J. Comput. Vision 23, 261–282 (1997)CrossRef
12.
Zurück zum Zitat Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: IEEE international conference on computer vision (ICCV ’11), pp. 1419–1426 (2011) Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: IEEE international conference on computer vision (ICCV ’11), pp. 1419–1426 (2011)
13.
Zurück zum Zitat Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE conference on computer vision and, pattern recognition (CVPR ’11), pp. 3169–3176 (2011) Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE conference on computer vision and, pattern recognition (CVPR ’11), pp. 3169–3176 (2011)
14.
Zurück zum Zitat Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64, 107–123 (2005)CrossRef Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64, 107–123 (2005)CrossRef
15.
Zurück zum Zitat Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp. 147–151 (1988) Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of fourth Alvey vision conference, pp. 147–151 (1988)
16.
Zurück zum Zitat Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005) Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)
17.
Zurück zum Zitat Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. BMVC, In (2008) Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. BMVC, In (2008)
18.
Zurück zum Zitat Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: ACM multimedia, pp. 357–360 (2007) Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: ACM multimedia, pp. 357–360 (2007)
19.
Zurück zum Zitat Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of the 11th European conference on computer vision (ECCV ’10), pp. 494–507 (2010) Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Proceedings of the 11th European conference on computer vision (ECCV ’10), pp. 494–507 (2010)
20.
Zurück zum Zitat Oliva, A., Torralba, A.B., Guerin-Dugue, A., Herault, J.: Global semantic classification of scenes using power spectrum templates. Challenge of image retrieval, pp. 1–12 (1999) Oliva, A., Torralba, A.B., Guerin-Dugue, A., Herault, J.: Global semantic classification of scenes using power spectrum templates. Challenge of image retrieval, pp. 1–12 (1999)
21.
Zurück zum Zitat Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 145–175 (2001)CrossRefMATH Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 145–175 (2001)CrossRefMATH
23.
Zurück zum Zitat Maaten, L.V.D., Postma, E.O., Herik, H.J.V.D.: Dimensionality reduction: a comparative review (2008) Maaten, L.V.D., Postma, E.O., Herik, H.J.V.D.: Dimensionality reduction: a comparative review (2008)
24.
Zurück zum Zitat Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, p. 127 (2009) Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, p. 127 (2009)
25.
Zurück zum Zitat Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011) Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
26.
Zurück zum Zitat Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision (ICCV’07), pp. 1–8 (2007) Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision (ICCV’07), pp. 1–8 (2007)
27.
Zurück zum Zitat Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33, 883–897 (2011)CrossRef Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33, 883–897 (2011)CrossRef
Metadaten
Titel
Classifying web videos using a global video descriptor
verfasst von
Berkan Solmaz
Shayan Modiri Assari
Mubarak Shah
Publikationsdatum
01.10.2013
Verlag
Springer Berlin Heidelberg
Erschienen in
Machine Vision and Applications / Ausgabe 7/2013
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI
https://doi.org/10.1007/s00138-012-0449-x

Weitere Artikel der Ausgabe 7/2013

Machine Vision and Applications 7/2013 Zur Ausgabe