nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Classification of Human Actions Using 3-D Convolutional Neural Networks: A Hierarchical Approach

verfasst von : Shaival Thakkar, M. V. Joshi

Erschienen in: Computer Vision, Pattern Recognition, Image Processing, and Graphics

Verlag: Springer Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we present a hierarchical approach for human action classification using 3-D Convolutional neural networks (3-D CNN). In general, human actions refer to positioning and movement of hands and legs and hence can be classified based on those performed by hands or by legs or, in some cases, both. This acts as the intuition for our work on hierarchical classification. In this work, we consider the actions as tasks performed by hand or leg movements. Therefore, instead of using a single 3-D CNN for classification of given actions, we use multiple networks to perform the classification hierarchically, that is, we first perform binary classification to separate the hand and leg actions and then use two separate networks for hand and leg actions to perform classification among target action categories. For example, in case of KTH dataset, we train three networks to classify six different actions, comprising of three actions each for hands and legs. The novelty of our approach lies in performing the separation of hand and leg actions first, thus making the subsequent classifiers to accept the features corresponding to either hands or legs only. This leads to better classification accuracy. Also, the use of 3-D CNN enables automatic extraction of features in spatial as well as temporal domain, avoiding the need for hand crafted features. This makes it one of the better approaches when it comes to video classification. We use the KTH, Weizmann and UCF-sports datasets to evaluate our method and comparison with the state of the art methods shows that our approach outperforms most of them.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Visual Odometry Based Omni-directional Hyperlapse

Nächstes Kapitel SmartTennisTV: Automatic Indexing of Tennis Videos

Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRef

Laptev, I., Lindeberg, T.: Space-time interest points. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 432–439, October 2003

Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, Series MM 2007, pp. 357–360. ACM, New York (2007). http://doi.acm.org/10.1145/1291233.1291311

Ravanbakhsh, M., Mousavi, H., Rastegari, M., Murino, V., Davis, L.S.: Action recognition with image based CNN features, CoRR, vol. abs/1512.03980 (2015). http://arxiv.org/abs/1512.03980

Baumann, F.: Action recognition with HOG-OF features. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 243–248. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40602-7_26CrossRef

Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411CrossRefMATH

Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Series COLT 1992, pp. 144–152. ACM, New York (1992). http://doi.acm.org/10.1145/130385.130401

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Series NIPS 2012, pp. 1097–1105. Curran Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257

Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)MATH

10.

Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36, August 2004

11.

Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the 14th International Conference on Computer Communications and Networks, Series ICCCN 2005, Washington, DC, USA, pp. 65–72. IEEE Computer Society (2005). http://dl.acm.org/citation.cfm?id=1259587.1259830

12.

Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008). https://doi.org/10.1007/s11263-007-0122-4CrossRef

13.

Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8, October 2007

14.

Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008

15.

Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176, June 2011

16.

Hao, Z., Lu, L., Zhang, Q., Wu, J., Izquierdo, E., Yang, J., Zhao, J.: Action recognition based on subdivision-fusion model. In: Proceedings of the British Machine Vision Conference (BMVC), pp. 50.1–50.12. BMVA Press, September 2015. https://doi.org/10.5244/C.29.50

17.

Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)CrossRef

18.

Brahnam, S., Nanni, L.: High performance set of features for human action classification (2009)

19.

Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008

20.

Soomro, K., Zamir, A.R.: Action recognition in realistic sports videos. In: Moeslund, T.B., Thomas, G., Hilton, A. (eds.) Computer Vision in Sports. ACVPR, pp. 181–208. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09396-3_9CrossRef

21.

Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Cavallaro, A., Prince, S., Alexander, D. (eds.) British Machine Vision Conference, BMVC 2009, London, United Kingdom, pp. 124.1–124.11. BMVA Press, September 2009. https://hal.inria.fr/inria-00439769

22.

Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2046–2053, June 2010

23.

Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Learning to track for spatio-temporal action localization, CoRR, vol. abs/1506.01929 (2015). http://arxiv.org/abs/1506.01929

Titel: Classification of Human Actions Using 3-D Convolutional Neural Networks: A Hierarchical Approach
verfasst von: Shaival Thakkar
M. V. Joshi
Verlag: Springer Singapore
Buch: Computer Vision, Pattern Recognition, Image Processing, and Graphics
Print ISBN: 978-981-13-0019-6

Electronic ISBN: 978-981-13-0020-2

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-981-13-0020-2_2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"