Skip to main content
Erschienen in: Machine Vision and Applications 7/2014

01.10.2014 | Original Paper

Realistic human action recognition by Fast HOG3D and self-organization feature map

verfasst von: Nijun Li, Xu Cheng, Suofei Zhang, Zhenyang Wu

Erschienen in: Machine Vision and Applications | Ausgabe 7/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, local features are very popular in vision-based human action recognition, especially in “wild” or unconstrained videos. This paper proposes a novel framework that combines Fast HOG3D and self-organization feature map (SOM) network for action recognition from unconstrained videos, bypassing the demanding preprocessing such as human detection, tracking or contour extraction. The contributions of our work not only lie in creating a more compact and computational effective local feature descriptor than original HOG3D, but also lie in first successfully applying SOM to realistic action recognition task and studying its training parameters’ influence. We mainly test our approach on the UCF-YouTube dataset with 11 realistic sport actions, achieving promising results that outperform local feature-based support vector machine and are comparable with bag-of-words. Experiments are also carried out on KTH and UT-Interaction datasets for comparison. Results on all the three datasets confirm that our work has comparable, if not better, performance comparing with state-of-the-art.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRef
2.
Zurück zum Zitat Turaga, P., Chellappa, R.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)CrossRef Turaga, P., Chellappa, R.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)CrossRef
3.
Zurück zum Zitat Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. (CVIU) 117(6), 633–659 (2013)CrossRef Chaquet, J.M., Carmona, E.J., Caballero, A.F.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. (CVIU) 117(6), 633–659 (2013)CrossRef
4.
Zurück zum Zitat Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. Proc. CVPR 2, 1709–1718 (2006) Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. Proc. CVPR 2, 1709–1718 (2006)
6.
Zurück zum Zitat Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005) Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings of the IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
7.
Zurück zum Zitat Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. Proc. ECCV 3954, 490–503 (2006) Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. Proc. ECCV 3954, 490–503 (2006)
8.
Zurück zum Zitat Yao, A., Gall, J., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: Proceedings of CVPR, pp. 2061–2068 (2010) Yao, A., Gall, J., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: Proceedings of CVPR, pp. 2061–2068 (2010)
9.
Zurück zum Zitat Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British Machine Vision Conference (BMVC), pp. 995–1004 (2008) Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of British Machine Vision Conference (BMVC), pp. 995–1004 (2008)
10.
Zurück zum Zitat Ji, Yanli, Shimada, A., Taniguchi, R.: Human action recognition by SOM considering the probability of spatio-temporal features. Neural Inf. Process. Models Appl. 6444, 391–398 (2010)CrossRef Ji, Yanli, Shimada, A., Taniguchi, R.: Human action recognition by SOM considering the probability of spatio-temporal features. Neural Inf. Process. Models Appl. 6444, 391–398 (2010)CrossRef
11.
Zurück zum Zitat Ilonen, J., Kamarainen, J.K.: Object categorization using self-organization over visual appearance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 4549–4553 (2006) Ilonen, J., Kamarainen, J.K.: Object categorization using self-organization over visual appearance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 4549–4553 (2006)
12.
Zurück zum Zitat Huang, W., Wu, Q.M.J.: Human action recognition based on self-organizing map. In: Proceedings of ICASSP, pp. 2130–2133 (2010) Huang, W., Wu, Q.M.J.: Human action recognition based on self-organizing map. In: Proceedings of ICASSP, pp. 2130–2133 (2010)
13.
Zurück zum Zitat Jin, S., Li, Y., Lu, G.-M., et al.: SOM-based hand gesture recognition for virtual interactions. In: Proceedings of the IEEE International Symposium on Virtual Reality Innovation (ISVRI), pp. 317–322 (2011) Jin, S., Li, Y., Lu, G.-M., et al.: SOM-based hand gesture recognition for virtual interactions. In: Proceedings of the IEEE International Symposium on Virtual Reality Innovation (ISVRI), pp. 317–322 (2011)
14.
Zurück zum Zitat Shimada, A., Taniguchi, R.i.: Gesture recognition using sparse code of hierarchical SOM. In: Proceedings of ICPR, pp. 1–4 (2008) Shimada, A., Taniguchi, R.i.: Gesture recognition using sparse code of hierarchical SOM. In: Proceedings of ICPR, pp. 1–4 (2008)
15.
Zurück zum Zitat Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Proc. ICCV 2, 1395–1402 (2005) Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Proc. ICCV 2, 1395–1402 (2005)
16.
Zurück zum Zitat Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. ICPR 3, 32–36 (2004) Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. ICPR 3, 32–36 (2004)
17.
Zurück zum Zitat Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of CVPR, pp. 1996–2003 (2009) Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of CVPR, pp. 1996–2003 (2009)
18.
Zurück zum Zitat Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of ICCV, pp. 1593–1600 (2009) Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of ICCV, pp. 1593–1600 (2009)
19.
Zurück zum Zitat Cohen, I., Li, H.: Inference of human postures by classification of 3D human body shape. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp. 74–81 (2003) Cohen, I., Li, H.: Inference of human postures by classification of 3D human body shape. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp. 74–81 (2003)
20.
Zurück zum Zitat Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. Proc. CVPR 1, 144–149 (2005) Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. Proc. CVPR 1, 144–149 (2005)
21.
Zurück zum Zitat Kellokumpu, V., Pietikäinen, M., Heikkilä, J.: Human activity recognition using sequences of postures. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 570–573 (2005) Kellokumpu, V., Pietikäinen, M., Heikkilä, J.: Human activity recognition using sequences of postures. In: Proceedings of IAPR Conference on Machine Vision Applications, pp. 570–573 (2005)
22.
Zurück zum Zitat Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: Proceedings of CVPR, pp. 1–8 (2007) Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: Proceedings of CVPR, pp. 1–8 (2007)
23.
Zurück zum Zitat Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proceedings of CVPR, pp. 1–8 (2007) Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proceedings of CVPR, pp. 1–8 (2007)
24.
Zurück zum Zitat Abdelkader, M.F., Almageed, W.A., Srivastava, A., Chellappa, R.: Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds. Comput. Vis. Image Underst. (CVIU) 115(3), 439–455 (2011)CrossRef Abdelkader, M.F., Almageed, W.A., Srivastava, A., Chellappa, R.: Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds. Comput. Vis. Image Underst. (CVIU) 115(3), 439–455 (2011)CrossRef
25.
Zurück zum Zitat Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. Proc. CVPR 1, 984–989 (2005) Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. Proc. CVPR 1, 984–989 (2005)
26.
Zurück zum Zitat Achard, C., Qu, X., Mokhber, A., Milgram, M.: A novel approach for recognition of human actions with semi-global features. Mach. Vis. Appl. 19, 27–34 (2008)CrossRef Achard, C., Qu, X., Mokhber, A., Milgram, M.: A novel approach for recognition of human actions with semi-global features. Mach. Vis. Appl. 19, 27–34 (2008)CrossRef
27.
Zurück zum Zitat Grundmann, M., Meier, F., Essa, I.: 3D shape context and distance transform for action recognition. In: Proceedings of ICPR, pp. 1–4 (2008) Grundmann, M., Meier, F., Essa, I.: 3D shape context and distance transform for action recognition. In: Proceedings of ICPR, pp. 1–4 (2008)
28.
Zurück zum Zitat Laptev, I.: On space-time interest points. IJCV 64(2/3), 107–123 (2005)CrossRef Laptev, I.: On space-time interest points. IJCV 64(2/3), 107–123 (2005)CrossRef
29.
Zurück zum Zitat Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: action recognition through the motion analysis of tracked features. In: Proceedings of ICCV Workshops, pp. 514–521 (2009) Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: action recognition through the motion analysis of tracked features. In: Proceedings of ICCV Workshops, pp. 514–521 (2009)
30.
Zurück zum Zitat Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of ICCV, pp. 104–111 (2009) Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of ICCV, pp. 104–111 (2009)
31.
Zurück zum Zitat Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of ECCV, pp. 577–590 (2010) Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: Proceedings of ECCV, pp. 577–590 (2010)
32.
Zurück zum Zitat Wang, H., Klaser, A., Schmid, C.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)MathSciNetCrossRef Wang, H., Klaser, A., Schmid, C.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)MathSciNetCrossRef
33.
Zurück zum Zitat Niebles, J.C., Wang, Hongcheng, Fei-Fei, Li: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)CrossRef Niebles, J.C., Wang, Hongcheng, Fei-Fei, Li: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)CrossRef
34.
Zurück zum Zitat Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR, pp. 1–8 (2008) Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR, pp. 1–8 (2008)
35.
Zurück zum Zitat Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM International Conference on Multimedia, pp. 357–360 (2007) Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM International Conference on Multimedia, pp. 357–360 (2007)
36.
Zurück zum Zitat Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Proceedings of ECCV, pp. 508–521 (2010) Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Proceedings of ECCV, pp. 508–521 (2010)
37.
Zurück zum Zitat Zhang, Y., Liu, X., Chang, M.C., et al.: Spatio-temporal phrases for activity recognition. Proc. ECCV 7574, 707–721 (2012) Zhang, Y., Liu, X., Chang, M.C., et al.: Spatio-temporal phrases for activity recognition. Proc. ECCV 7574, 707–721 (2012)
38.
Zurück zum Zitat Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: Proceedings of CVPR, pp. 1–8 (2008) Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: Proceedings of CVPR, pp. 1–8 (2008)
39.
Zurück zum Zitat Etemad, S.A., Arya, A.: 3D human action recognition and style transformation using resilient backpropagation neural networks. Proc. Intell. Comput. Intell. Syst. (ICIS) 4, 296–301 (2009) Etemad, S.A., Arya, A.: 3D human action recognition and style transformation using resilient backpropagation neural networks. Proc. Intell. Comput. Intell. Syst. (ICIS) 4, 296–301 (2009)
40.
Zurück zum Zitat Li, N., Cheng, X., Zhang, S., Wu, Z.: Recognizing human actions by BP-AdaBoost algorithm under a hierarchical framework. In: Proceedings of ICASSP, pp. 3407–3411 (2013) Li, N., Cheng, X., Zhang, S., Wu, Z.: Recognizing human actions by BP-AdaBoost algorithm under a hierarchical framework. In: Proceedings of ICASSP, pp. 3407–3411 (2013)
41.
Zurück zum Zitat Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001) Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)
42.
Zurück zum Zitat Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: Proceedings of CVPR, pp. 872–879 (2009) Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: Proceedings of CVPR, pp. 872–879 (2009)
43.
Zurück zum Zitat Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef
44.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
45.
Zurück zum Zitat Gong, Shaogang, Xiang, Tao: Recognition of group activities using dynamic probabilistic networks. Proc. ICCV 2, 742–749 (2003) Gong, Shaogang, Xiang, Tao: Recognition of group activities using dynamic probabilistic networks. Proc. ICCV 2, 742–749 (2003)
46.
Zurück zum Zitat Ryoo, M.S., Chen, C.C., Aggarwal J.K., et al.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 270–285 (2010) Ryoo, M.S., Chen, C.C., Aggarwal J.K., et al.: An overview of contest on semantic description of human activities (SDHA) 2010. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 270–285 (2010)
47.
Zurück zum Zitat Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Pte Ltd., Singapore (2010) Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Pte Ltd., Singapore (2010)
48.
Zurück zum Zitat Boberg, J., Salakoski, T.: General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances. Pattern Recognit. 26(9), 1395–1406 (1993)CrossRef Boberg, J., Salakoski, T.: General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances. Pattern Recognit. 26(9), 1395–1406 (1993)CrossRef
49.
Zurück zum Zitat Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: Proceedings of CVPR, pp. 461–468 (2009) Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: Proceedings of CVPR, pp. 461–468 (2009)
50.
Zurück zum Zitat Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proceedings of CVPR, pp. 1–8 (2008) Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proceedings of CVPR, pp. 1–8 (2008)
51.
Zurück zum Zitat Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492–497 (2009) Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492–497 (2009)
52.
Zurück zum Zitat Imtiaz, H., Mahbub, U., Ahad, M.A.R.: Action recognition algorithm based on optical flow and RANSAC in frequency domain. In: Proceedings of SICE Annual Conference, pp. 1627–1631 (2011) Imtiaz, H., Mahbub, U., Ahad, M.A.R.: Action recognition algorithm based on optical flow and RANSAC in frequency domain. In: Proceedings of SICE Annual Conference, pp. 1627–1631 (2011)
53.
Zurück zum Zitat Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a Hough-voting action recognition system. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 306–312 (2010) Waltisberg, D., Yao, A., Gall, J., Van Gool, L.: Variations of a Hough-voting action recognition system. In: Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, pp. 306–312 (2010)
54.
Zurück zum Zitat Zhen, X., Shao, L.: A local descriptor based on Laplacian pyramid coding for action recognition. Pattern Recognit. Lett. (PRL) 34(15), 1899–1905 (2013)CrossRef Zhen, X., Shao, L.: A local descriptor based on Laplacian pyramid coding for action recognition. Pattern Recognit. Lett. (PRL) 34(15), 1899–1905 (2013)CrossRef
55.
Zurück zum Zitat Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by ‘dominating pose doublet’. In: Proceedings of the Machine Vision and Applications, pp. 1–20 (2013) Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by ‘dominating pose doublet’. In: Proceedings of the Machine Vision and Applications, pp. 1–20 (2013)
Metadaten
Titel
Realistic human action recognition by Fast HOG3D and self-organization feature map
verfasst von
Nijun Li
Xu Cheng
Suofei Zhang
Zhenyang Wu
Publikationsdatum
01.10.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Machine Vision and Applications / Ausgabe 7/2014
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI
https://doi.org/10.1007/s00138-014-0639-9

Weitere Artikel der Ausgabe 7/2014

Machine Vision and Applications 7/2014 Zur Ausgabe

Premium Partner