Top

Neural Computing and Applications

Published in:

30-11-2019 | Emerging Trends of Applied Neural Computation - E_TRAINCO

Spatiotemporal neural networks for action recognition based on joint loss

Authors: Chao Jing, Ping Wei, Hongbin Sun, Nanning Zheng

Published in: Neural Computing and Applications | Issue 9/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Action recognition is a challenging and important problem in a myriad of significant fields, such as intelligent robots and video surveillance. In recent years, deep learning and neural network techniques have been widely applied to action recognition and attained remarkable results. However, it is still a difficult task to recognize actions in complicated scenes, such as various illumination conditions, similar motions, and background noise. In this paper, we present a spatiotemporal neural network model with a joint loss to recognize human actions from videos. This spatiotemporal neural network is comprised of two key connected substructures. The first one is a two-stream-based network extracting optical flow and appearance features from each frame of videos, which characterizes the human actions of videos in spatial dimension. The second substructure is a group of Long Short-Term Memory structures following the spatial network, which describes the temporal and transition information in videos. This research effort presents a joint loss function for training the spatiotemporal neural network model. By introducing the loss function, the action recognition performance is improved. The proposed method was tested with video samples from two challenging datasets. The experiments demonstrate that our approach outperforms the baseline comparison methods.

previous article Deep Bayesian Self-Training

next article Gryphon: a semi-supervised anomaly detection system based on one-class evolving spiking neural network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRef

Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576

Wei P, Zhao Y, Zheng N, Zhu SC (2017) Modeling 4D human-object interactions for joint event segmentation, recognition, and object localization. IEEE Trans Pattern Anal Mach Intell 39(6):1165–1179CrossRef

Shu T, Gao X, Ryoo M, Zhu S-C (2017) Learning social affordance grammar from videos: transferring human interactions to human-robot interactions. In: International conference on robotics and automation (ICRA)

Rezazadegan F, Shirazi S, Upcrofit B, Milford M (2017) Action recognition: from static datasets to moving robots. In: IEEE international conference on robotics and automation (ICRA), pp 3186–3191

Arunnehru J, Kalaiselvi Geetha M (2015) Vision-based human action recognition in surveillance videos using motion projection profile features. In: International conference on mining intelligence and knowledge exploration, pp 460–471

Luo S, Yang H, Wang C, Che X, Meinel C (2016) Action recognition in surveillance video using ConvNets and motion history image. In: International conference on artificial neural networks, pp 187–195

Ordóñez FJ, Roggen D (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115CrossRef

Rahmani H, Mian A, Shah M (2018) Learning a deep model for human action recognition from novel viewpoints. IEEE Trans Pattern Ana Mach Intell 40(3):667–681CrossRef

10.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

11.

Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision

12.

Wang J, Liu Z, Wu Y, Yuan J (2012) Learning actionlet ensemble for 3D human action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1290–1297

13.

Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. http://arxiv.org/abs/1212.0402

14.

Jing C, Wei P, Sun H, Zheng N (2018) Spatial-temporal neural networks for action recognition. In: International conference on artificial intelligence applications and innovations, pp 619–627

15.

Wei P, Zheng N, Zhao Y, Zhu SC (2013) Concurrent action detection with structural prediction. In: International conference on computer vision, pp 3136–3143

16.

Fujiyoshi H, Lipton AJ (1998) Real-time human motion analysis by image skeletonization. In: IEEE workshop on applications of computer vision, pp 15–21

17.

Wei P, Sun H, Zheng N (2019) Learning composite latent structures for 3D human action representation and recognition. IEEE Trans Multimed 21(9):2195–2208CrossRef

18.

Yan S, Xiong Y, Lin D (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence, pp 7444–7452

19.

Zhang S, Xiao J, Liu X, Yi Y, Di X, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM Networks. IEEE Trans Multimed 20(9):2330–2343CrossRef

20.

Papenberg N, Bruhn A, Brox T, Didas S, Weickert J (2006) Highly accurate optic flow computation with theoretically justified warping. Int J Comput Vis 67(2):141–158CrossRef

21.

Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition, pp 1932–1939

22.

Lowe DG (1999) Object recognition from local scale-invariant features. In: International conference on computer vision

23.

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, pp 886–893

24.

Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: The British machine vision conference

25.

Sch C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: International conference on pattern recognition, pp 32–36

26.

Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE conference on computer vision and pattern recognition, pp 3169–3176

27.

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH

28.

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH

29.

Xing H, Zhang G, Shang M (2016) Deeplearning. Int J Semant Comput 10(3):417–439CrossRef

30.

Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRef

31.

Mora SV, Knottenbelt WJ (2017) Deep learning for domain-specific action recognition in tennis. In: Computer vision and pattern recognition workshops, pp 170–178

32.

Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deepconvolutional descriptors. In: IEEE conference on computer vision and pattern recognition, pp 4305–4314

33.

Husain F, Dellen B, Torras C (2016) Action recognition based on efficient deep feature learning in the spatio-temporal domain. IEEE Robot Autom Lett 1(2):984–991CrossRef

34.

Li C, Sun S, Min X, Lin W, Nie B, Zhang X (2017) End-to-end learning of deep convolutional neural network for 3D human action recognition. In: IEEE international conference on multimedia and expo workshops, pp 609–612

35.

Karpathy A, Toderici G, Shetty S (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition, pp 1725–1732

36.

Li C, Sun S, Min X (2017) End-to-end learning of deep convolutional neural network for 3D human action recognition. In: IEEE international conference on multimedia and expo workshops, pp 609–612

37.

Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517CrossRef

38.

Ng YH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: IEEE international conference on computer vision and pattern recognition, pp 4694–4702

39.

Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, pp 677–691

40.

Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding, pp 29–39

41.

Graves A (2012) Supervised sequence labelling with recurrent neural networks. Springer, BerlinCrossRef

42.

Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Luc VG (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision

43.

Barbu A, Bridge A, Burchill Z, Coroian D, Dickinson S, Fidler S, Michaux A, Mussman S, Narayanaswamy S, Salvi D, Schmidt L, Shangguan J, Siskind JM, Waggoner J, Wang S, Wei J, Yin Y, Zhang Z (2012) Video in sentences out. In: The conference on uncertainty in artificial intelligence, pp 102–112

44.

Yuan ZW, Zhang J (2016) Feature extraction and image retrieval based on AlexNet. In: Eighth international conference on digital image processing

45.

Fischer P, Dosovitskiy A, Ilg E, Häusser P, Hazırbaş C, Golkov V (2015) Flownet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision

46.

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675–678

47.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

48.

Müller M, Röder T (2006) Motion templates for automatic classification and retrieval of motion capture data. In: ACM SIGGRAPH/EUROGRAPHICS symposium on computer animation SCA 2006 Vienna Austria September, pp 137–146

49.

Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927CrossRef

50.

Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2015) Deep convolutional neural networks for action recognition using depth map sequences. Comput Sci. arXiv:1501.04686v1

Title: Spatiotemporal neural networks for action recognition based on joint loss
Authors: Chao Jing
Ping Wei
Hongbin Sun
Nanning Zheng
Publication date: 30-11-2019
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 9/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-019-04615-w

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 9/2020

An integrated particle swarm optimization approach hybridizing a new self-adaptive particle swarm optimization with a modified differential evolution

Toward cognitive support for automated defect detection

A tree-BLSTM-based recognition system for online handwritten mathematical expressions

Analysis and design of genetic algorithm-based cascade control strategy for improving the dynamic performance of interleaved DC–DC SEPIC PFC converter

Finite-time synchronization of delayed memristive neural networks via 1-norm-based analytical approach

Multiple-attribute group decision making for interval-valued intuitionistic fuzzy sets based on expert reliability and the evidential reasoning rule

Premium Partner