nach oben

Neural Computing and Applications

Erschienen in:

30.11.2019 | Emerging Trends of Applied Neural Computation - E_TRAINCO

Spatiotemporal neural networks for action recognition based on joint loss

verfasst von: Chao Jing, Ping Wei, Hongbin Sun, Nanning Zheng

Erschienen in: Neural Computing and Applications | Ausgabe 9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Action recognition is a challenging and important problem in a myriad of significant fields, such as intelligent robots and video surveillance. In recent years, deep learning and neural network techniques have been widely applied to action recognition and attained remarkable results. However, it is still a difficult task to recognize actions in complicated scenes, such as various illumination conditions, similar motions, and background noise. In this paper, we present a spatiotemporal neural network model with a joint loss to recognize human actions from videos. This spatiotemporal neural network is comprised of two key connected substructures. The first one is a two-stream-based network extracting optical flow and appearance features from each frame of videos, which characterizes the human actions of videos in spatial dimension. The second substructure is a group of Long Short-Term Memory structures following the spatial network, which describes the temporal and transition information in videos. This research effort presents a joint loss function for training the spatiotemporal neural network model. By introducing the loss function, the action recognition performance is improved. The proposed method was tested with video samples from two challenging datasets. The experiments demonstrate that our approach outperforms the baseline comparison methods.

Vorheriger Artikel Deep Bayesian Self-Training

Nächster Artikel Gryphon: a semi-supervised anomaly detection system based on one-class evolving spiking neural network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRef

Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576

Wei P, Zhao Y, Zheng N, Zhu SC (2017) Modeling 4D human-object interactions for joint event segmentation, recognition, and object localization. IEEE Trans Pattern Anal Mach Intell 39(6):1165–1179CrossRef

Shu T, Gao X, Ryoo M, Zhu S-C (2017) Learning social affordance grammar from videos: transferring human interactions to human-robot interactions. In: International conference on robotics and automation (ICRA)

Rezazadegan F, Shirazi S, Upcrofit B, Milford M (2017) Action recognition: from static datasets to moving robots. In: IEEE international conference on robotics and automation (ICRA), pp 3186–3191

Arunnehru J, Kalaiselvi Geetha M (2015) Vision-based human action recognition in surveillance videos using motion projection profile features. In: International conference on mining intelligence and knowledge exploration, pp 460–471

Luo S, Yang H, Wang C, Che X, Meinel C (2016) Action recognition in surveillance video using ConvNets and motion history image. In: International conference on artificial neural networks, pp 187–195

Ordóñez FJ, Roggen D (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115CrossRef

Rahmani H, Mian A, Shah M (2018) Learning a deep model for human action recognition from novel viewpoints. IEEE Trans Pattern Ana Mach Intell 40(3):667–681CrossRef

10.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

11.

Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision

12.

Wang J, Liu Z, Wu Y, Yuan J (2012) Learning actionlet ensemble for 3D human action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1290–1297

13.

Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. http://arxiv.org/abs/1212.0402

14.

Jing C, Wei P, Sun H, Zheng N (2018) Spatial-temporal neural networks for action recognition. In: International conference on artificial intelligence applications and innovations, pp 619–627

15.

Wei P, Zheng N, Zhao Y, Zhu SC (2013) Concurrent action detection with structural prediction. In: International conference on computer vision, pp 3136–3143

16.

Fujiyoshi H, Lipton AJ (1998) Real-time human motion analysis by image skeletonization. In: IEEE workshop on applications of computer vision, pp 15–21

17.

Wei P, Sun H, Zheng N (2019) Learning composite latent structures for 3D human action representation and recognition. IEEE Trans Multimed 21(9):2195–2208CrossRef

18.

Yan S, Xiong Y, Lin D (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI conference on artificial intelligence, pp 7444–7452

19.

Zhang S, Xiao J, Liu X, Yi Y, Di X, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM Networks. IEEE Trans Multimed 20(9):2330–2343CrossRef

20.

Papenberg N, Bruhn A, Brox T, Didas S, Weickert J (2006) Highly accurate optic flow computation with theoretically justified warping. Int J Comput Vis 67(2):141–158CrossRef

21.

Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE conference on computer vision and pattern recognition, pp 1932–1939

22.

Lowe DG (1999) Object recognition from local scale-invariant features. In: International conference on computer vision

23.

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, pp 886–893

24.

Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: The British machine vision conference

25.

Sch C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: International conference on pattern recognition, pp 32–36

26.

Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: IEEE conference on computer vision and pattern recognition, pp 3169–3176

27.

Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH

28.

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH

29.

Xing H, Zhang G, Shang M (2016) Deeplearning. Int J Semant Comput 10(3):417–439CrossRef

30.

Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRef

31.

Mora SV, Knottenbelt WJ (2017) Deep learning for domain-specific action recognition in tennis. In: Computer vision and pattern recognition workshops, pp 170–178

32.

Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deepconvolutional descriptors. In: IEEE conference on computer vision and pattern recognition, pp 4305–4314

33.

Husain F, Dellen B, Torras C (2016) Action recognition based on efficient deep feature learning in the spatio-temporal domain. IEEE Robot Autom Lett 1(2):984–991CrossRef

34.

Li C, Sun S, Min X, Lin W, Nie B, Zhang X (2017) End-to-end learning of deep convolutional neural network for 3D human action recognition. In: IEEE international conference on multimedia and expo workshops, pp 609–612

35.

Karpathy A, Toderici G, Shetty S (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition, pp 1725–1732

36.

Li C, Sun S, Min X (2017) End-to-end learning of deep convolutional neural network for 3D human action recognition. In: IEEE international conference on multimedia and expo workshops, pp 609–612

37.

Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517CrossRef

38.

Ng YH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: IEEE international conference on computer vision and pattern recognition, pp 4694–4702

39.

Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, pp 677–691

40.

Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding, pp 29–39

41.

Graves A (2012) Supervised sequence labelling with recurrent neural networks. Springer, BerlinCrossRef

42.

Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Luc VG (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision

43.

Barbu A, Bridge A, Burchill Z, Coroian D, Dickinson S, Fidler S, Michaux A, Mussman S, Narayanaswamy S, Salvi D, Schmidt L, Shangguan J, Siskind JM, Waggoner J, Wang S, Wei J, Yin Y, Zhang Z (2012) Video in sentences out. In: The conference on uncertainty in artificial intelligence, pp 102–112

44.

Yuan ZW, Zhang J (2016) Feature extraction and image retrieval based on AlexNet. In: Eighth international conference on digital image processing

45.

Fischer P, Dosovitskiy A, Ilg E, Häusser P, Hazırbaş C, Golkov V (2015) Flownet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision

46.

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675–678

47.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

48.

Müller M, Röder T (2006) Motion templates for automatic classification and retrieval of motion capture data. In: ACM SIGGRAPH/EUROGRAPHICS symposium on computer animation SCA 2006 Vienna Austria September, pp 137–146

49.

Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927CrossRef

50.

Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2015) Deep convolutional neural networks for action recognition using depth map sequences. Comput Sci. arXiv:1501.04686v1

Titel: Spatiotemporal neural networks for action recognition based on joint loss
verfasst von: Chao Jing
Ping Wei
Hongbin Sun
Nanning Zheng
Publikationsdatum: 30.11.2019
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 9/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-019-04615-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 9/2020

Text synthesis from keywords: a comparison of recurrent-neural-network-based architectures and hybrid approaches

Quasi-synchronization of stochastic memristor-based neural networks with mixed delays and parameter mismatches

Novel approaches to one-directional two-dimensional principal component analysis in hybrid pattern framework

Selection of mine development scheme based on similarity measure under fuzzy environment

Continuous drone control using deep reinforcement learning for frontal view person shooting

A distributed Newton–Raphson-based coordination algorithm for multi-agent optimization with discrete-time communication