Skip to main content
Erschienen in: Neural Processing Letters 2/2020

30.07.2020

A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement

verfasst von: Zhimin Gao, Pichao Wang, Huogen Wang, Mingliang Xu, Wanqing Li

Erschienen in: Neural Processing Letters | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

RGB-D based action recognition is attracting more and more attention in both the research and industrial communities. However, due to the lack of training data, pre-training based methods are popular in this field. This paper presents a review of the concept of dynamic maps for RGB-D based human motion recognition using pretrained models in image domain. The dynamic maps recursively encode the spatial, temporal and structural information contained in the video sequence into dynamic motion images simultaneously. They enable the usage of Convolutional Neural Network and its pretained models on ImageNet for 3D human motion recognition. This simple, compact and effective representation achieves state-of-the-art results on various gesture/action/activities recognition datasets. Based on the review of previous methods using this concept upon different modalities (depth, skeleton or RGB-D data), a novel encoding scheme is developed and presented in this paper. The improved method generates effective flow-guided dynamic maps, and they could select the high motion window and distinguish the order among the frames with small motion. The improved flow-guided dynamic maps achieve state-of-the-art results on the large Chalearn LAP IsoGD and NTU RGB+D datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: CVPR Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: CVPR
2.
Zurück zum Zitat Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: ICIP, pp 168–172 Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: ICIP, pp 168–172
3.
Zurück zum Zitat Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634 Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634
4.
Zurück zum Zitat Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp 1110–1118 Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp 1110–1118
5.
Zurück zum Zitat Duan J, Wan J, Zhou S, Guo X, Li S (2017) A unified framework for multi-modal isolated gesture recognition. In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM),(under review, round 2) Duan J, Wan J, Zhou S, Guo X, Li S (2017) A unified framework for multi-modal isolated gesture recognition. In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM),(under review, round 2)
6.
Zurück zum Zitat Fothergill S, Mentis HM, Nowozin S, Kohli P (2012) Instructing people for training gestural interactive systems. In: ACM HCI Fothergill S, Mentis HM, Nowozin S, Kohli P (2012) Instructing people for training gestural interactive systems. In: ACM HCI
7.
Zurück zum Zitat Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra based action recognition using convolutional neural networks. In: TCSVT, pp 1–5 Hou Y, Li Z, Wang P, Li W (2016) Skeleton optical spectra based action recognition using convolutional neural networks. In: TCSVT, pp 1–5
8.
Zurück zum Zitat Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2462–2470 Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2462–2470
9.
Zurück zum Zitat Jayaraman D, Grauman K (2016) Slow and steady feature analysis: higher order temporal coherence in video. In: CVPR Jayaraman D, Grauman K (2016) Slow and steady feature analysis: higher order temporal coherence in video. In: CVPR
10.
Zurück zum Zitat Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. TPAMI 35(1):221–231CrossRef Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. TPAMI 35(1):221–231CrossRef
11.
Zurück zum Zitat Ji X, Cheng J, Tao D, Wu X, Feng W (2017) The spatial laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl-Based Syst 122:64–74CrossRef Ji X, Cheng J, Tao D, Wu X, Feng W (2017) The spatial laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl-Based Syst 122:64–74CrossRef
12.
Zurück zum Zitat Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628CrossRef Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628CrossRef
13.
Zurück zum Zitat Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPRW, pp 9–14 Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPRW, pp 9–14
14.
Zurück zum Zitat Liu AA, Xu N, Nie WZ, Su YT, Wong Y, Kankanhalli M (2016a) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. TCYB Liu AA, Xu N, Nie WZ, Su YT, Wong Y, Kankanhalli M (2016a) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. TCYB
15.
Zurück zum Zitat Liu J, Shahroudy A, Xu D, Wang G (2016b) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: ECCV, pp 816–833 Liu J, Shahroudy A, Xu D, Wang G (2016b) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: ECCV, pp 816–833
16.
Zurück zum Zitat Liu M, Liu H, Chen C (2017) 3d action recognition using multiscale energy-based global ternary image. IEEE Trans Circuits Syst Video Technol 28(8):1824–1838MathSciNetCrossRef Liu M, Liu H, Chen C (2017) 3d action recognition using multiscale energy-based global ternary image. IEEE Trans Circuits Syst Video Technol 28(8):1824–1838MathSciNetCrossRef
17.
Zurück zum Zitat Lu C, Jia J, Tang CK (2014) Range-sample depth feature for action recognition. In: CVPR, pp 772–779 Lu C, Jia J, Tang CK (2014) Range-sample depth feature for action recognition. In: CVPR, pp 772–779
18.
Zurück zum Zitat Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp 716–723 Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp 716–723
19.
Zurück zum Zitat Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+ D: A large scale dataset for 3D human activity analysis. In: CVPR Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+ D: A large scale dataset for 3D human activity analysis. In: CVPR
21.
Zurück zum Zitat Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR, pp 1297–1304 Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR, pp 1297–1304
22.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
23.
Zurück zum Zitat Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: ICML, pp 843–852 Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: ICML, pp 843–852
24.
Zurück zum Zitat Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp 4489–4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp 4489–4497
25.
Zurück zum Zitat Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: ICCV, pp 4041–4049 Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: ICCV, pp 4041–4049
26.
Zurück zum Zitat Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: CVPR, pp 588–595 Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: CVPR, pp 588–595
27.
Zurück zum Zitat Wan J, Guo G, Li SZ (2016a) Explore efficient local features from RGB-D data for one-shot learning gesture recognition. TPAMI 38(8):1626–1639CrossRef Wan J, Guo G, Li SZ (2016a) Explore efficient local features from RGB-D data for one-shot learning gesture recognition. TPAMI 38(8):1626–1639CrossRef
28.
Zurück zum Zitat Wan J, Li SZ, Zhao Y, Zhou S, Guyon I, Escalera S (2016b) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: CVPRW, pp 1–9 Wan J, Li SZ, Zhao Y, Zhou S, Guyon I, Escalera S (2016b) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: CVPRW, pp 1–9
29.
Zurück zum Zitat Wang H, Wang P, Song Z, Li W (2017a) Large-scale multimodal gesture recognition using heterogeneous networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3129–3137 Wang H, Wang P, Song Z, Li W (2017a) Large-scale multimodal gesture recognition using heterogeneous networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3129–3137
30.
Zurück zum Zitat Wang H, Wang P, Song Z, Li W (2017b) Large-scale multimodal gesture segmentation and recognition based on convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3138–3146 Wang H, Wang P, Song Z, Li W (2017b) Large-scale multimodal gesture segmentation and recognition based on convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3138–3146
31.
Zurück zum Zitat Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp 1290–1297 Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp 1290–1297
32.
Zurück zum Zitat Wang P, Li W, Ogunbona P, Gao Z, Zhang H (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: DICTA, pp 1–8 Wang P, Li W, Ogunbona P, Gao Z, Zhang H (2014) Mining mid-level features for action recognition based on effective skeleton representation. In: DICTA, pp 1–8
33.
Zurück zum Zitat Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona PO (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122 Wang P, Li W, Gao Z, Tang C, Zhang J, Ogunbona PO (2015) Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACM MM, pp 1119–1122
34.
Zurück zum Zitat Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016a) Action recognition from depth maps using deep convolutional neural networks. THMS 46(4):498–509 Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona P (2016a) Action recognition from depth maps using deep convolutional neural networks. THMS 46(4):498–509
35.
Zurück zum Zitat Wang P, Li W, Liu S, Gao Z, Tang C, Ogunbona P (2016b) Large-scale isolated gesture recognition using convolutional neural networks. In: Pattern recognition (ICPR), 2016 23rd international conference on, IEEE, pp 7–12 Wang P, Li W, Liu S, Gao Z, Tang C, Ogunbona P (2016b) Large-scale isolated gesture recognition using convolutional neural networks. In: Pattern recognition (ICPR), 2016 23rd international conference on, IEEE, pp 7–12
36.
Zurück zum Zitat Wang P, Li Z, Hou Y, Li W (2016c) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp 102–106 Wang P, Li Z, Hou Y, Li W (2016c) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM MM, pp 102–106
37.
Zurück zum Zitat Wang P, Li W, Gao Z, Zhang Y, Tang C, Ogunbona P (2017c) Scene flow to action map: A new representation for rgb-d based action recognition with convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR) Wang P, Li W, Gao Z, Zhang Y, Tang C, Ogunbona P (2017c) Scene flow to action map: A new representation for rgb-d based action recognition with convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR)
38.
Zurück zum Zitat Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based large-scale 3-d action recognition with convolutional neural networks. IEEE Trans Multimed 20(5):1051–1061CrossRef Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based large-scale 3-d action recognition with convolutional neural networks. IEEE Trans Multimed 20(5):1051–1061CrossRef
39.
Zurück zum Zitat Xia L, Chen CC, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp 20–27 Xia L, Chen CC, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: CVPRW, pp 20–27
40.
Zurück zum Zitat Xiao Y, Chen J, Wang Y, Cao Z, Zhou JT, Bai X (2019) Action recognition for depth video using multi-view dynamic images. Inform Sci 480:287–304CrossRef Xiao Y, Chen J, Wang Y, Cao Z, Zhou JT, Bai X (2019) Action recognition for depth video using multi-view dynamic images. Inform Sci 480:287–304CrossRef
41.
Zurück zum Zitat Yang X, Tian Y (2012) Eigenjoints-based action recognition using Naive-Bayes-Nearest-Neighbor. In: CVPRW, pp 14–19 Yang X, Tian Y (2012) Eigenjoints-based action recognition using Naive-Bayes-Nearest-Neighbor. In: CVPRW, pp 14–19
42.
Zurück zum Zitat Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: CVPR, pp 804–811 Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: CVPR, pp 804–811
43.
Zurück zum Zitat Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM MM, pp 1057–1060 Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM MM, pp 1057–1060
44.
Zurück zum Zitat Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: CVPR, pp 4694–4702 Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: CVPR, pp 4694–4702
45.
Zurück zum Zitat Zhu G, Zhang L, Shen P, Song J (2017) Multimodal gesture recognition using 3d convolution and convolutional lstm. IEEE Access Zhu G, Zhang L, Shen P, Song J (2017) Multimodal gesture recognition using 3d convolution and convolutional lstm. IEEE Access
Metadaten
Titel
A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement
verfasst von
Zhimin Gao
Pichao Wang
Huogen Wang
Mingliang Xu
Wanqing Li
Publikationsdatum
30.07.2020
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 2/2020
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-020-10320-w

Weitere Artikel der Ausgabe 2/2020

Neural Processing Letters 2/2020 Zur Ausgabe

Neuer Inhalt