nach oben

Pattern Analysis and Applications

Erschienen in:

14.02.2021 | Short Paper

Advanced skeleton-based action recognition via spatial–temporal rotation descriptors

verfasst von: Zhongwei Shen, Xiao-Jun Wu, Josef Kittler

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As human action is a spatial–temporal process, modern action recognition research has focused on exploring more effective motion representations, rather than only taking human poses as input. To better model a motion pattern, in this paper, we exploit the rotation information to depict the spatial–temporal variation, thus enhancing the dynamic appearance, as well as forming a complementary component with the static coordinates of the joints. Specifically, we design to represent the movement of human body with joint units, consisting of performing regrouping human joints together with the adjacent two bones. Therefore, the rotation descriptors reduce the impact from the static values while focus on the dynamic movement. The proposed general features can be simply applied to existing CNN-based action recognition methods. The experimental results performed on NTU-RGB+D and ICL First Person Handpose datasets demonstrate the advantages of the proposed method.

Vorheriger Artikel Learning occlusion-aware view synthesis for light fields

Nächster Artikel An improved small object detection method based on Yolo V3

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7291–7299

Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6299–6308

Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2014) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691CrossRef

Du T, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE international conference on computer vision (Proceedings of the IEEE international conference on computer vision)

Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE international conference on computer vision, pp 3725–3734

Christoph R, Pinz FA (2016) Spatiotemporal residual networks for video action recognition. In: Advances in neural information processing systems, pp 3468–3476

Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787CrossRef

Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555

10.

Hu JF, Zheng WS, Lai J, Zhang J (2015) Jointly learning heterogeneous features for rgb-d activity recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5344–5352

11.

Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6099–6108

12.

Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2000–2009

13.

Jost J, Jost J (2008) Riemannian geometry and geometric analysis, vol 42005. Springer, BerlinMATH

14.

Khan SS, Nogas J, Mihailidis A (2021) Spatio-temporal adversarial learning for detecting unseen falls. Pattern Anal Appl 24:381–391CrossRef

15.

Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631

16.

Li H, Wu XJ (2018) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623MathSciNetCrossRef

17.

Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603

18.

Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355MathSciNetCrossRef

19.

Li Z, Tang J, He X (2017) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960MathSciNetCrossRef

20.

Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083CrossRef

21.

Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 816–833

22.

Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152

23.

Ni T, Shi Y, Sun A et al (2021) Simultaneous identification of points and circles: structure from motion system in industry scenes. Pattern Anal Appl 24:333–342CrossRef

24.

Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu-rgb+d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1010–1019

25.

Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035

26.

Shou Z, Lin X, Kalantidis Y, Sevilla-Lara L, Rohrbach M, Chang SF, Yan Z (2019) Dmc-net: generating discriminative motion cues for fast compressed video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1268–1277

27.

Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European conference on computer vision (ECCV), pp 103–118

28.

Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199

29.

Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1390–1399

30.

Tran D, Ray J, Shou Z, Chang SF, Paluri M (2017) Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038

31.

Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6450–6459

32.

Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 588–595

33.

Vemulapalli R, Chellapa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4471–4479

34.

Walia GS, Kumar A, Saxena A, Sharma K, Singh K (2020) Robust object tracking with crow search optimized multi-cue particle filter. Pattern Anal Appl 23(3):1439–1455MathSciNetCrossRef

35.

Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision

36.

Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36

37.

Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE international conference on computer vision

38.

Wu C, Wu XJ, Kittler J (2019) Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition. In: The IEEE international conference on computer vision (Proceedings of the IEEE international conference on computer vision) workshops, pp 01–08

39.

Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV), pp 305–321

40.

Xu T, Feng ZH, Wu XJ, Kittler J (2019) Joint group feature selection and discriminative filter learning for robust visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 7950–7960

41.

Yadav M, Sastry N (2018) Group theory and computation. Springer, Berlin. ISBN: 978-981-13-2046-0

42.

Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452

43.

Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. In: Proceedings of the European conference on computer vision (ECCV), pp 803–818

Titel: Advanced skeleton-based action recognition via spatial–temporal rotation descriptors
verfasst von: Zhongwei Shen
Xiao-Jun Wu
Josef Kittler
Publikationsdatum: 14.02.2021
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 3/2021
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-020-00952-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2021

SSD based on contour–material level for domain adaptation

Real-time one-shot learning gesture recognition based on lightweight 3D Inception-ResNet with separable convolutions

Learning occlusion-aware view synthesis for light fields

A novel framework for rapid diagnosis of COVID-19 on computed tomography scans

Tracklet style transfer and part-level feature description for person reidentification in a camera network

Text classification based on the word subspace representation

Premium Partner