Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2021

14.02.2021 | Short Paper

Advanced skeleton-based action recognition via spatial–temporal rotation descriptors

verfasst von: Zhongwei Shen, Xiao-Jun Wu, Josef Kittler

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As human action is a spatial–temporal process, modern action recognition research has focused on exploring more effective motion representations, rather than only taking human poses as input. To better model a motion pattern, in this paper, we exploit the rotation information to depict the spatial–temporal variation, thus enhancing the dynamic appearance, as well as forming a complementary component with the static coordinates of the joints. Specifically, we design to represent the movement of human body with joint units, consisting of performing regrouping human joints together with the adjacent two bones. Therefore, the rotation descriptors reduce the impact from the static values while focus on the dynamic movement. The proposed general features can be simply applied to existing CNN-based action recognition methods. The experimental results performed on NTU-RGB+D and ICL First Person Handpose datasets demonstrate the advantages of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7291–7299 Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7291–7299
2.
Zurück zum Zitat Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6299–6308 Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6299–6308
3.
Zurück zum Zitat Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2014) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691CrossRef Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2014) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691CrossRef
4.
Zurück zum Zitat Du T, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE international conference on computer vision (Proceedings of the IEEE international conference on computer vision) Du T, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE international conference on computer vision (Proceedings of the IEEE international conference on computer vision)
5.
Zurück zum Zitat Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE international conference on computer vision, pp 3725–3734 Du W, Wang Y, Qiao Y (2017) Rpan: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE international conference on computer vision, pp 3725–3734
6.
Zurück zum Zitat Christoph R, Pinz FA (2016) Spatiotemporal residual networks for video action recognition. In: Advances in neural information processing systems, pp 3468–3476 Christoph R, Pinz FA (2016) Spatiotemporal residual networks for video action recognition. In: Advances in neural information processing systems, pp 3468–3476
7.
Zurück zum Zitat Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787CrossRef Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2015) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787CrossRef
8.
Zurück zum Zitat Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
9.
Zurück zum Zitat Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555 Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555
10.
Zurück zum Zitat Hu JF, Zheng WS, Lai J, Zhang J (2015) Jointly learning heterogeneous features for rgb-d activity recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5344–5352 Hu JF, Zheng WS, Lai J, Zhang J (2015) Jointly learning heterogeneous features for rgb-d activity recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5344–5352
11.
Zurück zum Zitat Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6099–6108 Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6099–6108
12.
Zurück zum Zitat Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2000–2009 Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2000–2009
13.
Zurück zum Zitat Jost J, Jost J (2008) Riemannian geometry and geometric analysis, vol 42005. Springer, BerlinMATH Jost J, Jost J (2008) Riemannian geometry and geometric analysis, vol 42005. Springer, BerlinMATH
14.
Zurück zum Zitat Khan SS, Nogas J, Mihailidis A (2021) Spatio-temporal adversarial learning for detecting unseen falls. Pattern Anal Appl 24:381–391CrossRef Khan SS, Nogas J, Mihailidis A (2021) Spatio-temporal adversarial learning for detecting unseen falls. Pattern Anal Appl 24:381–391CrossRef
15.
Zurück zum Zitat Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631 Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631
16.
Zurück zum Zitat Li H, Wu XJ (2018) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623MathSciNetCrossRef Li H, Wu XJ (2018) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623MathSciNetCrossRef
17.
Zurück zum Zitat Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603 Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603
18.
Zurück zum Zitat Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355MathSciNetCrossRef Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355MathSciNetCrossRef
19.
Zurück zum Zitat Li Z, Tang J, He X (2017) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960MathSciNetCrossRef Li Z, Tang J, He X (2017) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Netw Learn Syst 29(5):1947–1960MathSciNetCrossRef
20.
Zurück zum Zitat Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083CrossRef Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083CrossRef
21.
Zurück zum Zitat Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 816–833 Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 816–833
22.
Zurück zum Zitat Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152 Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 143–152
23.
Zurück zum Zitat Ni T, Shi Y, Sun A et al (2021) Simultaneous identification of points and circles: structure from motion system in industry scenes. Pattern Anal Appl 24:333–342CrossRef Ni T, Shi Y, Sun A et al (2021) Simultaneous identification of points and circles: structure from motion system in industry scenes. Pattern Anal Appl 24:333–342CrossRef
24.
Zurück zum Zitat Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu-rgb+d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1010–1019 Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu-rgb+d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1010–1019
25.
Zurück zum Zitat Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035 Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035
26.
Zurück zum Zitat Shou Z, Lin X, Kalantidis Y, Sevilla-Lara L, Rohrbach M, Chang SF, Yan Z (2019) Dmc-net: generating discriminative motion cues for fast compressed video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1268–1277 Shou Z, Lin X, Kalantidis Y, Sevilla-Lara L, Rohrbach M, Chang SF, Yan Z (2019) Dmc-net: generating discriminative motion cues for fast compressed video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1268–1277
27.
Zurück zum Zitat Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European conference on computer vision (ECCV), pp 103–118 Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European conference on computer vision (ECCV), pp 103–118
28.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:​1406.​2199
29.
Zurück zum Zitat Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1390–1399 Sun S, Kuang Z, Sheng L, Ouyang W, Zhang W (2018) Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1390–1399
30.
Zurück zum Zitat Tran D, Ray J, Shou Z, Chang SF, Paluri M (2017) Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038 Tran D, Ray J, Shou Z, Chang SF, Paluri M (2017) Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:​1708.​05038
31.
Zurück zum Zitat Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6450–6459 Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6450–6459
32.
Zurück zum Zitat Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 588–595 Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 588–595
33.
Zurück zum Zitat Vemulapalli R, Chellapa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4471–4479 Vemulapalli R, Chellapa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4471–4479
34.
Zurück zum Zitat Walia GS, Kumar A, Saxena A, Sharma K, Singh K (2020) Robust object tracking with crow search optimized multi-cue particle filter. Pattern Anal Appl 23(3):1439–1455MathSciNetCrossRef Walia GS, Kumar A, Saxena A, Sharma K, Singh K (2020) Robust object tracking with crow search optimized multi-cue particle filter. Pattern Anal Appl 23(3):1439–1455MathSciNetCrossRef
35.
Zurück zum Zitat Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision
36.
Zurück zum Zitat Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36 Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36
37.
Zurück zum Zitat Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE international conference on computer vision Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE international conference on computer vision
38.
Zurück zum Zitat Wu C, Wu XJ, Kittler J (2019) Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition. In: The IEEE international conference on computer vision (Proceedings of the IEEE international conference on computer vision) workshops, pp 01–08 Wu C, Wu XJ, Kittler J (2019) Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition. In: The IEEE international conference on computer vision (Proceedings of the IEEE international conference on computer vision) workshops, pp 01–08
39.
Zurück zum Zitat Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV), pp 305–321 Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV), pp 305–321
40.
Zurück zum Zitat Xu T, Feng ZH, Wu XJ, Kittler J (2019) Joint group feature selection and discriminative filter learning for robust visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 7950–7960 Xu T, Feng ZH, Wu XJ, Kittler J (2019) Joint group feature selection and discriminative filter learning for robust visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 7950–7960
41.
Zurück zum Zitat Yadav M, Sastry N (2018) Group theory and computation. Springer, Berlin. ISBN: 978-981-13-2046-0 Yadav M, Sastry N (2018) Group theory and computation. Springer, Berlin. ISBN: 978-981-13-2046-0
42.
Zurück zum Zitat Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452 Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452
43.
Zurück zum Zitat Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. In: Proceedings of the European conference on computer vision (ECCV), pp 803–818 Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. In: Proceedings of the European conference on computer vision (ECCV), pp 803–818
Metadaten
Titel
Advanced skeleton-based action recognition via spatial–temporal rotation descriptors
verfasst von
Zhongwei Shen
Xiao-Jun Wu
Josef Kittler
Publikationsdatum
14.02.2021
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2021
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-020-00952-y

Weitere Artikel der Ausgabe 3/2021

Pattern Analysis and Applications 3/2021 Zur Ausgabe

Premium Partner