Skip to main content

2018 | OriginalPaper | Buchkapitel

An Attention-Aware Model for Human Action Recognition on Tree-Based Skeleton Sequences

verfasst von : Runwei Ding, Chang Liu, Hong Liu

Erschienen in: Social Robotics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Skeleton-based human action recognition (HAR) has attracted a lot of research attentions because of robustness to variations of locations and appearances. However, most existing methods treat the whole skeleton as a fixed pattern, in which the importance of different skeleton joints for action recognition is not considered. In this paper, a novel CNN-based attention-ware network is proposed. First, to describe the semantic meaning of skeletons and learn the discriminative joints over time, an attention generate network named Global Attention Network (GAN) is proposed to generate attention masks. Then, to encode the spatial structure of skeleton sequences, we design a tree-based traversal (TTTM) rule, which can represent the skeleton structure, as a convolution unit of main network. Finally, the GAN and main network are cascaded as a whole network which is trained in an end-to-end manner. Experiments show that the TTTM and GAN are supplemented each other, and the whole network achieves an efficient improvement over the state-of-the-arts, e.g., the classification accuracy of this network was 83.6% and 89.5% on NTU-RGBD CV and CS dataset, which outperforms any other methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baxter, R.H., Robertson, N.M., Lane, D.M.: Human behavior recognition in data-scarce domains. Pattern Recognit. 48(8), 2377–2393 (2015)CrossRef Baxter, R.H., Robertson, N.M., Lane, D.M.: Human behavior recognition in data-scarce domains. Pattern Recognit. 48(8), 2377–2393 (2015)CrossRef
2.
Zurück zum Zitat Chen, H., Wang, G., Xue, J., He, L.: A novel hierarchical framework for human action recognition. Pattern Recognit. 55(C), 148–159 (2016)CrossRef Chen, H., Wang, G., Xue, J., He, L.: A novel hierarchical framework for human action recognition. Pattern Recognit. 55(C), 148–159 (2016)CrossRef
3.
Zurück zum Zitat Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)CrossRef Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)CrossRef
4.
Zurück zum Zitat Ding, M., Fan, G.: Multilayer joint gait-pose manifolds for human gait motion modeling. IEEE Trans. Cybern. 45(11), 1–8 (2015)CrossRef Ding, M., Fan, G.: Multilayer joint gait-pose manifolds for human gait motion modeling. IEEE Trans. Cybern. 45(11), 1–8 (2015)CrossRef
5.
Zurück zum Zitat Yao, A., Gall, J., Fanelli, G., Gool, L.-V.: Does human action recognition benefit from pose estimation? In: British Machine Vision Conference, pp. 67.1–67.11. British Machine Vision Association (2011) Yao, A., Gall, J., Fanelli, G., Gool, L.-V.: Does human action recognition benefit from pose estimation? In: British Machine Vision Conference, pp. 67.1–67.11. British Machine Vision Association (2011)
6.
Zurück zum Zitat Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 588–595. IEEE, Columbus (2014) Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 588–595. IEEE, Columbus (2014)
7.
Zurück zum Zitat Yang, X., Tian, Y.: Eigen joints-based action recognition using Naïve-Bayes-nearest-neighbor. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19. IEEE, Providence (2012) Yang, X., Tian, Y.: Eigen joints-based action recognition using Naïve-Bayes-nearest-neighbor. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 14–19. IEEE, Providence (2012)
8.
Zurück zum Zitat Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Learning clip representations for skeleton-based 3D action recognition. IEEE Trans. Image Process. 27, 2842–2855 (2018)MathSciNetCrossRef Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Learning clip representations for skeleton-based 3D action recognition. IEEE Trans. Image Process. 27, 2842–2855 (2018)MathSciNetCrossRef
9.
Zurück zum Zitat Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI Conference on Artificial Intelligence, pp. 4263–4270. AAAI, San Francisco (2017) Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI Conference on Artificial Intelligence, pp. 4263–4270. AAAI, San Francisco (2017)
10.
Zurück zum Zitat Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. In: The 30th International Conference on Machine Learning, Beijing, China (2014) Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. In: The 30th International Conference on Machine Learning, Beijing, China (2014)
11.
Zurück zum Zitat Tu, Z., et al.: Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recognit. 79, 32–43 (2018)CrossRef Tu, Z., et al.: Multi-stream CNN: learning representations based on human-related regions for action recognition. Pattern Recognit. 79, 32–43 (2018)CrossRef
12.
Zurück zum Zitat Kim, T.-S., Reiter. A.: Interpretable 3D human action analysis with temporal convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1623–1631. IEEE Computer Society, Honolulu (2017) Kim, T.-S., Reiter. A.: Interpretable 3D human action analysis with temporal convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1623–1631. IEEE Computer Society, Honolulu (2017)
13.
Zurück zum Zitat Ding, W., Liu, K., Belyaev, E., Cheng, F.: Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit. 77, 75–86 (2018)CrossRef Ding, W., Liu, K., Belyaev, E., Cheng, F.: Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit. 77, 75–86 (2018)CrossRef
14.
Zurück zum Zitat Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455 (2018) Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:​1801.​07455 (2018)
15.
Zurück zum Zitat Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE, Colorado Springs (2011) Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE, Colorado Springs (2011)
16.
Zurück zum Zitat Liu, J., Shahroudy, A., Xu, D., Chichung, A.-K., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2017) Liu, J., Shahroudy, A., Xu, D., Chichung, A.-K., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2017)
17.
Zurück zum Zitat Cayley, A.: XXVIII. On the theory of the analytical forms called trees. Lond. Edinb. Dublin Philos. Mag. J. Sci. 13(85), 172–176 (1857)CrossRef Cayley, A.: XXVIII. On the theory of the analytical forms called trees. Lond. Edinb. Dublin Philos. Mag. J. Sci. 13(85), 172–176 (1857)CrossRef
18.
Zurück zum Zitat Shahroudy, A., Liu, J., Ng, T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019. IEEE, Las Vegas (2016) Shahroudy, A., Liu, J., Ng, T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019. IEEE, Las Vegas (2016)
20.
Zurück zum Zitat Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310. IEEE, Honolulu (2017) Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310. IEEE, Honolulu (2017)
21.
Zurück zum Zitat Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4041–4049. IEEE, Honolulu (2017) Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4041–4049. IEEE, Honolulu (2017)
22.
Zurück zum Zitat Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118. IEEE, Honolulu (2017) Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118. IEEE, Honolulu (2017)
24.
Zurück zum Zitat Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4570–4579. IEEE, Honolulu (2017) Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4570–4579. IEEE, Honolulu (2017)
Metadaten
Titel
An Attention-Aware Model for Human Action Recognition on Tree-Based Skeleton Sequences
verfasst von
Runwei Ding
Chang Liu
Hong Liu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-05204-1_56

Premium Partner