Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2019

24.07.2018 | Original Article

Human action recognition in videos with articulated pose information by deep networks

verfasst von: M. Farrajota, João M. F. Rodrigues, J. M. H. du Buf

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Action recognition is of great importance in understanding human motion from video. It is an important topic in computer vision due to its many applications such as video surveillance, human–machine interaction and video retrieval. One key problem is to automatically recognize low-level actions and high-level activities of interest. This paper proposes a way to cope with low-level actions by combining information of human body joints to aid action recognition. This is achieved by using high-level features computed by a convolutional neural network which was pre-trained on Imagenet, with articulated body joints as low-level features. These features are then used to feed a Long Short-Term Memory network to learn the temporal dependencies of an action. For pose prediction, we focus on articulated relations between body joints. We employ a series of residual auto-encoders to produce multiple predictions which are then combined to provide a likelihood map of body joints. In the network topology, features are processed across all scales which capture the various spatial relationships associated with the body. Repeated bottom-up and top-down processing with intermediate supervision of each auto-encoder network is applied. We demonstrate state-of-the-art results on the popular FLIC, LSP and UCF Sports datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR. IEEE, pp 3686–3693 Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR. IEEE, pp 3686–3693
3.
Zurück zum Zitat Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: ECCV. Springer, pp 717–732 Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: ECCV. Springer, pp 717–732
4.
Zurück zum Zitat Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J (2015) Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR, pp 961–970 Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J (2015) Activitynet: a large-scale video benchmark for human activity understanding. In: CVPR, pp 961–970
5.
Zurück zum Zitat Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. arXiv preprint arXiv:1705.07750 Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. arXiv preprint arXiv:​1705.​07750
6.
Zurück zum Zitat Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS, pp 1736–1744 Chen X, Yuille AL (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS, pp 1736–1744
7.
Zurück zum Zitat Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: a matlab-like environment for machine learning. Tech. rep Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: a matlab-like environment for machine learning. Tech. rep
8.
Zurück zum Zitat Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: CVPR, pp 3041–3048 Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: CVPR, pp 3041–3048
9.
Zurück zum Zitat Derpanis KG, Sizintsev M, Cannons K, Wildes RP (2010) Efficient action spotting based on a spacetime oriented structure representation. In: CVPR. IEEE, pp 1990–1997 Derpanis KG, Sizintsev M, Cannons K, Wildes RP (2010) Efficient action spotting based on a spacetime oriented structure representation. In: CVPR. IEEE, pp 1990–1997
10.
Zurück zum Zitat Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634 Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634
11.
Zurück zum Zitat Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: CVPR, pp 1–8 Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: CVPR, pp 1–8
13.
Zurück zum Zitat Farrajota M, Rodrigues JM, du Buf J (2017) Human pose estimation by a series of residual auto-encoders. In: IBPRIA. Springer, pp 131–139 Farrajota M, Rodrigues JM, du Buf J (2017) Human pose estimation by a series of residual auto-encoders. In: IBPRIA. Springer, pp 131–139
14.
Zurück zum Zitat Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR, pp 1933–1941 Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR, pp 1933–1941
15.
Zurück zum Zitat Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. In: PAMI, vol. 32. IEEE, pp 1627–1645 Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. In: PAMI, vol. 32. IEEE, pp 1627–1645
16.
Zurück zum Zitat Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: CVPR, pp 5378–5387 Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: CVPR, pp 5378–5387
17.
Zurück zum Zitat Gaidon A, Harchaoui Z, Schmid C (2013) Temporal localization of actions with actoms. In: PAMI, vol. 35. IEEE, pp 2782–2795 Gaidon A, Harchaoui Z, Schmid C (2013) Temporal localization of actions with actoms. In: PAMI, vol. 35. IEEE, pp 2782–2795
18.
Zurück zum Zitat Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) Devnet: a deep event network for multimedia event detection and evidence recounting. In: CVPR, pp 2568–2577 Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) Devnet: a deep event network for multimedia event detection and evidence recounting. In: CVPR, pp 2568–2577
20.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
21.
Zurück zum Zitat Insafutdinov E, Andriluka M, Pishchulin L, Tang S, Andres B, Schiele B (2016) Articulated multi-person tracking in the wild. arXiv preprint arXiv:1612.01465 Insafutdinov E, Andriluka M, Pishchulin L, Tang S, Andres B, Schiele B (2016) Articulated multi-person tracking in the wild. arXiv preprint arXiv:​1612.​01465
22.
Zurück zum Zitat Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. arXiv preprint arXiv:1605.03170 Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. arXiv preprint arXiv:​1605.​03170
23.
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167
24.
Zurück zum Zitat Jain M, Van Gemert J, Jégou H, Bouthemy P, Snoek C (2014) Action localization with tubelets from motion. In: CVPR Jain M, Van Gemert J, Jégou H, Bouthemy P, Snoek C (2014) Action localization with tubelets from motion. In: CVPR
25.
Zurück zum Zitat Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: ICCV, pp 3192–3199 Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: ICCV, pp 3192–3199
26.
Zurück zum Zitat Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE, pp 221–231 Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE, pp 221–231
27.
Zurück zum Zitat Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol 2, p 5 Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol 2, p 5
28.
Zurück zum Zitat Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: CVPR, pp 1465–1472 Johnson S, Everingham M (2011) Learning effective human pose estimation from inaccurate annotation. In: CVPR, pp 1465–1472
29.
Zurück zum Zitat Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recogn Lett 33(4):446–452CrossRef Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recogn Lett 33(4):446–452CrossRef
30.
Zurück zum Zitat Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: CVPR, pp 1725–1732 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: CVPR, pp 1725–1732
32.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
33.
Zurück zum Zitat Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: ICCV. IEEE, pp 2003–2010 Lan T, Wang Y, Mori G (2011) Discriminative figure-centric models for joint action localization and recognition. In: ICCV. IEEE, pp 2003–2010
34.
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
36.
37.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440
38.
Zurück zum Zitat Newell A, Deng J (2016) Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424 Newell A, Deng J (2016) Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:​1611.​05424
39.
40.
Zurück zum Zitat Niebles JC, Chen CW, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV, pp 392–405. Springer Niebles JC, Chen CW, Fei-Fei L (2010) Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV, pp 392–405. Springer
41.
Zurück zum Zitat Packer B, Saenko K, Koller D (2012) A combined pose, object, and feature model for action understanding. In: CVPR, pp 1378–1385 Packer B, Saenko K, Koller D (2012) A combined pose, object, and feature model for action understanding. In: CVPR, pp 1378–1385
42.
Zurück zum Zitat Pirsiavash H, Ramanan D (2014) Parsing videos of actions with segmental grammars. In: CVPR, pp 612–619 Pirsiavash H, Ramanan D (2014) Parsing videos of actions with segmental grammars. In: CVPR, pp 612–619
43.
Zurück zum Zitat Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: CVPR, pp 588–595 Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: CVPR, pp 588–595
44.
Zurück zum Zitat Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Strong appearance and expressive spatial models for human pose estimation. In: ICCV, pp 3487–3494 Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Strong appearance and expressive spatial models for human pose estimation. In: ICCV, pp 3487–3494
45.
Zurück zum Zitat Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2015) Deepcut: joint subset partition and labeling for multi person pose estimation. arXiv preprint arXiv:1511.06645 Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2015) Deepcut: joint subset partition and labeling for multi person pose estimation. arXiv preprint arXiv:​1511.​06645
46.
Zurück zum Zitat Ramakrishna V, Munoz D, Hebert M, Bagnell JA, Sheikh Y (2014) Pose machines: articulated pose estimation via inference machines. In: ECCV. Springer, pp 33–47 Ramakrishna V, Munoz D, Hebert M, Bagnell JA, Sheikh Y (2014) Pose machines: articulated pose estimation via inference machines. In: ECCV. Springer, pp 33–47
47.
Zurück zum Zitat Raptis M, Sigal L (2013) Poselet key-framing: a model for human activity recognition. In: CVPR, pp 2650–2657 Raptis M, Sigal L (2013) Poselet key-framing: a model for human activity recognition. In: CVPR, pp 2650–2657
48.
Zurück zum Zitat Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR. IEEE, pp 1–8 Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR. IEEE, pp 1–8
49.
Zurück zum Zitat Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet large scale visual recognition challenge, p 37. arXiv preprint arXiv:1409.0575 Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) ImageNet large scale visual recognition challenge, p 37. arXiv preprint arXiv:​1409.​0575
50.
Zurück zum Zitat Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: CVPR. IEEE, pp 1234–1241 Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: CVPR. IEEE, pp 1234–1241
51.
Zurück zum Zitat Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation. In: CVPR, vol 13, p 3 Sapp B, Taskar B (2013) Modec: multimodal decomposable models for human pose estimation. In: CVPR, vol 13, p 3
52.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
53.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
54.
Zurück zum Zitat Singh VK, Nevatia R (2011) Action recognition in cluttered dynamic scenes using pose-specific part models. In: ICCV. IEEE, pp 113–120 Singh VK, Nevatia R (2011) Action recognition in cluttered dynamic scenes using pose-specific part models. In: ICCV. IEEE, pp 113–120
55.
Zurück zum Zitat Souly N, Shah M (2016) Visual saliency detection using group lasso regularization in videos of natural scenes. IJCV 117:93–110MathSciNetCrossRef Souly N, Shah M (2016) Visual saliency detection using group lasso regularization in videos of natural scenes. IJCV 117:93–110MathSciNetCrossRef
56.
Zurück zum Zitat Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: ICCV, pp 4597–4605 Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: ICCV, pp 4597–4605
57.
Zurück zum Zitat Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: ICCV. IEEE, pp 723–730 Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: ICCV. IEEE, pp 723–730
58.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9
59.
Zurück zum Zitat Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, pp 1799–1807 Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, pp 1799–1807
60.
Zurück zum Zitat Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: CVPR, pp 648–656 Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: CVPR, pp 648–656
61.
Zurück zum Zitat Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: CVPR, pp 1653–1660 Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: CVPR, pp 1653–1660
62.
Zurück zum Zitat Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp 4489–4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp 4489–4497
63.
Zurück zum Zitat Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE
64.
Zurück zum Zitat Wang L, Qiao Y, Tang X (2014) Latent hierarchical model of temporal structure for complex activity classification. Trans Image Process 23(2):810–822MathSciNetCrossRefMATH Wang L, Qiao Y, Tang X (2014) Latent hierarchical model of temporal structure for complex activity classification. Trans Image Process 23(2):810–822MathSciNetCrossRefMATH
65.
Zurück zum Zitat Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3d parts for human motion recognition. In: CVPR, pp 2674–2681 Wang L, Qiao Y, Tang X (2013) Motionlets: mid-level 3d parts for human motion recognition. In: CVPR, pp 2674–2681
66.
Zurück zum Zitat Wang L, Qiao Y, Tang X (2014) Video action detection with relational dynamic-poselets. In: ECCV. Springer, pp 565–580 Wang L, Qiao Y, Tang X (2014) Video action detection with relational dynamic-poselets. In: ECCV. Springer, pp 565–580
67.
Zurück zum Zitat Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp 4305–4314 Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp 4305–4314
68.
Zurück zum Zitat Wang H, Schmid C (2013) Action recognition with improved trajectories. In: ICCV, pp 3551–3558 Wang H, Schmid C (2013) Action recognition with improved trajectories. In: ICCV, pp 3551–3558
69.
Zurück zum Zitat Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: CVPR, pp 915–922 Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: CVPR, pp 915–922
70.
Zurück zum Zitat Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV. Springer, pp 20–36 Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV. Springer, pp 20–36
71.
Zurück zum Zitat Weinzaepfel P, Harchaoui Z, Schmid C (2015) Learning to track for spatio-temporal action localization. In: ICCV, pp 3164–3172 Weinzaepfel P, Harchaoui Z, Schmid C (2015) Learning to track for spatio-temporal action localization. In: ICCV, pp 3164–3172
73.
Zurück zum Zitat Xiong Y, Zhu K, Lin D, Tang X (2015) Recognize complex events from static images by fusing deep channels. In: CVPR, pp 1600–1609 Xiong Y, Zhu K, Lin D, Tang X (2015) Recognize complex events from static images by fusing deep channels. In: CVPR, pp 1600–1609
74.
Zurück zum Zitat Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:​1505.​00853
75.
Zurück zum Zitat Yang Y, Saleemi I, Shah M (2013) Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. In: PAMI, vol 35. IEEE, pp 1635–1648 Yang Y, Saleemi I, Shah M (2013) Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. In: PAMI, vol 35. IEEE, pp 1635–1648
76.
Zurück zum Zitat Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: CVPR, pp 4694–4702 Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: CVPR, pp 4694–4702
77.
Zurück zum Zitat Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector cnns. In: CVPR, pp 2718–2726 Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector cnns. In: CVPR, pp 2718–2726
Metadaten
Titel
Human action recognition in videos with articulated pose information by deep networks
verfasst von
M. Farrajota
João M. F. Rodrigues
J. M. H. du Buf
Publikationsdatum
24.07.2018
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2019
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-018-0727-y

Weitere Artikel der Ausgabe 4/2019

Pattern Analysis and Applications 4/2019 Zur Ausgabe

Premium Partner