Skip to main content
Top

2018 | OriginalPaper | Chapter

Part-Activated Deep Reinforcement Learning for Action Prediction

Authors : Lei Chen, Jiwen Lu, Zhanjie Song, Jie Zhou

Published in: Computer Vision – ECCV 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a part-activated deep reinforcement learning (PA-DRL) method for action prediction. Most existing methods for action prediction utilize the evolution of whole frames to model actions, which cannot avoid the noise of the current action, especially in the early prediction. Moreover, the loss of structural information of human body diminishes the capacity of features to describe actions. To address this, we design the PA-DRL to exploit the structure of the human body by extracting skeleton proposals under a deep reinforcement learning framework. Specifically, we extract features from different parts of the human body individually and activate the action-related parts in features to enhance the representation. Our method not only exploits the structure information of the human body, but also considers the saliency part for expressing actions. We evaluate our method on three popular action prediction datasets: UT-Interaction, BIT-Interaction and UCF101. Our experimental results demonstrate that our method achieves the performance with state-of-the-arts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: ICCV, pp. 2488–2496 (2015) Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: ICCV, pp. 2488–2496 (2015)
2.
go back to reference Cao, Y., et al.: Recognize human activities from partially observed videos. In: CVPR, pp. 2658–2665 (2013) Cao, Y., et al.: Recognize human activities from partially observed videos. In: CVPR, pp. 2658–2665 (2013)
3.
go back to reference Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 7291–7299 (2017) Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 7291–7299 (2017)
4.
go back to reference Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 6299–6308 (2017) Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 6299–6308 (2017)
5.
go back to reference Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp. 1110–1118 (2015) Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: CVPR, pp. 1110–1118 (2015)
6.
go back to reference Goodrich, B., Arel, I.: Reinforcement learning based visual attention with application to face detection. In: CVPRW, pp. 19–24 (2012) Goodrich, B., Arel, I.: Reinforcement learning based visual attention with application to face detection. In: CVPRW, pp. 19–24 (2012)
7.
go back to reference Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: ICCV, pp. 3619–3627 (2017) Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: ICCV, pp. 3619–3627 (2017)
9.
go back to reference Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades. In: ICCV, pp. 105–114 (2017) Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades. In: ICCV, pp. 105–114 (2017)
10.
go back to reference Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., Yan, S.: Tree-structured reinforcement learning for sequential object localization. In: NIPS, pp. 127–135 (2016) Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., Yan, S.: Tree-structured reinforcement learning for sequential object localization. In: NIPS, pp. 127–135 (2016)
11.
go back to reference Karayev, S., Baumgartner, T., Fritz, M., Darrell, T.: Timely object recognition. In: NIPS, pp. 890–898 (2012) Karayev, S., Baumgartner, T., Fritz, M., Darrell, T.: Timely object recognition. In: NIPS, pp. 890–898 (2012)
12.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
13.
go back to reference Kong, X., Xin, B., Wang, Y., Hua, G.: Collaborative deep reinforcement learning for joint object search. In: CVPR, pp. 1695–1704 (2017) Kong, X., Xin, B., Wang, Y., Hua, G.: Collaborative deep reinforcement learning for joint object search. In: CVPR, pp. 1695–1704 (2017)
14.
go back to reference Kong, Y., Fu, Y.: Max-margin action prediction machine. TPAMI 38(9), 1844–1858 (2016)CrossRef Kong, Y., Fu, Y.: Max-margin action prediction machine. TPAMI 38(9), 1844–1858 (2016)CrossRef
17.
go back to reference Kong, Y., Tao, Z., Fu, Y.: Deep sequential context networks for action prediction. In: CVPR, pp. 1473–1481 (2017) Kong, Y., Tao, Z., Fu, Y.: Deep sequential context networks for action prediction. In: CVPR, pp. 1473–1481 (2017)
18.
go back to reference Krull, A., Brachmann, E., Nowozin, S., Michel, F., Shotton, J., Rother, C.: Poseagent: budget-constrained 6D object pose estimation via reinforcement learning. In: CVPR, pp. 6702–6710 (2017) Krull, A., Brachmann, E., Nowozin, S., Michel, F., Shotton, J., Rother, C.: Poseagent: budget-constrained 6D object pose estimation via reinforcement learning. In: CVPR, pp. 6702–6710 (2017)
19.
go back to reference Lai, S., Zheng, W.S., Hu, J.F., Zhang, J.: Global-local temporal saliency action prediction. TIP 27, 2272–2285 (2017)MathSciNet Lai, S., Zheng, W.S., Hu, J.F., Zhang, J.: Global-local temporal saliency action prediction. TIP 27, 2272–2285 (2017)MathSciNet
21.
go back to reference Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV (2017) Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV (2017)
22.
go back to reference Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: CVPR, pp. 848–857 (2017) Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: CVPR, pp. 848–857 (2017)
23.
go back to reference Littman, M.L.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445–451 (2015)CrossRef Littman, M.L.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445–451 (2015)CrossRef
24.
go back to reference Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR, pp. 1942–1950 (2016) Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR, pp. 1942–1950 (2016)
25.
go back to reference Mahmud, T., Hasan, M., Roy-Chowdhury, A.K.: Joint prediction of activity labels and starting times in untrimmed videos. In: ICCV, pp. 5773–5782 (2017) Mahmud, T., Hasan, M., Roy-Chowdhury, A.K.: Joint prediction of activity labels and starting times in untrimmed videos. In: ICCV, pp. 5773–5782 (2017)
26.
go back to reference Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR, pp. 2891–2900 (2017) Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR, pp. 2891–2900 (2017)
27.
go back to reference Mathe, S., Pirinen, A., Sminchisescu, C.: Reinforcement learning for visual object detection. In: CVPR, pp. 2894–2902 (2016) Mathe, S., Pirinen, A., Sminchisescu, C.: Reinforcement learning for visual object detection. In: CVPR, pp. 2894–2902 (2016)
28.
go back to reference Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
29.
go back to reference Qi, S., Huang, S., Wei, P., Zhu, S.C.: Predicting human activities using stochastic grammar. In: ICCV, pp. 1164–1172 (2017) Qi, S., Huang, S., Wei, P., Zhu, S.C.: Predicting human activities using stochastic grammar. In: ICCV, pp. 1164–1172 (2017)
30.
go back to reference Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.: Deep reinforcement learning-based image captioning with embedding reward. In: CVPR, pp. 290–298 (2017) Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.: Deep reinforcement learning-based image captioning with embedding reward. In: CVPR, pp. 290–298 (2017)
31.
go back to reference Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011) Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011)
32.
go back to reference Ryoo, M.S., Aggarwal, J.: UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). In: ICPR, vol. 2, p. 4 (2010) Ryoo, M.S., Aggarwal, J.: UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). In: ICPR, vol. 2, p. 4 (2010)
33.
go back to reference Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: CVPR, pp. 2730–2737 (2013) Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: CVPR, pp. 2730–2737 (2013)
34.
go back to reference Shi, Q., Cheng, L., Wang, L., Smola, A.: Human action segmentation and recognition using discriminative semi-markov models. IJCV 93(1), 22–32 (2011)CrossRef Shi, Q., Cheng, L., Wang, L., Smola, A.: Human action segmentation and recognition using discriminative semi-markov models. IJCV 93(1), 22–32 (2011)CrossRef
35.
go back to reference Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
36.
go back to reference Singh, G., Saha, S., Sapienza, M., Torr, P.H.S., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: ICCV, pp. 3637–3646 (2017) Singh, G., Saha, S., Sapienza, M., Torr, P.H.S., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: ICCV, pp. 3637–3646 (2017)
37.
go back to reference Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012) Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:​1212.​0402 (2012)
38.
go back to reference Supancic III, J.S., Ramanan, D.: Tracking as online decision-making: learning a policy from streaming videos with reinforcement learning. In: ICCV, pp. 322–331 (2017) Supancic III, J.S., Ramanan, D.: Tracking as online decision-making: learning a policy from streaming videos with reinforcement learning. In: ICCV, pp. 322–331 (2017)
39.
go back to reference Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)
40.
go back to reference Tudor Ionescu, R., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: ICCV, pp. 2895–2903 (2017) Tudor Ionescu, R., Smeureanu, S., Alexe, B., Popescu, M.: Unmasking the abnormal events in video. In: ICCV, pp. 2895–2903 (2017)
41.
go back to reference Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012) Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR, pp. 1290–1297 (2012)
42.
go back to reference Wang, L., Xiong, Y., Lin, D., Van Gool, L.: Untrimmednets for weakly supervised action recognition and detection. In: CVPR, pp. 4325–4334 (2017) Wang, L., Xiong, Y., Lin, D., Van Gool, L.: Untrimmednets for weakly supervised action recognition and detection. In: CVPR, pp. 4325–4334 (2017)
43.
go back to reference Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016) Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016)
44.
go back to reference Xu, Z., Qing, L., Miao, J.: Activity auto-completion: predicting human activities from partial videos. In: ICCV, pp. 3191–3199 (2015) Xu, Z., Qing, L., Miao, J.: Activity auto-completion: predicting human activities from partial videos. In: ICCV, pp. 3191–3199 (2015)
45.
go back to reference Yoo, Y., Yun, S., Choi, J., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: CVPR, pp. 2711–2720 (2017) Yoo, Y., Yun, S., Choi, J., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: CVPR, pp. 2711–2720 (2017)
46.
go back to reference Zaki, H.F.M., Shafait, F., Mian, A.: Modeling sub-event dynamics in first-person action recognition. In: CVPR, pp. 7253–7262 (2017) Zaki, H.F.M., Shafait, F., Mian, A.: Modeling sub-event dynamics in first-person action recognition. In: CVPR, pp. 7253–7262 (2017)
47.
go back to reference Zhuang, B., Liu, L., Shen, C., Reid, I.: Towards context-aware interaction recognition for visual relationship detection. In: ICCV, pp. 589–598 (2017) Zhuang, B., Liu, L., Shen, C., Reid, I.: Towards context-aware interaction recognition for visual relationship detection. In: ICCV, pp. 589–598 (2017)
Metadata
Title
Part-Activated Deep Reinforcement Learning for Action Prediction
Authors
Lei Chen
Jiwen Lu
Zhanjie Song
Jie Zhou
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-01219-9_26

Premium Partner