Skip to main content
Top
Published in: Cluster Computing 1/2019

07-11-2017

More efficient and effective tricks for deep action recognition

Authors: Zheyuan Liu, Xiaoteng Zhang, Lei Song, Zhengyan Ding, Huixian Duan

Published in: Cluster Computing | Special Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Deep convolutional network has achieved great success in visual recognition of static images, while it is not so advantageous as traditional methods in action recognition in videos. As two-stream-style convolutional network gaining best performance in human action recognition, there exist obstacles such as selecting different pre-train models and hyper-parameters, and high computation consumption. In this paper, we propose two efficient and effective methods for action recognition, based on two-stream convolutional network. (1) Reducing computational cost of temporal stream while achieving the same accuracy, and (2) providing techniques such as selection of optical flow algorithm, the pre-train dataset/architectures and the hyper-parameters for assembly in action recognition task. Experimental results show that we are able to obtain performance on a par with the state-of-the-art ones on the datasets of HMDB51 (70.9%) and UCF101 (95.4%).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
2.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 1–14 (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 1–14 (2015)
3.
go back to reference Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
4.
go back to reference Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
5.
go back to reference Tran, D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp. 4489–4497 (2015) Tran, D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp. 4489–4497 (2015)
6.
go back to reference Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In NIPS (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In NIPS (2014)
7.
go back to reference Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR (2015) Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR (2015)
8.
go back to reference Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016) Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)
9.
go back to reference Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.V.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV (2016) Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.V.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV (2016)
10.
go back to reference Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. In: CoRR (2012) Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. In: CoRR (2012)
11.
go back to reference Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011) Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011)
12.
go back to reference Ioe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015) Ioe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
13.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
14.
go back to reference Yue-Hei, J., Jonghyun, C., Jan, N., Larry, D.: ActionFlowNet: learning motion representation for action recognition. In: CoRR (2016). arXiv:1612.03052 Yue-Hei, J., Jonghyun, C., Jan, N., Larry, D.: ActionFlowNet: learning motion representation for action recognition. In: CoRR (2016). arXiv:​1612.​03052
15.
go back to reference Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: ECCV (2004) Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: ECCV (2004)
16.
go back to reference Jean-Yves, B.: Pyramidal Implementation of the Affine Lucas Kanade Feature Tracker Description of the Algorithm, vol. 5, no. 1–10, p. 4. Intel Corporation (2001) Jean-Yves, B.: Pyramidal Implementation of the Affine Lucas Kanade Feature Tracker Description of the Algorithm, vol. 5, no. 1–10, p. 4. Intel Corporation (2001)
17.
go back to reference Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-L1 optical flow. In: Proceedings of the 29th DAGM symposium on pattern recognition, pp. 214–223 (2007) Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-L1 optical flow. In: Proceedings of the 29th DAGM symposium on pattern recognition, pp. 214–223 (2007)
18.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
19.
go back to reference Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database 27. In: NIPS (2014) Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database 27. In: NIPS (2014)
20.
go back to reference Jiang, Y., Wu, Z., Wang, J., Xue, X., Chang, S.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. In: CoRR (2015). arXiv:1502.07209 Jiang, Y., Wu, Z., Wang, J., Xue, X., Chang, S.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. In: CoRR (2015). arXiv:​1502.​07209
21.
go back to reference Szegedy, C., Vanhoucke, V., Ioffe, S., Jonathon, S.: Rethinking the inception architecture for computer vision. In: CoRR (2015). arXiv:1512.00567 Szegedy, C., Vanhoucke, V., Ioffe, S., Jonathon, S.: Rethinking the inception architecture for computer vision. In: CoRR (2015). arXiv:​1512.​00567
22.
go back to reference Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: CoRR (2016). arXiv:1602.07261 Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: CoRR (2016). arXiv:​1602.​07261
23.
go back to reference Xie, S., Girshick, R., Doll’ar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CoRR (2016). arXiv:1611.05431 Xie, S., Girshick, R., Doll’ar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CoRR (2016). arXiv:​1611.​05431
24.
go back to reference Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian pyramid: multi-skip feature stacking for action recognition. In: CVPR (2015) Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian pyramid: multi-skip feature stacking for action recognition. In: CVPR (2015)
25.
go back to reference Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016) Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016)
28.
go back to reference Jiang, Y.G., Liu, J., Roshan Zamir, A., Laptev, I., Piccardi, M., Shah, M., Suk-thankar, R.: THUMOS challenge: action recognition with a large number of classes (2013) Jiang, Y.G., Liu, J., Roshan Zamir, A., Laptev, I., Piccardi, M., Shah, M., Suk-thankar, R.: THUMOS challenge: action recognition with a large number of classes (2013)
29.
go back to reference Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th international joint conference on artificial intelligence (IJCAI), pp. 674–679 (1981) Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th international joint conference on artificial intelligence (IJCAI), pp. 674–679 (1981)
Metadata
Title
More efficient and effective tricks for deep action recognition
Authors
Zheyuan Liu
Xiaoteng Zhang
Lei Song
Zhengyan Ding
Huixian Duan
Publication date
07-11-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 1/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1309-2

Other articles of this Special Issue 1/2019

Cluster Computing 1/2019 Go to the issue

Premium Partner