Top

Neural Processing Letters

Published in:

03-10-2018

Action Recognition Using Multiple Pooling Strategies of CNN Features

Authors: Haifeng Hu, Zhongke Liao, Xiang Xiao

Published in: Neural Processing Letters | Issue 1/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The deep convolution neural network has shown great potential in the field of human action recognition. For the sake of obtaining compact and discriminative feature representation, this paper proposes multiple pooling strategies using CNN features. We explore three different pooling strategies, which are called space-time feature pooling (STFP), time filter pooling (TFP) and spatio-temporal pyramid pooling (STPP), respectively. STFP shares the advantages of both hand-crafted features and deep ConvNets features. TFP reflects the change of elements on each CNN feature map over time. STPP focuses on the spatial and temporal pyramid structure of the feature maps. We aggregate these pooled features to produce a new discriminative video descriptor. Experimental results show that the three strategies have complementary advantages on the challenging YouTube, UCF50 and UCF101 datasets, and our video representation is comparable to the previous state-of-the-art algorithms.

previous article Sampled-Data State Estimation of Neutral Type Neural Networks with Mixed Time-Varying Delays

next article Age-Invariant Face Recognition Using Coupled Similarity Reference Coding

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 379–385

Laptev I (2005) On space-time interest points. Int J Comput Vis 64:107–123CrossRef

Niebles J, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial temporal words. Int J Comput Vis 79:299–318CrossRef

Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. In: Proceedings of IEEE conference on computer vision, pp 726–733

Wang H, Klaser A, Schmid C (2013) Dense trajectories and motion boundary descriptors for action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 60–79

Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 492–4976

Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefMATH

Kliper-Gross O, Gurovich Y, Hassner T (2012) Motion interchange patterns for action recognition in unconstrained videos. In: European conference on computer vision, pp 256–269

Tao D, Guo Y, Li Y, Gao X (2018) Tensor rank preserving discriminant analysis for facial recognition. IEEE Trans Image Process 27(1):325–334MathSciNetCrossRefMATH

10.

Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1122–1134MathSciNetCrossRef

11.

Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3169–3176

12.

Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3551–3558

13.

Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2555–2562

14.

Ramana Murthy OV, Goecke R (2013) Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 412–419

15.

Jiang YG, Dai Q, Liu W, Xue XY, Ngo CW (2015) Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans Image Process 24(11):3781–3795MathSciNetCrossRefMATH

16.

Seo JJ, Son J, Kim H, Neve WD, Ro YM (2015) Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories. In: Proceedings of IEEE conference on automatic face and gesture recognition, pp 1–6

17.

Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385:338–352CrossRef

18.

Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of IEEE international conference on computer vision, pp 1839–1848

19.

Hong C, Jun Y, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRefMATH

20.

Jun Y, Hong C, Rui Y, Tao D (2018) Multitask autoencoder model for recovering human poses. IEEE Trans Ind Electron 65(6):5060–5068CrossRef

21.

Ji S, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231CrossRef

22.

Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with R*CNN. In: Vision, pp 1080–1088

23.

Simonyan K, Zisserman A (2013) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576

24.

Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of IEEE conference on computer vision, pp 4041–4049

25.

Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of IEEE conference on computer vision, pp 4597–4605

26.

Wang LM, Qiao Y, Tang XO (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 4305–4314

27.

Jegou H, Perronnin F, Douze M, Sanchez J (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716CrossRef

28.

Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2169–2178

29.

Dollar P, Rabaud V, Cottrell G, Belongie S (2006) Behavior recognition via sparse spatio-temporal features. In: Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, pp 65–72

30.

Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1470–1477

31.

Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105:222–245MathSciNetCrossRefMATH

32.

Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3304–3311

33.

Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1725–1732

34.

Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407

35.

He KM, Zhang XY, Ren SQ, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 1904–1916

36.

Yoo D, Park S, Lee JY, Kweon IS (2015) Multi-scale pyramid pooling for deep convolutional representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 71–80

37.

Ryoo MS, Rothrock B, Matthies L (2015) Pooled motion features for first-person videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 896–904

38.

Choi J, Jeon WJ, Lee SC (1992) Spatio-temporal pyramid matching for sports videos. In: ACM international conference on multimedia information retrieval, pp 379–385

39.

Zhang XJ, Zhang H, Cao XC (2012) Action recognition based on spatial-temporal pyramid sparse coding. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1455–1458

40.

Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1996–2003

41.

Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24:971–981CrossRef

42.

Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. arXiv:1212.0402

43.

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. CoRR arXiv:1408.5093

44.

Jones S, Ling S (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 23–28

45.

Belkin M, Niyogi P (2002) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396CrossRefMATH

46.

Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: European conference on computer vision, pp 695–710

Title: Action Recognition Using Multiple Pooling Strategies of CNN Features
Authors: Haifeng Hu
Zhongke Liao
Xiang Xiao
Publication date: 03-10-2018
Publisher: Springer US
Published in: Neural Processing Letters / Issue 1/2019
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-018-9932-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2019

Adaptive Syncretic Attention for Constrained Image Captioning

Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations

A Forecasting Framework Based on Kalman Filter Integrated Multivariate Local Polynomial Regression: Application to Urban Water Demand

Stability Analysis of Fractional Order Hopfield Neural Networks with Optimal Discontinuous Control

Improved GNN Models for Constant Matrix Inversion

Extraction of Product Evaluation Factors with a Convolutional Neural Network and Transfer Learning