Top

International Journal of Computer Vision

Published in:

01-01-2015

Pose Adaptive Motion Feature Pooling for Human Action Analysis

Authors: Bingbing Ni, Pierre Moulin, Shuicheng Yan

Published in: International Journal of Computer Vision | Issue 2/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Ineffective spatial–temporal motion feature pooling has been a fundamental bottleneck for human action recognition/detection for decades. Previous pooling schemes such as global, spatial–temporal pyramid, or human and object centric pooling fail to capture discriminative motion patterns because informative movements only occur in specific regions of the human body, that depend on the type of action being performed. Global (holistic) motion feature pooling methods therefore often result in an action representation with limited discriminative capability. To address this fundamental limitation, we propose an adaptive motion feature pooling scheme that utilizes human poses as side information. Such poses can be detected for instance in assisted living and indoor smart surveillance scenarios. Taking both video sub-volumes for pooling and human pose types as hidden variables, we formulate the motion feature pooling problem as a latent structural learning problem where the relationship between the discriminative pooling video sub-volumes and the pose types is learned. The resulting pose adaptive motion feature pooling scheme is extensively tested on assisted living and smart surveillance datasets and on general action recognition benchmarks. Improved action recognition and detection performances are demonstrated.

previous article Locally Orderless Tracking

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

We also offline test other spatial partition schemes including: (1) vertical-4region-overlap scheme; (2) vertical-3region-nonoverlapscheme; and (3) vertical-3region-overlap/horizontal-2region-nonoverlap scheme (simply performing a horizontal cut in the middle on the current partition scheme used in this work). The results show that the overlapping partition scheme is better than the non-overlapping version and the six-region partition scheme (i.e., vertical-3region-overlap/horizontal-2region-nonoverlap scheme) only slightly outperforms the currently used three-region partition scheme, but with much higher computational cost. Therefore, in this work, we use the currentvertical-3region-overlap partition scheme, which is also naturally corresponding to the head-upper torso, torso, and lower torso-leg regions.

We have offline tested our poselet key-framing implementation on the UT-Interaction dataset, our recognition result (accuracy on half videos) on that dataset is \(71.5\,\%\) which is comparable with the result reported in the original work (Raptis and Sigal 2013), i.e., \(73.3\,\%\). Note that the manual annotations in Raptis and Sigal (2013) are not available.

Andrews, S., & Tsochantaridis, I. (2003). Support vector machines for multiple instance learning. In: Advances in neural information processing systems (pp. 561–568). MIT Press.

Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In: International conference on computer vision, URL http://www.eecs.berkeley.edu/~lbourdev/poselets

Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent System and Technology, 2(27), 1–27.CrossRef

Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2011). Hierarchical matching with side information for image classification. In: International conference on computer vision and pattern recognition.

Choi, J., Jeon, W.J., & Lee, S.C. (2008). Spatio-temporal pyramid matching for sports videos. In: ACM multimedia information retrieval.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: International conference on computer vision and pattern recognition (pp. 886–893).

Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In: VS-PETS.

Duchenne, O., Laptev, I., Sivic, J., Bach, F., & Ponce, J. (2009). Automatic annotation of human actions in video. In: International conference on computer vision (pp. 1491–1498).

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef

Girshick, R.B., Felzenszwalb, P.F., & McAllester, D. (2012). Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/~rbg/latent-release5/

Yang, J., YG., Yu, K., Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In: International conference on computer vision and pattern recognition.

Jiang, Y., Yuan, J., Yu, G. (2012). Randomized spatial partition for scene recognition. In: European conference on computer vision.

Kanan, C., Cottrell, G. (2010). Robust classification of objects, faces, and flowers using natural image statistics. In: International conference on computer vision and pattern recognition.

Klaser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d gradients. In: British machine vision conference.

Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB: A large video database for human motion recognition. In: International conference on computer vision.

Laptev, I., Lindeberg, T. (2003). Space-time interest points. In: International conference on computer vision.

Lazebnik, S., Schmid, C., Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: International conference on computer vision and pattern recognition.

Lv, F., & Nevatia, R. (2007). Single view human action recognition using key pose matching and viterbi path searching. In: International conference computer vision and pattern recognition.

Marszalek, M., Laptev, I., & Schmid, C. (2009). Actions in context. In: International conference on computer vision and pattern recognition (pp. 2929–2936). Retrieved June, 2009.

Ni, B., Wang, G., & Moulin, P. (2011). RGBD-HuDaAct: A color-depth video database for human daily activity recognition. In: ICCV workshops (pp. 1147–1153).

Niebles, J.C., Chen, C.W., & Fei-fei, L. (2010). Modeling temporal structure of decomposable motion segments for activity classification. In: European conference on computer vision (pp. 392–405).

Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In: European conference on computer vision (pp. 143–156).

Raptis, M., & Sigal, L. (2013). Poselet key-framing: A model for human activity recognition. In: International conference on computer vision and pattern recognition (pp. 2650–2657).

Raptis, M., Kokkinos, I., & Soatto, S. (2012). Discovering discriminative action parts from mid-level video representations. In: International conference on computer vision and pattern recognition.

Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In: European conference on computer vision.

Ryoo, M.S., & Aggarwal, J. (2009). Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International conference on computer vision (pp. 1593–1600).

Satkin, S., Hebert, M. (2010). Modeling the temporal extent of actions. In: European conference on computer vision (pp. 536–548).

Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local SVM approach. In: International conference on pattern recognition.

Shi, Q., Wang, L., Cheng, L., & Smola, A. (2011). Discriminative human action segmentation and recognition using semi-markov models. International Journal of Computer Vision, 93(1), 22–32.CrossRefMATH

Shimada, A., Kondo, K., Deguchi, D., Morin, G., & Stern, H. (2013). Kitchen scene context based gesture recognition: A contest in icpr2012. In: Advances in depth image analysis and applications. (vol. 7854), (pp. 168–185), URL http://www.murase.m.is.nagoya-u.ac.jp/KSCGR/index.html

Tang, K., Fei-fei, L., & Koller, D. (2012). Learning latent temporal structure for complex event detection. In: International conference on computer vision and pattern recognition.

Vahdat, A., Gao, B., Ranjbar, M., & Mori, G. (2011). A discriminative key pose sequence model for recognizing human interactions. In: ICCV workshop (pp. 1729–1736).

Wang, G., & Forsyth, D. (2009). Joint learning of visual attributes, object classes and visual saliency. In: International conference on computer vision.

Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In: International conference on computer vision.

Wang, H., Kläser, A., Schmid, C., & Cheng-Lin, L. (2011). Action recognition by dense trajectories. In: International conference on computer vision and pattern recognition (pp. 3169–3176).

Wang, H., Kläser, A., Schmid, C., & Liu, C. L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1), 60–79.CrossRefMathSciNet

Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In: International conference on computer vision and pattern recognition (pp. 1290–1297).

Wang, Y., & Mori, G. (2011). Hidden part models for human action recognition: Probabilistic versus max margin. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1310–1323.CrossRef

Wolf, C., Mille, J., Lombardi, L., Celiktutan, O., Jiu, M., Baccouche, M., Dellandrea, E., Bichot, C., Garcia, C., & Sankur, B. (2012). The liris human activities dataset and the icpr 2012 human activities recognition and localization competition. Technical report RR-LIRIS-2012-004, LIRIS laboratory, URL http://liris.cnrs.fr/harl2012/evaluation.html

Yakhnenko, O., & Verbeek, J. (2011). Region-based image classification with a latent SVM model. Technical report, INRIA.

Yamato, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using hidden markov model. In: International conference on computer vision and pattern recognition (pp. 379–385).

Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728–1743.CrossRef

Title: Pose Adaptive Motion Feature Pooling for Human Action Analysis
Authors: Bingbing Ni
Pierre Moulin
Shuicheng Yan
Publication date: 01-01-2015
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 2/2015
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-014-0742-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2015

Locally Orderless Tracking

An Elastic Deformation Field Model for Object Detection and Tracking

Local Alignments for Fine-Grained Categorization

Robust Visual Tracking Via Consistent Low-Rank Sparse Learning

Learning Complementary Saliency Priors for Foreground Object Segmentation in Complex Scenes

Premium Partner