Skip to main content
Top

2014 | OriginalPaper | Chapter

Fine-Grained Activity Recognition with Holistic and Pose Based Features

Authors : Leonid Pishchulin, Mykhaylo Andriluka, Bernt Schiele

Published in: Pattern Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Holistic methods based on dense trajectories [29, 30] are currently the de facto standard for recognition of human activities in video. Whether holistic representations will sustain or will be superseded by higher level video encoding in terms of body pose and motion is the subject of an ongoing debate [12]. In this paper we aim to clarify the underlying factors responsible for good performance of holistic and pose-based representations. To that end we build on our recent dataset [2] leveraging the existing taxonomy of human activities. This dataset includes \(24,920\) video snippets covering \(410\) human activities in total. Our analysis reveals that holistic and pose-based methods are highly complementary, and their performance varies significantly depending on the activity. We find that holistic methods are mostly affected by the number and speed of trajectories, whereas pose-based methods are mostly influenced by viewpoint of the person. We observe striking performance differences across activities: for certain activities results with pose-based features are more than twice as accurate compared to holistic features, and vice versa. The best performing approach in our comparison is based on the combination of holistic and pose-based approaches, which again underlines their complementarity.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ainsworth, B., Haskell, W., Herrmann, S., Meckes, N., Bassett, D., Tudor-Locke, C., Greer, J., Vezina, J., Whitt-Glover, M., Leon, A.: 2011 compendium of physical activities: a second update of codes and MET values. MSSE 43(8), 1575–1581 (2011) Ainsworth, B., Haskell, W., Herrmann, S., Meckes, N., Bassett, D., Tudor-Locke, C., Greer, J., Vezina, J., Whitt-Glover, M., Leon, A.: 2011 compendium of physical activities: a second update of codes and MET values. MSSE 43(8), 1575–1581 (2011)
2.
go back to reference Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human poseestimation: new benchmark and state of the art analysis. In: CVPR’14 Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human poseestimation: new benchmark and state of the art analysis. In: CVPR’14
3.
go back to reference Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV’11 Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV’11
4.
go back to reference Cardinaux, F., Bhowmik, D., Abhayaratne, C., Hawley, M.S.: Video based technology for ambient assisted living: a review of the literature. J. Ambient Intell. Smart Environ. 3(3), 253–269 (2011) Cardinaux, F., Bhowmik, D., Abhayaratne, C., Hawley, M.S.: Video based technology for ambient assisted living: a review of the literature. J. Ambient Intell. Smart Environ. 3(3), 253–269 (2011)
5.
go back to reference Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Xavier Roca, F.: A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: ICCV’11 Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Xavier Roca, F.: A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: ICCV’11
6.
go back to reference Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection.In: CVPR’05 Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection.In: CVPR’05
7.
go back to reference Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV’06 Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV’06
8.
go back to reference Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation usingbody parts dependent joint regressors. In: CVPR’13 Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation usingbody parts dependent joint regressors. In: CVPR’13
9.
go back to reference Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV’09 Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV’09
11.
go back to reference Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR’08 Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR’08
12.
go back to reference Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV’13 Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV’13
13.
go back to reference Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011) Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011)
14.
go back to reference Laptev, I.: On space-time interest points. IJCV 64(2/3), 107–123 (2005) Laptev, I.: On space-time interest points. IJCV 64(2/3), 107–123 (2005)
15.
go back to reference Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistichuman actions from movies. In: CVPR’08 Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistichuman actions from movies. In: CVPR’08
16.
go back to reference Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in thewild. In: CVPR’09 Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in thewild. In: CVPR’09
17.
go back to reference Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR’09 Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR’09
18.
go back to reference Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditionedpictorial structures. In: CVPR’13 Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditionedpictorial structures. In: CVPR’13
19.
go back to reference Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV’13 Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV’13
20.
go back to reference Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR’08 Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR’08
21.
go back to reference Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR’12 Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR’12
22.
go back to reference Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer andzero-shot learning in a large-scale setting. In: CVPR’11 Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer andzero-shot learning in a large-scale setting. In: CVPR’11
23.
go back to reference Sadanand, S., J., C.J.: Action bank: a high-level representation of activity in video. In: ECCV’12 Sadanand, S., J., C.J.: Action bank: a high-level representation of activity in video. In: ECCV’12
24.
go back to reference Sapp, B., Taskar, B.: Multimodal decomposable models for human pose estimation. In: CVPR’13 Sapp, B., Taskar, B.: Multimodal decomposable models for human pose estimation. In: CVPR’13
25.
go back to reference Singh, V.K., Nevatia, R.: Action recognition in cluttered dynamic scenes usingpose-specific part models. In: ICCV’11 Singh, V.K., Nevatia, R.: Action recognition in cluttered dynamic scenes usingpose-specific part models. In: ICCV’11
26.
go back to reference Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human action classes from videos in the wild. Technical report CRCV-TR-12-01, UCF (2012) Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human action classes from videos in the wild. Technical report CRCV-TR-12-01, UCF (2012)
27.
go back to reference Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR’10 Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR’10
28.
go back to reference Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. VC 29(10), 983–1009 (2013) Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. VC 29(10), 983–1009 (2013)
29.
go back to reference Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)CrossRef Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)CrossRef
30.
go back to reference Wang, H., Schmid, C.: Action recognition with improved trajectories. In:ICCV’13 Wang, H., Schmid, C.: Action recognition with improved trajectories. In:ICCV’13
31.
go back to reference Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation oflocal spatio-temporal features for action recognition. In: BMVC’09 Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation oflocal spatio-temporal features for action recognition. In: BMVC’09
32.
go back to reference Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 61(1), 55–79 (2013) Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 61(1), 55–79 (2013)
Metadata
Title
Fine-Grained Activity Recognition with Holistic and Pose Based Features
Authors
Leonid Pishchulin
Mykhaylo Andriluka
Bernt Schiele
Copyright Year
2014
DOI
https://doi.org/10.1007/978-3-319-11752-2_56

Premium Partner