nach oben

Pattern Analysis and Applications

Erschienen in:

Open Access 20.01.2022 | Short Paper

Action recognition by key trajectories

verfasst von: Fernando Camarena, Leonardo Chang, Miguel Gonzalez-Mendoza, Ricardo J Cuevas-Ascencio

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Human action recognition is an active field of research that intends to explain what a subject is doing in an input video. Deep learning architectures serve as the foundation for cutting-edge approaches. Recent research, on the other hand, indicates that hand-crafted characteristics are complementary and, when combined, can enhance classification accuracy. Cutting-edge approaches are based on deep learning architectures. Recent research, however, indicates that hand-crafted features complement each other and can help boost classification accuracy when combined. We introduce the key trajectories approach that is based on the popular, hand-crafted method, improved dense trajectories. Our work explores how pose estimation can be used to find meaningful key points to reduce computational time, undesired noise, and to guarantee a stable frame processing rate. Furthermore, we tested how feature-tracking behaves with dense inverse search and with a frame to frame subject key point estimation. Our proposal was tested on the KTH and UCF11 datasets employing Bag-of-words and on the UCF50 and HMDB datasets using Fisher Vector, where we got an accuracy performance of 95.71, 84.88, 92.9, and 81.3%, respectively. Also, our proposal can recognize subject actions in video eight times faster compared to its dense counterpart. To maximize the bag-of-words classification performance, we illustrate how the hyperparameters affect both accuracy and computation time. Precisely, we present an exploration of the vocabulary size, the SVM hyperparameter, the descriptor’s distinctiveness, and the subject body key points.

Vorheriger Artikel Finger knuckle pattern person authentication system based on monogenic and LPQ features

Nächster Artikel Structured classifier-based dictionary pair learning for pattern classification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359CrossRef

Berlin SJ, John M (2016) Human interaction recognition through deep learning network. In: 2016 IEEE international Carnahan conference on security technology (ICCST), pp 1–4. IEEE

Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2, pp 1395–1402. IEEE

Blunsom P (2004) Hidden markov models. Lect Notes August 15(18–19):48

Camiña JB, Medina-Pérez MA, Monroy R, Loyola-González O, Villanueva LAP, Gurrola LCG (2018) Bagging-randomminer: a one-class classifier for file access-based masquerade detection. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0957-4CrossRef

Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR

Chang L, Pérez-Suárez A, Hernández-Palancar J, Arias-Estrada M, Sucar LE (2017) Improving visual vocabularies: a more discriminative, representative and compact bag of visual words. Informatica 41(3)

Chéron G, Laptev I, Schmid C (2015) P-cnn: pose-based cnn features for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3218–3226

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. IEEE

10.

Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, pp 428–441

11.

De Souza CR, Gaidon A, Vig E, López AM (2016) Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition. In: European conference on computer vision. Springer, pp 697–716

12.

Du Y, Chen F, Xu W, Li Y (2006) Recognizing interaction activities using dynamic bayesian network. In: 18th international conference on pattern recognition (ICPR’06), vol 1, pp 618–621. IEEE

13.

Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118

14.

Harris CG, Stephens M, et al (1988) A combined corner and edge detector. In: Alvey vision conference, vol 15, pp 10–5244. Citeseer

15.

Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21CrossRef

16.

Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1243–1252. IEEE computer Society

17.

Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR 2010-23rd IEEE conference on computer vision & pattern recognition, pp 3304–3311. IEEE Computer Society

18.

Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: 2013 IEEE international conference on computer vision, pp 3192–3199. https://doi.org/10.1109/ICCV.2013.396

19.

Kroeger T, Timofte R, Dai D, Van Gool L (2016) Fast optical flow using dense inverse search. In: European conference on computer vision. Springer, pp 471–488

20.

Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: 2011 international conference on computer vision, pp 2556–2563. IEEE

21.

Lan Z, Lin M, Li X, Hauptmann AG, Raj B (2015) Beyond gaussian pyramid: multi-skip feature stacking for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 204–212

22.

Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123CrossRef

23.

Li Y, Li W, Mahadevan V, Vasconcelos N (2016) Vlad3: encoding dynamics of deep features for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1951–1960

24.

Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 1996–2003. IEEE

25.

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef

26.

Lowe DG et al (1999) Object recognition from local scale-invariant features. In: iccv, vol 99, pp 1150–1157

27.

Luo W, Yang B, Urtasun R (2018) Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3569–3577

28.

Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 2929–2936. IEEE

29.

Mo L, Li F, Zhu Y, Huang A (2016) Human physical activity recognition based on computer vision with deep learning model. In: 2016 IEEE international instrumentation and measurement technology conference proceedings (I2MTC), pp 1–6. IEEE

30.

Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, pp 841–848

31.

Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. In: European conference on computer vision. Springer, pp 581–595

32.

Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision. Springer, pp 143–156

33.

Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981CrossRef

34.

Rodríguez J, Medina-Pérez MA, Gutierrez-Rodríguez AE, Monroy R, Terashima-Marín H (2018) Cluster validation using an ensemble of supervised classifiers. Knowl Based Syst 145:134–144. https://doi.org/10.1016/j.knosys.2018.01.010. http://www.sciencedirect.com/science/article/pii/S0950705118300091

35.

Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3, pp 32–36. IEEE

36.

Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia, pp 357–360. ACM

37.

Shi Y, Tian Y, Wang Y, Huang T (2017) Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans Multimed 19(7):1510–1520CrossRef

38.

Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: CVPR

39.

Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

40.

Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: null, p 1470. IEEE

41.

Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402

42.

Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, p 235. ACM

43.

Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049

44.

Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 3169–3176. IEEE

45.

Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79MathSciNetCrossRef

46.

Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558

47.

Wang L, Koniusz P, Huynh D (2019) Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In: Proceedings of the 2019 international conference on computer vision. IEEE, Institute of Electrical and Electronics Engineers

48.

Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314

49.

Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: CVPR

50.

Willems G, Tuytelaars T, Van Gool L (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision. Springer, pp 650–663

51.

Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2718–2726

52.

Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 2017

53.

Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European conference on computer vision. Springer, pp 141–154

Titel: Action recognition by key trajectories
verfasst von: Fernando Camarena
Leonardo Chang
Miguel Gonzalez-Mendoza
Ricardo J Cuevas-Ascencio
Publikationsdatum: 20.01.2022
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 2/2022
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-021-01054-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2022

Object-based hyperspectral image classification using a new latent block model based on hidden Markov random fields

Single-target visual tracking using color compression and spatially weighted generalized Gaussian mixture models

Correction to: Automatic stress analysis from facial videos based on deep facial action units recognition

Identification of winter road friction coefficient based on multi-task distillation attention network

Gram regularization for sparse and disentangled representation

Correction to: Action recognition by key trajectories

Premium Partner