Skip to main content
Erschienen in: Pattern Analysis and Applications 2/2022

Open Access 20.01.2022 | Short Paper

Action recognition by key trajectories

verfasst von: Fernando Camarena, Leonardo Chang, Miguel Gonzalez-Mendoza, Ricardo J Cuevas-Ascencio

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Human action recognition is an active field of research that intends to explain what a subject is doing in an input video. Deep learning architectures serve as the foundation for cutting-edge approaches. Recent research, on the other hand, indicates that hand-crafted characteristics are complementary and, when combined, can enhance classification accuracy. Cutting-edge approaches are based on deep learning architectures. Recent research, however, indicates that hand-crafted features complement each other and can help boost classification accuracy when combined. We introduce the key trajectories approach that is based on the popular, hand-crafted method, improved dense trajectories. Our work explores how pose estimation can be used to find meaningful key points to reduce computational time, undesired noise, and to guarantee a stable frame processing rate. Furthermore, we tested how feature-tracking behaves with dense inverse search and with a frame to frame subject key point estimation. Our proposal was tested on the KTH and UCF11 datasets employing Bag-of-words and on the UCF50 and HMDB datasets using Fisher Vector, where we got an accuracy performance of 95.71, 84.88, 92.9, and 81.3%, respectively. Also, our proposal can recognize subject actions in video eight times faster compared to its dense counterpart. To maximize the bag-of-words classification performance, we illustrate how the hyperparameters affect both accuracy and computation time. Precisely, we present an exploration of the vocabulary size, the SVM hyperparameter, the descriptor’s distinctiveness, and the subject body key points.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359CrossRef Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359CrossRef
2.
Zurück zum Zitat Berlin SJ, John M (2016) Human interaction recognition through deep learning network. In: 2016 IEEE international Carnahan conference on security technology (ICCST), pp 1–4. IEEE Berlin SJ, John M (2016) Human interaction recognition through deep learning network. In: 2016 IEEE international Carnahan conference on security technology (ICCST), pp 1–4. IEEE
3.
Zurück zum Zitat Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2, pp 1395–1402. IEEE Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2, pp 1395–1402. IEEE
4.
Zurück zum Zitat Blunsom P (2004) Hidden markov models. Lect Notes August 15(18–19):48 Blunsom P (2004) Hidden markov models. Lect Notes August 15(18–19):48
6.
Zurück zum Zitat Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR
7.
Zurück zum Zitat Chang L, Pérez-Suárez A, Hernández-Palancar J, Arias-Estrada M, Sucar LE (2017) Improving visual vocabularies: a more discriminative, representative and compact bag of visual words. Informatica 41(3) Chang L, Pérez-Suárez A, Hernández-Palancar J, Arias-Estrada M, Sucar LE (2017) Improving visual vocabularies: a more discriminative, representative and compact bag of visual words. Informatica 41(3)
8.
Zurück zum Zitat Chéron G, Laptev I, Schmid C (2015) P-cnn: pose-based cnn features for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3218–3226 Chéron G, Laptev I, Schmid C (2015) P-cnn: pose-based cnn features for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3218–3226
9.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. IEEE Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893. IEEE
10.
Zurück zum Zitat Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, pp 428–441 Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, pp 428–441
11.
Zurück zum Zitat De Souza CR, Gaidon A, Vig E, López AM (2016) Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition. In: European conference on computer vision. Springer, pp 697–716 De Souza CR, Gaidon A, Vig E, López AM (2016) Sympathy for the details: dense trajectories and hybrid classification architectures for action recognition. In: European conference on computer vision. Springer, pp 697–716
12.
Zurück zum Zitat Du Y, Chen F, Xu W, Li Y (2006) Recognizing interaction activities using dynamic bayesian network. In: 18th international conference on pattern recognition (ICPR’06), vol 1, pp 618–621. IEEE Du Y, Chen F, Xu W, Li Y (2006) Recognizing interaction activities using dynamic bayesian network. In: 18th international conference on pattern recognition (ICPR’06), vol 1, pp 618–621. IEEE
13.
Zurück zum Zitat Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118 Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
14.
Zurück zum Zitat Harris CG, Stephens M, et al (1988) A combined corner and edge detector. In: Alvey vision conference, vol 15, pp 10–5244. Citeseer Harris CG, Stephens M, et al (1988) A combined corner and edge detector. In: Alvey vision conference, vol 15, pp 10–5244. Citeseer
15.
Zurück zum Zitat Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21CrossRef Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 60:4–21CrossRef
16.
Zurück zum Zitat Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1243–1252. IEEE computer Society Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1243–1252. IEEE computer Society
17.
Zurück zum Zitat Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR 2010-23rd IEEE conference on computer vision & pattern recognition, pp 3304–3311. IEEE Computer Society Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: CVPR 2010-23rd IEEE conference on computer vision & pattern recognition, pp 3304–3311. IEEE Computer Society
19.
Zurück zum Zitat Kroeger T, Timofte R, Dai D, Van Gool L (2016) Fast optical flow using dense inverse search. In: European conference on computer vision. Springer, pp 471–488 Kroeger T, Timofte R, Dai D, Van Gool L (2016) Fast optical flow using dense inverse search. In: European conference on computer vision. Springer, pp 471–488
20.
Zurück zum Zitat Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: 2011 international conference on computer vision, pp 2556–2563. IEEE Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: 2011 international conference on computer vision, pp 2556–2563. IEEE
21.
Zurück zum Zitat Lan Z, Lin M, Li X, Hauptmann AG, Raj B (2015) Beyond gaussian pyramid: multi-skip feature stacking for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 204–212 Lan Z, Lin M, Li X, Hauptmann AG, Raj B (2015) Beyond gaussian pyramid: multi-skip feature stacking for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 204–212
22.
Zurück zum Zitat Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123CrossRef Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123CrossRef
23.
Zurück zum Zitat Li Y, Li W, Mahadevan V, Vasconcelos N (2016) Vlad3: encoding dynamics of deep features for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1951–1960 Li Y, Li W, Mahadevan V, Vasconcelos N (2016) Vlad3: encoding dynamics of deep features for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1951–1960
24.
Zurück zum Zitat Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 1996–2003. IEEE Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 1996–2003. IEEE
25.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRef
26.
Zurück zum Zitat Lowe DG et al (1999) Object recognition from local scale-invariant features. In: iccv, vol 99, pp 1150–1157 Lowe DG et al (1999) Object recognition from local scale-invariant features. In: iccv, vol 99, pp 1150–1157
27.
Zurück zum Zitat Luo W, Yang B, Urtasun R (2018) Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3569–3577 Luo W, Yang B, Urtasun R (2018) Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3569–3577
28.
Zurück zum Zitat Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 2929–2936. IEEE Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 2929–2936. IEEE
29.
Zurück zum Zitat Mo L, Li F, Zhu Y, Huang A (2016) Human physical activity recognition based on computer vision with deep learning model. In: 2016 IEEE international instrumentation and measurement technology conference proceedings (I2MTC), pp 1–6. IEEE Mo L, Li F, Zhu Y, Huang A (2016) Human physical activity recognition based on computer vision with deep learning model. In: 2016 IEEE international instrumentation and measurement technology conference proceedings (I2MTC), pp 1–6. IEEE
30.
Zurück zum Zitat Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, pp 841–848 Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, pp 841–848
31.
Zurück zum Zitat Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. In: European conference on computer vision. Springer, pp 581–595 Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. In: European conference on computer vision. Springer, pp 581–595
32.
Zurück zum Zitat Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision. Springer, pp 143–156 Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision. Springer, pp 143–156
33.
Zurück zum Zitat Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981CrossRef Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981CrossRef
35.
Zurück zum Zitat Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3, pp 32–36. IEEE Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3, pp 32–36. IEEE
36.
Zurück zum Zitat Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia, pp 357–360. ACM Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on Multimedia, pp 357–360. ACM
37.
Zurück zum Zitat Shi Y, Tian Y, Wang Y, Huang T (2017) Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans Multimed 19(7):1510–1520CrossRef Shi Y, Tian Y, Wang Y, Huang T (2017) Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans Multimed 19(7):1510–1520CrossRef
38.
Zurück zum Zitat Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: CVPR Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: CVPR
39.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
40.
Zurück zum Zitat Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: null, p 1470. IEEE Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: null, p 1470. IEEE
41.
Zurück zum Zitat Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:​1212.​0402
42.
Zurück zum Zitat Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, p 235. ACM Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, p 235. ACM
43.
Zurück zum Zitat Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049 Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049
44.
Zurück zum Zitat Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 3169–3176. IEEE Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 3169–3176. IEEE
45.
Zurück zum Zitat Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79MathSciNetCrossRef Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79MathSciNetCrossRef
46.
Zurück zum Zitat Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558 Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
47.
Zurück zum Zitat Wang L, Koniusz P, Huynh D (2019) Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In: Proceedings of the 2019 international conference on computer vision. IEEE, Institute of Electrical and Electronics Engineers Wang L, Koniusz P, Huynh D (2019) Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In: Proceedings of the 2019 international conference on computer vision. IEEE, Institute of Electrical and Electronics Engineers
48.
Zurück zum Zitat Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314 Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
49.
Zurück zum Zitat Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: CVPR Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: CVPR
50.
Zurück zum Zitat Willems G, Tuytelaars T, Van Gool L (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision. Springer, pp 650–663 Willems G, Tuytelaars T, Van Gool L (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision. Springer, pp 650–663
51.
Zurück zum Zitat Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2718–2726 Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2718–2726
52.
Zurück zum Zitat Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 2017 Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 2017
53.
Zurück zum Zitat Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European conference on computer vision. Springer, pp 141–154 Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European conference on computer vision. Springer, pp 141–154
Metadaten
Titel
Action recognition by key trajectories
verfasst von
Fernando Camarena
Leonardo Chang
Miguel Gonzalez-Mendoza
Ricardo J Cuevas-Ascencio
Publikationsdatum
20.01.2022
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 2/2022
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-021-01054-z

Weitere Artikel der Ausgabe 2/2022

Pattern Analysis and Applications 2/2022 Zur Ausgabe

Premium Partner