Skip to main content
Erschienen in: Neural Processing Letters 1/2020

02.08.2019

Action Recognition with Multiple Relative Descriptors of Trajectories

verfasst von: Zhongke Liao, Haifeng Hu, Yichu Liu

Erschienen in: Neural Processing Letters | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dense trajectory has become one of the most successful hand-crafted features for action recognition. However, most of the existing dense trajectories based methods ignore the relationship between trajectories. In this paper, we propose multiple relative descriptors of trajectories to model the relative information of pairs of trajectories. Specifically, we present relative motion descriptors and relative location descriptors, which are utilized to capture the relative motion information and relative location information respectively. Moreover, we present relative deep feature descriptors which combine the deep features with hand-crafted features. By aggregating the above descriptors, we obtain the fixed-length representation regardless of the various duration of input video. The experimental results on three standard datasets demonstrate the superiority of our method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang H, Klaser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79 Wang H, Klaser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103:60–79
2.
Zurück zum Zitat Wang H, Schmid C (2013) Action recognition with improved trajectories. In: International conference on computer vision, pp 3551–3558 Wang H, Schmid C (2013) Action recognition with improved trajectories. In: International conference on computer vision, pp 3551–3558
3.
Zurück zum Zitat Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2555–2562 Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2555–2562
4.
Zurück zum Zitat Ramana Murthy OV, Goecke R (2013) Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition works, pp 412–419 Ramana Murthy OV, Goecke R (2013) Ordered trajectories for large scale human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition works, pp 412–419
5.
Zurück zum Zitat Seo JJ, Son J, Kim H, Neve WD, Ro YM (2015) Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories. In: Proceedings of IEEE international conference on automatic face and gesture recognition works, pp 1–6 Seo JJ, Son J, Kim H, Neve WD, Ro YM (2015) Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories. In: Proceedings of IEEE international conference on automatic face and gesture recognition works, pp 1–6
6.
Zurück zum Zitat Wang LM, Qiao Y, Tang XO (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE conference on computer vision and pattern recognition IEEE computer society, pp 4305–4314 Wang LM, Qiao Y, Tang XO (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE conference on computer vision and pattern recognition IEEE computer society, pp 4305–4314
7.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576
8.
Zurück zum Zitat Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X (2016) Temporal segment networks: towards good practices for deep action recognition. In: ECCV, 22(1):20–36 Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X (2016) Temporal segment networks: towards good practices for deep action recognition. In: ECCV, 22(1):20–36
9.
Zurück zum Zitat Zhu W, Hu J, Sun G, Cao X, Qiao YA key volume mining deep framework for action recognition. In: Computer vision and pattern recognition. IEEE, pp 1991–1999 Zhu W, Hu J, Sun G, Cao X, Qiao YA key volume mining deep framework for action recognition. In: Computer vision and pattern recognition. IEEE, pp 1991–1999
10.
Zurück zum Zitat Varol G, Laptev I, Schmid C (2016) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 99:1 Varol G, Laptev I, Schmid C (2016) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 99:1
11.
Zurück zum Zitat Diba A, Sharma V, Gool LV (2017) Deep temporal linear encoding networks. In: CVPR, pp 1541–1550 Diba A, Sharma V, Gool LV (2017) Deep temporal linear encoding networks. In: CVPR, pp 1541–1550
12.
Zurück zum Zitat Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: IEEE conference on compute vision and pattern recognition, pp 379–385 Yamato J, Ohya J, Ishii K (1992) Recognizing human actions in time-sequential images using hidden Markov model. In: IEEE conference on compute vision and pattern recognition, pp 379–385
13.
Zurück zum Zitat Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS
14.
Zurück zum Zitat Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 492–4976 Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 492–4976
15.
Zurück zum Zitat Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis Comput 32(8):453–464 Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis Comput 32(8):453–464
16.
Zurück zum Zitat Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: BMVC Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: BMVC
17.
Zurück zum Zitat Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1725–1732 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1725–1732
18.
Zurück zum Zitat Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of ICCV Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of ICCV
19.
Zurück zum Zitat Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of CVPR Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of CVPR
20.
Zurück zum Zitat Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of CVPR Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of CVPR
21.
Zurück zum Zitat Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition, pp 1933–1941 Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition, pp 1933–1941
22.
Zurück zum Zitat Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1470–1477 Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1470–1477
23.
Zurück zum Zitat Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105:222–245 Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105:222–245
24.
Zurück zum Zitat Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407 Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407
25.
Zurück zum Zitat Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3304–3311 Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3304–3311
26.
Zurück zum Zitat Bay H, Ess A, Tuytelaars T (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359 Bay H, Ess A, Tuytelaars T (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359
27.
Zurück zum Zitat Fischler A, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395 Fischler A, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
28.
Zurück zum Zitat Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, Heidelberg, pp 428–441 Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European conference on computer vision. Springer, Heidelberg, pp 428–441
29.
Zurück zum Zitat Wang X, Gao L, Song J, Shen H (2016) Beyond frame-level cnn: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514 Wang X, Gao L, Song J, Shen H (2016) Beyond frame-level cnn: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514
30.
Zurück zum Zitat Liu J, Luo J, Shah M (Jun. 2009) Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE conference on CVPR, pp 1996–2003 Liu J, Luo J, Shah M (Jun. 2009) Recognizing realistic actions from videos in the wild. In: Proceedings of IEEE conference on CVPR, pp 1996–2003
31.
Zurück zum Zitat Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc 28(1):100–108 Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc 28(1):100–108
32.
Zurück zum Zitat Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: ICCV Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. In: ICCV
33.
Zurück zum Zitat Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, vol 3, pp 32-36 Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, vol 3, pp 32-36
34.
Zurück zum Zitat Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of ICCV Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of ICCV
35.
Zurück zum Zitat Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Human behavior understanding, pp 29–39 Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: Human behavior understanding, pp 29–39
36.
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding, CoRR, arXiv:1408.5093 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding, CoRR, arXiv:​1408.​5093
37.
Zurück zum Zitat Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L 1 optical flow. In: Pattern recognition, pp 214–223 Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L 1 optical flow. In: Pattern recognition, pp 214–223
38.
Zurück zum Zitat Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild, Technical Report CRCV-TR-12-01, UCF Center for Research in Computer Vision Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild, Technical Report CRCV-TR-12-01, UCF Center for Research in Computer Vision
39.
Zurück zum Zitat Gkioxari G, Malik J (2015) Finding action tubes. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 759–768 Gkioxari G, Malik J (2015) Finding action tubes. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 759–768
40.
Zurück zum Zitat Cheron G, Laptev I, Schmid C (2015) P-CNN: pose-based CNN features for action recognition. In: IEEE international conference on computer vision, pp 3218–3226 Cheron G, Laptev I, Schmid C (2015) P-CNN: pose-based CNN features for action recognition. In: IEEE international conference on computer vision, pp 3218–3226
41.
Zurück zum Zitat Rodriguez M, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–8 Rodriguez M, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–8
42.
Zurück zum Zitat Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision, pp 1–8 Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision, pp 1–8
43.
Zurück zum Zitat Grushin A, Monner DD, Reggia JA, Mishra A (2013) Robust human action recognition via long short-term memory. In: International joint conference on neural networks, pp 1–8 Grushin A, Monner DD, Reggia JA, Mishra A (2013) Robust human action recognition via long short-term memory. In: International joint conference on neural networks, pp 1–8
44.
Zurück zum Zitat Veeriah V, Zhuang NF, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: IEEE international conference on computer vision, pp 4041–4049 Veeriah V, Zhuang NF, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: IEEE international conference on computer vision, pp 4041–4049
Metadaten
Titel
Action Recognition with Multiple Relative Descriptors of Trajectories
verfasst von
Zhongke Liao
Haifeng Hu
Yichu Liu
Publikationsdatum
02.08.2019
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 1/2020
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-019-10091-z

Weitere Artikel der Ausgabe 1/2020

Neural Processing Letters 1/2020 Zur Ausgabe

Neuer Inhalt