Skip to main content
Erschienen in: Neural Computing and Applications 15/2021

09.02.2021 | Original Article

DM-CTSA: a discriminative multi-focused and complementary temporal/spatial attention framework for action recognition

verfasst von: Ming Tong, Kaibo Yan, Lei Jin, Xing Yue, Mingyang Li

Erschienen in: Neural Computing and Applications | Ausgabe 15/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Video-based human action recognition remains a challenging task. There are three main limitations: (1) Most works are only restricted to single temporal scale modeling. (2) Although a few methods consider multilevel motion features, they disregard the fact that different features usually contribute differently. (3) Most attention mechanisms only notice important regions in frames without concerning the spatial structure information around them. To address these issues, a discriminative multi-focused and complementary temporal/spatial attention framework is presented, which consists of the multi-focused temporal attention network with multi-granularity loss (M2TEAN) and complementary spatial attention network with co-classification loss (C2SPAN). Firstly, M2TEAN not only focuses on discriminative multilevel motion features, but also highlights more discriminative features among them. Specifically, a short-term discriminative attention sub-network and a middle-term consistent attention sub-network are, respectively, constructed to focus on discriminative short-term and middle-term features. A long-term evolutive attention sub-network is proposed to focus on long-term action evolution over time. Followed by a multi-focused temporal attention module, more discriminative features are ulteriorly highlighted. Secondly, C2SPAN captures discriminative regions in frames, while mining the spatial structure information around them. Experiments reveal that our methods produce state-of-the-art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504–2517CrossRef Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504–2517CrossRef
2.
Zurück zum Zitat Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: Proceedings of the European conference on computer vision (ECCV). pp 721–736 Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: Proceedings of the European conference on computer vision (ECCV). pp 721–736
3.
Zurück zum Zitat Hu JF, Zheng WS, Pan J, Lai J, Zhang J (2018) Deep bilinear learning for rgb-d action recognition. In: Proceedings of the European conference on computer vision (ECCV). pp 335–351 Hu JF, Zheng WS, Pan J, Lai J, Zhang J (2018) Deep bilinear learning for rgb-d action recognition. In: Proceedings of the European conference on computer vision (ECCV). pp 335–351
4.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 886–893
5.
Zurück zum Zitat Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1–8 Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1–8
6.
Zurück zum Zitat Crasto N, Weinzaepfel P, Alahari K, Schmid C (2019) MARS: motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 7882–7891 Crasto N, Weinzaepfel P, Alahari K, Schmid C (2019) MARS: motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 7882–7891
7.
Zurück zum Zitat Song S, Liu J, Li Y, Guo Z (2020) Modality compensation network: cross-modal adaptation for action recognition. IEEE Trans Image Process 29:3957–3969CrossRef Song S, Liu J, Li Y, Guo Z (2020) Modality compensation network: cross-modal adaptation for action recognition. IEEE Trans Image Process 29:3957–3969CrossRef
8.
Zurück zum Zitat Du Y, Yuan C, Li B, Zhao L, Li Y, Hu W (2018) Interaction-aware spatio-temporal pyramid attention networks for action classification. In: Proceedings of the European conference on computer vision (ECCV). pp 373–389 Du Y, Yuan C, Li B, Zhao L, Li Y, Hu W (2018) Interaction-aware spatio-temporal pyramid attention networks for action classification. In: Proceedings of the European conference on computer vision (ECCV). pp 373–389
9.
Zurück zum Zitat Zhu Y, Li R, Yang Y, Ye N (2020) Learning cascade attention for fine-grained image classification. Neural Netw 122:174–182CrossRef Zhu Y, Li R, Yang Y, Ye N (2020) Learning cascade attention for fine-grained image classification. Neural Netw 122:174–182CrossRef
10.
Zurück zum Zitat Georgakopoulos SV, Kottari K, Delibasis K, Plagianakos VP, Maglogiannis I (2019) Improving the performance of convolutional neural network for skin image classification using the response of image analysis filters. Neural Comput Appl 31(6):1805–1822CrossRef Georgakopoulos SV, Kottari K, Delibasis K, Plagianakos VP, Maglogiannis I (2019) Improving the performance of convolutional neural network for skin image classification using the response of image analysis filters. Neural Comput Appl 31(6):1805–1822CrossRef
11.
Zurück zum Zitat Takikawa T, Acuna D, Jampani V, Fidler S (2019) Gated-scnn: Gated shape cnns for semantic segmentation. In: Proceedings of IEEE international conference on computer vision ( ICCV). pp 5229–5238 Takikawa T, Acuna D, Jampani V, Fidler S (2019) Gated-scnn: Gated shape cnns for semantic segmentation. In: Proceedings of IEEE international conference on computer vision ( ICCV). pp 5229–5238
12.
Zurück zum Zitat Tokunaga H, Acuna D, Jampani V, Fidler S (2019) Adaptive weighting multi-field-of-view CNN for semantic segmentation in pathology. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 12597–12606 Tokunaga H, Acuna D, Jampani V, Fidler S (2019) Adaptive weighting multi-field-of-view CNN for semantic segmentation in pathology. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 12597–12606
13.
Zurück zum Zitat Yang M, Wen W, Wang X, Shen L, Gao G (2020) Adaptive convolution local and global learning for class-level joint representation of facial recognition with a single sample per data subject. IEEE Trans Inf Forensics Secur 15:2469–2484CrossRef Yang M, Wen W, Wang X, Shen L, Gao G (2020) Adaptive convolution local and global learning for class-level joint representation of facial recognition with a single sample per data subject. IEEE Trans Inf Forensics Secur 15:2469–2484CrossRef
15.
Zurück zum Zitat Zhou L, Gu X (2020) Embedding topological features into convolutional neural network salient object detection. Neural Netw 121:308–318CrossRef Zhou L, Gu X (2020) Embedding topological features into convolutional neural network salient object detection. Neural Netw 121:308–318CrossRef
16.
Zurück zum Zitat Zhang H, Guo H, Wang X, Ji Y, Wu QJ (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48CrossRef Zhang H, Guo H, Wang X, Ji Y, Wu QJ (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48CrossRef
17.
Zurück zum Zitat Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380CrossRef Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31(11):7361–7380CrossRef
19.
Zurück zum Zitat He N, Fang L, Li S, Plaza J, Plaza A (2019) Skip-connected covariance network for remote sensing scene classification. IEEE Trans Neural Netw Learn Syst 31(5):1461–1474CrossRef He N, Fang L, Li S, Plaza J, Plaza A (2019) Skip-connected covariance network for remote sensing scene classification. IEEE Trans Neural Netw Learn Syst 31(5):1461–1474CrossRef
20.
Zurück zum Zitat Li C, Zhong Q, Xie D, Pu S (2019) Collaborative spatiotemporal feature learning for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 7872–7881 Li C, Zhong Q, Xie D, Pu S (2019) Collaborative spatiotemporal feature learning for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 7872–7881
21.
Zurück zum Zitat Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of IEEE international conference on computer vision (ICCV). pp 4489–4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of IEEE international conference on computer vision (ICCV). pp 4489–4497
22.
Zurück zum Zitat Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the Kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 4724–4733 Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the Kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 4724–4733
23.
Zurück zum Zitat Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 677–691 Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 677–691
24.
Zurück zum Zitat Wang Q, Chen K (2020) Multi-label zero-shot human action recognition via joint latent ranking embedding. Neural Netw 122:1–23CrossRef Wang Q, Chen K (2020) Multi-label zero-shot human action recognition via joint latent ranking embedding. Neural Netw 122:1–23CrossRef
25.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of advances in neural information processing systems (NIPS). pp 568–576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of advances in neural information processing systems (NIPS). pp 568–576
26.
Zurück zum Zitat Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1933–1941 Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1933–1941
27.
Zurück zum Zitat Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Applied Soft Computing 86:105820CrossRef Dai C, Liu X, Lai J (2020) Human action recognition using two-stream attention based LSTM networks. Applied Soft Computing 86:105820CrossRef
28.
Zurück zum Zitat Lu M, Li ZN, Wang Y, Pan G (2019) Deep attention network for egocentric action recognition. IEEE Trans Image Process 28(8):3703–3713MathSciNetCrossRef Lu M, Li ZN, Wang Y, Pan G (2019) Deep attention network for egocentric action recognition. IEEE Trans Image Process 28(8):3703–3713MathSciNetCrossRef
29.
Zurück zum Zitat Rahimi S, Aghagolzadeh A, Ezoji M (2020) Human action recognition using double discriminative sparsity preserving projections and discriminant ridge-based classifier based on the GDWL-l1 graph. Expert Syst with Appl 141:112927CrossRef Rahimi S, Aghagolzadeh A, Ezoji M (2020) Human action recognition using double discriminative sparsity preserving projections and discriminant ridge-based classifier based on the GDWL-l1 graph. Expert Syst with Appl 141:112927CrossRef
30.
Zurück zum Zitat Naveenkumar N, Domnic S (2020) Deep ensemble network using distance maps and body part features for skeleton based action recognition. Pattern Recognit 100:107125CrossRef Naveenkumar N, Domnic S (2020) Deep ensemble network using distance maps and body part features for skeleton based action recognition. Pattern Recognit 100:107125CrossRef
31.
Zurück zum Zitat Li Y, Song S, Li Y, Liu J (2019) Temporal bilinear networks for video action recognition. Proce AAAI Conf Artif Intell 33:8674–8681 Li Y, Song S, Li Y, Liu J (2019) Temporal bilinear networks for video action recognition. Proce AAAI Conf Artif Intell 33:8674–8681
32.
Zurück zum Zitat Zhang H, Liu D, Xiong Z (2019) Two-stream action recognition-oriented video super-resolution. In: Proceedings of IEEE international conference on computer vision (ICCV). pp 8799–8808 Zhang H, Liu D, Xiong Z (2019) Two-stream action recognition-oriented video super-resolution. In: Proceedings of IEEE international conference on computer vision (ICCV). pp 8799–8808
33.
Zurück zum Zitat Li L, Zhang Z, Huang Y, Wang L (2018) Deep temporal feature encoding for action recognition. In: 2018 24th international conference on pattern recognition (ICPR). pp 1109–1114 Li L, Zhang Z, Huang Y, Wang L (2018) Deep temporal feature encoding for action recognition. In: 2018 24th international conference on pattern recognition (ICPR). pp 1109–1114
34.
Zurück zum Zitat Zhu J, Zhu Z, Zou W (2018) End-to-end video-level representation learning for action recognition. In: 2018 24th international conference on pattern recognition (ICPR). pp 645–650 Zhu J, Zhu Z, Zou W (2018) End-to-end video-level representation learning for action recognition. In: 2018 24th international conference on pattern recognition (ICPR). pp 645–650
35.
Zurück zum Zitat Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van GL (2016) Temporal segment networks: Towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV). pp 20–36 Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van GL (2016) Temporal segment networks: Towards good practices for deep action recognition. In: Proceedings of the European conference on computer vision (ECCV). pp 20–36
36.
Zurück zum Zitat Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1924–1932 Fernando B, Anderson P, Hutter M, Gould S (2016) Discriminative hierarchical rank pooling for activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1924–1932
37.
Zurück zum Zitat Bilen H, Fernando B, Gavves E, Vedaldi A (2018) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 40:2799–2813CrossRef Bilen H, Fernando B, Gavves E, Vedaldi A (2018) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 40:2799–2813CrossRef
38.
Zurück zum Zitat Meng L, Zhao B, Chang B, Huang G, Sun W, Tung F, Sigal L (2019) Interpretable spatio-temporal attention for video action recognition. In: Proceedings of the IEEE international conference on computer vision workshops (ICCV Workshops) Meng L, Zhao B, Chang B, Huang G, Sun W, Tung F, Sigal L (2019) Interpretable spatio-temporal attention for video action recognition. In: Proceedings of the IEEE international conference on computer vision workshops (ICCV Workshops)
39.
Zurück zum Zitat Song S, Lan C, Xing J, Zeng W, Liu J (2018) Spatio-temporal attention-based LSTM networks for 3D action recognition and detection. IEEE Trans Image Process 27(7):3459–3471MathSciNetCrossRef Song S, Lan C, Xing J, Zeng W, Liu J (2018) Spatio-temporal attention-based LSTM networks for 3D action recognition and detection. IEEE Trans Image Process 27(7):3459–3471MathSciNetCrossRef
40.
Zurück zum Zitat Yang H, Yuan C, Zhang L, Sun Y, Hu W, Maybank SJ (2020) STA-CNN: Convolutional spatial-temporal attention learning for action recognition. IEEE Trans Image Process 29:5783–5793CrossRef Yang H, Yuan C, Zhang L, Sun Y, Hu W, Maybank SJ (2020) STA-CNN: Convolutional spatial-temporal attention learning for action recognition. IEEE Trans Image Process 29:5783–5793CrossRef
41.
Zurück zum Zitat Ni B, Li T, Yang X (2018) Learning semantic-aligned action representation. IEEE Trans Neural Netw Learn Syst 29(8):3715–3725CrossRef Ni B, Li T, Yang X (2018) Learning semantic-aligned action representation. IEEE Trans Neural Netw Learn Syst 29(8):3715–3725CrossRef
42.
Zurück zum Zitat Li D, Qiu Z, Dai Q, Yao T, Mei T (2018) Recurrent tubelet proposal and recognition networks for action detection. In: Proceedings of the European conference on computer vision (ECCV). pp 303–318 Li D, Qiu Z, Dai Q, Yao T, Mei T (2018) Recurrent tubelet proposal and recognition networks for action detection. In: Proceedings of the European conference on computer vision (ECCV). pp 303–318
43.
Zurück zum Zitat Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1227–1236 Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 1227–1236
45.
Zurück zum Zitat Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of IEEE international conference on computer vision (ICCV). pp 2556–2563 Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of IEEE international conference on computer vision (ICCV). pp 2556–2563
46.
Zurück zum Zitat Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: Proceedings of NIPS workshop. pp 1–4 Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: Proceedings of NIPS workshop. pp 1–4
47.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 2818–2826
48.
Zurück zum Zitat Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 248–255 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 248–255
49.
Zurück zum Zitat Zach C, Pock T, Bischof H (2007) A duality based approach for realtime tv-l 1 optical flow. In: Proceedings of the 29th DAGM symposium on pattern recognition. pp 214–223 Zach C, Pock T, Bischof H (2007) A duality based approach for realtime tv-l 1 optical flow. In: Proceedings of the 29th DAGM symposium on pattern recognition. pp 214–223
50.
Zurück zum Zitat Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2018) Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans Image Process 27(5):2326–2339MathSciNetCrossRef Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2018) Real-time action recognition with deeply transferred motion vector CNNs. IEEE Trans Image Process 27(5):2326–2339MathSciNetCrossRef
51.
Zurück zum Zitat Wei D, Lim JJ, Zisserman A, Freeman WT (2018) Learning and using the arrow of time. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 8052–8060 Wei D, Lim JJ, Zisserman A, Freeman WT (2018) Learning and using the arrow of time. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 8052–8060
52.
Zurück zum Zitat Li D, Yao T, Duan LY, Mei T, Rui Y (2018) Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans Multimed 21(2):416–428CrossRef Li D, Yao T, Duan LY, Mei T, Rui Y (2018) Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans Multimed 21(2):416–428CrossRef
53.
Zurück zum Zitat Chen L, Song Z, Lu J, Zhou J (2019) Learning principal orientations and residual descriptor for action recognition. Pattern Recognit 86:14–26CrossRef Chen L, Song Z, Lu J, Zhou J (2019) Learning principal orientations and residual descriptor for action recognition. Pattern Recognit 86:14–26CrossRef
54.
Zurück zum Zitat Zhao J, Snoek CG (2019) Dance with flow: Two-in-one stream action detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 9935–9944 Zhao J, Snoek CG (2019) Dance with flow: Two-in-one stream action detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 9935–9944
55.
Zurück zum Zitat Wang P, Liu L, Shen C, Shen HT (2019) Order-aware convolutional pooling for video based action recognition. Pattern Recognit 91:357–365CrossRef Wang P, Liu L, Shen C, Shen HT (2019) Order-aware convolutional pooling for video based action recognition. Pattern Recognit 91:357–365CrossRef
56.
Zurück zum Zitat Bo Y, Lu Y, He W (2020) Few-shot learning of video action recognition only based on video contents. In: IEEE winter conference on applications of computer vision (WACV). pp 595–604 Bo Y, Lu Y, He W (2020) Few-shot learning of video action recognition only based on video contents. In: IEEE winter conference on applications of computer vision (WACV). pp 595–604
57.
Zurück zum Zitat Girdhar R, Tran D, Torresani L, Ramanan D (2019) Distinit: Learning video representations without a single labeled video. In: Proceedings of the IEEE international conference on computer vision (ICCV). pp 852–861 Girdhar R, Tran D, Torresani L, Ramanan D (2019) Distinit: Learning video representations without a single labeled video. In: Proceedings of the IEEE international conference on computer vision (ICCV). pp 852–861
58.
Zurück zum Zitat Wang C, Fu H, Ling CX, Du P, Ma H (2020) Region-based global reasoning networks. In: Proceedings of the AAAI conference on artificial intelligence. pp 12136–12143 Wang C, Fu H, Ling CX, Du P, Ma H (2020) Region-based global reasoning networks. In: Proceedings of the AAAI conference on artificial intelligence. pp 12136–12143
59.
Zurück zum Zitat Yang H, Yuan C, Li B, Du Y, Xing J, Hu W, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit 85:1–12CrossRef Yang H, Yuan C, Li B, Du Y, Xing J, Hu W, Maybank SJ (2019) Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit 85:1–12CrossRef
61.
Zurück zum Zitat Pang B, Zha K, Cao H, Shi C, Lu C (2019) Deep rnn framework for visual sequential applications. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 423–432 Pang B, Zha K, Cao H, Shi C, Lu C (2019) Deep rnn framework for visual sequential applications. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). pp 423–432
62.
Zurück zum Zitat Hao W, Zhang Z (2019) Spatiotemporal distilled dense-connectivity network for video action recognition. Pattern Recognit 92:13–24CrossRef Hao W, Zhang Z (2019) Spatiotemporal distilled dense-connectivity network for video action recognition. Pattern Recognit 92:13–24CrossRef
Metadaten
Titel
DM-CTSA: a discriminative multi-focused and complementary temporal/spatial attention framework for action recognition
verfasst von
Ming Tong
Kaibo Yan
Lei Jin
Xing Yue
Mingyang Li
Publikationsdatum
09.02.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 15/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-021-05698-0

Weitere Artikel der Ausgabe 15/2021

Neural Computing and Applications 15/2021 Zur Ausgabe