Skip to main content
Top
Published in: Neural Processing Letters 4/2021

15-05-2021

Boundary Adjusted Network Based on Cosine Similarity for Temporal Action Proposal Generation

Authors: Jingye Zheng, Dihu Chen, Haifeng Hu

Published in: Neural Processing Letters | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Detecting temporal actions in long and untrimmed videos is a challenging and important field in computer vision. Generating high-quality proposals is a key step in temporal action detection. A high-quality proposal usually contains two main characteristics. One is the temporal overlaps between proposals and action instances should be as large as possible. The another one is the number of generated proposals should be as few as possible. Inspired by the similarity comparison in face recognition and the similarity of action in same action segment, we design a module to compare the similarity for visual features extracted from visual feature encoder. We find out time points where the similarity of features changes shapely to generate candidate proposals. Then, we train a classifier to evaluate the candidate proposals whether contains or not contains action instances. The experiments suggest that our method outperforms other temporal action proposal generation methods in THUMOS-14 dataset and ActivityNet-v1.3 dataset. In addition, our method still outperforms other methods when using different visual features extracted from different networks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. Proceedings of the IEEE conference on computer vision and pattern recognition 1049-1058 Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. Proceedings of the IEEE conference on computer vision and pattern recognition 1049-1058
2.
go back to reference Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. Proceedings of the IEEE international conference on computer vision 2914-2923 Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. Proceedings of the IEEE international conference on computer vision 2914-2923
3.
go back to reference Gao J, Chen K, Nevatia R (2018) Ctap: Complementary temporal action proposal generation. Proceedings of the European conference on computer vision 68-83 Gao J, Chen K, Nevatia R (2018) Ctap: Complementary temporal action proposal generation. Proceedings of the European conference on computer vision 68-83
4.
go back to reference Yang K, Qiao P, Li D, et al. (2018) Exploring temporal preservation networks for precise temporal action localization. Thirty-Second AAAI Conference on Artificial Intelligence Yang K, Qiao P, Li D, et al. (2018) Exploring temporal preservation networks for precise temporal action localization. Thirty-Second AAAI Conference on Artificial Intelligence
5.
go back to reference Karaman S, Seidenari L, Del Bimbo A (2014) Fast saliency based pooling of fisher encoded dense trajectories. ECCV THUMOS Workshop. 1(2):5 Karaman S, Seidenari L, Del Bimbo A (2014) Fast saliency based pooling of fisher encoded dense trajectories. ECCV THUMOS Workshop. 1(2):5
6.
go back to reference Oneata D, Verbeek J, Schmid C (2014) The lear submission at thumos. 2014: ECCV THUMOS Workshop Oneata D, Verbeek J, Schmid C (2014) The lear submission at thumos. 2014: ECCV THUMOS Workshop
7.
go back to reference Wang L, Qiao Y, Tang X (2014) Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recogn Chall 1(2):2 Wang L, Qiao Y, Tang X (2014) Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recogn Chall 1(2):2
8.
go back to reference Lin T, Zhao X, Su H, Wang C, Yang M (2018) Bsn: Boundary sensitive network for temporal action proposal generation. Proceedings of the European conference on computer vision 3-19 Lin T, Zhao X, Su H, Wang C, Yang M (2018) Bsn: Boundary sensitive network for temporal action proposal generation. Proceedings of the European conference on computer vision 3-19
9.
go back to reference Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. Proceedings of the European conference on computer vision 803-818 Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. Proceedings of the European conference on computer vision 803-818
10.
go back to reference Wu CY, Zaheer M, Hu H, Manmatha R, Smola AJ, Krhenbuhl P (2018) Compressed video action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition 6026-6035 Wu CY, Zaheer M, Hu H, Manmatha R, Smola AJ, Krhenbuhl P (2018) Compressed video action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition 6026-6035
11.
go back to reference Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition?. Proceedings of the IEEE conference on computer vision and pattern recognition 471-478 Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition?. Proceedings of the IEEE conference on computer vision and pattern recognition 471-478
12.
go back to reference Zheng J, Yang P, Chen S, Shen G, Wang W (2017) Iterative re-constrained group sparse face recognition with adaptive weights learning. IEEE Trans Image Process 26(5):2408–2423MathSciNetCrossRef Zheng J, Yang P, Chen S, Shen G, Wang W (2017) Iterative re-constrained group sparse face recognition with adaptive weights learning. IEEE Trans Image Process 26(5):2408–2423MathSciNetCrossRef
13.
go back to reference Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems 568-576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems 568-576
14.
go back to reference Wang H, Schmid C (2013) Action recognition with improved trajectories. Proceedings of the IEEE international conference on computer vision 3551-3558 Wang H, Schmid C (2013) Action recognition with improved trajectories. Proceedings of the IEEE international conference on computer vision 3551-3558
15.
go back to reference Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vision 103(1):60–79MathSciNetCrossRef Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vision 103(1):60–79MathSciNetCrossRef
16.
go back to reference Wang H, Kl\(\ddot{a}\)ser A, Schmid C, Cheng-Lin L (2011) Action recognition by dense trajectories. Proceedings of the IEEE conference on computer vision and pattern recognition 3169-3176 Wang H, Kl\(\ddot{a}\)ser A, Schmid C, Cheng-Lin L (2011) Action recognition by dense trajectories. Proceedings of the IEEE conference on computer vision and pattern recognition 3169-3176
17.
go back to reference Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition 1725-1732 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition 1725-1732
18.
go back to reference Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE conference on computer vision 4489-4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE conference on computer vision 4489-4497
19.
go back to reference Buch S, Escorcia V, Ghanem B, Fei-Fei L, Niebles JC (2017) End-to-end, single-stream temporal action detection in untrimmed videos. Proceedings of the British machine vision conference. 1:2 Buch S, Escorcia V, Ghanem B, Fei-Fei L, Niebles JC (2017) End-to-end, single-stream temporal action detection in untrimmed videos. Proceedings of the British machine vision conference. 1:2
20.
go back to reference Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) Daps: Deep action proposals for action understanding. In European conference on computer vision 768-784 Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) Daps: Deep action proposals for action understanding. In European conference on computer vision 768-784
21.
go back to reference Gao J, Yang Z, Chen K, Sun C, Nevatia R (2017) Turn tap: Temporal unit regression network for temporal action proposals. Proceedings of the IEEE international conference on computer vision 3628-3636 Gao J, Yang Z, Chen K, Sun C, Nevatia R (2017) Turn tap: Temporal unit regression network for temporal action proposals. Proceedings of the IEEE international conference on computer vision 3628-3636
22.
24.
go back to reference Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. Proceedings of the IEEE conference on computer vision and pattern recognition 1914-1923 Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. Proceedings of the IEEE conference on computer vision and pattern recognition 1914-1923
25.
go back to reference Laptev I (2015) On space-time interest points. Int J Comput Vision 64(2–3):107–123 Laptev I (2015) On space-time interest points. Int J Comput Vision 64(2–3):107–123
26.
go back to reference Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS–Improving Object Detection With One Line of Code. Proceedings of the IEEE conference on computer vision 5561-5569 Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS–Improving Object Detection With One Line of Code. Proceedings of the IEEE conference on computer vision 5561-5569
27.
go back to reference Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J (2015) Activitynet: A large-scale video benchmark for human activity understanding. Proceedings of the IEEE conference on computer vision and pattern recognition 961-970 Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J (2015) Activitynet: A large-scale video benchmark for human activity understanding. Proceedings of the IEEE conference on computer vision and pattern recognition 961-970
28.
go back to reference Jiang YG, Liu J, Zamir AR, Toderici G, Laptev I, Shah M, Sukthankar R (2014) Thumos challenge: Action recognition with a large number of classes. In: ECCV Workshop Jiang YG, Liu J, Zamir AR, Toderici G, Laptev I, Shah M, Sukthankar R (2014) Thumos challenge: Action recognition with a large number of classes. In: ECCV Workshop
29.
go back to reference Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:​1212.​0402
30.
go back to reference Xiong Y, Wang L, Wang Z, Zhang B, Song H, Li W, Lin D, Qiao Y, Gool LV,Tang X (2016) Cuhk & ethz & siat submission to activitynet challenge 2016. arXiv preprint arXiv:1608.00797 Xiong Y, Wang L, Wang Z, Zhang B, Song H, Li W, Lin D, Qiao Y, Gool LV,Tang X (2016) Cuhk & ethz & siat submission to activitynet challenge 2016. arXiv preprint arXiv:​1608.​00797
31.
go back to reference Lin T, Zhao X, Shou Z (2017) Temporal convolution based action proposal: Submission to activitynet 2017, arXiv:1707.06750. [Online]. Available: https://arxiv.org/abs/1707.06750 Lin T, Zhao X, Shou Z (2017) Temporal convolution based action proposal: Submission to activitynet 2017, arXiv:​1707.​06750. [Online]. Available: https://​arxiv.​org/​abs/​1707.​06750
32.
go back to reference Chao Y-W, Vijayanarasimhan S, Seybold B, Ross DA, Deng J, Sukthankar R (2018) Rethinking the faster R-CNN architecture for temporal action localization, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1130-1139 Chao Y-W, Vijayanarasimhan S, Seybold B, Ross DA, Deng J, Sukthankar R (2018) Rethinking the faster R-CNN architecture for temporal action localization, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1130-1139
33.
go back to reference Buch S, Escorcia V, Shen C, Ghanem B, Carlos Niebles J (2017) Sst: Single-stream temporal action proposals. Proceedings of the IEEE conference on computer vision and pattern recognition 2911-2920 Buch S, Escorcia V, Shen C, Ghanem B, Carlos Niebles J (2017) Sst: Single-stream temporal action proposals. Proceedings of the IEEE conference on computer vision and pattern recognition 2911-2920
34.
go back to reference Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE conference on computer vision 4489-4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE conference on computer vision 4489-4497
35.
go back to reference Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) Cdc: Convolutional-deconvolutional networks for precise temporal action localization in untrimmed videos. arXiv preprint arXiv:1703.01515 Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) Cdc: Convolutional-deconvolutional networks for precise temporal action localization in untrimmed videos. arXiv preprint arXiv:​1703.​01515
36.
go back to reference Xu H, Das A, Saenko K (2017) R-c3d: Region convolutional 3d network for temporal activity detection. In proceedings of the IEEE international conference on computer vision (ICCV) 5783-5792 Xu H, Das A, Saenko K (2017) R-c3d: Region convolutional 3d network for temporal activity detection. In proceedings of the IEEE international conference on computer vision (ICCV) 5783-5792
37.
Metadata
Title
Boundary Adjusted Network Based on Cosine Similarity for Temporal Action Proposal Generation
Authors
Jingye Zheng
Dihu Chen
Haifeng Hu
Publication date
15-05-2021
Publisher
Springer US
Published in
Neural Processing Letters / Issue 4/2021
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10500-2

Other articles of this Issue 4/2021

Neural Processing Letters 4/2021 Go to the issue