Skip to main content
Erschienen in: Neural Computing and Applications 11/2022

07.03.2022 | Review

Weakly-supervised temporal action localization: a survey

verfasst von: AbdulRahman Baraka, Mohd Halim Mohd Noor

Erschienen in: Neural Computing and Applications | Ausgabe 11/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Temporal Action Localization (TAL) is an important task of various computer vision topics such as video understanding, summarization, and analysis. In the real world, the videos are long untrimmed and contain multiple actions, where the temporal boundaries annotations are required in the fully-supervised learning setting for classification and localization tasks. Since the annotation task is costly and time-consuming, the trend is moving toward the weakly-supervised setting, which depends on the video-level labels only without any additional information, and this approach is called weakly-supervised Temporal Action Localization (WTAL). In this survey, we review the concepts, strategies, and techniques related to the WTAL in order to clarify all aspects of the problem and review the state-of-the-art frameworks of WTAL according to their challenges. Furthermore, a comparison of models’ performance and results based on benchmark datasets is presented. Finally, we summarize the future works to allow the researchers to improve the model's performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
5.
12.
13.
Zurück zum Zitat Huang DA, Fei-Fei L, and Niebles JC (2016) Connectionist temporal modeling for weakly supervised action labelling. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 137–153. https://doi.org/10.1007/978-3-319-46493-0_9 Huang DA, Fei-Fei L, and Niebles JC (2016) Connectionist temporal modeling for weakly supervised action labelling. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 137–153. https://​doi.​org/​10.​1007/​978-3-319-46493-0_​9
15.
Zurück zum Zitat Chéron G, Alayrac JB, Laptev I, Schmid C (2018) A flexible model for training action localization with varying levels of supervision. Adv Neural Inf Process Syst 2018:942–953 Chéron G, Alayrac JB, Laptev I, Schmid C (2018) A flexible model for training action localization with varying levels of supervision. Adv Neural Inf Process Syst 2018:942–953
18.
Zurück zum Zitat Kolesnikov A and Lampert CH (2016) Seed, expand and constrain: three principles for weakly-supervised image segmentation. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), 9908 LNCS, pp. 695–711. https://doi.org/10.1007/978-3-319-46493-0_42 Kolesnikov A and Lampert CH (2016) Seed, expand and constrain: three principles for weakly-supervised image segmentation. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), 9908 LNCS, pp. 695–711. https://​doi.​org/​10.​1007/​978-3-319-46493-0_​42
24.
29.
Zurück zum Zitat Liu D, Jiang T, and Wang Y (2019) Completeness modeling and context separation for weakly supervised temporal action localization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2019:1298–1307. https://doi.org/10.1109/CVPR.2019.00139 Liu D, Jiang T, and Wang Y (2019) Completeness modeling and context separation for weakly supervised temporal action localization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 2019:1298–1307. https://​doi.​org/​10.​1109/​CVPR.​2019.​00139
31.
Zurück zum Zitat Shou Z, Gao H, Zhang L, Miyazawa K, and Chang SF (2018) AutoLoc: weakly-supervised temporal action localization in untrimmed videos. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics), vol. 11220 LNCS, pp. 162–179. https://doi.org/10.1007/978-3-030-01270-0_10 Shou Z, Gao H, Zhang L, Miyazawa K, and Chang SF (2018) AutoLoc: weakly-supervised temporal action localization in untrimmed videos. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics), vol. 11220 LNCS, pp. 162–179. https://​doi.​org/​10.​1007/​978-3-030-01270-0_​10
34.
35.
41.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(January):568–576 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(January):568–576
46.
Zurück zum Zitat Zhong JX, Li N, Kong W, Zhang T, Li TH, and Li G (2018) Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector. In: MM 2018 - Proceedings of the 2018 ACM multimedia conference, no. 2014, pp. 35–44. https://doi.org/10.1145/3240508.3240511 Zhong JX, Li N, Kong W, Zhang T, Li TH, and Li G (2018) Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector. In: MM 2018 - Proceedings of the 2018 ACM multimedia conference, no. 2014, pp. 35–44. https://​doi.​org/​10.​1145/​3240508.​3240511
47.
Zurück zum Zitat Huang L, Huang Y, Ouyang W, and Wang L (2020) Relational prototypical network for weakly supervised temporal action localization. Aaai Huang L, Huang Y, Ouyang W, and Wang L (2020) Relational prototypical network for weakly supervised temporal action localization. Aaai
49.
Zurück zum Zitat Ioffe S and Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd Int Conf Mach Learn. ICML 2015, 1:448–456 Ioffe S and Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd Int Conf Mach Learn. ICML 2015, 1:448–456
54.
Zurück zum Zitat Yuan Y, Lyu Y, Shen X, Tsang IW, and Yeung DY (2019) Marginalized average attentional network for weakly-supervised learning. In: 7th Int Conf Learn. Represent. ICLR 2019, pp. 1–19 Yuan Y, Lyu Y, Shen X, Tsang IW, and Yeung DY (2019) Marginalized average attentional network for weakly-supervised learning. In: 7th Int Conf Learn. Represent. ICLR 2019, pp. 1–19
56.
Zurück zum Zitat Nair V and Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: ICML 2010 - Proceedings, 27th Int Conf Mach Learn, pp. 807–814 Nair V and Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: ICML 2010 - Proceedings, 27th Int Conf Mach Learn, pp. 807–814
58.
Zurück zum Zitat Narayan S, Cholakkal H, Hayat M, Khan FS, Yang MH, and Shao L (2020) D2-Net: weakly-supervised action localization via discriminative embeddings and denoised activations. arXiv, no. December Narayan S, Cholakkal H, Hayat M, Khan FS, Yang MH, and Shao L (2020) D2-Net: weakly-supervised action localization via discriminative embeddings and denoised activations. arXiv, no. December
62.
Zurück zum Zitat Sigurdsson GA, Varol G, Wang X, Farhadi A, Laptev I, and Gupta A (2016) Hollywood in homes: crowdsourcing data collection for activity understanding. Lect Notes Comput Sci (including Subser. Lect Notes Artif Intell Lect Notes Bioinformatics), vol. 9905 LNCS, pp. 510–526. https://doi.org/10.1007/978-3-319-46448-0_31 Sigurdsson GA, Varol G, Wang X, Farhadi A, Laptev I, and Gupta A (2016) Hollywood in homes: crowdsourcing data collection for activity understanding. Lect Notes Comput Sci (including Subser. Lect Notes Artif Intell Lect Notes Bioinformatics), vol. 9905 LNCS, pp. 510–526. https://​doi.​org/​10.​1007/​978-3-319-46448-0_​31
64.
Zurück zum Zitat Huang Z, Wang X, Wang JJ, Liu W, and Wang JJ (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 7014–7023. https://doi.org/10.1109/CVPR.2018.00733 Huang Z, Wang X, Wang JJ, Liu W, and Wang JJ (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp. 7014–7023. https://​doi.​org/​10.​1109/​CVPR.​2018.​00733
68.
Zurück zum Zitat Hendrycks D and Gimpel K (2016) A baseline for detecting misclassified and out-of-distribution examples in neural networks. 5th Int Conf Learn Represent ICLR 2017 – Conf Track Proc, pp. 1–12. [Online]. Available: http://arxiv.org/abs/1610.02136 Hendrycks D and Gimpel K (2016) A baseline for detecting misclassified and out-of-distribution examples in neural networks. 5th Int Conf Learn Represent ICLR 2017 – Conf Track Proc, pp. 1–12. [Online]. Available: http://​arxiv.​org/​abs/​1610.​02136
71.
73.
Zurück zum Zitat Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inform Process Syst 2017:4078–4088 Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inform Process Syst 2017:4078–4088
74.
Zurück zum Zitat Kingma DP and J. L. Ba (2015) Adam: a method for stochastic optimization, 3rd Int Conf Learn Represent. ICLR 2015 - Conf Track Proc, pp. 1–15 Kingma DP and J. L. Ba (2015) Adam: a method for stochastic optimization, 3rd Int Conf Learn Represent. ICLR 2015 - Conf Track Proc, pp. 1–15
76.
Zurück zum Zitat Defferrard M, Bresson X, and Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inform Process Syst, no. Nips, pp. 3844–3852 Defferrard M, Bresson X, and Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inform Process Syst, no. Nips, pp. 3844–3852
78.
Zurück zum Zitat Zhai Y, Wang L, Tang W, Zhang Q, and Yuan J (2020) Two-stream consensus network for weakly-supervised temporal action localization. In: Proc Eur. Conf Comput Vis, no. Mil, pp. 1–17 Zhai Y, Wang L, Tang W, Zhang Q, and Yuan J (2020) Two-stream consensus network for weakly-supervised temporal action localization. In: Proc Eur. Conf Comput Vis, no. Mil, pp. 1–17
Metadaten
Titel
Weakly-supervised temporal action localization: a survey
verfasst von
AbdulRahman Baraka
Mohd Halim Mohd Noor
Publikationsdatum
07.03.2022
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 11/2022
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-022-07102-x

Weitere Artikel der Ausgabe 11/2022

Neural Computing and Applications 11/2022 Zur Ausgabe

Premium Partner