Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 9/2023

14-03-2023 | Original Article

A static video summarization approach via block-based self-motivated visual attention scoring mechanism

Authors: Wen-lin Li, Tong Zhang, Xiao Liu

Published in: International Journal of Machine Learning and Cybernetics | Issue 9/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Since automatic visual semantic comprehension of video content is currently infeasible and unintelligent, key frames extracted from videos are inconsistent with human visual understanding. In this paper, a block-based self-motivated visual attention scoring mechanism named the BSVAS mechanism is proposed for extracting key frames. The approach described in this paper first reduces the dimensionality of the video by exploiting entropy as a static global characteristic measurement. Next, two block-based motion metrics are employed to express features from a spatiotemporal perspective, and a novel self-motivated strategy is applied to conduct feature fusion. Finally, a self-motivated scoring algorithm is performed to evaluate content attractiveness and frame importance to generate key frames. Experiments on gesture videos with various postures demonstrate that key frames extracted using the proposed method provide high-quality video summaries and cover the main content of the gesture videos as compared to several other excellent mechanisms in the literature.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10:2057–2070CrossRef Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10:2057–2070CrossRef
2.
go back to reference Wu F, Duan J, Chen S, Ye Y, Ai P, Yang Z (2021) Multi-target recognition of bananas and automatic positioning for the inflorescence axis cutting point. Front Plant Sci 12:705021CrossRef Wu F, Duan J, Chen S, Ye Y, Ai P, Yang Z (2021) Multi-target recognition of bananas and automatic positioning for the inflorescence axis cutting point. Front Plant Sci 12:705021CrossRef
3.
go back to reference Ding W, Hu B, Liu H, Wang X, Huang X (2020) Human posture recognition based on multiple features and rule learning. Int J Mach Learn Cybern 11:2529–2540CrossRef Ding W, Hu B, Liu H, Wang X, Huang X (2020) Human posture recognition based on multiple features and rule learning. Int J Mach Learn Cybern 11:2529–2540CrossRef
4.
go back to reference Hussain T, Muhammad K, Ding W, Lloret J, Baik SW, Albuquerque VHC (2021) A comprehensive survey of multi-view video summarization. Pattern Recognit 109:107567CrossRef Hussain T, Muhammad K, Ding W, Lloret J, Baik SW, Albuquerque VHC (2021) A comprehensive survey of multi-view video summarization. Pattern Recognit 109:107567CrossRef
5.
go back to reference Yan J, Gao X (2018) Pornographic video detection with mapreduce. Int J Mach Learn Cybern 9:2105–2115CrossRef Yan J, Gao X (2018) Pornographic video detection with mapreduce. Int J Mach Learn Cybern 9:2105–2115CrossRef
6.
go back to reference Yasmin G, Chowdhury S, Nayak J, Das P, Das AK (2023) Key moment extraction for designing an agglomerative clustering algorithm-based video summarization framework. Neural Comput Appl 35(7):4881–4902CrossRef Yasmin G, Chowdhury S, Nayak J, Das P, Das AK (2023) Key moment extraction for designing an agglomerative clustering algorithm-based video summarization framework. Neural Comput Appl 35(7):4881–4902CrossRef
7.
go back to reference Hu W, Xie N, Li L, Zeng X, Maybank SJ (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cyberne Part C (Applications and Reviews) 41:797–819 Hu W, Xie N, Li L, Zeng X, Maybank SJ (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cyberne Part C (Applications and Reviews) 41:797–819
8.
go back to reference Bhuyan MK, Ramaraju VV, Iwahori Y (2014) Hand gesture recognition and animation for local hand motions. Int J Mach Learn Cybern 5:607–623CrossRef Bhuyan MK, Ramaraju VV, Iwahori Y (2014) Hand gesture recognition and animation for local hand motions. Int J Mach Learn Cybern 5:607–623CrossRef
9.
go back to reference Lu Z, Zhang G, Huang G, Yu Z, Pun C-M, Zhang W, Chen J, Ling W-K (2022) Video person re-identification using key frame screening with index and feature reorganization based on inter-frame relation. Int J Mach Learn Cybern 13(9):2745–2761CrossRef Lu Z, Zhang G, Huang G, Yu Z, Pun C-M, Zhang W, Chen J, Ling W-K (2022) Video person re-identification using key frame screening with index and feature reorganization based on inter-frame relation. Int J Mach Learn Cybern 13(9):2745–2761CrossRef
10.
go back to reference Tamilkodi R, Kumari GRN (2021) A novel framework for retrieval of image using weighted edge matching algorithm. Multimed Tools Appl 80:19625–19648CrossRef Tamilkodi R, Kumari GRN (2021) A novel framework for retrieval of image using weighted edge matching algorithm. Multimed Tools Appl 80:19625–19648CrossRef
11.
go back to reference Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1346–1353 Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1346–1353
12.
go back to reference Li W, Qi D, Zhang C, Guo J, Yao J (2020) Video summarization based on mutual information and entropy sliding window method. Entropy 22:1–16MathSciNetCrossRef Li W, Qi D, Zhang C, Guo J, Yao J (2020) Video summarization based on mutual information and entropy sliding window method. Entropy 22:1–16MathSciNetCrossRef
13.
go back to reference Hannane R, Elboushaki A, Afdel K, Nagabhushan P, Javed M (2016) An efficient method for video shot boundary detection and keyframe extraction using sift-point distribution histogram. Int J Multimed Inf Retr 5:89–104CrossRef Hannane R, Elboushaki A, Afdel K, Nagabhushan P, Javed M (2016) An efficient method for video shot boundary detection and keyframe extraction using sift-point distribution histogram. Int J Multimed Inf Retr 5:89–104CrossRef
14.
go back to reference Liu T, Kender JR (2007) Computational approaches to temporal sampling of video sequences. ACM Trans. Multimedia Comput. Commun. Appl. 3(2):7CrossRef Liu T, Kender JR (2007) Computational approaches to temporal sampling of video sequences. ACM Trans. Multimedia Comput. Commun. Appl. 3(2):7CrossRef
15.
go back to reference Yuan Y, Lu Z-q, Yang Z, Jian M, Wu L, Li Z, Liu X (2021) Key frame extraction based on global motion statistics for team-sport videos. Multimed Syst 28(2):387–401CrossRef Yuan Y, Lu Z-q, Yang Z, Jian M, Wu L, Li Z, Liu X (2021) Key frame extraction based on global motion statistics for team-sport videos. Multimed Syst 28(2):387–401CrossRef
16.
go back to reference Ejaz N, Baik SW, Majeed H, Chang H, Mehmood I (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 2018(1):40CrossRef Ejaz N, Baik SW, Majeed H, Chang H, Mehmood I (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 2018(1):40CrossRef
17.
go back to reference Hannane R, Elboushaki A, Afdel K (2018) Mskvs: adaptive mean shift-based keyframe extraction for video summarization and a new objective verification approach. J Vis Commun Image Represent 55:179–200CrossRef Hannane R, Elboushaki A, Afdel K (2018) Mskvs: adaptive mean shift-based keyframe extraction for video summarization and a new objective verification approach. J Vis Commun Image Represent 55:179–200CrossRef
18.
go back to reference Shi Y, Yang H, Gong M, Liu X, Xia Y (2017) A fast and robust key frame extraction method for video copyright protection. J Electr Comput Eng 2017:1–7 Shi Y, Yang H, Gong M, Liu X, Xia Y (2017) A fast and robust key frame extraction method for video copyright protection. J Electr Comput Eng 2017:1–7
19.
go back to reference Tang H, Liu H, Xiao W, Sebe N (2019) Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing 331:424–433CrossRef Tang H, Liu H, Xiao W, Sebe N (2019) Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing 331:424–433CrossRef
20.
go back to reference Yu L, Cao J, Chen M, Cui X-C (2018) Key frame extraction scheme based on sliding window and features. Peer-to-Peer Netw Appl 11:1141–1152CrossRef Yu L, Cao J, Chen M, Cui X-C (2018) Key frame extraction scheme based on sliding window and features. Peer-to-Peer Netw Appl 11:1141–1152CrossRef
21.
go back to reference Martins GB, Pereira DR, Almeida J, de Albuquerque VHC, Papa JP (2020) Opfsumm: on the video summarization using optimum-path forest. Multimed Tools Appl 79:11195–11211CrossRef Martins GB, Pereira DR, Almeida J, de Albuquerque VHC, Papa JP (2020) Opfsumm: on the video summarization using optimum-path forest. Multimed Tools Appl 79:11195–11211CrossRef
23.
go back to reference Ma L, Yang H, Tan X, Feng G (2018) Image keyframe-based visual-depth map establishing method. J Harbin Inst Technol 50(11):23–31 Ma L, Yang H, Tan X, Feng G (2018) Image keyframe-based visual-depth map establishing method. J Harbin Inst Technol 50(11):23–31
24.
go back to reference Guan G, Wang Z, Yu K, Mei S, He M, Feng DD (2012) Video summarization with global and local features. In: 2012 IEEE international conference on multimedia and expo workshops, pp 570–575 Guan G, Wang Z, Yu K, Mei S, He M, Feng DD (2012) Video summarization with global and local features. In: 2012 IEEE international conference on multimedia and expo workshops, pp 570–575
25.
go back to reference Kannan R, Ghinea G, Swaminathan S (2015) What do you wish to see? A summarization system for movies based on user preferences. Inf Process Manag 51:286–305CrossRef Kannan R, Ghinea G, Swaminathan S (2015) What do you wish to see? A summarization system for movies based on user preferences. Inf Process Manag 51:286–305CrossRef
26.
go back to reference Kuanar SK, Ranga KB, Chowdhury AS (2015) Multi-view video summarization using bipartite matching constrained optimum-path forest clustering. IEEE Trans Multimed 17:1166–1173CrossRef Kuanar SK, Ranga KB, Chowdhury AS (2015) Multi-view video summarization using bipartite matching constrained optimum-path forest clustering. IEEE Trans Multimed 17:1166–1173CrossRef
27.
go back to reference Zhang Y, Jin R, Zhou Z-H (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1:43–52CrossRef Zhang Y, Jin R, Zhou Z-H (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1:43–52CrossRef
28.
go back to reference Shao C, Li H, Ma L (2019) Visual cognitive mechanism guided video shot segmentation. In: ICCC Shao C, Li H, Ma L (2019) Visual cognitive mechanism guided video shot segmentation. In: ICCC
29.
go back to reference Wu L, Zhang S, Jian M, Zhao Z, Wang D (2018) Shot boundary detection with spatial-temporal convolutional neural networks. In: PRCV Wu L, Zhang S, Jian M, Zhao Z, Wang D (2018) Shot boundary detection with spatial-temporal convolutional neural networks. In: PRCV
30.
go back to reference Lai J, Yi Y (2012) Key frame extraction based on visual attention model. J Vis Commun Image Represent 23:114–125CrossRef Lai J, Yi Y (2012) Key frame extraction based on visual attention model. J Vis Commun Image Represent 23:114–125CrossRef
31.
go back to reference Traver VJ, Damen D (2022) Egocentric video summarisation via purpose-oriented frame scoring and selection. Expert Syst Appl 189:116079CrossRef Traver VJ, Damen D (2022) Egocentric video summarisation via purpose-oriented frame scoring and selection. Expert Syst Appl 189:116079CrossRef
32.
go back to reference Yu L, Cao J, Chen M, Cui X (2018) Key frame extraction scheme based on sliding window and features. Peer-to-Peer Netw Appl 11(5):1141–1152CrossRef Yu L, Cao J, Chen M, Cui X (2018) Key frame extraction scheme based on sliding window and features. Peer-to-Peer Netw Appl 11(5):1141–1152CrossRef
33.
go back to reference Rao PC, Das MM (2012) Keyframe extraction method using contourlet transform. In: Proceedings of the 2012 international conference on electronics, communications and control. IEEE Computer Society, pp 437–440 Rao PC, Das MM (2012) Keyframe extraction method using contourlet transform. In: Proceedings of the 2012 international conference on electronics, communications and control. IEEE Computer Society, pp 437–440
34.
go back to reference Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV
35.
go back to reference Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: ECCV Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. In: ECCV
36.
go back to reference Liu T, Meng Q, Huang J, Vlontzos A, Rueckert D, Kainz B (2022) Video summarization through reinforcement learning with a 3d spatio-temporal u-net. IEEE Trans Image Process 31:1573–1586CrossRef Liu T, Meng Q, Huang J, Vlontzos A, Rueckert D, Kainz B (2022) Video summarization through reinforcement learning with a 3d spatio-temporal u-net. IEEE Trans Image Process 31:1573–1586CrossRef
37.
go back to reference Zhong S-H, Wu J, Jiang J (2019) Video summarization via spatio-temporal deep architecture. Neurocomputing 332:224–235CrossRef Zhong S-H, Wu J, Jiang J (2019) Video summarization via spatio-temporal deep architecture. Neurocomputing 332:224–235CrossRef
38.
go back to reference Lei J, Luan Q, Song X, Liu X, Tao D, Song M (2019) Action parsing-driven video summarization based on reinforcement learning. IEEE Trans Circuits Syst Video Technol 29:2126–2137CrossRef Lei J, Luan Q, Song X, Liu X, Tao D, Song M (2019) Action parsing-driven video summarization based on reinforcement learning. IEEE Trans Circuits Syst Video Technol 29:2126–2137CrossRef
39.
go back to reference Mohammad-Djafari A (2015) Entropy, information theory, information geometry and Bayesian inference in data, signal and image processing and inverse problems. Entropy 17(6):3989–4027MathSciNetCrossRef Mohammad-Djafari A (2015) Entropy, information theory, information geometry and Bayesian inference in data, signal and image processing and inverse problems. Entropy 17(6):3989–4027MathSciNetCrossRef
40.
go back to reference Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef
41.
go back to reference Ejaz N, Baik S, Majeed H, Chang H, Mehmood I (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 2018:1–11CrossRef Ejaz N, Baik S, Majeed H, Chang H, Mehmood I (2018) Multi-scale contrast and relative motion-based key frame extraction. EURASIP J Image Video Process 2018:1–11CrossRef
42.
go back to reference Mahmoud R, Belgacem S, Omri MN (2021) Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos. Int J Mach Learn Cybern 12:1173–1189CrossRef Mahmoud R, Belgacem S, Omri MN (2021) Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos. Int J Mach Learn Cybern 12:1173–1189CrossRef
43.
go back to reference Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: SCIA Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: SCIA
44.
go back to reference Chang C-W, Zhong Z-Q, Liou JJ (2019) A fpga implementation of farneback optical flow by high-level synthesis. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays Chang C-W, Zhong Z-Q, Liou JJ (2019) A fpga implementation of farneback optical flow by high-level synthesis. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays
45.
go back to reference Kim T-K, Wong S-F, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: 2007 IEEE conference on computer vision and pattern recognition, pp 1–8 Kim T-K, Wong S-F, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: 2007 IEEE conference on computer vision and pattern recognition, pp 1–8
Metadata
Title
A static video summarization approach via block-based self-motivated visual attention scoring mechanism
Authors
Wen-lin Li
Tong Zhang
Xiao Liu
Publication date
14-03-2023
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 9/2023
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01814-9

Other articles of this Issue 9/2023

International Journal of Machine Learning and Cybernetics 9/2023 Go to the issue