Skip to main content
Log in

An automatic video annotation framework based on two level keyframe extraction mechanism

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Large increase in audio, video and digital data in the internet signifies the importance of video annotation techniques. This paper mainly deals with the development of a hybrid algorithm for automatic Video Annotation (VA). Another aim in developing the algorithm is to improve the performance and precision as well as to reduce the amount of time required to obtain the annotations. The overall process leads to the development of efficient techniques for shot detection followed by two level key frame extractions and saliency based residual approach for feature extraction. For all the stages in VA like shot detection, keyframe extraction and feature extraction, factors relating to improve the performance are addressed here. The combination of color histogram difference (CBD) and Edge change ratio (ECR) is used here; as these two are the most promising techniques in shot detection. The new idea is proposed to fine tune the keyframe extraction, which extracts keyframe in two levels. At first level, the first frame in the shot is considered as a keyframe. But to remove redundancy, it enters into second level and finds the optimal set of keyframes by using fuzzy c-means clustering technique. Colour and texture features are used for feature extraction. Here the Video annotation process is divided into two sections, training and testing. The weight vector is found in training stage. Based on this feature vector, the similarity array is calculated in testing phase which further finds corrected annotations. The proposed method is compared with OMG-SSL and MMT-MGO and results are found better on Trechvid dataset. The significance of using weight vector is also experimentally shown here.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abdollahian G, Birinci M, Diaz-de-Maria F, Gabbouj M, Delp EJ (2011) A region-dependent image matching method for image and video annotation. In: 9th international workshop on content-based multimedia indexing (CBMI), 13–15 June 2011, pp 121–126. https://doi.org/10.1109/CBMI.2011.5972532

  2. Adjeroh D, Lee MC, Banda N, Kandaswamy U (2009) Adaptive edge oriented shot boundary detection. Hindawi Publishing Corporation EURASIP Journal on Image and Video Processing 2009:1–13. https://doi.org/10.1155/2009/859371

    Article  Google Scholar 

  3. Angadi S, Naik V (2014) Entropy based fuzzy c means clustering and key frame extraction for sports video summarization. 2014 Fifth International Conference on Signal and Image Processing. https://doi.org/10.1109/ICSIP.2014.49

  4. Bi J, Liu X, Lang B (2011) A novel shot boundary detection based on information theory using SVM. 4th International Congress on Image and Signal Processing. IEEE, pp 512-516, https://doi.org/10.1109/CISP.2011.6099941

  5. Boreczky JS, Rowe LA (1996) Comparison of video shot boundary detection techniques. J Electron Imaging 5(2):122–128

    Article  Google Scholar 

  6. Brown LM (2010) Example-based color vehicle retrieval for surveillance. In: Seventh IEEE international conference on advanced video and signal based surveillance (AVSS), Aug 29–Sep 01 2010. IEEE, pp 91–96. https://doi.org/10.1109/AVSS.2010.59

  7. Chasemani FF, Affendy LS, Mustapha N, Khalid F (2015) Automatic video annotation framework using concept detectors. J Appl Sci 15:256–263. https://doi.org/10.3923/jas.2015.256.263

    Article  Google Scholar 

  8. Hong-cai F, Xiao-juan Y, Wei M, Cao Y (2010) A shot boundary detection method based on color space. In: International conference on E-business and E-government, 7–10 May 2010. https://doi.org/10.1109/ICEE.2010.417

  9. Huo Y, Zhang P, Wang Y (2014) Adaptive threshold video shot boundary detection algorithm based on progressive bisection strategy. Int J Inf Comput Sci 11(2):391–403. https://doi.org/10.12733/jics20102621

  10. Kavasidis I, Palazzo S, Di Salvo R, Giordano D, Spampinato C (2013) An innovative web-based collaborative platform for video annotation. Multimedia Tools and Applications 70(1):413–432. https://doi.org/10.1007/s11042-013-1419-7

  11. Khurana K, Chandak MB (2013) Video annotation methodology based on ontology for transportation domain. International Journal of Advanced Research in Computer Science and Software Engineering 3(6):540–548

    Google Scholar 

  12. Lai J-L, Yi Y (2012) Key frame extraction based on visual attention model. Elsevier, Journal Of Visual Communication Image 23(1):114–125. https://doi.org/10.1016/j.jvcir.2011.08.005

  13. Lei Y, Luo W, Wang Y (2012) Video Sequence Matching Based on the Invariance of Color Correlation. IEEE Transactions On Circuits And Systems For Video Technology 22(9):1332–1343

    Article  Google Scholar 

  14. Li Y, Tian Y, Duan L-Y, Yang J, Huang T, Gao W (2010) Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context. IEEE Trans Multimedia 12(8):814–828

    Article  Google Scholar 

  15. Li A, Yu F, Shi K (2011) A novel fast and effective video retrieval system for surveillance application. In: IEEE international conference on cyber technology in automation, control, and intelligent systems (CYBER), 20–23 Mar 2011, pp 153–157. https://doi.org/10.1109/CYBER.2011.6011783

  16. Liu H, Meng W (2012) Key frame extraction of online video based on optimized frame difference. In: 9th international conference on fuzzy systems and knowledge discovery (FSKD 2012), 29–31 Mar 2012, pp 1238–1242. https://doi.org/10.1109/FSKD.2012.6233777

  17. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Proceedings of the international joint conference on artificial intelligence, 25–31 Jul 2015, pp 1617–1623

  18. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: Sensor-based activity recognition. Neurocomputing, Elsevier 181:108–115

    Article  Google Scholar 

  19. Moxley E, Mei T (2010) Video annotation through search and graph reinforcement mining. In: Manjunath BS (ed). IEEE Trans Multimedia 12(3):184–193. https://doi.org/10.1109/TMM.2010.2041101

  20. Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proc. ACM multimedia, 25–29 Sep 2017, pp 17–26. https://doi.org/10.1145/1291233.1291245

  21. Rathod GI, Nikam DA (2013) An algorithm for shot boundary detection and keyframe extraction using histogram diffference. International Journal of Emerging Technology and Advanced Engineering 2013(8):155–163

  22. Sayar A, Yarman Vural FT (2009) Image annotation with semi-supervised clustering. In: Computer and information sciences, 14–16 Sep 2009. ISCIS 24th international symposium. IEEE, pp 12–17. https://doi.org/10.1109/ISCIS.2009.5291929

  23. Swain MJ, Ballard DH (1991) Color Indexing. J Comput Vis 7(1):11–32

    Article  Google Scholar 

  24. Tang J, Hua X-S, Wang M, Zhiwei G, Qi G-J, Xiuqing W (2009) Correlative Linear Neighborhood Propagation for Video Annotation. IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics 39(2):409–416

    Article  Google Scholar 

  25. Thakar VB, Hadia SK (2013) An adaptive novel feature based approach for automatic video shot boundary detection. In: International conference on intelligent systems and signal processing (ISSP), 1–2 Mar 2013. IEEE, pp 145–149. https://doi.org/10.1109/ISSP.2013.6526891

  26. Wang M, Hua X-S, Tang J, Hong R (2009a) Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation. IEEE Trans Multimedia 11(3):465–476

    Article  Google Scholar 

  27. Wang M, Hua X-S, Hong R, Tang J, Qi G-J, Song Y (2009b) Unified Video Annotation via Multigraph Learning. IEEE Transactions On Circuits And Systems For Video Technology 19(5):733–746

    Article  Google Scholar 

  28. Xu CS, Wang JJ, Lu HQ, Zhang YF (2008) A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimedia 10(3):421–436

    Article  Google Scholar 

  29. Xu S, Tang S, Zhang Y, Li J, Zheng Y-T (2012) Exploring multi-modality structure for cross domian adaptation in video concept annotation. Journal of Neurocomputing 95:11–21. https://doi.org/10.1016/j.neucom.2011.05.041

  30. Zhang H, Kankanhalli A, Smoliar SW (1993) Multimedia systems 1:10. https://doi.org/10.1007/BF01210504

    Article  Google Scholar 

  31. Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2012) Automatic Image Annotation Using Group Sparsity. IEEE Trans Syst Man Cybern B Cybern 42(3):838–849

    Article  Google Scholar 

  32. Yu Qiu (2010) Improving News Video Annotation with Semantic Context, International Conference on Digital Image Computing: Techniques and Applications, DICTA 2010, Sydney, Australia. https://doi.org/10.1109/DICTA.2010.47

  33. Golnaz Abdollahian, Murat Birinci Fernando Diaz-de-Maria , Moncef Gabbouj, Edward J. Delp (2011a) A region-dependent image matching method for image and video annotation. Content-Based Multimedia Indexing (CBMI), 9th International Workshop, ISSN: 1949-3983, pp. 121–126

  34. Guo-Jun Qi, Yan Song, Xian-Sheng Hua, Li-Rong Dai, Hong-Jiang Zhang (2006) Video Annotation by Active Learning and Cluster Tuning, in International. Workshop on Semantic Learning Applications in Multimedia (SLAM 2006). In association with CVPR

  35. Waqas Sultan (2016) What if we do not have multiple videos of the same action? - Video Action Localization Using Web Images, IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR)

  36. Fereshteh FC, Affendy LS, Khalid NMF (2015) Automatic Video Annotation Framework Using Concept Detectors. J Appl Sci. https://doi.org/10.3923/jas.2015.256.263

  37. Guojing Xuan (2013) A Video Annotation Method Based on Color Statistics, International Conference on Computer Sciences and Applications. doi: https://doi.org/10.1109/CSA.2013.151

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shailendra S. Aote.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aote, S.S., Potnurwar, A. An automatic video annotation framework based on two level keyframe extraction mechanism. Multimed Tools Appl 78, 14465–14484 (2019). https://doi.org/10.1007/s11042-018-6826-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6826-3

Keywords

Navigation