Skip to main content
Log in

Unsupervised, efficient and scalable key-frame selection for automatic summarization of surveillance videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recent years have witnessed a dramatical growth of the deployment of vision-based surveillance in public spaces. Automatic summarization of surveillance videos (ASOSV) is hence becoming more and more desirable in many real-world applications. For this purpose, a novel frame-selection framework is proposed in the present paper, which has three properties: 1) un-supervision: it can work without requirements of any supervised learning or training; 2) efficiency: it can work very fast, with experiments demonstrating efficiency faster than real-timeness and 3) scalability: it can achieve a hierarchical analysis/overview of video content. The performance of proposed framework is systematically evaluated and compared with various state-of-the-art frame selection techniques on some collected video sequences and publicly-available ViSOR dataset. The experimental results demonstrate promising performance and good applicability for real-world problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Here, we use the simple frame difference to extract interested objects, and do not consider any higher level information such as cars or pedestrians. That is because: a) in this step of our framework, what is of interest is only to extract the data distribution in video which is used to measure the temporal fluctuation of video content; and b) the role of surveillance-video summarization is to provide the user an efficient and concise way to view a given video and thus can be considered as a pro-procedure for high-level tasks such as pedestrian detection in the next as has been mentioned in the previous Section 1. But here, it is notable that our framework can work for other-feature-represented video data, as long as the employed features can effectively describe the data distribution in video.

  2. M(1) is initialized as 1.

  3. http://www.openvisor.org/video_categories.asp

  4. For three videos of Traffic monitoring, CVC_Zebra and Hermes_Outdoor_cam, we only chosen a number of beginning frames as shown in the table for testing which is convenient for showing examples of selected frames as will be seen in Fig. 9.

References

  1. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. Pattern Anal Mach Intell 30 (3):555–560

    Article  Google Scholar 

  2. Angadi S, Naik V (2014) Entropy Based Fuzzy C Means Clustering and Key Frame Extraction for Sports Video Summarization. In: Fifth International Conference on Signal and Image Processing (ICSIP), 2014. IEEE, pp 271–279

  3. Chang HS, Sull S, Lee SU (1999) Efficient video indexing scheme for content-based retrieval. IEEE Trans Circuits Syst Video Technol 9(8):1269–1279

    Article  Google Scholar 

  4. Cotsaces C, Nikolaidis N, Pitas I (2006) Video shot detection and condensed representation: a review. IEEE Signal Process Mag 23(2):28–37

    Article  Google Scholar 

  5. Do TT, Chen Y, Nguyen DT, Nguyen N, Gan L, Tran TD (2009) Distributed compressed video sensing. In: Proceeding of the 2009 16th IEEE International Conference on Image Processing (ICIP), pp 1393–1396

  6. Doob JL (1962) Boundary properties of functions with finite dirichlet integrals. Ann Inst Fourier 12:573–621

    Article  MathSciNet  MATH  Google Scholar 

  7. Ejaz N, Mehmood I, Baik SW (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44

    Article  Google Scholar 

  8. Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1600–1607

  9. Evangelio RH, Senst T, Keller T, Sikora T (2013) Video indexing and summarization as a tool for privacy protection. In: 18th International Conference on Digital Signal Processing (DSP), 2013. IEEE, pp 1–6

  10. Fox EB, Hughes MC, Sudderth EB, Jordan MI (2014) Joint modeling of multiple time series via the Beta process with application to motion capture segmentation. Ann Appl Stat 8(3):1281–1313

    Article  MathSciNet  MATH  Google Scholar 

  11. Gong D, Medioni G, Zhao X (2014) Structured time series analysis for human action segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 36 (7):1414–1427

    Article  Google Scholar 

  12. Hammoud RI, Sahin CS, Blasch EP, Rhodes BJ (2014) Multi-source multi-modal activity recognition in aerial video surveillance. In: Proceeding of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 237–244

  13. Ho SS, Wechsler H (2010) A martingale framework for detecting changes in data streams by testing exchangeability. Pattern Anal Mach Intell 32(12):2113–2127

    Article  Google Scholar 

  14. Ji QG, Fang ZD, Xie ZH, Lu ZM (2013) Video abstraction based on the visual attention model and online clustering. Signal Process Image Commun 28 (3):241–253

    Article  Google Scholar 

  15. Jones S, Shao L (2014) Linear regression motion analysis for unsupervised temporal segmentation of human actions. In: IEEE Winter Conference on Applications of Computer Vision (WACV), 2014. IEEE, pp 816–822

  16. Keogh E, Lin J, Fu A (2005) Hot sax: Efficiently finding the most unusual time series subsequence. In: Proceedings of the the fifth IEEE international conference on Data mining, pp 1–8

  17. Kim TK, Wong KYK, Cipolla R (2007) Tensor canonical correlation analysis for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  18. Liu T, Zhang H-J, Qi F (2003) A novel video key-frameextraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13 (10):1006–1013

    Article  Google Scholar 

  19. Liu T, Zhang X, Feng J, Lo K-T (2004) Shot reconstruction degree: a novel criterion for key frame selection. Pattern Recogn Lett 25(12):1451–1457

    Article  Google Scholar 

  20. Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching

  21. Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: Image Analysis and ProcessingCICIAP 2013. Springer, Berlin Heidelberg, pp 733–742

  22. Mentzelopoulos M, Psarrou A (2004) Key-frame extraction algorithm using entropy difference. In: Proceedings of the the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp 39–45

  23. Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using delaunay clustering. Int J Digit Libr 6(2):219–232

    Article  Google Scholar 

  24. Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305

    Article  Google Scholar 

  25. Pan L, Wu X, Shu X (2009) Key frame extraction based on sub-shot segmentation and entropy computing. In: Proceedings of the Chinese Conference on Pattern Recognition, pp 1–5

  26. Porter SV, Mirmehdi M, Thomas BT (2003) A shortest path representation for video summarisation. In: Proceedings of the 12th International Conference on Image Analysis and Processing, pp 460–465

  27. Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Computer VisionCECCV 2014. Springer International Publishing, pp 540–555

  28. Rajendra, Sachan Priyamvada, Keshaveni N (2014) A survey of automatic video summarization techniques. International Journal of Electronics, Electrical and Computational System

  29. Rasheed Z, Shah M (2003) Scene detection in hollywood movies and tv shows. In: Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp II-343

  30. Shao L, Ji L (2009) Motion histogram analysis based key frame extraction for human action/activity representation. In: Proceedings of the Canadian Conference on Computer and Robot Vision, pp 88–92

  31. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  32. Sun X, Kankanhalli MS (2000) Video summarization using r-sequences. Real-Time Imaging 6(6):449–459

    Article  MATH  Google Scholar 

  33. Sundaram H, Chang S-F (2000) Video scene segmentation using video and audio features. In: Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, pp 1145–1148

  34. Ten Holt GA, Reinders MJ, Hendriks EA (2007) Multi-dimensional dynamic time warping for gesture recognition. In: Annual Conference of the Advanced School for Computing and Imaging, pp 1–6

  35. Tseng BL, Lin CY, Smith JR (2002) Real-time video surveillance for traffic monitoring using virtual line analysis. In: Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, pp 541–544

  36. Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 3(1):3

    Article  Google Scholar 

  37. Tu Z, Sun D, Luo B (2013) Video Summarization by Robust Low-Rank Subspace Segmentation. In: Proceedings of The Eighth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), 2013. Springer, Berlin Heidelberg, pp 929–937

  38. Vezzani R, Cucchiara R (2010) Video Surveillance Online Repository (ViSOR): an integrated framework. Multimed Tools Appl 50(2):359–380

    Article  Google Scholar 

  39. Vovk V, Nouretdinov I, Gammerman A (2003) Testing exchangeability on-line. ICML 12(2):768–775

    Google Scholar 

  40. Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555

    Article  Google Scholar 

  41. Xiong Z, Radhakrishnan R, Divakaran A, Rui Y, Huang T S (2006) A unified framework for video summarization, browsing & retrieval: with applications to consumer and surveillance video. Academic Press

  42. Yang S, Lin X (2005) Key frame extraction using unsupervised clustering based on a statistical model. Tsinghua Sci Technol 10(2):169–173

    Article  MathSciNet  Google Scholar 

  43. Yu XD, Wang L, Tian Q, Xue P (2004) Multilevel video representation with application to keyframe extraction. In: Proceedings of the 10th International Multimedia Modelling Conference, pp 117–123

  44. Zhang X, Sun F, Liu G, Ma Y (2014) Fast low-rank subspace segmentation. IEEE Trans Knowl Data Eng 26(5):1293–1297

    Article  Google Scholar 

Download references

Acknowledgments

The work is financially supported in part by National Natural Science Foundation of China (61403232), Natural Science Foundation of Shandong Province, China (ZR2014FQ025), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry of China, and the Open Projects Program of National Laboratory of Pattern Recognition (NLPR) of China (201407346).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, G., Zhou, Y., Li, X. et al. Unsupervised, efficient and scalable key-frame selection for automatic summarization of surveillance videos. Multimed Tools Appl 76, 6309–6331 (2017). https://doi.org/10.1007/s11042-016-3263-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3263-z

Keywords

Navigation