Skip to main content
Log in

A Spatiotemporal Saliency Model for Video Surveillance

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

A video sequence is more than a sequence of still images. It contains a strong spatial–temporal correlation between the regions of consecutive frames. The most important characteristic of videos is the perceived motion foreground objects across the frames. The motion of foreground objects dramatically changes the importance of the objects in a scene and leads to a different saliency map of the frame representing the scene. This makes the saliency analysis of videos much more complicated than that of still images. In this paper, we investigate saliency in video sequences and propose a novel spatiotemporal saliency model devoted for video surveillance applications. Compared to classical saliency models based on still images, such as Itti’s model, and space–time saliency models, the proposed model is more correlated to visual saliency perception of surveillance videos. Both bottom-up and top-down attention mechanisms are involved in this model. Stationary saliency and motion saliency are, respectively, analyzed. First, a new method for background subtraction and foreground extraction is developed based on content analysis of the scene in the domain of video surveillance. Then, a stationary saliency model is setup based on multiple features computed from the foreground. Every feature is analyzed with a multi-scale Gaussian pyramid, and all the features conspicuity maps are combined using different weights. The stationary model integrates faces as a supplement feature to other low level features such as color, intensity and orientation. Second, a motion saliency map is calculated using the statistics of the motion vectors field. Third, both motion saliency map and stationary saliency map are merged based on center-surround framework defined by an approximated Gaussian function. The video saliency maps computed from our model have been compared to the gaze maps obtained from subjective experiments with SMI eye tracker for surveillance video sequences. The results show strong correlation between the output of the proposed spatiotemporal saliency model and the experimental gaze maps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  1. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans PAMI. 1998;20(11):1254–9.

    Google Scholar 

  2. Rajashekar U, van der Linde I, Bovik AC, Cormack LK. GAFFE: a gaze-attentive fixation finding engine. IEEE Trans Image Process. 2008; 17(4):564–73.

    Google Scholar 

  3. Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-tuned saliency detection model. CVPR: Proc IEEE; 2009. p. 1597–604.

    Google Scholar 

  4. Guo CL, Ma Q, Zhang LM. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. Proceedings of IEEE, CVPR, 2008; pp 1–8.

  5. Cerf M, Paxon Frady E, Koch C. Faces and text attract gaze independent of the task: Experimental data and computer model. J Vis. 2009;9(12):1–15.

    Google Scholar 

  6. Cerf M, Harel J, Einhäuser W, Koch C. Predicting human gaze using low-level saliency combined with face detection. In Platt JC, Koller D, Singer Y, Roweis S, editors. Adv Neural Inf Process Syst 2007;20.

  7. Li L-J, Fei-Fei L. What, where and who? Classifying event by scene and object recognition. IEEE Int Conf Comput Vis (ICCV); 2007.

  8. Scassellati B. Theory of mind for a humanoid robot. Autonom Robots. 2002;12(1):13–24.

    Google Scholar 

  9. Marat S, Ho Phuoc T. Spatio-temporal saliency model to predict eye movements in video free viewing. 16th European Signal Processing Conference EUSIPCO-2008, Lausanne: Suisse; 2008.

  10. Ma Y, Zhang H. A model of motion attention for video skimming. Proceedings of IEEE, ICIP, Vol. 1, pp. 22–25; 2002.

  11. Shan L, Lee MC. Fast visual tracking using motion saliency in video. Proceedings of IEEE, ICASSP. Vol. 1, pp. 1073–1076; 2007.

  12. Peters RJ, Itti L. Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In: Proceedings of IEEE, CVPR; 2007, p. 1–8.

  13. Schütz AC, Braun DI, Gegenfurtner KR. Object recognition during foveating eye movement. Vis Res. 2009;49:2241–53.

    Article  PubMed  Google Scholar 

  14. Zhang L, Tong M, Cottrell G. SUNDAy: Saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st annual cognitive science conference. Netherlands: Amsterdam; 2009.

    Google Scholar 

  15. Itti L, Baldi P. Bayesian surprise attracts human attention. Vis Res. 2009;49(10):1295–306.

    Google Scholar 

  16. Seo HJ, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vis. 2009;9(12):1–27.

    Google Scholar 

  17. Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell. 2010;32(1):171–7.

    Google Scholar 

  18. Sevilmis T, Bastan M, Gudukbay U, Ulusoy O. Automatic detection of salient objects and spatial relations in videos for a video database system. Image Vis Comput. 2008;26(10):1384–96.

    Google Scholar 

  19. Li LJ, Socher R, Fei-Fei L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. Comput Vis Pattern Recogn (CVPR); 2009.

  20. Yazhou L, Yao H, Wengao, Chen X, Zhao D. Non parametric background generation. J Vis Commun Image Represent. 2007;18:253–63.

  21. Wang H, Suter D. A novel robust statistical method for background initialization, visual surveillance. ACCV 2006, LNCS. 2006;3851:328–37.

    Article  Google Scholar 

  22. Cotsaces C, Nikolaidis N, Pitas I. Video shot boundary detection and condensed representation: a review. IEEE Signal Process Mag 2006;23(2):28–37.

    Google Scholar 

  23. Lu S, King I, Lyu MR. Video summarization by video structure analysis, graph optimization”, IEEE International Conference on Multimedia, Expo, 2004. ICME ‘04. 2004;3:1959–62.

    Google Scholar 

  24. Money AG, Agius H. Video summarization: a conceptual framework and survey of the state of the art. J Vis Commun Image R 2008;19:121–43.

    Google Scholar 

  25. Carmi R, Itti L. The role of memory in guiding attention during natural vision. J Vis. 2006;6(9):898–914.

    Google Scholar 

  26. Pinson M, Wolf S. Comparing subjective video quality testing methodologies. In: Proceedings of SPIE, VCIP, Lugano, Switzerland; 2003.

  27. Zivkovic Z. Improved adaptive Gaussian mixture model for background subtraction. Proc Int Conf Pattern Recogn; 2004.

  28. Stauffer C, Grimson WEL. Adaptive background mixture models for real-time tracking. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’99), vol 2. 1999; p. 2246.

  29. Elgammal A, Duraiswami R, Harwood D, Davis LS. Background and foreground modeling using nonparametric kernel density for visual surveillance. Proc IEEE. 2002;90(7):1151–63.

    Google Scholar 

  30. Sidibé D, Strauss O. A fast and automatic background generation method from a video based on QCH. J Visual Commun Image Represent; 2009.

  31. Cucchiara R, Grana C, Piccardi M, Prati A. Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell. 2003;25(10):1337–42.

    Google Scholar 

  32. Lipton AJ, Haering N, Almen MC, Venetianer PL, Slowe TE, Zhang Z. Video scene background maintenance using statistical pixel modeling. United States Patent Application Publication. Pub. No.: US 2004/0126014 A1; 2004.

  33. Liu T, Sun J, Zheng NN, Tang X, Shum HY. Learning to detect a salient object. Proc IEEE, CVPR. 2007; p. 1–8.

  34. Rutishauser U, Walther D, Koch C, Perona P. Is bottom-up attention useful for object recognition?. Proc IEEE, CVPR. 2004; p. 37–44.

  35. Tseng PH, Carmi R, Cameron IGM, Munoz DP, Itti L. Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis. 2009;9(7):1–16.

    Google Scholar 

  36. Gao D, Mahadevan V, Vasconcelos N. On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis. 2008;8(7):1–18.

    Google Scholar 

  37. Desimone R, Albright TD, Gross CG, Bruce C. Stimulus selective properties of inferior temporal neurons in the macaque. J Neurosci. 1984;4:2051–62.

    PubMed  CAS  Google Scholar 

  38. Koch WD. Modeling attention to salient proto-objects. Neural Netw. 2006;19:1395–407.

    Article  PubMed  Google Scholar 

  39. Tampere Image Database (TID). 2008. Page: http://www.ponomarenko.info/tid2008.htm.

  40. Ma YF, Zhang HJ. A new perceived motion based shot content representation. Proc IEEE, ICIP. 2001;3:426–9.

    Google Scholar 

  41. Jacobson N, Lee YL, Mahadevan V, Vasconcelos N, Nguyen TQ. Motion vector refinement for FRUC using saliency and segmentation. IEEE Trans Image Process (in print); 2010.

  42. Belardinelli A, Pirri F, Carbone A. Motion Saliency maps from spatiotemporal filtering. Lecture Notes In: Artificial intelligence, attention in cognitive systems: 5th international workshop on attention in cognitive systems, WAPCV 2008 Fira, Santorini, Greece, p. 112–23; 2008.

  43. Ma Q, Zhang L. Saliency-based image quality assessment criterion. Proceedings of ICIC 2008, LNCS 5226, p. 1124–33; 2008.

  44. Stalder S, Grabner H, Van Gool L. Beyond semi-supervised tracking: tracking should be as simple as detection, but not simpler than recognition. In: Proceedings ICCV’09 WS on on-line learning for computer vision; 2009.

  45. Babenko B, Yang M, Belongie S. Visual tracking with online multiple instance learning. IEEE conference on computer vision and pattern recognition (CVPR), Miami; 2009.

  46. Avidan S. Ensemble tracking. PAMI. 2007;29(2):261–71.

    Google Scholar 

  47. Collins R, Liu Y, Leordeanu M. Online selection of discriminative tracking features. PAMI. 2005;27(10):1631–43.

    Google Scholar 

  48. Jost T, Ouerhani N, von Wartburg R, Müri R, Hügli H. Assessing the contribution of color in visual attention. Comput Vis Image Underst. 2005;100(1):107–23.

    Google Scholar 

  49. Harel J, Koch C, Perona P. Graph-based visual saliency. Proceedings of NIPS, p. 545–52; 2007.

  50. Itti L. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis Cogn. 2005;12(6):1093–123.

    Google Scholar 

  51. Peters RJ, Iyer A, Itti L, Koch C. Components of bottom-up gaze map allocation in natural images. Vis Res. 2005; 45:2397–416.

    Google Scholar 

  52. Marat S, Phuoc T, Granjon L, Guyader N, Pellerin D, Guerin-Dugue A. Modelling spatiotemporal saliency to predict gaze direction for short videos. Int J Comput Vis. 2009;82:231–43.

    Article  Google Scholar 

  53. Tong Y, Konik H, Cheikh FA, Tremeau A. Multi-feature based visual saliency detection in surveillance video. IEEE, VCIP; 2010 (accepted).

Download references

Acknowledgments

This work was supported by the Région Rhône-Alpes via the LIMA project in the context of the cluster ISLE (see http://cluster-isle.grenoble-inp.fr/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alain Trémeau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yubing, T., Cheikh, F.A., Guraya, F.F.E. et al. A Spatiotemporal Saliency Model for Video Surveillance. Cogn Comput 3, 241–263 (2011). https://doi.org/10.1007/s12559-010-9094-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-010-9094-8

Keywords

Navigation