Abstract
A video sequence is more than a sequence of still images. It contains a strong spatial–temporal correlation between the regions of consecutive frames. The most important characteristic of videos is the perceived motion foreground objects across the frames. The motion of foreground objects dramatically changes the importance of the objects in a scene and leads to a different saliency map of the frame representing the scene. This makes the saliency analysis of videos much more complicated than that of still images. In this paper, we investigate saliency in video sequences and propose a novel spatiotemporal saliency model devoted for video surveillance applications. Compared to classical saliency models based on still images, such as Itti’s model, and space–time saliency models, the proposed model is more correlated to visual saliency perception of surveillance videos. Both bottom-up and top-down attention mechanisms are involved in this model. Stationary saliency and motion saliency are, respectively, analyzed. First, a new method for background subtraction and foreground extraction is developed based on content analysis of the scene in the domain of video surveillance. Then, a stationary saliency model is setup based on multiple features computed from the foreground. Every feature is analyzed with a multi-scale Gaussian pyramid, and all the features conspicuity maps are combined using different weights. The stationary model integrates faces as a supplement feature to other low level features such as color, intensity and orientation. Second, a motion saliency map is calculated using the statistics of the motion vectors field. Third, both motion saliency map and stationary saliency map are merged based on center-surround framework defined by an approximated Gaussian function. The video saliency maps computed from our model have been compared to the gaze maps obtained from subjective experiments with SMI eye tracker for surveillance video sequences. The results show strong correlation between the output of the proposed spatiotemporal saliency model and the experimental gaze maps.
Similar content being viewed by others
References
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans PAMI. 1998;20(11):1254–9.
Rajashekar U, van der Linde I, Bovik AC, Cormack LK. GAFFE: a gaze-attentive fixation finding engine. IEEE Trans Image Process. 2008; 17(4):564–73.
Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-tuned saliency detection model. CVPR: Proc IEEE; 2009. p. 1597–604.
Guo CL, Ma Q, Zhang LM. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. Proceedings of IEEE, CVPR, 2008; pp 1–8.
Cerf M, Paxon Frady E, Koch C. Faces and text attract gaze independent of the task: Experimental data and computer model. J Vis. 2009;9(12):1–15.
Cerf M, Harel J, Einhäuser W, Koch C. Predicting human gaze using low-level saliency combined with face detection. In Platt JC, Koller D, Singer Y, Roweis S, editors. Adv Neural Inf Process Syst 2007;20.
Li L-J, Fei-Fei L. What, where and who? Classifying event by scene and object recognition. IEEE Int Conf Comput Vis (ICCV); 2007.
Scassellati B. Theory of mind for a humanoid robot. Autonom Robots. 2002;12(1):13–24.
Marat S, Ho Phuoc T. Spatio-temporal saliency model to predict eye movements in video free viewing. 16th European Signal Processing Conference EUSIPCO-2008, Lausanne: Suisse; 2008.
Ma Y, Zhang H. A model of motion attention for video skimming. Proceedings of IEEE, ICIP, Vol. 1, pp. 22–25; 2002.
Shan L, Lee MC. Fast visual tracking using motion saliency in video. Proceedings of IEEE, ICASSP. Vol. 1, pp. 1073–1076; 2007.
Peters RJ, Itti L. Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In: Proceedings of IEEE, CVPR; 2007, p. 1–8.
Schütz AC, Braun DI, Gegenfurtner KR. Object recognition during foveating eye movement. Vis Res. 2009;49:2241–53.
Zhang L, Tong M, Cottrell G. SUNDAy: Saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st annual cognitive science conference. Netherlands: Amsterdam; 2009.
Itti L, Baldi P. Bayesian surprise attracts human attention. Vis Res. 2009;49(10):1295–306.
Seo HJ, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vis. 2009;9(12):1–27.
Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell. 2010;32(1):171–7.
Sevilmis T, Bastan M, Gudukbay U, Ulusoy O. Automatic detection of salient objects and spatial relations in videos for a video database system. Image Vis Comput. 2008;26(10):1384–96.
Li LJ, Socher R, Fei-Fei L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. Comput Vis Pattern Recogn (CVPR); 2009.
Yazhou L, Yao H, Wengao, Chen X, Zhao D. Non parametric background generation. J Vis Commun Image Represent. 2007;18:253–63.
Wang H, Suter D. A novel robust statistical method for background initialization, visual surveillance. ACCV 2006, LNCS. 2006;3851:328–37.
Cotsaces C, Nikolaidis N, Pitas I. Video shot boundary detection and condensed representation: a review. IEEE Signal Process Mag 2006;23(2):28–37.
Lu S, King I, Lyu MR. Video summarization by video structure analysis, graph optimization”, IEEE International Conference on Multimedia, Expo, 2004. ICME ‘04. 2004;3:1959–62.
Money AG, Agius H. Video summarization: a conceptual framework and survey of the state of the art. J Vis Commun Image R 2008;19:121–43.
Carmi R, Itti L. The role of memory in guiding attention during natural vision. J Vis. 2006;6(9):898–914.
Pinson M, Wolf S. Comparing subjective video quality testing methodologies. In: Proceedings of SPIE, VCIP, Lugano, Switzerland; 2003.
Zivkovic Z. Improved adaptive Gaussian mixture model for background subtraction. Proc Int Conf Pattern Recogn; 2004.
Stauffer C, Grimson WEL. Adaptive background mixture models for real-time tracking. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’99), vol 2. 1999; p. 2246.
Elgammal A, Duraiswami R, Harwood D, Davis LS. Background and foreground modeling using nonparametric kernel density for visual surveillance. Proc IEEE. 2002;90(7):1151–63.
Sidibé D, Strauss O. A fast and automatic background generation method from a video based on QCH. J Visual Commun Image Represent; 2009.
Cucchiara R, Grana C, Piccardi M, Prati A. Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell. 2003;25(10):1337–42.
Lipton AJ, Haering N, Almen MC, Venetianer PL, Slowe TE, Zhang Z. Video scene background maintenance using statistical pixel modeling. United States Patent Application Publication. Pub. No.: US 2004/0126014 A1; 2004.
Liu T, Sun J, Zheng NN, Tang X, Shum HY. Learning to detect a salient object. Proc IEEE, CVPR. 2007; p. 1–8.
Rutishauser U, Walther D, Koch C, Perona P. Is bottom-up attention useful for object recognition?. Proc IEEE, CVPR. 2004; p. 37–44.
Tseng PH, Carmi R, Cameron IGM, Munoz DP, Itti L. Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis. 2009;9(7):1–16.
Gao D, Mahadevan V, Vasconcelos N. On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis. 2008;8(7):1–18.
Desimone R, Albright TD, Gross CG, Bruce C. Stimulus selective properties of inferior temporal neurons in the macaque. J Neurosci. 1984;4:2051–62.
Koch WD. Modeling attention to salient proto-objects. Neural Netw. 2006;19:1395–407.
Tampere Image Database (TID). 2008. Page: http://www.ponomarenko.info/tid2008.htm.
Ma YF, Zhang HJ. A new perceived motion based shot content representation. Proc IEEE, ICIP. 2001;3:426–9.
Jacobson N, Lee YL, Mahadevan V, Vasconcelos N, Nguyen TQ. Motion vector refinement for FRUC using saliency and segmentation. IEEE Trans Image Process (in print); 2010.
Belardinelli A, Pirri F, Carbone A. Motion Saliency maps from spatiotemporal filtering. Lecture Notes In: Artificial intelligence, attention in cognitive systems: 5th international workshop on attention in cognitive systems, WAPCV 2008 Fira, Santorini, Greece, p. 112–23; 2008.
Ma Q, Zhang L. Saliency-based image quality assessment criterion. Proceedings of ICIC 2008, LNCS 5226, p. 1124–33; 2008.
Stalder S, Grabner H, Van Gool L. Beyond semi-supervised tracking: tracking should be as simple as detection, but not simpler than recognition. In: Proceedings ICCV’09 WS on on-line learning for computer vision; 2009.
Babenko B, Yang M, Belongie S. Visual tracking with online multiple instance learning. IEEE conference on computer vision and pattern recognition (CVPR), Miami; 2009.
Avidan S. Ensemble tracking. PAMI. 2007;29(2):261–71.
Collins R, Liu Y, Leordeanu M. Online selection of discriminative tracking features. PAMI. 2005;27(10):1631–43.
Jost T, Ouerhani N, von Wartburg R, Müri R, Hügli H. Assessing the contribution of color in visual attention. Comput Vis Image Underst. 2005;100(1):107–23.
Harel J, Koch C, Perona P. Graph-based visual saliency. Proceedings of NIPS, p. 545–52; 2007.
Itti L. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis Cogn. 2005;12(6):1093–123.
Peters RJ, Iyer A, Itti L, Koch C. Components of bottom-up gaze map allocation in natural images. Vis Res. 2005; 45:2397–416.
Marat S, Phuoc T, Granjon L, Guyader N, Pellerin D, Guerin-Dugue A. Modelling spatiotemporal saliency to predict gaze direction for short videos. Int J Comput Vis. 2009;82:231–43.
Tong Y, Konik H, Cheikh FA, Tremeau A. Multi-feature based visual saliency detection in surveillance video. IEEE, VCIP; 2010 (accepted).
Acknowledgments
This work was supported by the Région Rhône-Alpes via the LIMA project in the context of the cluster ISLE (see http://cluster-isle.grenoble-inp.fr/).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yubing, T., Cheikh, F.A., Guraya, F.F.E. et al. A Spatiotemporal Saliency Model for Video Surveillance. Cogn Comput 3, 241–263 (2011). https://doi.org/10.1007/s12559-010-9094-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-010-9094-8