A Spatiotemporal Saliency Model for Video Surveillance

Yubing, Tong; Cheikh, Faouzi Alaya; Guraya, Fahad Fazal Elahi; Konik, Hubert; Trémeau, Alain

doi:10.1007/s12559-010-9094-8

A Spatiotemporal Saliency Model for Video Surveillance

Published: 08 January 2011

Volume 3, pages 241–263, (2011)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Tong Yubing¹,
Faouzi Alaya Cheikh²,
Fahad Fazal Elahi Guraya²,
Hubert Konik¹ &
…
Alain Trémeau¹

690 Accesses
62 Citations
3 Altmetric
Explore all metrics

Abstract

A video sequence is more than a sequence of still images. It contains a strong spatial–temporal correlation between the regions of consecutive frames. The most important characteristic of videos is the perceived motion foreground objects across the frames. The motion of foreground objects dramatically changes the importance of the objects in a scene and leads to a different saliency map of the frame representing the scene. This makes the saliency analysis of videos much more complicated than that of still images. In this paper, we investigate saliency in video sequences and propose a novel spatiotemporal saliency model devoted for video surveillance applications. Compared to classical saliency models based on still images, such as Itti’s model, and space–time saliency models, the proposed model is more correlated to visual saliency perception of surveillance videos. Both bottom-up and top-down attention mechanisms are involved in this model. Stationary saliency and motion saliency are, respectively, analyzed. First, a new method for background subtraction and foreground extraction is developed based on content analysis of the scene in the domain of video surveillance. Then, a stationary saliency model is setup based on multiple features computed from the foreground. Every feature is analyzed with a multi-scale Gaussian pyramid, and all the features conspicuity maps are combined using different weights. The stationary model integrates faces as a supplement feature to other low level features such as color, intensity and orientation. Second, a motion saliency map is calculated using the statistics of the motion vectors field. Third, both motion saliency map and stationary saliency map are merged based on center-surround framework defined by an approximated Gaussian function. The video saliency maps computed from our model have been compared to the gaze maps obtained from subjective experiments with SMI eye tracker for surveillance video sequences. The results show strong correlation between the output of the proposed spatiotemporal saliency model and the experimental gaze maps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans PAMI. 1998;20(11):1254–9.
Google Scholar
Rajashekar U, van der Linde I, Bovik AC, Cormack LK. GAFFE: a gaze-attentive fixation finding engine. IEEE Trans Image Process. 2008; 17(4):564–73.
Google Scholar
Achanta R, Hemami S, Estrada F, Susstrunk S. Frequency-tuned saliency detection model. CVPR: Proc IEEE; 2009. p. 1597–604.
Google Scholar
Guo CL, Ma Q, Zhang LM. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. Proceedings of IEEE, CVPR, 2008; pp 1–8.
Cerf M, Paxon Frady E, Koch C. Faces and text attract gaze independent of the task: Experimental data and computer model. J Vis. 2009;9(12):1–15.
Google Scholar
Cerf M, Harel J, Einhäuser W, Koch C. Predicting human gaze using low-level saliency combined with face detection. In Platt JC, Koller D, Singer Y, Roweis S, editors. Adv Neural Inf Process Syst 2007;20.
Li L-J, Fei-Fei L. What, where and who? Classifying event by scene and object recognition. IEEE Int Conf Comput Vis (ICCV); 2007.
Scassellati B. Theory of mind for a humanoid robot. Autonom Robots. 2002;12(1):13–24.
Google Scholar
Marat S, Ho Phuoc T. Spatio-temporal saliency model to predict eye movements in video free viewing. 16th European Signal Processing Conference EUSIPCO-2008, Lausanne: Suisse; 2008.
Ma Y, Zhang H. A model of motion attention for video skimming. Proceedings of IEEE, ICIP, Vol. 1, pp. 22–25; 2002.
Shan L, Lee MC. Fast visual tracking using motion saliency in video. Proceedings of IEEE, ICASSP. Vol. 1, pp. 1073–1076; 2007.
Peters RJ, Itti L. Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In: Proceedings of IEEE, CVPR; 2007, p. 1–8.
Schütz AC, Braun DI, Gegenfurtner KR. Object recognition during foveating eye movement. Vis Res. 2009;49:2241–53.
Article PubMed Google Scholar
Zhang L, Tong M, Cottrell G. SUNDAy: Saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st annual cognitive science conference. Netherlands: Amsterdam; 2009.
Google Scholar
Itti L, Baldi P. Bayesian surprise attracts human attention. Vis Res. 2009;49(10):1295–306.
Google Scholar
Seo HJ, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vis. 2009;9(12):1–27.
Google Scholar
Mahadevan V, Vasconcelos N. Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell. 2010;32(1):171–7.
Google Scholar
Sevilmis T, Bastan M, Gudukbay U, Ulusoy O. Automatic detection of salient objects and spatial relations in videos for a video database system. Image Vis Comput. 2008;26(10):1384–96.
Google Scholar
Li LJ, Socher R, Fei-Fei L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. Comput Vis Pattern Recogn (CVPR); 2009.
Yazhou L, Yao H, Wengao, Chen X, Zhao D. Non parametric background generation. J Vis Commun Image Represent. 2007;18:253–63.
Wang H, Suter D. A novel robust statistical method for background initialization, visual surveillance. ACCV 2006, LNCS. 2006;3851:328–37.
Article Google Scholar
Cotsaces C, Nikolaidis N, Pitas I. Video shot boundary detection and condensed representation: a review. IEEE Signal Process Mag 2006;23(2):28–37.
Google Scholar
Lu S, King I, Lyu MR. Video summarization by video structure analysis, graph optimization”, IEEE International Conference on Multimedia, Expo, 2004. ICME ‘04. 2004;3:1959–62.
Google Scholar
Money AG, Agius H. Video summarization: a conceptual framework and survey of the state of the art. J Vis Commun Image R 2008;19:121–43.
Google Scholar
Carmi R, Itti L. The role of memory in guiding attention during natural vision. J Vis. 2006;6(9):898–914.
Google Scholar
Pinson M, Wolf S. Comparing subjective video quality testing methodologies. In: Proceedings of SPIE, VCIP, Lugano, Switzerland; 2003.
Zivkovic Z. Improved adaptive Gaussian mixture model for background subtraction. Proc Int Conf Pattern Recogn; 2004.
Stauffer C, Grimson WEL. Adaptive background mixture models for real-time tracking. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’99), vol 2. 1999; p. 2246.
Elgammal A, Duraiswami R, Harwood D, Davis LS. Background and foreground modeling using nonparametric kernel density for visual surveillance. Proc IEEE. 2002;90(7):1151–63.
Google Scholar
Sidibé D, Strauss O. A fast and automatic background generation method from a video based on QCH. J Visual Commun Image Represent; 2009.
Cucchiara R, Grana C, Piccardi M, Prati A. Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell. 2003;25(10):1337–42.
Google Scholar
Lipton AJ, Haering N, Almen MC, Venetianer PL, Slowe TE, Zhang Z. Video scene background maintenance using statistical pixel modeling. United States Patent Application Publication. Pub. No.: US 2004/0126014 A1; 2004.
Liu T, Sun J, Zheng NN, Tang X, Shum HY. Learning to detect a salient object. Proc IEEE, CVPR. 2007; p. 1–8.
Rutishauser U, Walther D, Koch C, Perona P. Is bottom-up attention useful for object recognition?. Proc IEEE, CVPR. 2004; p. 37–44.
Tseng PH, Carmi R, Cameron IGM, Munoz DP, Itti L. Quantifying center bias of observers in free viewing of dynamic natural scenes. J Vis. 2009;9(7):1–16.
Google Scholar
Gao D, Mahadevan V, Vasconcelos N. On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis. 2008;8(7):1–18.
Google Scholar
Desimone R, Albright TD, Gross CG, Bruce C. Stimulus selective properties of inferior temporal neurons in the macaque. J Neurosci. 1984;4:2051–62.
PubMed CAS Google Scholar
Koch WD. Modeling attention to salient proto-objects. Neural Netw. 2006;19:1395–407.
Article PubMed Google Scholar
Tampere Image Database (TID). 2008. Page: http://www.ponomarenko.info/tid2008.htm.
Ma YF, Zhang HJ. A new perceived motion based shot content representation. Proc IEEE, ICIP. 2001;3:426–9.
Google Scholar
Jacobson N, Lee YL, Mahadevan V, Vasconcelos N, Nguyen TQ. Motion vector refinement for FRUC using saliency and segmentation. IEEE Trans Image Process (in print); 2010.
Belardinelli A, Pirri F, Carbone A. Motion Saliency maps from spatiotemporal filtering. Lecture Notes In: Artificial intelligence, attention in cognitive systems: 5th international workshop on attention in cognitive systems, WAPCV 2008 Fira, Santorini, Greece, p. 112–23; 2008.
Ma Q, Zhang L. Saliency-based image quality assessment criterion. Proceedings of ICIC 2008, LNCS 5226, p. 1124–33; 2008.
Stalder S, Grabner H, Van Gool L. Beyond semi-supervised tracking: tracking should be as simple as detection, but not simpler than recognition. In: Proceedings ICCV’09 WS on on-line learning for computer vision; 2009.
Babenko B, Yang M, Belongie S. Visual tracking with online multiple instance learning. IEEE conference on computer vision and pattern recognition (CVPR), Miami; 2009.
Avidan S. Ensemble tracking. PAMI. 2007;29(2):261–71.
Google Scholar
Collins R, Liu Y, Leordeanu M. Online selection of discriminative tracking features. PAMI. 2005;27(10):1631–43.
Google Scholar
Jost T, Ouerhani N, von Wartburg R, Müri R, Hügli H. Assessing the contribution of color in visual attention. Comput Vis Image Underst. 2005;100(1):107–23.
Google Scholar
Harel J, Koch C, Perona P. Graph-based visual saliency. Proceedings of NIPS, p. 545–52; 2007.
Itti L. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis Cogn. 2005;12(6):1093–123.
Google Scholar
Peters RJ, Iyer A, Itti L, Koch C. Components of bottom-up gaze map allocation in natural images. Vis Res. 2005; 45:2397–416.
Google Scholar
Marat S, Phuoc T, Granjon L, Guyader N, Pellerin D, Guerin-Dugue A. Modelling spatiotemporal saliency to predict gaze direction for short videos. Int J Comput Vis. 2009;82:231–43.
Article Google Scholar
Tong Y, Konik H, Cheikh FA, Tremeau A. Multi-feature based visual saliency detection in surveillance video. IEEE, VCIP; 2010 (accepted).

Download references

Acknowledgments

This work was supported by the Région Rhône-Alpes via the LIMA project in the context of the cluster ISLE (see http://cluster-isle.grenoble-inp.fr/).

Author information

Authors and Affiliations

Laboratoire Hubert Crurien UMR 5516, Université Jean Monnet, 42000, Saint-Etienne, France
Tong Yubing, Hubert Konik & Alain Trémeau
Faculty of Computer Science and Media Technology, Gjøvik University College, Gjøvik, Norway
Faouzi Alaya Cheikh & Fahad Fazal Elahi Guraya

Authors

Tong Yubing
View author publications
You can also search for this author in PubMed Google Scholar
Faouzi Alaya Cheikh
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Fazal Elahi Guraya
View author publications
You can also search for this author in PubMed Google Scholar
Hubert Konik
View author publications
You can also search for this author in PubMed Google Scholar
Alain Trémeau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alain Trémeau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yubing, T., Cheikh, F.A., Guraya, F.F.E. et al. A Spatiotemporal Saliency Model for Video Surveillance. Cogn Comput 3, 241–263 (2011). https://doi.org/10.1007/s12559-010-9094-8

Download citation

Received: 22 March 2010
Accepted: 27 December 2010
Published: 08 January 2011
Issue Date: March 2011
DOI: https://doi.org/10.1007/s12559-010-9094-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Spatiotemporal Saliency Model for Video Surveillance

Abstract

Access this article

Similar content being viewed by others

Temporal Saliency for Fast Motion Detection

Salient objects detection in dynamic scenes using color and texture features

Moving Human Detection in Video Using Dynamic Visual Attention Model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Spatiotemporal Saliency Model for Video Surveillance

Abstract

Access this article

Similar content being viewed by others

Temporal Saliency for Fast Motion Detection

Salient objects detection in dynamic scenes using color and texture features

Moving Human Detection in Video Using Dynamic Visual Attention Model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation