Abstract
Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.
Similar content being viewed by others
References
Adams FM, Osgood CE (1973) A cross-cultural study of the affective meanings of color. J Cross-Cult Psychol 4:135
Azimi M, Banitalebi-Dehkordi A, Dong Y, Pourazad MT, Nasiopoulos P (2014) Evaluating the performance of existing full-reference quality metrics on High Dynamic Range (HDR) Video content,” ICMSP 2014: XII International Conference on Multimedia Signal Processing, Nov. 2014, Venice
Baik M et al. (2013) Investigation of eye-catching colors using eye tracking. Proc. of SPIE-IS&T Electronic Imaging, SPIE vol. 8651
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2012) A human visual system based 3D video quality metric. 2nd International Conference on 3D Imaging, IC3D, Dec. 2012, Belgium
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2013a) 3D video quality metric for mobile applications. 38th International Conference on Acoustic, Speech, and Signal Processing, ICASSP, May 2013, Vancouver
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2013b) 3D video quality metric for 3D video compression,” 11th IEEE IVMSP Workshop: 3D Image/Video Technologies and Applications, June 2013, Seoul
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2014a) Effect of high frame rates on 3D video quality of experience. International Conference on Consumer Electronics, ICCE
Banitalebi-Dehkordi A, Azimi M, Pourazad MT, Nasiopoulos P (2014b) Compression of high dynamic range video using the HEVC and H. 264/AVC standards. QSHINE 2014 Conference, Greece, Aug. 2014 (invited paper)
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2015a) The effect of frame rate on 3D video quality and bitrate. Springer Journal of 3D Research 6(1):5–34. doi:10.1007/s13319-014-0034-3
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2016) An efficient human visual system based quality metric for 3D video. Multimed Tools Appl 75:4187–4215
Banitalebi-Dehkordi A, Dong Y, Pourazad MT, Nasiopoulos P (2015b) A Learning based Visual Saliency fusion model for High Dynamic Range Video (LBVS-HDR). 23rd European Signal Processing Conference, EUSIPCO
Banitalebi-Dehkordi A, Nasiopoulos E, Pourazad MT, Nasiopoulos P (2016a) Benchmark three-dimensional eye-tracking dataset for visual saliency prediction on stereoscopic three-dimensional video. J Electron Imaging 25(1):013008. doi:10.1117/1.JEI.25.1.013008. Data available at: http://dml.ece.ubc.ca/data/
Banitalebi-Dehkordi A, Azimi M, Pourazad MT, Nasiopoulos P (2016b) Visual saliency aided High Dynamic Range (HDR) video quality metrics. 2016 I.E. International Conference on Communications Workshops (ICC)
Beverley KI, Regan D (1979) Visual perception of changing size: The effect of object size. J Vis Res 19(10):1093–1104
Boev A et al. (2009) Classification and simulation of stereoscopic artefacts in mobile 3DTV content. Electronic Imaging Symposium
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE TIP 22(1):55–69
Bourassa DC, McManus IC, Bryden MP (1996) Handedness and eye-dominance: a meta-analysis of their relationship. Laterality 1(1):5–34
Breiman L (2001a) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324
Breiman L (2001b) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324
Breiman L, Cutler A (2001) Random forest. Mach Learn 45:5–32
Burnham KP, Anderson DR (2002) Model selection and multi-model inference. Springer (2nd ed), p. 51
Chamaret C, Godeffroy S, Lopez P, Meur OL (2010) Adaptive 3D rendering based on region-of-interest. in Proc SPIE, Feb 2010
Chen W, Fournier J, Barkowsky M, Le Callet P (2012a) Quality of experience model for 3DTV. SPIE, San Francisco
Chen W, Fournier J, Barkowsky M, Le Callet P (2012b) Quality of experience model for 3DTV. SPIE, San Francisco
Comanicu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE PAMI 24:603–619
Coria L, Xu D, Nasiopoulos P (2011) Quality of Experience of Stereoscopic Content on Displays of Different Sizes: a Comprehensive Subjective Evaluation. IEEE ICCE 9-12:778–779
Coria L, Xu D, Nasiopoulos P (2012) Automatic stereoscopic 3D video reframing,” 3DTV, ETH Zurich, Oct 15-17, pp. 1–4
Drulea M, Nedevschi S (2013) Motion estimation using the correlation transform. IEEE TIP 22(8):3260–3270
Duncan K, Sarkar S (2012) Saliency in images and videos: a brief survey. IET Comput Vis 6:514–523
Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13:11
Erdogan T (2016) How to calculate luminosity, dominant wavelength, and excitation purity. Semrock White Paper Series
Everingham M et al (2014) The PASCAL visual object classes challenge - a retrospective. Int J Comput Vis 111:98
Fan X, Liu Z, Sun G (2014) Salient region detection for stereoscopic images. 19th International Conference on DSP, Hong Kong
Fang Y et al (2014a) Saliency detection for stereoscopic images. IEEE TIP 23(6):2625–2636
Fang Y et al., (2014b) “An eye-tracking database for stereoscopic video,” QoMEX, Singapore
Felzenszwalb PF et al. (2010) Object detection with discriminatively trained part based models. PAMI
Gelasca ED, Tomasic D, Ebrahimi T (2005) Which colors best catch your eyes: a subjective study of color saliency. ISCAS
Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. CVPR 34:1915
Hanhart P and Ebrahimi T, (2014) “EyeC3D: 3D video eye tracking dataset,” QoMEX, Singapore
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. NIPS
Irwin DE (1992) Visual memory within and across fixations. Eye movements and Visual Cognition
Itti L, Koch C (2000) A saliency-based search mechanism for overt and covert shifts of visual attention. Vis Res 40:1489
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20:1254
Jiang Q, Duan F, Shao F (2014) 3D visual attention for stereoscopic image quality assessment. Journal of Software, vol. 9, no. 7, July 2014
Jonas JB, Schneider U, Naumann GOH (1992) Count and density of human retinal photoreceptors. Graefes Arch Clin Exp Ophthalmol 230:505–510
Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. ICIP
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. ICCV
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. Comp. Sci. and Artificial Intelligence Lab. Tech. Report, http://saliency.mit.edu/
Khaustova D et al. (2013) How visual attention is modified by disparities and textures changes? SPIE HVEI
Kim H, Lee S, Bovik AC (2014) Saliency prediction on stereoscopic videos. IEEE TIP 23(4):1476
Lang C et al. (2012) Depth matters: influence of depth cues on visual saliency. ECCV
Li W, Goodchild MF, Church R (2013) An efficient measure of compactness for two-dimensional shapes and its application in regionalization problems. Int J Geogr Inf Sci 27:1227
Lin JY, Franconeri S, Enns JT (2008) Objects on a collision path with the observer demand attention. Psychol Sci 19:686
Liu Y, Cormack LK, Bovik AC (2010) Natural scene statistics at stereo fixations. 2010 Symposium on Eye-Tracking Research & Applications, ETRA
Lübbe E (2010) Colours in the mind - colour systems in reality
Maki A, Nordlund P, Eklundh J (1996) A computational model of depth based attention. IEEE 13th Int Conf Pattern Recognition, Aug 1996
Neisser U (1967) Cognitive psychology. Appleton-Century-Crofts
Niu Y et al. (2012) Leveraging stereopsis for saliency analysis. CVPR
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic presentation of the spatial envelope. Int J Comput Vis 42:145–175
Ouerhani N, Hugli H (2000) Computing visual attention from scene depth. IEEE 15th Int Conf Pattern Recognition, Sep 2000
Park Y, Lee B, Cheong W, Hur N (2012) Stereoscopic 3D visual attention model considering comfortable viewing. IET IPR
Riche N et al. (2013a) A multi-scale rarity-based saliency detection with its comparative statistical analysis. Sig Proc Img Comm
Riche N et al. (2013b) Saliency and human fixations: state-of-the-art and study of comparison metrics. ICCV, pages 1153–1160
Rubner Y, Tomasi C, Guibas LJ (2000) The earth movers distance as a metric for image retrieval. Int J Comput Vis 40:2000
Schauerte B, Stiefelhagen R (2012) Quaternion-based spectral saliency detection for eye fixation prediction. ECCV
Schubert EF (2006) Light emitting diodes, 2nd Ed. Cambridge University Press
SensoMotoric Instruments (SMI), (2010) “Experiment center manual,”
Seo HJ, Milanfar P (2012) Static and space-time visual saliency detection by self-resemblance. J Vis 9:15
Seuntiens P (2006) Visual experience of 3D TV. Doctoral thesis, Eindhoven University of Technology
Shen C, Song M and Zhao Q, (2012) “Learning high-level concepts by training a deep network on eye fixations,” Deep Learning and Unsupervised Feature Learning Workshop, USA
Smith R (2007) An overview of Tesseract OCR engine. Proc Ninth Int Conference on Document Analysis and Recognition (ICDAR)
Tanimoto M, Fujii T, and Suzuki K, (2009) Video depth estimation reference software (DERS) with image segmentation and block matching. ISO/IEC JTC1/SC29/WG11 MPEG/M16092, Switzerland
Tavakoli H, Rahtu E, Heikkila J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. SCIA
Tian M, Wan S and Yue L (2010) “A color saliency model for salient objects detection in natural scenes,” Advances in Multimedia Modeling, Lecture Notes in Computer Science vol. 5916, pp 240–250
Treisman A, Gelade G (1980) A feature integration theory of attention. Cogn Psychol 12:97–136
Viola P, Jones M, (2001) Rapid object detection using a boosted cascade of simple features. CVPR
Wang J et al (2013) Computational model of stereoscopic 3D visual saliency. IEEE TIP 22:2151
Wolfe J (2000) Visual attention. Seeing, Academic, pp. 335–386
Wopking M (1995) Viewing comfort with stereoscopic pictures. Journal of the SID
Xu D, Coria LE, Nasiopoulos P (July 2012) Guidelines for an improved quality of experience in 3D TV and 3D mobile displays. J SID 20(7):397–407. doi:10.1002/jsid.99
Yubing T, Cheikh FA, Guraya FFE, Konik H, Tremeau A (2011) A spatiotemporal saliency model for video surveillance. Cogn Comput 3:241–263
Zhai Y and Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. in ACM Multimedia, pp. 815–824
Zhang J, Sclaroff S (2013) Saliency detection: a boolean map approach. ICCV
Zhang L et al (2008) SUN: a Bayesian framework for saliency using natural statistics. J Vis 8:32
Zhang Y et al. (2010) Stereoscopic visual attention model for 3D video. Advances in Multimedia Modeling, Springer-Verlag, pp. 314–324
Zhang L, Gu ZH, Li H (2013) SDSP: a novel saliency detection method by combining simple priors. ICIP
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This work was partly supported by Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant STPGP 447339–13 and Institute for Computing Information and Cognitive Systems (ICICS) at UBC.
Rights and permissions
About this article
Cite this article
Banitalebi-Dehkordi, A., Pourazad, M.T. & Nasiopoulos, P. A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D). Multimed Tools Appl 76, 23859–23890 (2017). https://doi.org/10.1007/s11042-016-4155-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4155-y