A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D)

Banitalebi-Dehkordi, Amin; Pourazad, Mahsa T.; Nasiopoulos, Panos

doi:10.1007/s11042-016-4155-y

A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D)

Published: 23 November 2016

Volume 76, pages 23859–23890, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Amin Banitalebi-Dehkordi^1,2,
Mahsa T. Pourazad^2,3 &
Panos Nasiopoulos^1,2

596 Accesses
17 Citations
Explore all metrics

Abstract

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bottom-Up Saliency Models for Videos: A Practical Review

Superpixel-Based Stereoscopic Video Saliency Detection Using Support Vector Regression Learning

Salient objects detection in dynamic scenes using color and texture features

Article 18 February 2017

References

Adams FM, Osgood CE (1973) A cross-cultural study of the affective meanings of color. J Cross-Cult Psychol 4:135
Article Google Scholar
Azimi M, Banitalebi-Dehkordi A, Dong Y, Pourazad MT, Nasiopoulos P (2014) Evaluating the performance of existing full-reference quality metrics on High Dynamic Range (HDR) Video content,” ICMSP 2014: XII International Conference on Multimedia Signal Processing, Nov. 2014, Venice
Baik M et al. (2013) Investigation of eye-catching colors using eye tracking. Proc. of SPIE-IS&T Electronic Imaging, SPIE vol. 8651
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2012) A human visual system based 3D video quality metric. 2nd International Conference on 3D Imaging, IC3D, Dec. 2012, Belgium
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2013a) 3D video quality metric for mobile applications. 38th International Conference on Acoustic, Speech, and Signal Processing, ICASSP, May 2013, Vancouver
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2013b) 3D video quality metric for 3D video compression,” 11th IEEE IVMSP Workshop: 3D Image/Video Technologies and Applications, June 2013, Seoul
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2014a) Effect of high frame rates on 3D video quality of experience. International Conference on Consumer Electronics, ICCE
Banitalebi-Dehkordi A, Azimi M, Pourazad MT, Nasiopoulos P (2014b) Compression of high dynamic range video using the HEVC and H. 264/AVC standards. QSHINE 2014 Conference, Greece, Aug. 2014 (invited paper)
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2015a) The effect of frame rate on 3D video quality and bitrate. Springer Journal of 3D Research 6(1):5–34. doi:10.1007/s13319-014-0034-3
Article Google Scholar
Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2016) An efficient human visual system based quality metric for 3D video. Multimed Tools Appl 75:4187–4215
Banitalebi-Dehkordi A, Dong Y, Pourazad MT, Nasiopoulos P (2015b) A Learning based Visual Saliency fusion model for High Dynamic Range Video (LBVS-HDR). 23rd European Signal Processing Conference, EUSIPCO
Banitalebi-Dehkordi A, Nasiopoulos E, Pourazad MT, Nasiopoulos P (2016a) Benchmark three-dimensional eye-tracking dataset for visual saliency prediction on stereoscopic three-dimensional video. J Electron Imaging 25(1):013008. doi:10.1117/1.JEI.25.1.013008. Data available at: http://dml.ece.ubc.ca/data/
Banitalebi-Dehkordi A, Azimi M, Pourazad MT, Nasiopoulos P (2016b) Visual saliency aided High Dynamic Range (HDR) video quality metrics. 2016 I.E. International Conference on Communications Workshops (ICC)
Beverley KI, Regan D (1979) Visual perception of changing size: The effect of object size. J Vis Res 19(10):1093–1104
Article Google Scholar
Boev A et al. (2009) Classification and simulation of stereoscopic artefacts in mobile 3DTV content. Electronic Imaging Symposium
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE TIP 22(1):55–69
MathSciNet MATH Google Scholar
Bourassa DC, McManus IC, Bryden MP (1996) Handedness and eye-dominance: a meta-analysis of their relationship. Laterality 1(1):5–34
Article Google Scholar
Breiman L (2001a) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324
Article MATH Google Scholar
Breiman L (2001b) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324
Article MATH Google Scholar
Breiman L, Cutler A (2001) Random forest. Mach Learn 45:5–32
Article Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multi-model inference. Springer (2nd ed), p. 51
Chamaret C, Godeffroy S, Lopez P, Meur OL (2010) Adaptive 3D rendering based on region-of-interest. in Proc SPIE, Feb 2010
Chen W, Fournier J, Barkowsky M, Le Callet P (2012a) Quality of experience model for 3DTV. SPIE, San Francisco
Book Google Scholar
Chen W, Fournier J, Barkowsky M, Le Callet P (2012b) Quality of experience model for 3DTV. SPIE, San Francisco
Book Google Scholar
Comanicu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE PAMI 24:603–619
Article Google Scholar
Coria L, Xu D, Nasiopoulos P (2011) Quality of Experience of Stereoscopic Content on Displays of Different Sizes: a Comprehensive Subjective Evaluation. IEEE ICCE 9-12:778–779
Google Scholar
Coria L, Xu D, Nasiopoulos P (2012) Automatic stereoscopic 3D video reframing,” 3DTV, ETH Zurich, Oct 15-17, pp. 1–4
Drulea M, Nedevschi S (2013) Motion estimation using the correlation transform. IEEE TIP 22(8):3260–3270
Google Scholar
Duncan K, Sarkar S (2012) Saliency in images and videos: a brief survey. IET Comput Vis 6:514–523
Article Google Scholar
Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13:11
Article Google Scholar
Erdogan T (2016) How to calculate luminosity, dominant wavelength, and excitation purity. Semrock White Paper Series
Google Scholar
Everingham M et al (2014) The PASCAL visual object classes challenge - a retrospective. Int J Comput Vis 111:98
Article Google Scholar
Fan X, Liu Z, Sun G (2014) Salient region detection for stereoscopic images. 19th International Conference on DSP, Hong Kong
Fang Y et al (2014a) Saliency detection for stereoscopic images. IEEE TIP 23(6):2625–2636
MathSciNet Google Scholar
Fang Y et al., (2014b) “An eye-tracking database for stereoscopic video,” QoMEX, Singapore
Felzenszwalb PF et al. (2010) Object detection with discriminatively trained part based models. PAMI
Gelasca ED, Tomasic D, Ebrahimi T (2005) Which colors best catch your eyes: a subjective study of color saliency. ISCAS
Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. CVPR 34:1915
Google Scholar
Hanhart P and Ebrahimi T, (2014) “EyeC3D: 3D video eye tracking dataset,” QoMEX, Singapore
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. NIPS
Irwin DE (1992) Visual memory within and across fixations. Eye movements and Visual Cognition
Itti L, Koch C (2000) A saliency-based search mechanism for overt and covert shifts of visual attention. Vis Res 40:1489
Article Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20:1254
Article Google Scholar
Jiang Q, Duan F, Shao F (2014) 3D visual attention for stereoscopic image quality assessment. Journal of Software, vol. 9, no. 7, July 2014
Jonas JB, Schneider U, Naumann GOH (1992) Count and density of human retinal photoreceptors. Graefes Arch Clin Exp Ophthalmol 230:505–510
Article Google Scholar
Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. ICIP
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. ICCV
Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. Comp. Sci. and Artificial Intelligence Lab. Tech. Report, http://saliency.mit.edu/
Khaustova D et al. (2013) How visual attention is modified by disparities and textures changes? SPIE HVEI
Kim H, Lee S, Bovik AC (2014) Saliency prediction on stereoscopic videos. IEEE TIP 23(4):1476
MathSciNet Google Scholar
Lang C et al. (2012) Depth matters: influence of depth cues on visual saliency. ECCV
Li W, Goodchild MF, Church R (2013) An efficient measure of compactness for two-dimensional shapes and its application in regionalization problems. Int J Geogr Inf Sci 27:1227
Article Google Scholar
Lin JY, Franconeri S, Enns JT (2008) Objects on a collision path with the observer demand attention. Psychol Sci 19:686
Article Google Scholar
Liu Y, Cormack LK, Bovik AC (2010) Natural scene statistics at stereo fixations. 2010 Symposium on Eye-Tracking Research & Applications, ETRA
Lübbe E (2010) Colours in the mind - colour systems in reality
Maki A, Nordlund P, Eklundh J (1996) A computational model of depth based attention. IEEE 13th Int Conf Pattern Recognition, Aug 1996
Neisser U (1967) Cognitive psychology. Appleton-Century-Crofts
Niu Y et al. (2012) Leveraging stereopsis for saliency analysis. CVPR
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic presentation of the spatial envelope. Int J Comput Vis 42:145–175
Article MATH Google Scholar
Ouerhani N, Hugli H (2000) Computing visual attention from scene depth. IEEE 15th Int Conf Pattern Recognition, Sep 2000
Park Y, Lee B, Cheong W, Hur N (2012) Stereoscopic 3D visual attention model considering comfortable viewing. IET IPR
Riche N et al. (2013a) A multi-scale rarity-based saliency detection with its comparative statistical analysis. Sig Proc Img Comm
Riche N et al. (2013b) Saliency and human fixations: state-of-the-art and study of comparison metrics. ICCV, pages 1153–1160
Rubner Y, Tomasi C, Guibas LJ (2000) The earth movers distance as a metric for image retrieval. Int J Comput Vis 40:2000
Article MATH Google Scholar
Schauerte B, Stiefelhagen R (2012) Quaternion-based spectral saliency detection for eye fixation prediction. ECCV
Schubert EF (2006) Light emitting diodes, 2nd Ed. Cambridge University Press
SensoMotoric Instruments (SMI), (2010) “Experiment center manual,”
Seo HJ, Milanfar P (2012) Static and space-time visual saliency detection by self-resemblance. J Vis 9:15
Article Google Scholar
Seuntiens P (2006) Visual experience of 3D TV. Doctoral thesis, Eindhoven University of Technology
Shen C, Song M and Zhao Q, (2012) “Learning high-level concepts by training a deep network on eye fixations,” Deep Learning and Unsupervised Feature Learning Workshop, USA
Smith R (2007) An overview of Tesseract OCR engine. Proc Ninth Int Conference on Document Analysis and Recognition (ICDAR)
Tanimoto M, Fujii T, and Suzuki K, (2009) Video depth estimation reference software (DERS) with image segmentation and block matching. ISO/IEC JTC1/SC29/WG11 MPEG/M16092, Switzerland
Tavakoli H, Rahtu E, Heikkila J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. SCIA
Tian M, Wan S and Yue L (2010) “A color saliency model for salient objects detection in natural scenes,” Advances in Multimedia Modeling, Lecture Notes in Computer Science vol. 5916, pp 240–250
Treisman A, Gelade G (1980) A feature integration theory of attention. Cogn Psychol 12:97–136
Article Google Scholar
Viola P, Jones M, (2001) Rapid object detection using a boosted cascade of simple features. CVPR
Wang J et al (2013) Computational model of stereoscopic 3D visual saliency. IEEE TIP 22:2151
MathSciNet MATH Google Scholar
Wolfe J (2000) Visual attention. Seeing, Academic, pp. 335–386
Wopking M (1995) Viewing comfort with stereoscopic pictures. Journal of the SID
Xu D, Coria LE, Nasiopoulos P (July 2012) Guidelines for an improved quality of experience in 3D TV and 3D mobile displays. J SID 20(7):397–407. doi:10.1002/jsid.99
Google Scholar
Yubing T, Cheikh FA, Guraya FFE, Konik H, Tremeau A (2011) A spatiotemporal saliency model for video surveillance. Cogn Comput 3:241–263
Article Google Scholar
Zhai Y and Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. in ACM Multimedia, pp. 815–824
Zhang J, Sclaroff S (2013) Saliency detection: a boolean map approach. ICCV
Zhang L et al (2008) SUN: a Bayesian framework for saliency using natural statistics. J Vis 8:32
Article Google Scholar
Zhang Y et al. (2010) Stereoscopic visual attention model for 3D video. Advances in Multimedia Modeling, Springer-Verlag, pp. 314–324
Zhang L, Gu ZH, Li H (2013) SDSP: a novel saliency detection method by combining simple priors. ICIP

Download references

Author information

Authors and Affiliations

ECE Department, University of British Columbia, Vancouver, BC, Canada
Amin Banitalebi-Dehkordi & Panos Nasiopoulos
ICICS, University of British Columbia (UBC), Vancouver, BC, Canada
Amin Banitalebi-Dehkordi, Mahsa T. Pourazad & Panos Nasiopoulos
TELUS Communications Inc., Vancouver, BC, Canada
Mahsa T. Pourazad

Authors

Amin Banitalebi-Dehkordi
View author publications
You can also search for this author in PubMed Google Scholar
Mahsa T. Pourazad
View author publications
You can also search for this author in PubMed Google Scholar
Panos Nasiopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amin Banitalebi-Dehkordi.

Ethics declarations

Funding

This work was partly supported by Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant STPGP 447339–13 and Institute for Computing Information and Cognitive Systems (ICICS) at UBC.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banitalebi-Dehkordi, A., Pourazad, M.T. & Nasiopoulos, P. A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D). Multimed Tools Appl 76, 23859–23890 (2017). https://doi.org/10.1007/s11042-016-4155-y

Download citation

Received: 10 May 2016
Revised: 16 September 2016
Accepted: 14 November 2016
Published: 23 November 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11042-016-4155-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D)

Abstract

Access this article

Similar content being viewed by others

Bottom-Up Saliency Models for Videos: A Practical Review

Superpixel-Based Stereoscopic Video Saliency Detection Using Support Vector Regression Learning

Salient objects detection in dynamic scenes using color and texture features

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D)

Abstract

Access this article

Similar content being viewed by others

Bottom-Up Saliency Models for Videos: A Practical Review

Superpixel-Based Stereoscopic Video Saliency Detection Using Support Vector Regression Learning

Salient objects detection in dynamic scenes using color and texture features

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation