Skip to main content

Advertisement

Log in

A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D)

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Saliency prediction models provide a probabilistic map of relative likelihood of an image or video region to attract the attention of the human visual system. Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene. A new fovea-based model of spatial distance between the image regions is adopted for considering local and global feature calculations. To efficiently fuse the conspicuity maps generated by our method to one single saliency map that is highly correlated with the eye-fixation data, a random forest based algorithm is utilized. The performance of the proposed saliency model is evaluated against the results of an eye-tracking experiment, which involved 24 subjects and an in-house database of 61 captured stereoscopic videos. Our stereo video database as well as the eye-tracking data are publicly available along with this paper. Experiment results show that the proposed saliency prediction method achieves competitive performance compared to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adams FM, Osgood CE (1973) A cross-cultural study of the affective meanings of color. J Cross-Cult Psychol 4:135

    Article  Google Scholar 

  2. Azimi M, Banitalebi-Dehkordi A, Dong Y, Pourazad MT, Nasiopoulos P (2014) Evaluating the performance of existing full-reference quality metrics on High Dynamic Range (HDR) Video content,” ICMSP 2014: XII International Conference on Multimedia Signal Processing, Nov. 2014, Venice

  3. Baik M et al. (2013) Investigation of eye-catching colors using eye tracking. Proc. of SPIE-IS&T Electronic Imaging, SPIE vol. 8651

  4. Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2012) A human visual system based 3D video quality metric. 2nd International Conference on 3D Imaging, IC3D, Dec. 2012, Belgium

  5. Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2013a) 3D video quality metric for mobile applications. 38th International Conference on Acoustic, Speech, and Signal Processing, ICASSP, May 2013, Vancouver

  6. Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2013b) 3D video quality metric for 3D video compression,” 11th IEEE IVMSP Workshop: 3D Image/Video Technologies and Applications, June 2013, Seoul

  7. Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2014a) Effect of high frame rates on 3D video quality of experience. International Conference on Consumer Electronics, ICCE

  8. Banitalebi-Dehkordi A, Azimi M, Pourazad MT, Nasiopoulos P (2014b) Compression of high dynamic range video using the HEVC and H. 264/AVC standards. QSHINE 2014 Conference, Greece, Aug. 2014 (invited paper)

  9. Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2015a) The effect of frame rate on 3D video quality and bitrate. Springer Journal of 3D Research 6(1):5–34. doi:10.1007/s13319-014-0034-3

    Article  Google Scholar 

  10. Banitalebi-Dehkordi A, Pourazad MT, Nasiopoulos P (2016) An efficient human visual system based quality metric for 3D video. Multimed Tools Appl 75:4187–4215

  11. Banitalebi-Dehkordi A, Dong Y, Pourazad MT, Nasiopoulos P (2015b) A Learning based Visual Saliency fusion model for High Dynamic Range Video (LBVS-HDR). 23rd European Signal Processing Conference, EUSIPCO

  12. Banitalebi-Dehkordi A, Nasiopoulos E, Pourazad MT, Nasiopoulos P (2016a) Benchmark three-dimensional eye-tracking dataset for visual saliency prediction on stereoscopic three-dimensional video. J Electron Imaging 25(1):013008. doi:10.1117/1.JEI.25.1.013008. Data available at: http://dml.ece.ubc.ca/data/

  13. Banitalebi-Dehkordi A, Azimi M, Pourazad MT, Nasiopoulos P (2016b) Visual saliency aided High Dynamic Range (HDR) video quality metrics. 2016 I.E. International Conference on Communications Workshops (ICC)

  14. Beverley KI, Regan D (1979) Visual perception of changing size: The effect of object size. J Vis Res 19(10):1093–1104

    Article  Google Scholar 

  15. Boev A et al. (2009) Classification and simulation of stereoscopic artefacts in mobile 3DTV content. Electronic Imaging Symposium

  16. Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE TIP 22(1):55–69

    MathSciNet  MATH  Google Scholar 

  17. Bourassa DC, McManus IC, Bryden MP (1996) Handedness and eye-dominance: a meta-analysis of their relationship. Laterality 1(1):5–34

    Article  Google Scholar 

  18. Breiman L (2001a) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  19. Breiman L (2001b) Random forests. Mach Learn 45(1):5–32. doi:10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  20. Breiman L, Cutler A (2001) Random forest. Mach Learn 45:5–32

    Article  Google Scholar 

  21. Burnham KP, Anderson DR (2002) Model selection and multi-model inference. Springer (2nd ed), p. 51

  22. Chamaret C, Godeffroy S, Lopez P, Meur OL (2010) Adaptive 3D rendering based on region-of-interest. in Proc SPIE, Feb 2010

  23. Chen W, Fournier J, Barkowsky M, Le Callet P (2012a) Quality of experience model for 3DTV. SPIE, San Francisco

    Book  Google Scholar 

  24. Chen W, Fournier J, Barkowsky M, Le Callet P (2012b) Quality of experience model for 3DTV. SPIE, San Francisco

    Book  Google Scholar 

  25. Comanicu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE PAMI 24:603–619

    Article  Google Scholar 

  26. Coria L, Xu D, Nasiopoulos P (2011) Quality of Experience of Stereoscopic Content on Displays of Different Sizes: a Comprehensive Subjective Evaluation. IEEE ICCE 9-12:778–779

    Google Scholar 

  27. Coria L, Xu D, Nasiopoulos P (2012) Automatic stereoscopic 3D video reframing,” 3DTV, ETH Zurich, Oct 15-17, pp. 1–4

  28. Drulea M, Nedevschi S (2013) Motion estimation using the correlation transform. IEEE TIP 22(8):3260–3270

    Google Scholar 

  29. Duncan K, Sarkar S (2012) Saliency in images and videos: a brief survey. IET Comput Vis 6:514–523

    Article  Google Scholar 

  30. Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13:11

    Article  Google Scholar 

  31. Erdogan T (2016) How to calculate luminosity, dominant wavelength, and excitation purity. Semrock White Paper Series

    Google Scholar 

  32. Everingham M et al (2014) The PASCAL visual object classes challenge - a retrospective. Int J Comput Vis 111:98

    Article  Google Scholar 

  33. Fan X, Liu Z, Sun G (2014) Salient region detection for stereoscopic images. 19th International Conference on DSP, Hong Kong

  34. Fang Y et al (2014a) Saliency detection for stereoscopic images. IEEE TIP 23(6):2625–2636

    MathSciNet  Google Scholar 

  35. Fang Y et al., (2014b) “An eye-tracking database for stereoscopic video,” QoMEX, Singapore

  36. Felzenszwalb PF et al. (2010) Object detection with discriminatively trained part based models. PAMI

  37. Gelasca ED, Tomasic D, Ebrahimi T (2005) Which colors best catch your eyes: a subjective study of color saliency. ISCAS

  38. Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. CVPR 34:1915

    Google Scholar 

  39. Hanhart P and Ebrahimi T, (2014) “EyeC3D: 3D video eye tracking dataset,” QoMEX, Singapore

  40. Harel J, Koch C, Perona P (2006) Graph-based visual saliency. NIPS

  41. Irwin DE (1992) Visual memory within and across fixations. Eye movements and Visual Cognition

  42. Itti L, Koch C (2000) A saliency-based search mechanism for overt and covert shifts of visual attention. Vis Res 40:1489

    Article  Google Scholar 

  43. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20:1254

    Article  Google Scholar 

  44. Jiang Q, Duan F, Shao F (2014) 3D visual attention for stereoscopic image quality assessment. Journal of Software, vol. 9, no. 7, July 2014

  45. Jonas JB, Schneider U, Naumann GOH (1992) Count and density of human retinal photoreceptors. Graefes Arch Clin Exp Ophthalmol 230:505–510

    Article  Google Scholar 

  46. Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. ICIP

  47. Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. ICCV

  48. Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations. Comp. Sci. and Artificial Intelligence Lab. Tech. Report, http://saliency.mit.edu/

  49. Khaustova D et al. (2013) How visual attention is modified by disparities and textures changes? SPIE HVEI

  50. Kim H, Lee S, Bovik AC (2014) Saliency prediction on stereoscopic videos. IEEE TIP 23(4):1476

    MathSciNet  Google Scholar 

  51. Lang C et al. (2012) Depth matters: influence of depth cues on visual saliency. ECCV

  52. Li W, Goodchild MF, Church R (2013) An efficient measure of compactness for two-dimensional shapes and its application in regionalization problems. Int J Geogr Inf Sci 27:1227

    Article  Google Scholar 

  53. Lin JY, Franconeri S, Enns JT (2008) Objects on a collision path with the observer demand attention. Psychol Sci 19:686

    Article  Google Scholar 

  54. Liu Y, Cormack LK, Bovik AC (2010) Natural scene statistics at stereo fixations. 2010 Symposium on Eye-Tracking Research & Applications, ETRA

  55. Lübbe E (2010) Colours in the mind - colour systems in reality

  56. Maki A, Nordlund P, Eklundh J (1996) A computational model of depth based attention. IEEE 13th Int Conf Pattern Recognition, Aug 1996

  57. Neisser U (1967) Cognitive psychology. Appleton-Century-Crofts

  58. Niu Y et al. (2012) Leveraging stereopsis for saliency analysis. CVPR

  59. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic presentation of the spatial envelope. Int J Comput Vis 42:145–175

    Article  MATH  Google Scholar 

  60. Ouerhani N, Hugli H (2000) Computing visual attention from scene depth. IEEE 15th Int Conf Pattern Recognition, Sep 2000

  61. Park Y, Lee B, Cheong W, Hur N (2012) Stereoscopic 3D visual attention model considering comfortable viewing. IET IPR

  62. Riche N et al. (2013a) A multi-scale rarity-based saliency detection with its comparative statistical analysis. Sig Proc Img Comm

  63. Riche N et al. (2013b) Saliency and human fixations: state-of-the-art and study of comparison metrics. ICCV, pages 1153–1160

  64. Rubner Y, Tomasi C, Guibas LJ (2000) The earth movers distance as a metric for image retrieval. Int J Comput Vis 40:2000

    Article  MATH  Google Scholar 

  65. Schauerte B, Stiefelhagen R (2012) Quaternion-based spectral saliency detection for eye fixation prediction. ECCV

  66. Schubert EF (2006) Light emitting diodes, 2nd Ed. Cambridge University Press

  67. SensoMotoric Instruments (SMI), (2010) “Experiment center manual,”

  68. Seo HJ, Milanfar P (2012) Static and space-time visual saliency detection by self-resemblance. J Vis 9:15

    Article  Google Scholar 

  69. Seuntiens P (2006) Visual experience of 3D TV. Doctoral thesis, Eindhoven University of Technology

  70. Shen C, Song M and Zhao Q, (2012) “Learning high-level concepts by training a deep network on eye fixations,” Deep Learning and Unsupervised Feature Learning Workshop, USA

  71. Smith R (2007) An overview of Tesseract OCR engine. Proc Ninth Int Conference on Document Analysis and Recognition (ICDAR)

  72. Tanimoto M, Fujii T, and Suzuki K, (2009) Video depth estimation reference software (DERS) with image segmentation and block matching. ISO/IEC JTC1/SC29/WG11 MPEG/M16092, Switzerland

  73. Tavakoli H, Rahtu E, Heikkila J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. SCIA

  74. Tian M, Wan S and Yue L (2010) “A color saliency model for salient objects detection in natural scenes,” Advances in Multimedia Modeling, Lecture Notes in Computer Science vol. 5916, pp 240–250

  75. Treisman A, Gelade G (1980) A feature integration theory of attention. Cogn Psychol 12:97–136

    Article  Google Scholar 

  76. Viola P, Jones M, (2001) Rapid object detection using a boosted cascade of simple features. CVPR

  77. Wang J et al (2013) Computational model of stereoscopic 3D visual saliency. IEEE TIP 22:2151

    MathSciNet  MATH  Google Scholar 

  78. Wolfe J (2000) Visual attention. Seeing, Academic, pp. 335–386

  79. Wopking M (1995) Viewing comfort with stereoscopic pictures. Journal of the SID

  80. Xu D, Coria LE, Nasiopoulos P (July 2012) Guidelines for an improved quality of experience in 3D TV and 3D mobile displays. J SID 20(7):397–407. doi:10.1002/jsid.99

    Google Scholar 

  81. Yubing T, Cheikh FA, Guraya FFE, Konik H, Tremeau A (2011) A spatiotemporal saliency model for video surveillance. Cogn Comput 3:241–263

    Article  Google Scholar 

  82. Zhai Y and Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. in ACM Multimedia, pp. 815–824

  83. Zhang J, Sclaroff S (2013) Saliency detection: a boolean map approach. ICCV

  84. Zhang L et al (2008) SUN: a Bayesian framework for saliency using natural statistics. J Vis 8:32

    Article  Google Scholar 

  85. Zhang Y et al. (2010) Stereoscopic visual attention model for 3D video. Advances in Multimedia Modeling, Springer-Verlag, pp. 314–324

  86. Zhang L, Gu ZH, Li H (2013) SDSP: a novel saliency detection method by combining simple priors. ICIP

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amin Banitalebi-Dehkordi.

Ethics declarations

Funding

This work was partly supported by Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant STPGP 447339–13 and Institute for Computing Information and Cognitive Systems (ICICS) at UBC.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banitalebi-Dehkordi, A., Pourazad, M.T. & Nasiopoulos, P. A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D). Multimed Tools Appl 76, 23859–23890 (2017). https://doi.org/10.1007/s11042-016-4155-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4155-y

Keywords

Navigation