Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2018 | OriginalPaper | Chapter

AFSnet: Fixation Prediction in Movie Scenes with Auxiliary Facial Saliency

Authors : Ziqi Zhou, Meijun Sun, Jinchang Ren, Zheng Wang

Published in: Advances in Brain Inspired Cognitive Systems

Publisher: Springer International Publishing

share
SHARE

Abstract

While data-driven methods for image saliency detection has become more and more mature, video saliency detection, which has additional inter-frame motion and temporal information, still needs further exploration. Different from images, video data, in addition to rich semantic information, also contains a large number of contextual information and motion features. For different scenes, video saliency also has different tendencies. In the movie scene, the face has the strongest visual stimulus to the viewer. In view of the specific movie scene, we propose an efficient and novel video attention prediction model with auxiliary facial saliency (AFSnet) to predict human eye locations in movie scene. The proposed model takes FCN as the basic structure, and improves the prediction effect by adaptively combining facial saliency hints. We give qualitative and quantitative experiments to prove the validity of the model.
Literature
1.
go back to reference Ren, J., et al.: Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process. 4(4), 294–301 (2010) CrossRef Ren, J., et al.: Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process. 4(4), 294–301 (2010) CrossRef
2.
go back to reference Yan, Y., et al.: Adaptive fusion of color and spatial features for noise-robust retrieval of colored logo and trademark images. Multidimension. Syst. Signal Process. 27(4), 945–968 (2016) MathSciNetCrossRef Yan, Y., et al.: Adaptive fusion of color and spatial features for noise-robust retrieval of colored logo and trademark images. Multidimension. Syst. Signal Process. 27(4), 945–968 (2016) MathSciNetCrossRef
3.
go back to reference Yan, Y., Ren, J., et al.: Fusion of dominant colour and spatial layout features for effective image retrieval of coloured logos and trademarks. In: Multimedia Big Data (2015) Yan, Y., Ren, J., et al.: Fusion of dominant colour and spatial layout features for effective image retrieval of coloured logos and trademarks. In: Multimedia Big Data (2015)
4.
go back to reference Zheng, J., Liu, Y., Ren, J., et al.: Fusion of block and keypoints based approaches for effective copy-move image forgery detection. Multidimension. Syst. Signal Process. 27(4), 989–1005 (2016) MathSciNetCrossRef Zheng, J., Liu, Y., Ren, J., et al.: Fusion of block and keypoints based approaches for effective copy-move image forgery detection. Multidimension. Syst. Signal Process. 27(4), 989–1005 (2016) MathSciNetCrossRef
5.
go back to reference Li, X., Zhao, L., Wei, L., et al.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Proc. 25(8), 3919–3930 (2016) MathSciNetCrossRef Li, X., Zhao, L., Wei, L., et al.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Proc. 25(8), 3919–3930 (2016) MathSciNetCrossRef
7.
go back to reference Zhang, P., Wang, D., Lu, H., Wang, H., Ruan, X.: Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE International Conference on Computer Vision (2017) Zhang, P., Wang, D., Lu, H., Wang, H., Ruan, X.: Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE International Conference on Computer Vision (2017)
8.
go back to reference Zhang, P., Wang, D., Lu, H., Wang, H., Yin, B.: Learning uncertain convolutional features for accurate saliency detection. In: IEEE International Conference on Computer Vision (2017) Zhang, P., Wang, D., Lu, H., Wang, H., Yin, B.: Learning uncertain convolutional features for accurate saliency detection. In: IEEE International Conference on Computer Vision (2017)
9.
go back to reference Lu, H., Yang, G., Ruan, X., Borji, A.: Detect globally, refine locally: a novel approach to saliency detection. In: IEEE Computer Vision and Pattern Recognition (2018) Lu, H., Yang, G., Ruan, X., Borji, A.: Detect globally, refine locally: a novel approach to saliency detection. In: IEEE Computer Vision and Pattern Recognition (2018)
10.
go back to reference Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: IEEE Computer Vision and Pattern Recognition (2018) Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: IEEE Computer Vision and Pattern Recognition (2018)
11.
go back to reference Lee, G., et al.: ELD-Net: an efficient deep learning architecture for accurate saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, August 2017 Lee, G., et al.: ELD-Net: an efficient deep learning architecture for accurate saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, August 2017
12.
go back to reference Wang, W., Shen, J.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Proc. 27(1), 38–49 (2018) MathSciNetCrossRef Wang, W., Shen, J.: Video salient object detection via fully convolutional networks. IEEE Trans. Image Proc. 27(1), 38–49 (2018) MathSciNetCrossRef
13.
go back to reference Yan, Y., et al.: Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn. 79, 65–78 (2018) CrossRef Yan, Y., et al.: Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn. 79, 65–78 (2018) CrossRef
14.
15.
go back to reference Liu, Z., Li, J.H., Ye, L.W., et al.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2527–2542 (2016) CrossRef Liu, Z., Li, J.H., Ye, L.W., et al.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2527–2542 (2016) CrossRef
16.
go back to reference Wang, W., Shen, J., et al.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Proc. 24(11), 4185–4196 (2015) MathSciNetCrossRef Wang, W., Shen, J., et al.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Proc. 24(11), 4185–4196 (2015) MathSciNetCrossRef
17.
go back to reference Han, J., Sun, L., Hu, X., Han, J., Shao, L.: Spatial and temporal visual attention prediction in videos using eye movement data. Neurocomputing 145(2014), 140–153 (2014) CrossRef Han, J., Sun, L., Hu, X., Han, J., Shao, L.: Spatial and temporal visual attention prediction in videos using eye movement data. Neurocomputing 145(2014), 140–153 (2014) CrossRef
18.
go back to reference Dehkordi, B., et al.: A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D). Multimed. Tools Appl. 76(22), 23859–23890 (2017) CrossRef Dehkordi, B., et al.: A learning-based visual saliency prediction model for stereoscopic 3D video (LBVS-3D). Multimed. Tools Appl. 76(22), 23859–23890 (2017) CrossRef
19.
go back to reference Wang, Z., et al.: A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 287(2018), 68–83 (2018) CrossRef Wang, Z., et al.: A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 287(2018), 68–83 (2018) CrossRef
20.
go back to reference Yan, Y., et al.: Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn. Comput. 10(1), 94–104 (2018) CrossRef Yan, Y., et al.: Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn. Comput. 10(1), 94–104 (2018) CrossRef
21.
go back to reference Toet, A.: Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans. Patt. Anal. Mach. Intell. 33(11), 2131–2148 (2011) CrossRef Toet, A.: Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans. Patt. Anal. Mach. Intell. 33(11), 2131–2148 (2011) CrossRef
23.
go back to reference Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conferences on Computer Vision and Pattern Recognition, pp. 1522–1530, July 2017 Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conferences on Computer Vision and Pattern Recognition, pp. 1522–1530, July 2017
24.
go back to reference Mathe, S., et al.: Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1408–1424 (2015) CrossRef Mathe, S., et al.: Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1408–1424 (2015) CrossRef
26.
go back to reference Han, J., Zhang, D., Wen, S., Guo, L., Liu, T.: Two-stage learning to predict human eye fixations via SDAEs. IEEE Trans. Cybern. 46(2), 487–498 (2016) CrossRef Han, J., Zhang, D., Wen, S., Guo, L., Liu, T.: Two-stage learning to predict human eye fixations via SDAEs. IEEE Trans. Cybern. 46(2), 487–498 (2016) CrossRef
27.
go back to reference Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. arXiv preprint, arXiv:​1611.​09571 (2017) Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. arXiv preprint, arXiv:​1611.​09571 (2017)
28.
go back to reference Zhang, L.Y., Tong, M.H., Marks, T.K.: SUN: a bayesian framework for saliency using natural statistics. J. Vis. 8, 32 (2008) CrossRef Zhang, L.Y., Tong, M.H., Marks, T.K.: SUN: a bayesian framework for saliency using natural statistics. J. Vis. 8, 32 (2008) CrossRef
29.
go back to reference Achanta, R., Susstrunk, S.: Saliency detection using maximum symmetric surround. IEEE Trans. on Image Processing, vol. 119, no. 9, pp. 2653–2656 (2010) Achanta, R., Susstrunk, S.: Saliency detection using maximum symmetric surround. IEEE Trans. on Image Processing, vol. 119, no. 9, pp. 2653–2656 (2010)
30.
go back to reference Guo, C., et al.: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, June 2008 Guo, C., et al.: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, June 2008
31.
go back to reference Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Proceedings of the Neural Information Processing Systems, pp. 545–552 (2007) Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Proceedings of the Neural Information Processing Systems, pp. 545–552 (2007)
33.
go back to reference Li, J., Tian, Y., Huang, T.: Visual saliency with statistical priors. Int. J. Comput. Vis. 107(3), 239–253 (2014) MathSciNetCrossRef Li, J., Tian, Y., Huang, T.: Visual saliency with statistical priors. Int. J. Comput. Vis. 107(3), 239–253 (2014) MathSciNetCrossRef
35.
go back to reference Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2011) Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2011)
36.
go back to reference Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM Multimedia, pp. 675–678 (2014) Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM Multimedia, pp. 675–678 (2014)
Metadata
Title
AFSnet: Fixation Prediction in Movie Scenes with Auxiliary Facial Saliency
Authors
Ziqi Zhou
Meijun Sun
Jinchang Ren
Zheng Wang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-00563-4_25

Premium Partner