Top

Published in:

2017 | OriginalPaper | Chapter

Deep Saliency: Prediction of Interestingness in Video with CNN

Authors : Souad Chaabouni, Jenny Benois-Pineau, Akka Zemmari, Chokri Ben Amar

Published in: Visual Content Indexing and Retrieval with Psycho-Visual Models

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep Neural Networks have become winners in indexing of visual information. They have allowed achievement of better performances in the fundamental tasks of visual information indexing and retrieval such as image classification and object recognition. In fine-grain indexing tasks, namely object recognition in visual scenes, the CNNs classifiers have to evaluate multiple “object proposals”, that is windows in the image plane of different size and location. Hence the problem of recognition is coupled with the problem of localization. In this chapter a model of prediction of Areas-if-Interest in video on the basis of Deep CNNs is proposed. A Deep CNN architecture is designed to classify windows in salient and non-salient. Then dense saliency maps are built upon classification score results. Using the known sensitivity of human visual system (HVS) to residual motion, the usual primary features such as pixel colour values are completed with residual motion features. The experiments show that the choice of the input features for the Deep CNN depends on visual task: for the interest in dynamic content, the proposed model with residual motion is more efficient.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Perceptual Texture Similarity for Machine Intelligence Applications

next chapter Introducing Image Saliency Information into Content Based Indexing and Emotional Impact Analysis

Available at http://www.di.ens.fr/~laptev/actions/hollywood2/.

Available at https://crcns.org/data-sets/eye/eye-1.

Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 185–207 (2013)CrossRef

Boulos, F., Chen, W., Parrein, B., Le Callet, P.: Region-of-interest intra prediction for H.264/AVC error resilience. In: IEEE International Conference on Image Processing, Cairo, pp. 3109–3112 (2009)

Chaabouni, S., Benois-Pineau, J., Ben Amar, C.: Transfer learning with deep networks for saliency prediction in natural video. In: 2016 IEEE International Conference on Image Processing, ICIP 2016, vol. 91 (2016)

Chaabouni, S., Benois-Pineau, J., Hadar, O.: Prediction of visual saliency in video with deep CNNs. Proceedings of the SPIE Optical Engineering + Applications, pp. 9711Q-99711Q-14 (2016)

Chaabouni, S., Benois-Pineau, J., Tison, F., Ben Amar, C.: Prediction of visual attention with Deep CNN for studies of neurodegenerative diseases. In: 14th International Workshop on Content-Based Multimedia Indexing CBMI 2016, Bucharest, 15–17 June 2016

Fathi, A., Ren, X., Rehg, J.M.: Learning to recognize objects in egocentric activities. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3281–3288 (2011)

Geng, M., Wang, Y., Xiang, T., Tian, Y.: Deep Transfer Learning for Person Re-identification. CoRR abs/1611.05244 (2016). http://arxiv.org/abs/1611.05244

Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRef

10.

González-Díaz, I., Buso, V., Benois-Pineau, J.: Perceptual modeling in the problem of active object recognition in visual scenes. Pattern Recogn. 56, 129–141 (2016)CrossRef

11.

Gygli, M., Soleymani, M.: Analyzing and predicting GIF interestingness. In: Proceedings of the 2016 ACM on Multimedia Conference, MM ’16, pp. 122–126. ACM, New York (2016). doi:10.1145/2964284.2967195. http://doi.acm.org/10.1145/2964284.2967195

12.

Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, vol. 19, pp. 545–552. MIT Press, Cambridge (2007)

13.

Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2012)CrossRef

14.

Itti, L.: CRCNS data sharing: eye movements during free-viewing of natural videos. In: Collaborative Research in Computational Neuroscience Annual Meeting, Los Angeles, CA (2008)

15.

Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)CrossRef

16.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, 03–07 November, 2014, pp. 675–678 (2014)

17.

Jiang, Y., Wang, Y., Feng, R., Xue, X., Zheng, Y., Yang, H.: Understanding and predicting interestingness of videos. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI’13, pp. 1113–1119. AAAI Press, Palo Alto (2013). http://dl.acm.org/citation.cfm?id=2891460.2891615

18.

Krizhevsky, A.: Learning multiple layers of features from tiny images. Ph.D. thesis, University of Toronto (2009)

19.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc., Red Hook (2012)

20.

Le Meur, O., Baccino, T.: Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav. Res. Methods 45(1), 251–266 (2010)CrossRef

21.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef

22.

Li, G.Y.Y.: Visual saliency based on multiscale deep features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5455–5463 (2015)

23.

Li, G.Y.Y.: Deep contrast learning for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition. 1603.01976 (2016)

24.

Lin, Y., Kong, S., Wang, D., Zhuang, Y.: Saliency detection within a deep convolutional architecture. In: Cognitive Computing for Augmented Human Intelligence: Papers from the AAAI-14 Workshop, pp. 31–37 (2014)

25.

Liu, N.H.J.Z.D.W.S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 362–370 (2015)

26.

Mai, L., Le, H., Niu, Y., Liu, F.: Rule of thirds detection from photograph. In: 2011 IEEE International Symposium on Multimedia (ISM), pp. 91–96 (2011)

27.

Manerba, F., Benois-Pineau, J., Leonardi, R.: Extraction of foreground objects from MPEG2 video stream in rough indexing framework. In: Proceedings of the EI2004, Storage and Retrieval Methods and Applications for Multimedia 2004, pp. 50–60 (2004). https://hal.archives-ouvertes.fr/hal-00308051

28.

Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., Guérin-Dugué, A.: Modelling spatio-temporal saliency to predict gaze direction for short videos. Int. J. Comput. Vis. 82(3), 231–243 (2009)CrossRef

29.

Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision & Pattern Recognition (2009)

30.

Mathe, S., Sminchisescu, C.: Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1408–1424 (2015)CrossRef

31.

Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I.J., Lavoie, E., Muller, X., Desjardins, G., Warde-Farley, D., Vincent, P., Courville, A., Bergstra, J.: Unsupervised and transfer learning challenge: a deep learning approach. In: JMLR W& CP: Proceedings of the Unsupervised and Transfer Learning Challenge and Workshop, vol. 27, pp. 97–110 (2012)

32.

Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O\left (1/k^{2}\right )\). Sov. Math. Doklady 27, 372–376 (1983)MATH

33.

Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010). http://dx.doi.org/10.1007/s10462--010-9156-z CrossRef

34.

Pan, J.G.: End-to-end convolutional network for saliency prediction. In: IEEE Conference on Computer Vision and Pattern Recognition 1507.01422 (2015)

35.

Pérez de San Roman, P., Benois-Pineau, J., Domenger, J.P., Paclet, F., Cataert, D., De Rugy, A.: Saliency Driven Object recognition in egocentric videos with deep CNN. CoRR abs/1606.07256 (2016). http://arxiv.org/abs/1606.07256

36.

Pinto, Y., van der Leij, A.R., Sligte, I.G., Lamme, V.F., Scholte, H.S.: Bottom-up and top-down attention are independent. J. Vis. 13(3), 16 (2013)CrossRef

37.

Polyak, B.: Introduction to Optimization (Translations Series in Mathematics and Engineering). Optimization Software, New York (1987)

38.

Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, New York (1992)MATH

39.

Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with bootstrapping. CoRR abs/1412.6596 (2014). http://arxiv.org/abs/1412.6596

40.

Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 15, 1–27 (2009)

41.

Shen, J., Itti, L.: Top-down influences on visual attention during listening are modulated by observer sex. Vis. Res. 65, 62–76 (2012)CrossRef

42.

Shen, C., Zhao, Q.: Learning to predict eye fixations for semantic contents using multi-layer sparse network. Neurocomputing 138, 61–68 (2014)CrossRef

43.

Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034 (2013)

44.

Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12(1), 97–136 (1980)CrossRef

45.

Uijlings, J., de Sande, K.V., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRef

46.

Vapnik, V.: Principles of risk minimization for learning theory. In: Moody, J.E., Hanson, S.J., Lippmann, R. (eds.) NIPS, pp. 831–838. Morgan Kaufmann, Burlington (1991)

47.

Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 2798–2805 (2014)

48.

Wooding, D.S.: Eye movements of large populations: II. Deriving regions of interest, coverage, and similarity using fixation maps. Behav. Res. Methods Instrum. Comput. 34(4), 518–528 (2002)

49.

Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

50.

Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the CVPR (2015)CrossRef

51.

Yoon, S., Pavlovic, V.: Sentiment flow for video interestingness prediction. In: Proceedings of the 1st ACM International Workshop on Human Centered Event Understanding from Multimedia, HuEvent 14, pp. 29–34. ACM, New York (2014). doi:10.1145/2660505.2660513

52.

Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3320–3328. Curran Associates, Inc., Red Hook (2014)

53.

Zeiler, M.D., Fergus, R.: Visualizing and Understanding Convolutional Networks. CoRR abs/1311.2901 (2013)

54.

Zen, G., de Juan, P., Song, Y., Jaimes, A.: Mouse activity as an indicator of interestingness in video. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR ’16, pp. 47–54. ACM, New York (2016). doi:10.1145/2911996.2912005. http://doi.acm.org/10.1145/2911996.2912005

55.

Zhou, Z., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition 1512.04150 (2015)

56.

Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004). doi:10.1007/s10462-004-0751-8

57.

Hebb, D.O.: The Organisation of Behaviour: A Neurophysiological Theory, p. 379. Laurence Erlbaum Associates, Inc. Mahwah (2002). ISBN:1-4106-1240-6. Originaly published Willey, New York (1949)

58.

Rosenblatt, F., The perception: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386-408 (1958)CrossRef

Title: Deep Saliency: Prediction of Interestingness in Video with CNN
Authors: Souad Chaabouni
Jenny Benois-Pineau
Akka Zemmari
Chokri Ben Amar
Publisher: Springer International Publishing
Book: Visual Content Indexing and Retrieval with Psycho-Visual Models
Print ISBN: 978-3-319-57686-2

Electronic ISBN: 978-3-319-57687-9

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-57687-9_3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner