Skip to main content
Top

2021 | OriginalPaper | Chapter

ATSal: An Attention Based Architecture for Saliency Prediction in 360\(^\circ \) Videos

Authors : Yasser Dahou, Marouane Tliba, Kevin McGuinness, Noel O’Connor

Published in: Pattern Recognition. ICPR International Workshops and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The spherical domain representation of 360\(^\circ \) video/image presents many challenges related to the storage, processing, transmission and rendering of omnidirectional videos (ODV). Models of human visual attention can be used so that only a single viewport is rendered at a time, which is important when developing systems that allow users to explore ODV with head mounted displays (HMD). Accordingly, researchers have proposed various saliency models for 360\(^\circ \) video/images. This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360\(^\circ \) videos. The attention mechanism explicitly encodes global static visual attention allowing expert models to focus on learning the saliency on local patches throughout consecutive frames. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in 360 video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 14(1), 5–26 (2020)CrossRef Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in 360 video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 14(1), 5–26 (2020)CrossRef
2.
go back to reference De Abreu, A., Ozcinar, C., Smolic, A.: Look around you: saliency maps for omnidirectional images in VR applications. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017 De Abreu, A., Ozcinar, C., Smolic, A.: Look around you: saliency maps for omnidirectional images in VR applications. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017
3.
go back to reference Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Res. 40(10–12), 1489–1506 (2000)CrossRef Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Res. 40(10–12), 1489–1506 (2000)CrossRef
4.
5.
6.
go back to reference Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018) Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)
8.
go back to reference Min, K., Corso, J.J.: TASED-net: temporally-aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2394–2403. ISO 690 (2019) Min, K., Corso, J.J.: TASED-net: temporally-aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2394–2403. ISO 690 (2019)
9.
go back to reference Lai, Q., Wang, W., Sun, H., Shen, J.: Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans. Image Process. 29, 1113–1126 (2019)MathSciNetCrossRef Lai, Q., Wang, W., Sun, H., Shen, J.: Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans. Image Process. 29, 1113–1126 (2019)MathSciNetCrossRef
10.
go back to reference Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i-Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: British Machine Vision Conference (BMVC) (2019) Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i-Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: British Machine Vision Conference (BMVC) (2019)
11.
go back to reference Djilali, Y.A.D., Sayah, M., McGuinness, K., O’Connor, N.E.: 3DSAL: an efficient 3D-CNN architecture for video saliency prediction (2020) Djilali, Y.A.D., Sayah, M., McGuinness, K., O’Connor, N.E.: 3DSAL: an efficient 3D-CNN architecture for video saliency prediction (2020)
12.
go back to reference Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: A large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894–4903 (2018) Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: A large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894–4903 (2018)
13.
go back to reference Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20(7), 1688–1698 (2017)CrossRef Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20(7), 1688–1698 (2017)CrossRef
14.
go back to reference Pan, J., et al.: SalGAN: visual saliency prediction with adversarial networks. In: CVPR Scene Understanding Workshop (SUNw), July 2017 Pan, J., et al.: SalGAN: visual saliency prediction with adversarial networks. In: CVPR Scene Understanding Workshop (SUNw), July 2017
15.
go back to reference Bogdanova, I., Bur, A., Hügli, H., Farine, P.A.: Dynamic visual attention on the sphere. Comput. Vis. Image Underst. 114(1), 100–110 (2010)CrossRef Bogdanova, I., Bur, A., Hügli, H., Farine, P.A.: Dynamic visual attention on the sphere. Comput. Vis. Image Underst. 114(1), 100–110 (2010)CrossRef
16.
go back to reference Bogdanova, I., Bur, A., Hugli, H.: Visual attention on the sphere. IEEE Trans. Image Process. 17(11), 2000–2014 (2008)MathSciNetCrossRef Bogdanova, I., Bur, A., Hugli, H.: Visual attention on the sphere. IEEE Trans. Image Process. 17(11), 2000–2014 (2008)MathSciNetCrossRef
17.
go back to reference Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment?. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017 Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment?. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017
18.
go back to reference Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)CrossRef Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)CrossRef
19.
go back to reference Sitzmann, V., et al.: Saliency in VR: how do people explore virtual environments? IEEE Trans. Visual Comput. Graphics 24(4), 1633–1642 (2018)CrossRef Sitzmann, V., et al.: Saliency in VR: how do people explore virtual environments? IEEE Trans. Visual Comput. Graphics 24(4), 1633–1642 (2018)CrossRef
20.
go back to reference Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015) Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)
21.
go back to reference Lebreton, P., Raake, A.: GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images. Signal Process. Image Commun. 69, 69–78 (2018)CrossRef Lebreton, P., Raake, A.: GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images. Signal Process. Image Commun. 69, 69–78 (2018)CrossRef
22.
go back to reference Zhang, J., Sclaroff, S.: Saliency detection: a Boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013) Zhang, J., Sclaroff, S.: Saliency detection: a Boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013)
23.
go back to reference Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances In Neural Information Processing Systems, pp. 545–552 (2007) Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances In Neural Information Processing Systems, pp. 545–552 (2007)
24.
go back to reference Maugey, T., Le Meur, O., Liu, Z.: Saliency-based navigation in omnidirectional image. In: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). Luton 2017, pp. 1–6 (2017) Maugey, T., Le Meur, O., Liu, Z.: Saliency-based navigation in omnidirectional image. In: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). Luton 2017, pp. 1–6 (2017)
25.
go back to reference Battisti, F., Baldoni, S., Brizzi, M., Carli, M.: A feature-based approach for saliency estimation of omni-directional images. Signal Process. Image Commun. 69, 53–59 (2018)CrossRef Battisti, F., Baldoni, S., Brizzi, M., Carli, M.: A feature-based approach for saliency estimation of omni-directional images. Signal Process. Image Commun. 69, 53–59 (2018)CrossRef
26.
go back to reference Fang, Y., Zhang, X., Imamoglu, N.: A novel superpixel-based saliency detection model for 360-degree images. Signal Process. Image Commun. 69, 1–7 (2018)CrossRef Fang, Y., Zhang, X., Imamoglu, N.: A novel superpixel-based saliency detection model for 360-degree images. Signal Process. Image Commun. 69, 1–7 (2018)CrossRef
27.
go back to reference David, EJ., Gutiérrez, J., Coutrot, A., Da Silva, M. P., Callet, P.L.: A dataset of head and eye movements for 360 videos. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 432–437. ISO 690, June 2018 David, EJ., Gutiérrez, J., Coutrot, A., Da Silva, M. P., Callet, P.L.: A dataset of head and eye movements for 360 videos. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 432–437. ISO 690, June 2018
28.
go back to reference Zhang, Z., Xu, Y., Yu, J., Gao, S.: Saliency detection in 360 videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 488–503 (2018) Zhang, Z., Xu, Y., Yu, J., Gao, S.: Saliency detection in 360 videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 488–503 (2018)
29.
go back to reference Cheng, H.T., Chao, C.H., Dong, J.D., Wen, H.K., Liu, T.L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2018) Cheng, H.T., Chao, C.H., Dong, J.D., Wen, H.K., Liu, T.L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2018)
30.
go back to reference Suzuki, T., Yamanaka, T.: Saliency map estimation for omni-directional image considering prior distributions. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2079–2084. IEEE, October 2018 Suzuki, T., Yamanaka, T.: Saliency map estimation for omni-directional image considering prior distributions. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2079–2084. IEEE, October 2018
31.
go back to reference Lebreton, P., Fremerey, S., Raake, A.: V-BMS360: a video extention to the BMS360 image saliency model. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–4. IEEE, July 2018 Lebreton, P., Fremerey, S., Raake, A.: V-BMS360: a video extention to the BMS360 image saliency model. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–4. IEEE, July 2018
32.
go back to reference Nguyen, A., Yan, Z., Nahrstedt, K.: Your attention is unique: detecting 360-degree video saliency in head-mounted display for head movement prediction. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1190–1198, October 2018 Nguyen, A., Yan, Z., Nahrstedt, K.: Your attention is unique: detecting 360-degree video saliency in head-mounted display for head movement prediction. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1190–1198, October 2018
33.
go back to reference Zhang, K., Chen, Z.: Video saliency prediction based on spatial-temporal two-stream network. IEEE Trans. Circuits Syst. Video Technol. 29(12), 3544–3557 (2018)CrossRef Zhang, K., Chen, Z.: Video saliency prediction based on spatial-temporal two-stream network. IEEE Trans. Circuits Syst. Video Technol. 29(12), 3544–3557 (2018)CrossRef
34.
go back to reference Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1396–1405. IEEE, July 2017 Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1396–1405. IEEE, July 2017
35.
go back to reference Chao, F.Y., Zhang, L., Hamidouche, W., Deforges, O.: SalGAN360: visual saliency prediction on 360 degree images with generative adversarial networks. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 01–04. IEEE, July 2018 Chao, F.Y., Zhang, L., Hamidouche, W., Deforges, O.: SalGAN360: visual saliency prediction on 360 degree images with generative adversarial networks. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 01–04. IEEE, July 2018
36.
go back to reference Qiao, M., Xu, M., Wang, Z., Borji, A.: Viewport-dependent saliency prediction in 360\(^\circ \) video. IEEE Trans. Multimed. (2020) Qiao, M., Xu, M., Wang, Z., Borji, A.: Viewport-dependent saliency prediction in 360\(^\circ \) video. IEEE Trans. Multimed. (2020)
37.
go back to reference Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017) Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
38.
go back to reference Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016) Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)
39.
go back to reference Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821 (2020) Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:​2005.​10821 (2020)
40.
go back to reference Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016) Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)
41.
go back to reference Rai, Y., Gutiérrez, J., Le Callet, P.: A dataset of head and eye movements for 360 degree images. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 205–210, June 2017 Rai, Y., Gutiérrez, J., Le Callet, P.: A dataset of head and eye movements for 360 degree images. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 205–210, June 2017
43.
go back to reference Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)CrossRef Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)CrossRef
44.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
45.
go back to reference Bao, Y., Zhang, T., Pande, A., Wu, H., Liu, X.: Motion-prediction-based multicast for 360-degree video transmissions. In: 2017 14th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 1–9. IEEE, June 2017 Bao, Y., Zhang, T., Pande, A., Wu, H., Liu, X.: Motion-prediction-based multicast for 360-degree video transmissions. In: 2017 14th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 1–9. IEEE, June 2017
46.
go back to reference Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)CrossRef Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)CrossRef
Metadata
Title
ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos
Authors
Yasser Dahou
Marouane Tliba
Kevin McGuinness
Noel O’Connor
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-68796-0_22

Premium Partner