ABSTRACT
Head movement prediction is the key enabler for the emerging 360-degree videos since it can enhance both streaming and rendering efficiency. To achieve accurate head movement prediction, it becomes imperative to understand user's visual attention on 360-degree videos under head-mounted display (HMD). Despite the rich history of saliency detection research, we observe that traditional models are designed for regular images/videos fixed at a single viewport and would introduce problems such as central bias and multi-object confusion when applied to the multi-viewport 360-degree videos switched by user interaction. To fill in this gap, this paper shifts the traditional single-viewport saliency models that have been extensively studied for decades to a fresh panoramic saliency detection specifically tailored for 360-degree videos, and thus maximally enhances the head movement prediction performance. The proposed head movement prediction framework is empowered by a newly created dataset for 360-degree video saliency, a panoramic saliency detection model and an integration of saliency and head tracking history for the ultimate head movement prediction. Experimental results demonstrate the measurable gain of both the proposed panoramic saliency detection and head movement prediction over traditional models for regular images/videos.
- Ana De Abreu, Cagri Ozcinar, and Aljosa Smolic. 2017. Look Around You: Saliency Maps for Omnidirectional Images in VR Applications. In IEEE International Conference on Quality of Multimedia Experience (QoMEX).Google Scholar
- A Deniz Aladagli, Erhan Ekmekcioglu, Dmitri Jarnikov, and Ahmet Kondoz. 2017. Predicting Head Trajectories in 360$^circ$Virtual Reality Videos. In IEEE International Conference on 3D Immersion (IC3D) .Google ScholarCross Ref
- Marc Assens, Xavier Girotext-i-Nieto, Kevin McGuinness, and Noel E. Otext'Connor. 2017. SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes. In IEEE International Conference on Computer Vision Workshop (ICCVW).Google Scholar
- Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Fredo Durand. 2018. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Trans. Pattern Anal. Mach. Intell. , Vol. PP (March 2018), 1--1.Google Scholar
- Xavier Corbillon, Gwendal Simon, Alisa Devlic, and Jacob Chakareski. 2017a. Viewport-adaptive Navigable 360-degree Video Delivery. In IEEE International Conference on Communications (ICC).Google ScholarCross Ref
- Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017b. 360-Degree Video Head Movement Dataset. In Proceedings of the 8th ACM on Multimedia Systems Conference . Google ScholarDigital Library
- Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In International Conference on Pattern Recognition (ICPR).Google ScholarCross Ref
- International Data Corporation. https://goo.gl/DHb26g. Worldwide Quarterly Augmented and Virtual Reality Headset Tracker. ( https://goo.gl/DHb26g).Google Scholar
- Fanyi Duanmu, Eymen Kurdoglu, S Amir Hosseini, Yong Liu, and Yao Wang. 2017. Prioritized Buffer Control in Two-tier 360 Video Streaming. In ACM Workshop on Virtual Reality and Augmented Reality Network . Google ScholarDigital Library
- Ching-Ling Fan, Jean Lee, Wen-Chih Lo, Chun-Ying Huang, Kuan-Ta Chen, and Cheng-Hsin Hsu. 2017. Fixation Prediction for 360 Video Streaming in Head-Mounted Virtual Reality. In ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV) . Google ScholarDigital Library
- Yuming Fang, Weisi Lin, Zhenzhong Chen, Chia-Ming Tsai, and Chia-Wen Lin. 2012. Video Saliency Detection in the Compressed Domain. ACM International Conference on Multimedia (MM) . Google ScholarDigital Library
- Yu Fang, Ryoichi Nakashima, Kazumichi Matsumiya, Ichiro Kuriki, and Satoshi Shioiri. 2015. Eye-head coordination for visual cognitive processing. In PLoS ONE 10(3): e0121035.Google ScholarCross Ref
- Wenzhong Guo, Xiaolong Sun, and Yuzhen Niu. 2015. Evaluation of visual saliency analysis algorithms in noisy images. IET Computer Vision , Vol. 9(2) (2015), 290--299.Google Scholar
- Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. 2015. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In ICCV . Google ScholarDigital Library
- Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Keras. https://keras.io. Keras: The Python Deep Learning library. ( https://keras.io).Google Scholar
- Wolf Kienzle, Bernhard Scholkopf, Felix Wichmann, and Matthias Franz. 2007. How to Find Interesting Locations in Video: A Spatiotemporal Interest Point Detector Learned from Human Eye Movements. In Hamprecht F.A., Schnorr C., Jahne B. (eds) Pattern Recognition. DAGM 2007. Lecture Notes in Computer Science . Google ScholarDigital Library
- J. B. Kuipers. 1999. Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace, and Virtual Reality .Princeton University Press, Princeton, USA.Google Scholar
- Matthias Kummerer, Lucas Theis, and Matthias Bethge. 2015. Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet. In International Conference on Learning Representations (ICLR) .Google Scholar
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient transfer learning. In 5th International Conference on Learning Representations (ICLR).Google Scholar
- Rafael Monroy, Sebastian Lutz, Tejo Chalasani, and Aljosa Smolic. 2017. SalNet360: Saliency Maps for omni-directional images with CNN. In arXiv preprint arXiv:1709.06505v1 .Google Scholar
- Afshin Taghavi Nasrabadi, Anahita Mahzari, Joseph D. Beshay, and Ravi Prakash. 2017. Adaptive 360-Degree Video Streaming using Scalable Video Coding. In ACM International Conference on Multimedia (MM). Google ScholarDigital Library
- Tam V Nguyen, Mengdi Xu, Guangyu Gao, Mohan Kankanhalli, Qi Tian, and Shuicheng Yan. 2013. Static Saliency vs. Dynamic Saliency: A Comparative Study. In ACM International Conference on Multimedia (MM) . Google ScholarDigital Library
- Yuzhen Niu, Lingling Ke, and Wenzhong Guo. 2016. Evaluation of visual saliency analysis algorithms in noisy images. Machine Vision and Applications , Vol. 27(6) (2016), 915--927.Google ScholarCross Ref
- Yuzhen Niu, Lening Lin, Yuzhong Chen, and Lingling Ke. {n. d.}. Machine learning-based framework for saliency detection in distorted images. Multimedia Tools Application ({n. d.}). Google ScholarDigital Library
- Yuzhen Niu, Wenqi Lin, and Xiao Ke. 2018. CF-based optimisation for saliency detection. IET Computer Vision , Vol. 12(4) (2018), 365--376.Google ScholarCross Ref
- Junting Pan, Elisa Sayrol, Xavier Giro-i Nieto, Kevin McGuinness, and Noel E O'Connor. 2016. Shallow and Deep Convolutional Networks for Saliency Prediction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Google Scholar
- Feng Qian, Lusheng Ji, Bo Han, and Vijay Gopalakrishnan. 2016. Optimizing 360 video delivery over cellular networks. In ACM Workshop on All Things Cellular: Operations, Applications and Challenges . Google ScholarDigital Library
- Hasim Sak, Andrew Senior, and Francoise Beaufays. 2014. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In 15th Annual Conference of the International Speech Communication Association (INTERSPEECH) .Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. In arXiv preprint arXiv:1409.1556, 2014 .Google Scholar
- Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2018. Saliency in VR: How do People Explore Virtual Environments? IEEE Trans. Vis. Comput. Graphics , Vol. 24 (April 2018), 1633--1642. Google ScholarDigital Library
- Benjamin W Tatler. 2007. The Central Fixation Bias in Scene Viewing: Selecting an Optimal Viewing Position Independently of Motor Biases and Image Feature Distributions. Journal of Vision , Vol. 7 (Nov. 2007), 1--17.Google ScholarCross Ref
- Technavio. https://goo.gl/zJCdnO. Global 360-Degree Camera Market 2016--2020. ( https://goo.gl/zJCdnO).Google Scholar
- Evgeniy Upenik and Touradj Ebrahimi. 2017. A Simple Method to Obtain Visual Attention Data in Head Mounted Virtual Reality. In IEEE International Conference on Multimedia & Expo Workshops (ICMEW).Google ScholarCross Ref
- JMP Van Waveren. 2016. The asynchronous time warp for virtual reality on consumer hardware. In ACM Conference on Virtual Reality Software and Technology (VRST) . Google ScholarDigital Library
- Shengke Wang, Shan Wu, Lianghua Duan, Changyin Yu, Yujuan Sun, and Junyu Dong. 2016. Person Re-Identification with Deep Features and Transfer Learning. In arXiv:1611.05244 .Google Scholar
- Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A Dataset for Exploring User Behaviors in VR Spherical Video Streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference . Google ScholarDigital Library
- Mengbai Xiao, Chao Zhou, Yao Liu, and Songqing Chen. 2017. OpTile: Toward Optimal Tiling in 360-degree Video Streaming. In ACM International Conference on Multimedia (MM). Google ScholarDigital Library
- Lan Xie, Zhimin Xu, Yixuan Ban, Xinggong Zhang, and Zongming Guo. 2017. 360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming. In ACM International Conference on Multimedia (MM). Google ScholarDigital Library
- Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, and Hongxun Yao. 2016. From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks. In ACM International Conference on Multimedia (MM) . Google ScholarDigital Library
Index Terms
- Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction
Recommendations
Learning a Deep Agent to Predict Head Movement in 360-Degree Images
Virtual reality adequately stimulates senses to trick users into accepting the virtual environment. To create a sense of immersion, high-resolution images are required to satisfy human visual system, and low latency is essential for smooth operations, ...
Optimizing 360 video delivery over cellular networks
ATC '16: Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and ChallengesAs an important component of the virtual reality (VR) technology, 360-degree videos provide users with panoramic view and allow them to freely control their viewing direction during video playback. Usually, a player displays only the visible portion of ...
Human Visual Scanpath Prediction Based on RGB-D Saliency
ICIGP '18: Proceedings of the 2018 International Conference on Image and Graphics ProcessingHuman visual perception is considered as a dynamic process of information acquisition, while the visual scanpath can clearly reflect the shift of our eye fixations. In the previous study of visual attention, researchers generally do the saliency ...
Comments