research-article

Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction

Authors:
Anh Nguyen

Georgia State University, Atlanta, GA, USA

Georgia State University, Atlanta, GA, USA
View Profile

,
Zhisheng Yan

Georgia State University, Atlanta, GA, USA

Georgia State University, Atlanta, GA, USA
View Profile

,
Klara Nahrstedt

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 1190–1198https://doi.org/10.1145/3240508.3240669

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1190–1198

ABSTRACT

Head movement prediction is the key enabler for the emerging 360-degree videos since it can enhance both streaming and rendering efficiency. To achieve accurate head movement prediction, it becomes imperative to understand user's visual attention on 360-degree videos under head-mounted display (HMD). Despite the rich history of saliency detection research, we observe that traditional models are designed for regular images/videos fixed at a single viewport and would introduce problems such as central bias and multi-object confusion when applied to the multi-viewport 360-degree videos switched by user interaction. To fill in this gap, this paper shifts the traditional single-viewport saliency models that have been extensively studied for decades to a fresh panoramic saliency detection specifically tailored for 360-degree videos, and thus maximally enhances the head movement prediction performance. The proposed head movement prediction framework is empowered by a newly created dataset for 360-degree video saliency, a panoramic saliency detection model and an integration of saliency and head tracking history for the ultimate head movement prediction. Experimental results demonstrate the measurable gain of both the proposed panoramic saliency detection and head movement prediction over traditional models for regular images/videos.

References

Ana De Abreu, Cagri Ozcinar, and Aljosa Smolic. 2017. Look Around You: Saliency Maps for Omnidirectional Images in VR Applications. In IEEE International Conference on Quality of Multimedia Experience (QoMEX).Google Scholar
A Deniz Aladagli, Erhan Ekmekcioglu, Dmitri Jarnikov, and Ahmet Kondoz. 2017. Predicting Head Trajectories in 360$^circ$Virtual Reality Videos. In IEEE International Conference on 3D Immersion (IC3D) .Google ScholarCross Ref
Marc Assens, Xavier Girotext-i-Nieto, Kevin McGuinness, and Noel E. Otext'Connor. 2017. SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes. In IEEE International Conference on Computer Vision Workshop (ICCVW).Google Scholar
Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Fredo Durand. 2018. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Trans. Pattern Anal. Mach. Intell. , Vol. PP (March 2018), 1--1.Google Scholar
Xavier Corbillon, Gwendal Simon, Alisa Devlic, and Jacob Chakareski. 2017a. Viewport-adaptive Navigable 360-degree Video Delivery. In IEEE International Conference on Communications (ICC).Google ScholarCross Ref
Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017b. 360-Degree Video Head Movement Dataset. In Proceedings of the 8th ACM on Multimedia Systems Conference . Google ScholarDigital Library
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In International Conference on Pattern Recognition (ICPR).Google ScholarCross Ref
International Data Corporation. https://goo.gl/DHb26g. Worldwide Quarterly Augmented and Virtual Reality Headset Tracker. ( https://goo.gl/DHb26g).Google Scholar
Fanyi Duanmu, Eymen Kurdoglu, S Amir Hosseini, Yong Liu, and Yao Wang. 2017. Prioritized Buffer Control in Two-tier 360 Video Streaming. In ACM Workshop on Virtual Reality and Augmented Reality Network . Google ScholarDigital Library
Ching-Ling Fan, Jean Lee, Wen-Chih Lo, Chun-Ying Huang, Kuan-Ta Chen, and Cheng-Hsin Hsu. 2017. Fixation Prediction for 360 Video Streaming in Head-Mounted Virtual Reality. In ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV) . Google ScholarDigital Library
Yuming Fang, Weisi Lin, Zhenzhong Chen, Chia-Ming Tsai, and Chia-Wen Lin. 2012. Video Saliency Detection in the Compressed Domain. ACM International Conference on Multimedia (MM) . Google ScholarDigital Library
Yu Fang, Ryoichi Nakashima, Kazumichi Matsumiya, Ichiro Kuriki, and Satoshi Shioiri. 2015. Eye-head coordination for visual cognitive processing. In PLoS ONE 10(3): e0121035.Google ScholarCross Ref
Wenzhong Guo, Xiaolong Sun, and Yuzhen Niu. 2015. Evaluation of visual saliency analysis algorithms in noisy images. IET Computer Vision , Vol. 9(2) (2015), 290--299.Google Scholar
Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. 2015. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In ICCV . Google ScholarDigital Library
Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Keras. https://keras.io. Keras: The Python Deep Learning library. ( https://keras.io).Google Scholar
Wolf Kienzle, Bernhard Scholkopf, Felix Wichmann, and Matthias Franz. 2007. How to Find Interesting Locations in Video: A Spatiotemporal Interest Point Detector Learned from Human Eye Movements. In Hamprecht F.A., Schnorr C., Jahne B. (eds) Pattern Recognition. DAGM 2007. Lecture Notes in Computer Science . Google ScholarDigital Library
J. B. Kuipers. 1999. Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace, and Virtual Reality .Princeton University Press, Princeton, USA.Google Scholar
Matthias Kummerer, Lucas Theis, and Matthias Bethge. 2015. Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet. In International Conference on Learning Representations (ICLR) .Google Scholar
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient transfer learning. In 5th International Conference on Learning Representations (ICLR).Google Scholar
Rafael Monroy, Sebastian Lutz, Tejo Chalasani, and Aljosa Smolic. 2017. SalNet360: Saliency Maps for omni-directional images with CNN. In arXiv preprint arXiv:1709.06505v1 .Google Scholar
Afshin Taghavi Nasrabadi, Anahita Mahzari, Joseph D. Beshay, and Ravi Prakash. 2017. Adaptive 360-Degree Video Streaming using Scalable Video Coding. In ACM International Conference on Multimedia (MM). Google ScholarDigital Library
Tam V Nguyen, Mengdi Xu, Guangyu Gao, Mohan Kankanhalli, Qi Tian, and Shuicheng Yan. 2013. Static Saliency vs. Dynamic Saliency: A Comparative Study. In ACM International Conference on Multimedia (MM) . Google ScholarDigital Library
Yuzhen Niu, Lingling Ke, and Wenzhong Guo. 2016. Evaluation of visual saliency analysis algorithms in noisy images. Machine Vision and Applications , Vol. 27(6) (2016), 915--927.Google ScholarCross Ref
Yuzhen Niu, Lening Lin, Yuzhong Chen, and Lingling Ke. {n. d.}. Machine learning-based framework for saliency detection in distorted images. Multimedia Tools Application ({n. d.}). Google ScholarDigital Library
Yuzhen Niu, Wenqi Lin, and Xiao Ke. 2018. CF-based optimisation for saliency detection. IET Computer Vision , Vol. 12(4) (2018), 365--376.Google ScholarCross Ref
Junting Pan, Elisa Sayrol, Xavier Giro-i Nieto, Kevin McGuinness, and Noel E O'Connor. 2016. Shallow and Deep Convolutional Networks for Saliency Prediction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Google Scholar
Feng Qian, Lusheng Ji, Bo Han, and Vijay Gopalakrishnan. 2016. Optimizing 360 video delivery over cellular networks. In ACM Workshop on All Things Cellular: Operations, Applications and Challenges . Google ScholarDigital Library
Hasim Sak, Andrew Senior, and Francoise Beaufays. 2014. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In 15th Annual Conference of the International Speech Communication Association (INTERSPEECH) .Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. In arXiv preprint arXiv:1409.1556, 2014 .Google Scholar
Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2018. Saliency in VR: How do People Explore Virtual Environments? IEEE Trans. Vis. Comput. Graphics , Vol. 24 (April 2018), 1633--1642. Google ScholarDigital Library
Benjamin W Tatler. 2007. The Central Fixation Bias in Scene Viewing: Selecting an Optimal Viewing Position Independently of Motor Biases and Image Feature Distributions. Journal of Vision , Vol. 7 (Nov. 2007), 1--17.Google ScholarCross Ref
Technavio. https://goo.gl/zJCdnO. Global 360-Degree Camera Market 2016--2020. ( https://goo.gl/zJCdnO).Google Scholar
Evgeniy Upenik and Touradj Ebrahimi. 2017. A Simple Method to Obtain Visual Attention Data in Head Mounted Virtual Reality. In IEEE International Conference on Multimedia & Expo Workshops (ICMEW).Google ScholarCross Ref
JMP Van Waveren. 2016. The asynchronous time warp for virtual reality on consumer hardware. In ACM Conference on Virtual Reality Software and Technology (VRST) . Google ScholarDigital Library
Shengke Wang, Shan Wu, Lianghua Duan, Changyin Yu, Yujuan Sun, and Junyu Dong. 2016. Person Re-Identification with Deep Features and Transfer Learning. In arXiv:1611.05244 .Google Scholar
Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A Dataset for Exploring User Behaviors in VR Spherical Video Streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference . Google ScholarDigital Library
Mengbai Xiao, Chao Zhou, Yao Liu, and Songqing Chen. 2017. OpTile: Toward Optimal Tiling in 360-degree Video Streaming. In ACM International Conference on Multimedia (MM). Google ScholarDigital Library
Lan Xie, Zhimin Xu, Yixuan Ban, Xinggong Zhang, and Zongming Guo. 2017. 360ProbDASH: Improving QoE of 360 Video Streaming Using Tile-based HTTP Adaptive Streaming. In ACM International Conference on Multimedia (MM). Google ScholarDigital Library
Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, and Hongxun Yao. 2016. From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks. In ACM International Conference on Multimedia (MM) . Google ScholarDigital Library

Index Terms

Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction
1. Information systems
  1. Information systems applications
    1. Mobile information processing systems

Recommendations

Learning a Deep Agent to Predict Head Movement in 360-Degree Images

Virtual reality adequately stimulates senses to trick users into accepting the virtual environment. To create a sense of immersion, high-resolution images are required to satisfy human visual system, and low latency is essential for smooth operations, ...
Read More
Optimizing 360 video delivery over cellular networks
ATC '16: Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges

As an important component of the virtual reality (VR) technology, 360-degree videos provide users with panoramic view and allow them to freely control their viewing direction during video playback. Usually, a player displays only the visible portion of ...
Read More
Human Visual Scanpath Prediction Based on RGB-D Saliency
ICIGP '18: Proceedings of the 2018 International Conference on Image and Graphics Processing

Human visual perception is considered as a dynamic process of information acquisition, while the visual scanpath can clearly reflect the shift of our eye fixations. In the previous study of visual attention, researchers generally do the saliency ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
360-degree video
head movement prediction
saliency
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 105
  Total Citations
  View Citations
- 1,621
  Total Downloads
- Downloads (Last 12 months)204
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning a Deep Agent to Predict Head Movement in 360-Degree Images

Optimizing 360 video delivery over cellular networks

Human Visual Scanpath Prediction Based on RGB-D Saliency