ABSTRACT
We study the problem of predicting the Field-of-Views (FoVs) of viewers watching 360° videos using commodity Head-Mounted Displays (HMDs). Existing solutions either use the viewer's current orientation to approximate the FoVs in the future, or extrapolate future FoVs using the historical orientations and dead-reckoning algorithms. In this paper, we develop fixation prediction networks that concurrently leverage sensor- and content-related features to predict the viewer fixation in the future, which is quite different from the solutions in the literature. The sensor-related features include HMD orientations, while the content-related features include image saliency maps and motion maps. We build a 360° video streaming testbed to HMDs, and recruit twenty-five viewers to watch ten 360° videos. We then train and validate two design alternatives of our proposed networks, which allows us to identify the better-performing design with the optimal parameter settings. Trace-driven simulation results show the merits of our proposed fixation prediction networks compared to the existing solutions, including: (i) lower consumed bandwidth, (ii) shorter initial buffering time, and (iii) short running time.
- 2016. Augmented Virtual Reality revenue forecast revised to hit $120 billion by 2020. (2016). https://goo.gl/nw9mtP.Google Scholar
- 2016. Global 360-Degree Camera Market 2016-2020. (2016). https://goo.gl/zJCdnO.Google Scholar
- T. Alshawi, Z. Long, and G. AlRegib. 2016. Understanding spatial correlation in eye-fixation maps for visual attention in videos. In Proc. of IEEE International Conference on Multimedia and Expo (ICME'16). 1--6.Google Scholar
- A. Borji, M. Cheng, H. Jiang, and J. Li. 2014. Salient object detection: A survey. arXiv preprint arXiv:1411.5878 (2014).Google Scholar
- L. Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proc. of International Conference on Computational Statistics (COMPSTAT'10). 177--186.Google ScholarCross Ref
- S. Chaabouni, J. Benois-Pineau, and C. Amar. 2016. Transfer learning with deep networks for saliency prediction in natural video. In Proc. of IEEE International Conference on Image Processing (ICIP'16). 1604--1608.Google Scholar
- C. Chang, C. Hsu, C. Hsu, and K. Chen. 2016. Performance measurements of virtual reality systems: Quantifying the timing and positioning accuracy. In Proc. of ACM Conference on Multimedia (MM'16). 655--659. Google ScholarDigital Library
- M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2016. A Deep Multi-Level Network for Saliency Prediction. In International Conference on Pattern Recognition (ICPR'16). 3488--3493.Google Scholar
- T. El-Ganainy and M. Hefeeda. 2016. Streaming Virtual Reality Content. arXiv preprint arXiv:1612.08350 (2016).Google Scholar
- S. Friston and A. Steed. 2014. Measuring latency in virtual environments. Transactions on Visualization and Computer Graphics 20, 4 (2014), 616--625. Google ScholarDigital Library
- V Gaddam, M. Riegler, R. Eg, C. Griwodz, and P. Halvorsen. 2016. Tiling in Interactive Panoramic Video: Approaches and Evaluation. IEEE Transactions on Multimedia 18, 9 (2016), 1819--1831. Google ScholarDigital Library
- R. Guntur and W. Ooi. 2012. On tile assignment for region-of-interest video streaming in a wireless LAN. In Proc. of ACM international workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV'12). 59--64. Google ScholarDigital Library
- S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Chun-Ying Huang, Kuan-Ta Chen, De-Yu Chen, Hwai-Jung Hsu, and Cheng-Hsin Hsu. 2014. GamingAnywhere: The First Open Source Cloud Gaming System. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 1 (2014). Google ScholarDigital Library
- T. Judd, K. Ehinger, F. Durand, and A. Torralba. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV'09). 2106--2113.Google Scholar
- Y. Kavak, E. Erdem, and A. Erdem. 2017. A comparative study for feature integration strategies in dynamic saliency estimation. Signal Processing: Image Communication 51 (2017), 13--25. Google ScholarDigital Library
- H. Kimata, D. Ochi, A. Kameda, H. Noto, K. Fukazawa, and A. Kojima. 2012. Mobile and multi-device interactive panorama video distribution system. In Proc. of IEEE Global Conference on Consumer Electronics (GCCE'12). 574--578.Google Scholar
- B. Lucas and T. Kanade. 1981. An iterative image registration technique with an application to stereo vision. In Proc. of the International Joint Conference on Artificial Intelligence. 674--679. Google ScholarDigital Library
- H. Lakshman M. Yu and B. Girod. 2015. A Framework to Evaluate Omnidirectional Video Coding Schemes. In IEEE International Symposium on Mixed and Augmented Reality. 31--36. Google ScholarDigital Library
- A. Mavlankar and B. Girod. 2009. Pre-fetching based on video analysis for interactive region-of-interest streaming of soccer sequences. In Proc. of IEEE International Conference on Image Processing (ICIP'09). 3061--3064. Google ScholarDigital Library
- A. Mavlankar and B. Girod. 2010. Video streaming with interactive pan/tilt/zoom. In Signals and Communication Technology. 431--455.Google Scholar
- T. Nguyen, M. Xu, G. Gao, M. Kankanhalli, Q. Tian, and S. Yan. 2013. Static saliency vs. dynamic saliency: a comparative study. In Proc. of ACM International Conference on Multimedia (MM'13). 987--996. Google ScholarDigital Library
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- K. Skarseth, H. Bjørlo, P. Halvorsen, M. Riegler, and C. Griwodz. 2016. OpenVQ: a video quality assessment toolkit. In Proc. of ACM International Conference on Multimedia (MM'16), OSSC paper. 1197--1200. Google ScholarDigital Library
- I. Sodagar. 2011. The mpeg-dash standard for multimedia streaming over the internet. IEEE MultiMedia 18, 4 (2011), 62--67. Google ScholarDigital Library
- E. Vig, M. Dorr, and D. Cox. 2014. Large-scale optimization of hierarchical features for saliency prediction in natural images. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14). 2798--2805. Google ScholarDigital Library
- G. Simon X. Corbillon, A. Devlic and J. Chakareski. 2017. Viewport-Adaptive Navigable 360-Degree Video Delivery. In IEEE International Conference on Communications (ICC'17). Accepted to appear.Google Scholar
- M. Young, G. Gaylor, S. Andrus, and B. Bodenheimer. 2014. A comparison of two cost-differentiated virtual reality systems for perception and action tasks. In Proc. of the ACM Symposium on Applied Perception. 83--90. Google ScholarDigital Library
Index Terms
- Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality
Recommendations
360° Video Viewing Dataset in Head-Mounted Virtual Reality
MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference360° videos and Head-Mounted Displays (HMDs) are getting increasingly popular. However, streaming 360° videos to HMDs is challenging. This is because only video content in viewers' Field-of-Views (FoVs) is rendered, and thus sending complete 360° videos ...
Don’t make me sick: investigating the incidence of cybersickness in commercial virtual reality headsets
AbstractThe resurgence of interest in the use of virtual reality (VR) technology for research and entertainment purposes has led to an increase in concerns about human factor issues inherent in VR technology. One issue that has received a great deal of ...
Thinning trainer based on forest-growth model, virtual reality and computer-aided virtual environment
Immersive virtual reality is applied in many human activities. This virtual reality can be used as a training tool for thinning operations in forests. The aim of this study is to describe the complex solution of Thinning Trainer, that we developed. This ...
Comments