Abstract
Designing a fully integrated 360° video camera supporting 6DoF head motion parallax requires overcoming many technical hurdles, including camera placement, optical design, sensor resolution, system calibration, real-time video capture, depth reconstruction, and real-time novel view synthesis. While there is a large body of work describing various system components, such as multi-view depth estimation, our paper is the first to describe a complete, reproducible system that considers the challenges arising when designing, building, and deploying a full end-to-end 6DoF video camera and playback environment. Our system includes a computational imaging software pipeline supporting online markerless calibration, high-quality reconstruction, and real-time streaming and rendering. Most of our exposition is based on a professional 16-camera configuration, which will be commercially available to film producers. However, our software pipeline is generic and can handle a variety of camera geometries and configurations. The entire calibration and reconstruction software pipeline along with example datasets is open sourced to encourage follow-up research in high-quality 6DoF video reconstruction and rendering 1.
Supplemental Material
Available for Download
Supplemental files.
- Hossein Afshari, Laurent Jacques, Luigi Bagnato, Alexandre Schmid, Pierre Vandergheynst, and Yusuf Leblebici. 2013. The PANOPTIC Camera: A Plenoptic Sensor with Real-Time Omnidirectional Capability. Signal Processing Systems 70, 3 (2013), 305--328.Google ScholarDigital Library
- Sameer Agarwal, Keir Mierle, et al. 2012. Ceres solver. (2012).Google Scholar
- Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Trans. Graph. 35, 6, Article 198 (Nov. 2016), 13 pages.Google ScholarDigital Library
- Murat Aytekin and Michele Rucci. 2012. Motion parallax from microscopic head movements during visual fixation. Vision research 70 (2012), 7--17.Google Scholar
- Luigi Barazzetti, Luigi Mussio, Fabio Remondino, and Marco Scaioni. 2011. Targetless camera calibration. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 38, 5/W16 (2011), 8.Google Scholar
- Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics (ToG), Vol. 28. ACM, 24.Google ScholarDigital Library
- Eric P. Bennett and Leonard McMillan. 2005. Video Enhancement Using Per-pixel Virtual Exposures. ACM Trans. Graph. 24, 3 (July 2005), 845--852.Google ScholarDigital Library
- Tobias Bertel, Neill DF Campbell, and Christian Richardt. 2019. MegaParallax: Casual 360° Panoramas with Motion Parallax. IEEE transactions on visualization and computer graphics 25, 5 (2019), 1828--1835.Google Scholar
- Michael Bleyer, Christoph Rhemann, and Carsten Rother. 2011. PatchMatch Stereo-Stereo Matching with Slanted Support Windows. In Bmvc, Vol. 11. 1--11.Google Scholar
- Michael Bleyer, Carsten Rother, and Pushmeet Kohli. 2010. Surface stereo with soft segmentation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1570--1577.Google Scholar
- Gary Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google Scholar
- Chris Buehler, Michael Bosse, Leonard McMillan, Steven J. Gortler, and Michael F. Cohen. 2001. Unstructured Lumigraph Rendering. In ACM SIGGRAPH 2001 Conference Proceedings, Eugene Fiume (Ed.). ACM Press / ACM SIGGRAPH, 425--432.Google Scholar
- Laurent Caraffa, Jean-Philippe Tarel, and Pierre Charbonnier. 2015. The Guided Bilateral Filter: When the Joint/Cross Bilateral Filter Becomes Robust. IEEE Transactions on Image Processing 24, 4 (April 2015), 1199--1208.Google ScholarDigital Library
- Rohan Chabra, Julian Straub, Christopher Sweeney, Richard Newcombe, and Henry Fuchs. 2019. StereoDRNet: Dilated Residual StereoNet. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- VR Circle. 2018. VR Movies in 360 Degree Virtual Reality. https://www.vrcircle.com/virtual-reality-360-degree-movies/. (2018). Accessed: 2019-05-18.Google Scholar
- Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality Streamable Free-viewpoint Video. ACM Trans. Graph. 34, 4, Article 69 (July 2015), 13 pages.Google ScholarDigital Library
- Antonio Criminisi, Geoffrey Cross, Andrew Blake, and Vladimir Kolmogorov. 2006. Bilayer segmentation of live video. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 1. IEEE, 53--60.Google Scholar
- Ioana Croitoru, Simion-Vlad Bogolin, and Marius Leordeanu. 2019. Unsupervised Learning of Foreground Object Segmentation. International Journal of Computer Vision 127, 9 (01 Sep 2019), 1279--1302.Google ScholarDigital Library
- Disney. 2008. Circle-Vision 360°. https://disney.fandom.com/wiki/Circle-Vision_360. (2008). Accessed: 2019-05-18.Google Scholar
- Disney. 2016. Disney Movies VR. http://www.disneymoviesvr.com/. (2016). Accessed: 2019-05-18.Google Scholar
- Simon Donne and Andreas Geiger. 2019. Learning Non-Volumetric Depth Fusion Using Successive Reprojections. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Suyog Dutt Jain, Bo Xiong, and Kristen Grauman. 2017. FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Facebook. 2016. Facebook Surround 360. https://facebook360.fb.com/. (2016). Accessed: 2016-12-26.Google Scholar
- Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. 1996. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 43--54.Google ScholarDigital Library
- Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (June 2013), 1397--1409.Google ScholarDigital Library
- Eugene Hecht et al. 2002. Optics. Reading, Mass.: Addison-Wesley,.Google Scholar
- Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 36, 6 (2017), 234:1--234:15.Google Scholar
- Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable inside-out image-based rendering. ACM Transactions on Graphics (TOG) 35, 6 (2016), 231.Google ScholarDigital Library
- Carlos Hernandez. 2016. Capture and share VR photos with Cardboard Camera, now on iOS. https://www.blog.google/products/cardboard/cardboard-camera-ios/. (2016).Google Scholar
- Hiroshi Ishiguro, Masashi Yamamoto, and Saburo Tsuji. 1990. Omni-directional stereo for making global map. In Third International Conference on Computer Vision. IEEE, 540--547.Google ScholarCross Ref
- Ehsan Khoramshahi and Eija Honkavaara. 2018. Modelling and automated calibration of a general multi-projective camera. The Photogrammetric Record (Mar 2018), 86--112.Google Scholar
- Vladimir Kolmogorov and Ramin Zabih. 2002. Multi-camera Scene Reconstruction via Graph Cuts. In Proceedings of the 7th European Conference on Computer Vision-Part III (ECCV '02). Springer-Verlag, Berlin, Heidelberg, 82--96.Google ScholarDigital Library
- Robert Konrad, Donald G Dansereau, Aniq Masood, and Gordon Wetzstein. 2017. Spinvr: towards live-streaming 3d virtual reality video. ACM Transactions on Graphics (TOG) 36, 6 (2017), 209.Google ScholarDigital Library
- Johannes Kopf, Michael Cohen, and Richard Szeliski. 2014. First-person Hyperlapse Videos. ACM Transactions on Graphics (Proc. SIGGRAPH 2014) 33, 4 (August 2014).Google Scholar
- Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 31--42.Google ScholarDigital Library
- K-K Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, and Luc Van Gool. 2018. Video object segmentation without temporal information. IEEE transactions on pattern analysis and machine intelligence 41, 6 (2018), 1515--1530.Google Scholar
- Kevin Matzen, Michael F. Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski.Google Scholar
- 2017. Low-cost 360 Stereo Photography and Video Capture. ACM Trans. Graph. 36, 4, Article 148 (July 2017), 12 pages.Google Scholar
- Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines. ACM Trans. Graph. 38, 4, Article 29 (July 2019), 14 pages. Google ScholarDigital Library
- Tim Milliron, Chrissy Szczupak, and Orin Green. 2017. Hallelujah: The World's First Lytro VR Experience. In ACM SIGGRAPH 2017 VR Village (SIGGRAPH '17). ACM, Article 7, 2 pages.Google ScholarDigital Library
- Ryan S. Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A System for Acquiring, Processing, and Rendering Panoramic Light Field Stills for Virtual Reality. ACM Trans. Graph. 37, 6, Article 197 (Dec. 2018), 15 pages.Google ScholarDigital Library
- S. Peleg, M. Ben-Ezra, and Y. Pritch. 2001. Omnistereo: panoramic stereo imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3 (2001), 279--290.Google ScholarDigital Library
- Eric Penner and Li Zhang. 2017. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG) 36, 6 (2017), 235.Google ScholarDigital Library
- Christian Richardt, Yael Pritch, Henning Zimmer, and Alexander Sorkine-Hornung. 2013. Megastereo: Constructing High-Resolution Stereo Panoramas. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013) (2013), 1256--1263.Google Scholar
- Wesley JM Ridgway and Alexei F Cheviakov. 2018. An iterative procedure for finding locally and globally optimal arrangements of particles on the unit sphere. Computer Physics Communications 233 (2018), 84--109.Google ScholarCross Ref
- Nuno Roma, José Santos-Victor, and José Tomé. 2002. A Comparative Analysis Of Cross-Correlation Matching Algorithms Using a Pyramidal Resolution Approach. (05 2002).Google Scholar
- Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In Computer Vision - ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 501--518.Google ScholarCross Ref
- Christopher Schroers, Jean Charles Bazin, and Alexander Sorkine-Hornung. 2018. An Omnistereoscopic Video Pipeline for Capture and Display of Real-World VR. ACM Trans. Graph. 37, 3 (2018), 37:1--37:13. https://dl.acm.org/citation.cfm?id=3225150Google ScholarDigital Library
- Heung-Yeung Shum and Li-Wei He. 1999. Rendering with Concentric Mosaics. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 299--306.Google ScholarDigital Library
- Chester C Slama. 1980. Manual of Photogrammetry. Technical Report. America Society of Photogrammetry,.Google Scholar
- Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation With Multi-plane Images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Jayant Thatte, Jean-Baptiste Boin, Haricharan Lakshman, and Bernd Girod. 2016. Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In IEEE International Conference on Multimedia and Expo, ICME 2016, Seattle, WA, USA, July 11--15, 2016. 1--6.Google ScholarCross Ref
- Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, and Luigi Di Stefano. 2019. Real-Time Self-Adaptive Deep Stereo. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Kentaro Toyama, John Krumm, Barry Brumitt, and Brian Meyers. 1999. Wallflower: Principles and Practice of Background Maintenance. In Seventh International Conference on Computer Vision (ICCV'99). 255--261.Google Scholar
- Bill Triggs, Philip F. McLauchlan, Richard I. Hartley, and Andrew W. Fitzgibbon. 1999. Bundle Adjustment --- A Modern Synthesis. In International Workshop on Vision Algorithms. Springer, 298--372.Google ScholarDigital Library
- Matthew Uyttendaele, Antonio Criminisi, Sing Bing Kang, Simon Winder, Richard Hartley, and Richard Szeliski. 2004. Image-Based Interactive Exploration of Real-World Environments. IEEE Computer Graphics and Applications 24, 3 (May/June 2004), 52--63.Google ScholarDigital Library
- Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction. ACM Trans. Graph. 36, 1, Article 45a (Jan. 2017).Google ScholarDigital Library
- Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (April 2004), 600--612.Google ScholarDigital Library
- Christian Weissig, Oliver Schreer, Peter Eisert, and Peter Kauff. 2012. The ultimate immersive experience: panoramic 3D video acquisition. In International Conference on Multimedia Modeling. Springer, 671--681.Google ScholarDigital Library
- Eric W. Weisstein. 1998. Thomson Problem. (1998). http://mathworld.wolfram.com/ThomsonProblem.html Visited on 19/05/16.Google Scholar
- Oliver Woodford, Philip Torr, Ian Reid, and Andrew Fitzgibbon. 2009. Global Stereo Reconstruction under Second-Order Smoothness Priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 12 (Dec 2009), 2115--2128.Google ScholarDigital Library
- Changchang Wu, B. Clipp, Xiaowei Li, J. Frahm, and M. Pollefeys. 2008. 3D model matching with Viewpoint-Invariant Patches (VIP). In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- Zexiang Xu, Sai Bi, Kalyan Sunkavalli, Sunil Hadap, Hao Su, and Ravi Ramamoorthi. 2019. Deep View Synthesis from Sparse Photometric Images. ACM Trans. Graph. 38, 4, Article 76 (July 2019), 13 pages. Google ScholarDigital Library
- Gengshan Yang, Joshua Manela, Michael Happold, and Deva Ramanan. 2019. Hierarchical Deep Stereo Matching on High-Resolution Images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, and Yong Zhou. 2019a. Video Object Segmentation and Tracking: A Survey. arXiv preprint arXiv:1904.09172 (2019).Google Scholar
- Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019b. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip H.S. Torr. 2019. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Enliang Zheng, Enrique Dunn, Vladimir Jojic, and Jan-Michael Frahm. 2014. PatchMatch Based Joint View Selection and Depthmap Estimation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14). IEEE Computer Society, Washington, DC, USA, 1510--1517.Google ScholarDigital Library
- C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. ACM Transactions on Graphics (Proc. SIGGRAPH 2004) 23, 3 (August 2004), 600--608.Google ScholarDigital Library
Index Terms
- An integrated 6DoF video camera and system design
Recommendations
Stabilization of panoramic videos from mobile multi-camera platforms
Wide field of view panoramic videos have recently become popular due to the availability of high resolution displays. These panoramic videos are generated by stitching video frames captured from a panoramic video acquisition system, typically comprising ...
Live 6DoF Video Production with Stereo Camera
SA '19: SIGGRAPH Asia 2019 XRWe propose a light-weight 6DoF video production pipeline which uses only one stereo camera as input. The subject can move freely in any direction (lateral and depth) as the stereo camera follows to keep the subject within the frame. DeepKeying, our own ...
A Real Time 6DOF Visual SLAM System Using a Monocular Camera
SBR-LARS '12: Proceedings of the 2012 Brazilian Robotics Symposium and Latin American Robotics SymposiumOne of the most important properties that a robot must have in order to be considered autonomous is the ability to localize by itself in an unknown environment, using the information gathered by its sensors. The system uses a cheap web camera, carried ...
Comments