research-article

Deep blending for free-viewpoint image-based rendering

Authors:
Peter Hedman

University College London

University College London
View Profile

,
Julien Philip

Inria, Université Côte d'Azur

Inria, Université Côte d'Azur
View Profile

,
True Price

UNC Chapel Hill

UNC Chapel Hill
View Profile

,
Jan-Michael Frahm

UNC Chapel Hill

UNC Chapel Hill
View Profile

,
George Drettakis

Inria, Université Côte d'Azur

Inria, Université Côte d'Azur
View Profile

,
Gabriel Brostow

University College London / Niantic

University College London / Niantic
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 37 Issue 6Article No.: 257pp 1–15https://doi.org/10.1145/3272127.3275084

Published:04 December 2018Publication History

ACM Transactions on Graphics

Abstract

Free-viewpoint image-based rendering (IBR) is a standing challenge. IBR methods combine warped versions of input photos to synthesize a novel view. The image quality of this combination is directly affected by geometric inaccuracies of multi-view stereo (MVS) reconstruction and by view- and image-dependent effects that produce artifacts when contributions from different input views are blended. We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions. Our Deep Blending method requires us to address several challenges to achieve our goal of interactive free-viewpoint IBR navigation. We first need to provide sufficiently accurate geometry so the Convolutional Neural Network (CNN) can succeed in finding correct blending weights. We do this by combining two different MVS reconstructions with complementary accuracy vs. completeness tradeoffs. To tightly integrate learning in an interactive IBR system, we need to adapt our rendering algorithm to produce a fixed number of input layers that can then be blended by the CNN. We generate training data with a variety of captured scenes, using each input photo as ground truth in a held-out approach. We also design the network architecture and the training loss to provide high quality novel view synthesis, while reducing temporal flickering artifacts. Our results demonstrate free-viewpoint IBR in a wide variety of scenes, clearly surpassing previous methods in visual quality, especially when moving far from the input cameras.

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.Google Scholar
Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernandez Esteban, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Transactions on Graphics (TOG) 35, 6 (2016). Google ScholarDigital Library
Murat Arikan, Reinhold Preiner, Claus Scheiblauer, Stefan Jeschke, and Michael Wimmer. 2014. Large-scale point-cloud visualization through localized textured surface reconstruction. IEEE Trans. Vis. Comput. Graphics 20, 9 (2014), 1280--1292.Google ScholarCross Ref
Murat Arikan, Reinhold Preiner, and Michael Wimmer. 2016. Multi-Depth-Map Raytracing for Efficient Large-Scene Reconstruction. IEEE Trans. Vis. Comput. Graphics 22, 2 (2016), 1127--1137. Google ScholarDigital Library
Steve Bako, Thijs Vogels, Brian McWilliams, Mark Meyer, Jan Novák, Alex Harvill, Pradeep Sen, Tony DeRose, and Fabrice Rousselle. 2017. Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Transactions on Graphics (TOG) 36, 4 (2017), 97. Google ScholarDigital Library
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. Patch-Match: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (TOG) 28, 3 (2009), 24. Google ScholarDigital Library
Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In SIGGRAPH. Google ScholarDigital Library
Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG) 32, 3 (2013), 30. Google ScholarDigital Library
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In Proceedings of the NIPS Workshop on Deep Learning and Representation Learning.Google Scholar
Abe Davis, Marc Levoy, and Fredo Durand. 2012. Unstructured light fields. In Computer Graphics Forum, Vol. 31. Wiley Online Library, 305--314. Google ScholarDigital Library
Martin Eisemann, Bert De Decker, Marcus Magnor, Philippe Bekaert, Edilson de Aguiar, Naveed Ahmed, Christian Theobalt, and Anita Sellent. 2008. Floating Textures. Comp. Graph. Forum (2008).Google Scholar
Jihad El-Sana and Amitabh Varshney. 1999. Generalized view-dependent simplification. In Computer Graphics Forum, Vol. 18. Wiley Online Library, 83--94.Google Scholar
Andrew Fitzgibbon, Yonatan Wexler, and Andrew Zisserman. 2005. Image-based rendering using image-based priors. International Journal of Computer Vision 63, 2 (2005), 141--151. Google ScholarDigital Library
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world's imagery. In Computer Vision and Pattern Recognition (CVPR). 5515--5524.Google Scholar
Yasutaka Furukawa and Jean Ponce. 2010. Accurate, Dense, and Robust Multi-View Stereopsis. IEEE Trans. PAMI (2010). Google ScholarDigital Library
Michael Garland and Paul S Heckbert. 1997. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 209--216. Google ScholarDigital Library
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M Seitz. 2007. Multi-view stereo for community photo collections. In International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. 1996. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 43--54. Google ScholarDigital Library
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015).Google ScholarDigital Library
Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Guided image filtering. In European Conference on Computer Vision (ECCV). Springer, 1--14.Google ScholarCross Ref
Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D photography. ACM Transactions on Graphics (TOG) 36, 6 (2017), 234. Google ScholarDigital Library
Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable inside-out image-based rendering. ACM Transactions on Graphics (TOG) 35, 6 (2016), 231. Google ScholarDigital Library
Benno Heigl, Reinhard Koch, Marc Pollefeys, Joachim Denzler, and Luc Van Gool. 1999. Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In Mustererkennung 1999. Springer, 94--101. Google ScholarDigital Library
Heiko Hirschmuller. 2006. Stereo Vision in Structured Environments by Consistent Semi-Global Matching. In Computer Vision and Patter Recognition (CVPR). 2386--2393. Google ScholarDigital Library
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint abs/1704.04861 (2017).Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-To-Image Translation With Conditional Adversarial Networks. In CVPR.Google Scholar
Michal Jancosek and Tomás Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In Computer Vision and Pattern Recognition (CVPR). IEEE, 3121--3128. Google ScholarDigital Library
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision (ECCV). Springer, 694--711.Google ScholarCross Ref
Nima Khademi Kalantari and Ravi Ramamoorthi. 2017. Deep high dynamic range imaging of dynamic scenes. ACM Transactions on Graphics (TOG) 36, 4 (2017), 144. Google ScholarDigital Library
Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics (TOG) 35, 6 (2016), 193. Google ScholarDigital Library
Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. arXiv preprint abs/1511.06530 (2015).Google Scholar
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (TOG) 36, 4 (2017), 78. Google ScholarDigital Library
Johannes Kopf, Michael F. Cohen, and Richard Szeliski. 2014a. First-person Hyper-lapse Videos. ACM Transactions on Graphics (TOG) 33, 4, Article 78 (July 2014), 10 pages. Google ScholarDigital Library
Johannes Kopf, Michael F Cohen, and Richard Szeliski. 2014b. First-person hyper-lapse videos. ACM Transactions on Graphics (TOG) 33, 4 (2014), 78. Google ScholarDigital Library
Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2007. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In International Conference on Computer Vision (ICCV). IEEE, 1--8.Google ScholarCross Ref
Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 31--42. Google ScholarDigital Library
Wenfeng Li and Baoxin Li. 2008. Joint Conditional Random Field of multiple views with online learning for image-based rendering. In Computer Vision and Pattern Recognition. IEEE.Google Scholar
Zhengqi Li and Noah Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
David Luebke and Carl Erikson. 1997. View-dependent simplification of arbitrary polygonal environments. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 199--208. Google ScholarDigital Library
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning. arXiv preprint abs/1611.06440 (2016).Google Scholar
Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, and Tobias Ritschel. 2017. Deep Shading: Convolutional Neural Networks for Screen-Space Shading. 36, 4 (2017). Google ScholarDigital Library
Rodrigo Ortiz-Cayon, Abdelaziz Djelouah, and George Drettakis. 2015. A Bayesian Approach for Selective Image-Based Rendering using Superpixels. In International Conference on 3D Vision (3DV). Lyon, France. https://hal.inria.fr/hal-01207907Google Scholar
Philippe Pébay and Timothy Baker. 2003. Analysis of triangle quality measures. Math. Comp. 72, 244 (2003), 1817--1839. Google ScholarDigital Library
Eric Penner and Li Zhang. 2017. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG) 36, 6 (2017), 235. Google ScholarDigital Library
Sergi Pujades, Frédéric Devernay, and Bastian Goldluecke. 2014. Bayesian view synthesis and image-based rendering principles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3906--3913. Google ScholarDigital Library
Kari Pulli, Hugues Hoppe, Michael Cohen, Linda Shapiro, Tom Duchamp, and Werner Stuetzle. 1997. View-based rendering: Visualizing real objects from scanned range and color data. In Rendering techniques? 97. Springer, 23--34. Google ScholarDigital Library
CapturingReality RealityCapture. 2016. RealityCapture. http://capturingreality.comGoogle Scholar
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention (MICCAI). Springer, 234--241.Google Scholar
Daniel Scharstein and Richard Szeliski. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1--3 (2002), 7--42. Google ScholarDigital Library
Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV).Google Scholar
Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
Pratul P Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to synthesize a 4d rgbd light field from a single image. In International Conference on Computer Vision (ICCV), Vol. 2. 6.Google ScholarCross Ref
Benjamin Ummenhofer and Thomas Brox. 2017. Global, Dense Multiscale Reconstruction for a Billion Points. International Journal of Computer Vision 125, 1 (2017), 82--94. Google ScholarDigital Library
Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view prediction error for 3D reconstruction. ACM Transactions on Graphics (TOG) 36, 1 (2017), 8. Google ScholarDigital Library
Oliver Woodford and Andrew W Fitzgibbon. 2005. Fast image-based rendering using hierarchical image-based priors. In BMVC, Vol. 1. 260--269.Google Scholar
Oliver J Woodford, Ian D Reid, and Andrew W Fitzgibbon. 2007. Efficient new-view synthesis using pairwise dictionary priors. In Computer Vision and Pattern Recognition (CVPR). IEEE, 1--8.Google Scholar
Oliver J Woodford, Ian D Reid, Philip HS Torr, and Andrew W Fitzgibbon. 2006. Fields of Experts for Image-based Rendering.. In BMVC, Vol. 3. 1109--1108.Google Scholar
Ke Colin Zheng, Alex Colburn, Aseem Agarwala, Maneesh Agrawala, David Salesin, Brian Curless, and Michael F Cohen. 2009. Parallax photography: creating 3d cinematic effects from stills. In Proceedings of Graphics Interface 2009. Canadian Information Processing Society, 111--118. Google ScholarDigital Library
Ke Colin Zheng, Sing Bing Kang, Michael F Cohen, and Richard Szeliski. 2007. Layered depth panoramas. In Computer Vision and Pattern Recognition (CVPR). IEEE, 1--8.Google Scholar
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In European Conference on Computer Vision (ECCV). Springer, 286--301.Google ScholarCross Ref
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. 1 (2017).Google ScholarCross Ref
C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM transactions on graphics (TOG), Vol. 23. ACM, 600--608. Google ScholarDigital Library

Index Terms

Deep blending for free-viewpoint image-based rendering
1. Computing methodologies

Recommendations

Scalable inside-out image-based rendering

Our aim is to give users real-time free-viewpoint rendering of real indoor scenes, captured with off-the-shelf equipment such as a high-quality color camera and a commodity depth sensor. Image-based Rendering (IBR) can provide the realistic imagery ...
Read More
Image-based rendering of diffuse, specular and glossy surfaces from a single image
SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques

In this paper, we present a new method to recover an approximation of the bidirectional reflectance distribution function (BRDF) of the surfaces present in a real scene. This is done from a single photograph and a 3D geometric model of the scene. The ...
Read More
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
We introduce a neural relighting algorithm for captured indoors scenes, that allows interactive free-viewpoint navigation. Our method allows illumination to be changed synthetically, while coherently rendering cast shadows and complex glossy materials. We ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 37, Issue 6
December 2018
1401 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3272127
Editor:
Takeo Igarashi
The University of Tokyo, Japan
Issue’s Table of Contents
Copyright © 2018 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2018
Published in tog Volume 37, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
free-viewpoint
image-based rendering
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 172
  Total Citations
  View Citations
- 1,827
  Total Downloads
- Downloads (Last 12 months)206
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep blending for free-viewpoint image-based rendering

ACM Transactions on Graphics

Abstract

References

Cited By

Index Terms

Recommendations

Scalable inside-out image-based rendering

Image-based rendering of diffuse, specular and glossy surfaces from a single image

Free-viewpoint Indoor Neural Relighting from Multi-view Stereo