Abstract
Free-viewpoint image-based rendering (IBR) is a standing challenge. IBR methods combine warped versions of input photos to synthesize a novel view. The image quality of this combination is directly affected by geometric inaccuracies of multi-view stereo (MVS) reconstruction and by view- and image-dependent effects that produce artifacts when contributions from different input views are blended. We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions. Our Deep Blending method requires us to address several challenges to achieve our goal of interactive free-viewpoint IBR navigation. We first need to provide sufficiently accurate geometry so the Convolutional Neural Network (CNN) can succeed in finding correct blending weights. We do this by combining two different MVS reconstructions with complementary accuracy vs. completeness tradeoffs. To tightly integrate learning in an interactive IBR system, we need to adapt our rendering algorithm to produce a fixed number of input layers that can then be blended by the CNN. We generate training data with a variety of captured scenes, using each input photo as ground truth in a held-out approach. We also design the network architecture and the training loss to provide high quality novel view synthesis, while reducing temporal flickering artifacts. Our results demonstrate free-viewpoint IBR in a wide variety of scenes, clearly surpassing previous methods in visual quality, especially when moving far from the input cameras.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.Google Scholar
- Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernandez Esteban, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Transactions on Graphics (TOG) 35, 6 (2016). Google ScholarDigital Library
- Murat Arikan, Reinhold Preiner, Claus Scheiblauer, Stefan Jeschke, and Michael Wimmer. 2014. Large-scale point-cloud visualization through localized textured surface reconstruction. IEEE Trans. Vis. Comput. Graphics 20, 9 (2014), 1280--1292.Google ScholarCross Ref
- Murat Arikan, Reinhold Preiner, and Michael Wimmer. 2016. Multi-Depth-Map Raytracing for Efficient Large-Scene Reconstruction. IEEE Trans. Vis. Comput. Graphics 22, 2 (2016), 1127--1137. Google ScholarDigital Library
- Steve Bako, Thijs Vogels, Brian McWilliams, Mark Meyer, Jan Novák, Alex Harvill, Pradeep Sen, Tony DeRose, and Fabrice Rousselle. 2017. Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Transactions on Graphics (TOG) 36, 4 (2017), 97. Google ScholarDigital Library
- Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. Patch-Match: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (TOG) 28, 3 (2009), 24. Google ScholarDigital Library
- Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In SIGGRAPH. Google ScholarDigital Library
- Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG) 32, 3 (2013), 30. Google ScholarDigital Library
- Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In Proceedings of the NIPS Workshop on Deep Learning and Representation Learning.Google Scholar
- Abe Davis, Marc Levoy, and Fredo Durand. 2012. Unstructured light fields. In Computer Graphics Forum, Vol. 31. Wiley Online Library, 305--314. Google ScholarDigital Library
- Martin Eisemann, Bert De Decker, Marcus Magnor, Philippe Bekaert, Edilson de Aguiar, Naveed Ahmed, Christian Theobalt, and Anita Sellent. 2008. Floating Textures. Comp. Graph. Forum (2008).Google Scholar
- Jihad El-Sana and Amitabh Varshney. 1999. Generalized view-dependent simplification. In Computer Graphics Forum, Vol. 18. Wiley Online Library, 83--94.Google Scholar
- Andrew Fitzgibbon, Yonatan Wexler, and Andrew Zisserman. 2005. Image-based rendering using image-based priors. International Journal of Computer Vision 63, 2 (2005), 141--151. Google ScholarDigital Library
- John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world's imagery. In Computer Vision and Pattern Recognition (CVPR). 5515--5524.Google Scholar
- Yasutaka Furukawa and Jean Ponce. 2010. Accurate, Dense, and Robust Multi-View Stereopsis. IEEE Trans. PAMI (2010). Google ScholarDigital Library
- Michael Garland and Paul S Heckbert. 1997. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 209--216. Google ScholarDigital Library
- Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M Seitz. 2007. Multi-view stereo for community photo collections. In International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. 1996. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 43--54. Google ScholarDigital Library
- Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015).Google ScholarDigital Library
- Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Guided image filtering. In European Conference on Computer Vision (ECCV). Springer, 1--14.Google ScholarCross Ref
- Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D photography. ACM Transactions on Graphics (TOG) 36, 6 (2017), 234. Google ScholarDigital Library
- Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable inside-out image-based rendering. ACM Transactions on Graphics (TOG) 35, 6 (2016), 231. Google ScholarDigital Library
- Benno Heigl, Reinhard Koch, Marc Pollefeys, Joachim Denzler, and Luc Van Gool. 1999. Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In Mustererkennung 1999. Springer, 94--101. Google ScholarDigital Library
- Heiko Hirschmuller. 2006. Stereo Vision in Structured Environments by Consistent Semi-Global Matching. In Computer Vision and Patter Recognition (CVPR). 2386--2393. Google ScholarDigital Library
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint abs/1704.04861 (2017).Google Scholar
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-To-Image Translation With Conditional Adversarial Networks. In CVPR.Google Scholar
- Michal Jancosek and Tomás Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In Computer Vision and Pattern Recognition (CVPR). IEEE, 3121--3128. Google ScholarDigital Library
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision (ECCV). Springer, 694--711.Google ScholarCross Ref
- Nima Khademi Kalantari and Ravi Ramamoorthi. 2017. Deep high dynamic range imaging of dynamic scenes. ACM Transactions on Graphics (TOG) 36, 4 (2017), 144. Google ScholarDigital Library
- Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics (TOG) 35, 6 (2016), 193. Google ScholarDigital Library
- Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. arXiv preprint abs/1511.06530 (2015).Google Scholar
- Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (TOG) 36, 4 (2017), 78. Google ScholarDigital Library
- Johannes Kopf, Michael F. Cohen, and Richard Szeliski. 2014a. First-person Hyper-lapse Videos. ACM Transactions on Graphics (TOG) 33, 4, Article 78 (July 2014), 10 pages. Google ScholarDigital Library
- Johannes Kopf, Michael F Cohen, and Richard Szeliski. 2014b. First-person hyper-lapse videos. ACM Transactions on Graphics (TOG) 33, 4 (2014), 78. Google ScholarDigital Library
- Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2007. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In International Conference on Computer Vision (ICCV). IEEE, 1--8.Google ScholarCross Ref
- Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 31--42. Google ScholarDigital Library
- Wenfeng Li and Baoxin Li. 2008. Joint Conditional Random Field of multiple views with online learning for image-based rendering. In Computer Vision and Pattern Recognition. IEEE.Google Scholar
- Zhengqi Li and Noah Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
- David Luebke and Carl Erikson. 1997. View-dependent simplification of arbitrary polygonal environments. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 199--208. Google ScholarDigital Library
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning. arXiv preprint abs/1611.06440 (2016).Google Scholar
- Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, and Tobias Ritschel. 2017. Deep Shading: Convolutional Neural Networks for Screen-Space Shading. 36, 4 (2017). Google ScholarDigital Library
- Rodrigo Ortiz-Cayon, Abdelaziz Djelouah, and George Drettakis. 2015. A Bayesian Approach for Selective Image-Based Rendering using Superpixels. In International Conference on 3D Vision (3DV). Lyon, France. https://hal.inria.fr/hal-01207907Google Scholar
- Philippe Pébay and Timothy Baker. 2003. Analysis of triangle quality measures. Math. Comp. 72, 244 (2003), 1817--1839. Google ScholarDigital Library
- Eric Penner and Li Zhang. 2017. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG) 36, 6 (2017), 235. Google ScholarDigital Library
- Sergi Pujades, Frédéric Devernay, and Bastian Goldluecke. 2014. Bayesian view synthesis and image-based rendering principles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3906--3913. Google ScholarDigital Library
- Kari Pulli, Hugues Hoppe, Michael Cohen, Linda Shapiro, Tom Duchamp, and Werner Stuetzle. 1997. View-based rendering: Visualizing real objects from scanned range and color data. In Rendering techniques? 97. Springer, 23--34. Google ScholarDigital Library
- CapturingReality RealityCapture. 2016. RealityCapture. http://capturingreality.comGoogle Scholar
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention (MICCAI). Springer, 234--241.Google Scholar
- Daniel Scharstein and Richard Szeliski. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1--3 (2002), 7--42. Google ScholarDigital Library
- Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV).Google Scholar
- Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Pratul P Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to synthesize a 4d rgbd light field from a single image. In International Conference on Computer Vision (ICCV), Vol. 2. 6.Google ScholarCross Ref
- Benjamin Ummenhofer and Thomas Brox. 2017. Global, Dense Multiscale Reconstruction for a Billion Points. International Journal of Computer Vision 125, 1 (2017), 82--94. Google ScholarDigital Library
- Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view prediction error for 3D reconstruction. ACM Transactions on Graphics (TOG) 36, 1 (2017), 8. Google ScholarDigital Library
- Oliver Woodford and Andrew W Fitzgibbon. 2005. Fast image-based rendering using hierarchical image-based priors. In BMVC, Vol. 1. 260--269.Google Scholar
- Oliver J Woodford, Ian D Reid, and Andrew W Fitzgibbon. 2007. Efficient new-view synthesis using pairwise dictionary priors. In Computer Vision and Pattern Recognition (CVPR). IEEE, 1--8.Google Scholar
- Oliver J Woodford, Ian D Reid, Philip HS Torr, and Andrew W Fitzgibbon. 2006. Fields of Experts for Image-based Rendering.. In BMVC, Vol. 3. 1109--1108.Google Scholar
- Ke Colin Zheng, Alex Colburn, Aseem Agarwala, Maneesh Agrawala, David Salesin, Brian Curless, and Michael F Cohen. 2009. Parallax photography: creating 3d cinematic effects from stills. In Proceedings of Graphics Interface 2009. Canadian Information Processing Society, 111--118. Google ScholarDigital Library
- Ke Colin Zheng, Sing Bing Kang, Michael F Cohen, and Richard Szeliski. 2007. Layered depth panoramas. In Computer Vision and Pattern Recognition (CVPR). IEEE, 1--8.Google Scholar
- Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In European Conference on Computer Vision (ECCV). Springer, 286--301.Google ScholarCross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. 1 (2017).Google ScholarCross Ref
- C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM transactions on graphics (TOG), Vol. 23. ACM, 600--608. Google ScholarDigital Library
Index Terms
- Deep blending for free-viewpoint image-based rendering
Recommendations
Scalable inside-out image-based rendering
Our aim is to give users real-time free-viewpoint rendering of real indoor scenes, captured with off-the-shelf equipment such as a high-quality color camera and a commodity depth sensor. Image-based Rendering (IBR) can provide the realistic imagery ...
Image-based rendering of diffuse, specular and glossy surfaces from a single image
SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniquesIn this paper, we present a new method to recover an approximation of the bidirectional reflectance distribution function (BRDF) of the surfaces present in a real scene. This is done from a single photograph and a 3D geometric model of the scene. The ...
Free-viewpoint Indoor Neural Relighting from Multi-view Stereo
We introduce a neural relighting algorithm for captured indoors scenes, that allows interactive free-viewpoint navigation. Our method allows illumination to be changed synthetically, while coherently rendering cast shadows and complex glossy materials. We ...
Comments