skip to main content
research-article

Deep blending for free-viewpoint image-based rendering

Published:04 December 2018Publication History
Skip Abstract Section

Abstract

Free-viewpoint image-based rendering (IBR) is a standing challenge. IBR methods combine warped versions of input photos to synthesize a novel view. The image quality of this combination is directly affected by geometric inaccuracies of multi-view stereo (MVS) reconstruction and by view- and image-dependent effects that produce artifacts when contributions from different input views are blended. We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions. Our Deep Blending method requires us to address several challenges to achieve our goal of interactive free-viewpoint IBR navigation. We first need to provide sufficiently accurate geometry so the Convolutional Neural Network (CNN) can succeed in finding correct blending weights. We do this by combining two different MVS reconstructions with complementary accuracy vs. completeness tradeoffs. To tightly integrate learning in an interactive IBR system, we need to adapt our rendering algorithm to produce a fixed number of input layers that can then be blended by the CNN. We generate training data with a variety of captured scenes, using each input photo as ground truth in a held-out approach. We also design the network architecture and the training loss to provide high quality novel view synthesis, while reducing temporal flickering artifacts. Our results demonstrate free-viewpoint IBR in a wide variety of scenes, clearly surpassing previous methods in visual quality, especially when moving far from the input cameras.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernandez Esteban, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Transactions on Graphics (TOG) 35, 6 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Murat Arikan, Reinhold Preiner, Claus Scheiblauer, Stefan Jeschke, and Michael Wimmer. 2014. Large-scale point-cloud visualization through localized textured surface reconstruction. IEEE Trans. Vis. Comput. Graphics 20, 9 (2014), 1280--1292.Google ScholarGoogle ScholarCross RefCross Ref
  4. Murat Arikan, Reinhold Preiner, and Michael Wimmer. 2016. Multi-Depth-Map Raytracing for Efficient Large-Scene Reconstruction. IEEE Trans. Vis. Comput. Graphics 22, 2 (2016), 1127--1137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Steve Bako, Thijs Vogels, Brian McWilliams, Mark Meyer, Jan Novák, Alex Harvill, Pradeep Sen, Tony DeRose, and Fabrice Rousselle. 2017. Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Transactions on Graphics (TOG) 36, 4 (2017), 97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. Patch-Match: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (TOG) 28, 3 (2009), 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG) 32, 3 (2013), 30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In Proceedings of the NIPS Workshop on Deep Learning and Representation Learning.Google ScholarGoogle Scholar
  10. Abe Davis, Marc Levoy, and Fredo Durand. 2012. Unstructured light fields. In Computer Graphics Forum, Vol. 31. Wiley Online Library, 305--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Martin Eisemann, Bert De Decker, Marcus Magnor, Philippe Bekaert, Edilson de Aguiar, Naveed Ahmed, Christian Theobalt, and Anita Sellent. 2008. Floating Textures. Comp. Graph. Forum (2008).Google ScholarGoogle Scholar
  12. Jihad El-Sana and Amitabh Varshney. 1999. Generalized view-dependent simplification. In Computer Graphics Forum, Vol. 18. Wiley Online Library, 83--94.Google ScholarGoogle Scholar
  13. Andrew Fitzgibbon, Yonatan Wexler, and Andrew Zisserman. 2005. Image-based rendering using image-based priors. International Journal of Computer Vision 63, 2 (2005), 141--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world's imagery. In Computer Vision and Pattern Recognition (CVPR). 5515--5524.Google ScholarGoogle Scholar
  15. Yasutaka Furukawa and Jean Ponce. 2010. Accurate, Dense, and Robust Multi-View Stereopsis. IEEE Trans. PAMI (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael Garland and Paul S Heckbert. 1997. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 209--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M Seitz. 2007. Multi-view stereo for community photo collections. In International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  18. Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. 1996. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 43--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Guided image filtering. In European Conference on Computer Vision (ECCV). Springer, 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  21. Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D photography. ACM Transactions on Graphics (TOG) 36, 6 (2017), 234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable inside-out image-based rendering. ACM Transactions on Graphics (TOG) 35, 6 (2016), 231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Benno Heigl, Reinhard Koch, Marc Pollefeys, Joachim Denzler, and Luc Van Gool. 1999. Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In Mustererkennung 1999. Springer, 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Heiko Hirschmuller. 2006. Stereo Vision in Structured Environments by Consistent Semi-Global Matching. In Computer Vision and Patter Recognition (CVPR). 2386--2393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint abs/1704.04861 (2017).Google ScholarGoogle Scholar
  26. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-To-Image Translation With Conditional Adversarial Networks. In CVPR.Google ScholarGoogle Scholar
  27. Michal Jancosek and Tomás Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In Computer Vision and Pattern Recognition (CVPR). IEEE, 3121--3128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision (ECCV). Springer, 694--711.Google ScholarGoogle ScholarCross RefCross Ref
  29. Nima Khademi Kalantari and Ravi Ramamoorthi. 2017. Deep high dynamic range imaging of dynamic scenes. ACM Transactions on Graphics (TOG) 36, 4 (2017), 144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics (TOG) 35, 6 (2016), 193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. arXiv preprint abs/1511.06530 (2015).Google ScholarGoogle Scholar
  32. Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (TOG) 36, 4 (2017), 78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Johannes Kopf, Michael F. Cohen, and Richard Szeliski. 2014a. First-person Hyper-lapse Videos. ACM Transactions on Graphics (TOG) 33, 4, Article 78 (July 2014), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Johannes Kopf, Michael F Cohen, and Richard Szeliski. 2014b. First-person hyper-lapse videos. ACM Transactions on Graphics (TOG) 33, 4 (2014), 78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2007. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In International Conference on Computer Vision (ICCV). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  36. Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wenfeng Li and Baoxin Li. 2008. Joint Conditional Random Field of multiple views with online learning for image-based rendering. In Computer Vision and Pattern Recognition. IEEE.Google ScholarGoogle Scholar
  38. Zhengqi Li and Noah Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  39. David Luebke and Carl Erikson. 1997. View-dependent simplification of arbitrary polygonal environments. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 199--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning. arXiv preprint abs/1611.06440 (2016).Google ScholarGoogle Scholar
  41. Oliver Nalbach, Elena Arabadzhiyska, Dushyant Mehta, Hans-Peter Seidel, and Tobias Ritschel. 2017. Deep Shading: Convolutional Neural Networks for Screen-Space Shading. 36, 4 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rodrigo Ortiz-Cayon, Abdelaziz Djelouah, and George Drettakis. 2015. A Bayesian Approach for Selective Image-Based Rendering using Superpixels. In International Conference on 3D Vision (3DV). Lyon, France. https://hal.inria.fr/hal-01207907Google ScholarGoogle Scholar
  43. Philippe Pébay and Timothy Baker. 2003. Analysis of triangle quality measures. Math. Comp. 72, 244 (2003), 1817--1839. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Eric Penner and Li Zhang. 2017. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG) 36, 6 (2017), 235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sergi Pujades, Frédéric Devernay, and Bastian Goldluecke. 2014. Bayesian view synthesis and image-based rendering principles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3906--3913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Kari Pulli, Hugues Hoppe, Michael Cohen, Linda Shapiro, Tom Duchamp, and Werner Stuetzle. 1997. View-based rendering: Visualizing real objects from scanned range and color data. In Rendering techniques? 97. Springer, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. CapturingReality RealityCapture. 2016. RealityCapture. http://capturingreality.comGoogle ScholarGoogle Scholar
  48. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention (MICCAI). Springer, 234--241.Google ScholarGoogle Scholar
  49. Daniel Scharstein and Richard Szeliski. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1--3 (2002), 7--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  51. Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  52. Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. In Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  53. Pratul P Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to synthesize a 4d rgbd light field from a single image. In International Conference on Computer Vision (ICCV), Vol. 2. 6.Google ScholarGoogle ScholarCross RefCross Ref
  54. Benjamin Ummenhofer and Thomas Brox. 2017. Global, Dense Multiscale Reconstruction for a Billion Points. International Journal of Computer Vision 125, 1 (2017), 82--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view prediction error for 3D reconstruction. ACM Transactions on Graphics (TOG) 36, 1 (2017), 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Oliver Woodford and Andrew W Fitzgibbon. 2005. Fast image-based rendering using hierarchical image-based priors. In BMVC, Vol. 1. 260--269.Google ScholarGoogle Scholar
  57. Oliver J Woodford, Ian D Reid, and Andrew W Fitzgibbon. 2007. Efficient new-view synthesis using pairwise dictionary priors. In Computer Vision and Pattern Recognition (CVPR). IEEE, 1--8.Google ScholarGoogle Scholar
  58. Oliver J Woodford, Ian D Reid, Philip HS Torr, and Andrew W Fitzgibbon. 2006. Fields of Experts for Image-based Rendering.. In BMVC, Vol. 3. 1109--1108.Google ScholarGoogle Scholar
  59. Ke Colin Zheng, Alex Colburn, Aseem Agarwala, Maneesh Agrawala, David Salesin, Brian Curless, and Michael F Cohen. 2009. Parallax photography: creating 3d cinematic effects from stills. In Proceedings of Graphics Interface 2009. Canadian Information Processing Society, 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ke Colin Zheng, Sing Bing Kang, Michael F Cohen, and Richard Szeliski. 2007. Layered depth panoramas. In Computer Vision and Pattern Recognition (CVPR). IEEE, 1--8.Google ScholarGoogle Scholar
  61. Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In European Conference on Computer Vision (ECCV). Springer, 286--301.Google ScholarGoogle ScholarCross RefCross Ref
  62. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. 1 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  63. C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM transactions on graphics (TOG), Vol. 23. ACM, 600--608. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep blending for free-viewpoint image-based rendering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 37, Issue 6
          December 2018
          1401 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/3272127
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 December 2018
          Published in tog Volume 37, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader