Abstract
Modern camera calibration and multiview stereo techniques enable users to smoothly navigate between different views of a scene captured using standard cameras. The underlying automatic 3D reconstruction methods work well for buildings and regular structures but often fail on vegetation, vehicles, and other complex geometry present in everyday urban scenes. Consequently, missing depth information makes Image-Based Rendering (IBR) for such scenes very challenging. Our goal is to provide plausible free-viewpoint navigation for such datasets. To do this, we introduce a new IBR algorithm that is robust to missing or unreliable geometry, providing plausible novel views even in regions quite far from the input camera positions. We first oversegment the input images, creating superpixels of homogeneous color content which often tends to preserve depth discontinuities. We then introduce a depth synthesis approach for poorly reconstructed regions based on a graph structure on the oversegmentation and appropriate traversal of the graph. The superpixels augmented with synthesized depth allow us to define a local shape-preserving warp which compensates for inaccurate depth. Our rendering algorithm blends the warped images, and generates plausible image-based novel views for our challenging target scenes. Our results demonstrate novel view synthesis in real time for multiple challenging scenes with significant depth complexity, providing a convincing immersive navigation experience.
Supplemental Material
Available for Download
Supplemental movie and image files for, Depth synthesis and local warps for plausible image-based navigation
- Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Analy. Machine Intell. 34, 11, 2274--2282. Google ScholarDigital Library
- Andreetto, M., Zelnik-Manor, L., and Perona, P. 2008. Unsupervised learning of categorical segments in image collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop.Google Scholar
- Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. B. 2009. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28, 3, 24:1--24:11. Google ScholarDigital Library
- Bleyer, M., Rother, C., Kohli, P., Scharstein, D., and Sinha, S. 2011. Object stereo joint stereo matching and object segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'11). 3081--3088. Google ScholarDigital Library
- Buehler, C., Bosse, M., Mcmillan, L., Gortler, S., and Cohen, M. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'01). 425--432. Google ScholarDigital Library
- Chaurasia, G., Sorkine, O., and Drettakis, G. 2011. Silhouette-aware warping for image-based rendering. Comput. Graph. Forum 30, 4, 1223--1232. Google ScholarDigital Library
- Chen, J., Paris, S., Wang, J., Matusik, W., Cohen, M., and Durand, F. 2011. The video mesh: A data structure for image-based three dimensional video editing. In Proceedings of the IEEE International Conference on Computational Photography (ICCP'11).Google Scholar
- Chen, Y., Davis, T. A., Hager, W. W., and Rajamanickam, S. 2008. Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35, 3, 22:1--22:14. Google ScholarDigital Library
- Cigla, C., Zabulis, X., and Alatan, A. 2007. Region-based dense depth extraction from multi-view video. In Proceedings of the IEEE International Conference on Image Processing (ICIP'07).Google Scholar
- Criminisi, A., Perez, P., and Toyama, K. 2003. Object removal by exemplar-based inpainting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'03). 721--728.Google Scholar
- Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'96). 11--20. Google ScholarDigital Library
- Dolson, J., Baek, J., Plagemann, C., and Thrun, S. 2010. Upsampling range data in dynamic environments. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'10). 1141--1148.Google Scholar
- Eisemann, M., Decker, B. D., Magnor, M., Bekaert, P., De Aguiar, E., Ahmed, N., Theobalt, C., and Sellent, A. 2008. Floating textures. Comput. Graph. Forum 27, 2, 409--418.Google ScholarCross Ref
- Felzenszwalb, P. F. and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. Int. J. Comput. Vision 59, 167--181. Google ScholarDigital Library
- Fuhrmann, S. and Goesele, M. 2011. Fusion of depth maps with multiple scales. In Proceedings of the SIGGRAPH Asia Conference. 148:1--148:8. Google ScholarDigital Library
- Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Manhattan-world stereo. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'09). 1422--1429.Google Scholar
- Furukawa, Y. and Ponce, J. 2009. Accurate, dense, and robust multiview stereopsis. IEEE Trans. PAMI 32, 8, 1362--1376. Google ScholarDigital Library
- Gallup, D., Frahm, J.-M., and Pollefeys, M. 2010. Piecewise planar and non-planar stereo for urban scene reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google Scholar
- Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., and Klowsky, R. 2010. Ambient point clouds for view interpolation. ACM Trans. Graph. 29, 95:1--95:6. Google ScholarDigital Library
- Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google Scholar
- Grundmann, M., Kwatra, V., Han, M., and Essa, I. 2010. Efficient hierarchical graph based video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10).Google Scholar
- Gupta, A., Bhat, P., Dontcheva, M., Curless, B., Deussen, O., and Cohen, M. 2009. Enhancing and experiencing spacetime resolution with videos and stills. In Proceedings of the IEEE International Conference on Computational Photography (ICCP'09).Google Scholar
- Hawe, S., Kleinsteuber, M., and Diepold, K. 2011. Dense disparity maps from sparse disparity measurements. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11). Google ScholarDigital Library
- Hoiem, D., Efros, A. A., and Hebert, M. 2007. Recovering surface layout from an image. Int. J. Comput. Vision 75, 1, 151--172. Google ScholarDigital Library
- Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Proceedings of the 4th Eurographics Symposium on Geometry Processing (SGP'06). 61--70. Google ScholarDigital Library
- Kolmogorov, V. and Zabih, R. 2004. What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Analy. Machine Intell. 26, 2, 147--159. Google ScholarDigital Library
- Kowdle, A., Sinha, S. N., and Szeliski, R. 2012. Multiple view object cosegmentation using appearance and stereo cues. In Proceedings of the 12th European Conference on Computer Vision (ECCV'12). Google ScholarDigital Library
- Levoy, M. and Hanrahan, P. 1996. Light field rendering. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'96). 31--42. Google ScholarDigital Library
- Lipski, C., Linz, C., Berger, K., Sellent, A., and Magnor, M. 2010. Virtual video camera: Image-based viewpoint navigation through space and time. Comput. Graph. Forum 29, 8, 2555--2568.Google ScholarCross Ref
- Liu, F., Gleicher, M., Jin, H., and Agarwala, A. 2009. Contentpreserving warps for 3D video stabilization. In Proceedings of the ACM SIGGRAPH Papers. 44:1--44:9. Google ScholarDigital Library
- Mahajan, D., Huang, F.-C., Matusik, W., Ramamoorthi, R., and Belhumeur, P. 2009. Moving gradients: A path-based method for plausible image interpolation. ACM Trans. Graph. 28, 3. Google ScholarDigital Library
- Mcmillan, L. and Bishop, G. 1995. Plenoptic modeling: An image based rendering system. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'95). 39--46. Google ScholarDigital Library
- Micusik, B. and Kosecka, J. 2010. Multi-view superpixel stereo in urban environments. Int. J. Comput. Vision 89, 1, 106--119. Google ScholarDigital Library
- Perez, P., Gangnet, M., and Blake, A. 2003. Poisson image editing. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'03). 313--318. Google ScholarDigital Library
- Pollefeys, M., Nistér, D., Frahm, J. M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.-J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewenius, H., Yang, R., Welch, G., and Towles, H. 2008. Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vision 78, 2--3, 143--167. Google ScholarDigital Library
- Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06). Google ScholarDigital Library
- Sinha, S. N., Kopf, J., Goesele, M., Scharstein, D., and Szeliski, R. 2012. Image-based rendering for scenes with reflections. ACM Trans. Graph. 31, 4, 100:1--100:10. Google ScholarDigital Library
- Sinha, S. N., Mordohai, P., and Pollefeys, M. 2007. Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'07).Google Scholar
- Sinha, S. N., Steedly, D., and Szeliski, R. 2009. Piecewise planar stereo for image-based rendering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09). 1881--1888.Google Scholar
- Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph. 25, 3, 835--846. Google ScholarDigital Library
- Stich, T., Linz, C., Wallraven, C., Cunningham, D., and Magnor, M. 2011. Perception-motivated interpolation of image sequences. ACM Trans. Appl. Percept. 8, 2, 11:1--11:25. Google ScholarDigital Library
- Vangorp, P., Chaurasia, G., Laffont, P.-Y., Fleming, R. W., and Drettakis, G. 2011. Perception of visual artifacts in image-based rendering of facades. Comput. Graph. Forum 30, 4, 1241--1250. Google ScholarDigital Library
- Yang, Q., Yang, R., Davis, J., and Niste R, D. 2007. Spatial-depth super resolution for range images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'07).Google ScholarCross Ref
- Zitnick, C. L., Jojic, N., and Kang, S. B. 2005. Consistent segmentation for optical flow estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'05). 1308--1315. Google ScholarDigital Library
- Zitnick, C. L. and Kang, S. B. 2007. Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vision 75, 1, 49--65. Google ScholarDigital Library
- Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23, 3, 600--608. Google ScholarDigital Library
Index Terms
- Depth synthesis and local warps for plausible image-based navigation
Recommendations
Video enhancement leveraging high-quality depth maps
VRCAI '12: Proceedings of the 11th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in IndustryUsing an image-based rendering (IBR) algorithm, we can reconstruct and enhance a low-quality video by extracting colors from several high-quality photographs according to the correspondences. In this paper, we present several improvements for the IBR ...
The depth discontinuity occlusion camera
I3D '06: Proceedings of the 2006 symposium on Interactive 3D graphics and gamesRendering a scene using a single depth image suffers from disocclusion errors as the view translates away from the reference view. We present the depth discontinuity occlusion camera (DDOC), a non-pinhole camera that samples surfaces which are hidden in ...
Image-based rendering of diffuse, specular and glossy surfaces from a single image
SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniquesIn this paper, we present a new method to recover an approximation of the bidirectional reflectance distribution function (BRDF) of the surfaces present in a real scene. This is done from a single photograph and a 3D geometric model of the scene. The ...
Comments