Skip to main content
Erschienen in: International Journal of Computer Vision 8/2023

10.05.2023

Single-View View Synthesis with Self-rectified Pseudo-Stereo

verfasst von: Yang Zhou, Hanjie Wu, Wenxi Liu, Zheng Xiong, Jing Qin, Shengfeng He

Erschienen in: International Journal of Computer Vision | Ausgabe 8/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, (1) pruning the network to reveal low-confident predictions; and (2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aliev, K. A., Ulyanov, D., & Lempitsky, V. S. (2020). Neural point-based graphics. In ECCV Aliev, K. A., Ulyanov, D., & Lempitsky, V. S. (2020). Neural point-based graphics. In ECCV
Zurück zum Zitat Chaurasia, G., Duchêne, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM TOG, 32, 30:1-30:12.CrossRef Chaurasia, G., Duchêne, S., Sorkine-Hornung, O., & Drettakis, G. (2013). Depth synthesis and local warps for plausible image-based navigation. ACM TOG, 32, 30:1-30:12.CrossRef
Zurück zum Zitat Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NeurIPS Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NeurIPS
Zurück zum Zitat Choi, I., Gallo, O., Troccoli, A. J., Kim, M. H., & Kautz, J. (2019). Extreme view synthesis. In ICCV (pp. 7780–7789). Choi, I., Gallo, O., Troccoli, A. J., Kim, M. H., & Kautz, J. (2019). Extreme view synthesis. In ICCV (pp. 7780–7789).
Zurück zum Zitat Cun, X., Xu, F., Pun, C. M., & Gao, H. (2019). Depth-assisted full resolution network for single image-based view synthesis. IEEE Computer Graphics and Applications, 39, 52–64.CrossRef Cun, X., Xu, F., Pun, C. M., & Gao, H. (2019). Depth-assisted full resolution network for single image-based view synthesis. IEEE Computer Graphics and Applications, 39, 52–64.CrossRef
Zurück zum Zitat Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 11–20). Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 11–20).
Zurück zum Zitat Debevec, P. E., Yu, Y., & Borshukov, G. (1998). Efficient view-dependent image-based rendering with projective texture-mapping. In Rendering Techniques. Debevec, P. E., Yu, Y., & Borshukov, G. (1998). Efficient view-dependent image-based rendering with projective texture-mapping. In Rendering Techniques.
Zurück zum Zitat Fitzgibbon, A. W., Wexler, Y., & Zisserman, A. (2005). Image-based rendering using image-based priors. International Journal of Computer Vision, 63, 141–151.CrossRef Fitzgibbon, A. W., Wexler, Y., & Zisserman, A. (2005). Image-based rendering using image-based priors. International Journal of Computer Vision, 63, 141–151.CrossRef
Zurück zum Zitat Flynn, J., Broxton, M., Debevec, P. E., DuVall, M., Fyffe, G., Overbeck, R. S., Snavely, N., & Tucker, R. (2019) Deepview: View synthesis with learned gradient descent. In CVPR (pp. 2362–2371). Flynn, J., Broxton, M., Debevec, P. E., DuVall, M., Fyffe, G., Overbeck, R. S., Snavely, N., & Tucker, R. (2019) Deepview: View synthesis with learned gradient descent. In CVPR (pp. 2362–2371).
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32, 1231–1237.CrossRef Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32, 1231–1237.CrossRef
Zurück zum Zitat Godard, C., Aodha, O. M., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR (pp. 6602–6611). Godard, C., Aodha, O. M., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR (pp. 6602–6611).
Zurück zum Zitat GonzalezBello, J. L., & Kim, M. (2020). Forget about the lidar: Self-supervised depth estimators with med probability volumes. In NeurIPS GonzalezBello, J. L., & Kim, M. (2020). Forget about the lidar: Self-supervised depth estimators with med probability volumes. In NeurIPS
Zurück zum Zitat Gonzalez, J. L., & Kim, M. (2021). Plade-net: Towards pixel-level accuracy for self-supervised single-view depth estimation with neural positional encoding and distilled matting loss. In CVPR. Gonzalez, J. L., & Kim, M. (2021). Plade-net: Towards pixel-level accuracy for self-supervised single-view depth estimation with neural positional encoding and distilled matting loss. In CVPR.
Zurück zum Zitat Hedman, P., Philip, J., Price, T., Frahm, J. M., Drettakis, G., & Brostow, G. J. (2018). Deep blending for free-viewpoint image-based rendering. ACM TOG, 37, 1–15. Hedman, P., Philip, J., Price, T., Frahm, J. M., Drettakis, G., & Brostow, G. J. (2018). Deep blending for free-viewpoint image-based rendering. ACM TOG, 37, 1–15.
Zurück zum Zitat Hooker, S., Courville, A., Clark, G., Dauphin, Y., & Frome, A. (2019). What do compressed deep neural networks forget? arXiv Hooker, S., Courville, A., Clark, G., Dauphin, Y., & Frome, A. (2019). What do compressed deep neural networks forget? arXiv
Zurück zum Zitat Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR (pp. 2462–2470). Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In CVPR (pp. 2462–2470).
Zurück zum Zitat Jampani, V., Chang, H., Sargent, K., Kar, A., Tucker, R., Krainin, M., Kaeser, D., Freeman, W. T., Salesin, D., Curless, B., et al. (2021). Slide: Single image 3d photography with soft layering and depth-aware inpainting. In ICCV (pp. 12518–12527). Jampani, V., Chang, H., Sargent, K., Kar, A., Tucker, R., Krainin, M., Kaeser, D., Freeman, W. T., Salesin, D., Curless, B., et al. (2021). Slide: Single image 3d photography with soft layering and depth-aware inpainting. In ICCV (pp. 12518–12527).
Zurück zum Zitat Jantet, V., Morin, L., & Guillemot, C. (2009). Incremental-ldi for multi-view coding. In 3DTV (pp. 1–4). Jantet, V., Morin, L., & Guillemot, C. (2009). Incremental-ldi for multi-view coding. In 3DTV (pp. 1–4).
Zurück zum Zitat Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In CVPR (pp. 4396–4405). Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In CVPR (pp. 4396–4405).
Zurück zum Zitat Kingma, D. P., Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR Kingma, D. P., Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR
Zurück zum Zitat Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., & Goesele, M. (2013). Image-based rendering in the gradient domain. ACM TOG, 32, 1–9. Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., & Goesele, M. (2013). Image-based rendering in the gradient domain. ACM TOG, 32, 1–9.
Zurück zum Zitat Kopf, J., Matzen, K., Alsisan, S., Quigley, O., Ge, F., Chong, Y., Patterson, J., Frahm, J. M., Wu, S., Yu, M., et al. (2020). One shot 3d photography. ACM TOG, 39(4), 76:1-76:13.CrossRef Kopf, J., Matzen, K., Alsisan, S., Quigley, O., Ge, F., Chong, Y., Patterson, J., Frahm, J. M., Wu, S., Yu, M., et al. (2020). One shot 3d photography. ACM TOG, 39(4), 76:1-76:13.CrossRef
Zurück zum Zitat Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. B. (2015). Deep convolutional inverse graphics network. In NeurIPS. Kulkarni, T. D., Whitney, W. F., Kohli, P., & Tenenbaum, J. B. (2015). Deep convolutional inverse graphics network. In NeurIPS.
Zurück zum Zitat Li, J., Feng, Z., She, Q., Ding, H., Wang, C., & Lee, G. H. (2021). Mine: Towards continuous depth mpi with nerf for novel view synthesis. In ICCV (pp. 12578–12588). Li, J., Feng, Z., She, Q., Ding, H., Wang, C., & Lee, G. H. (2021). Mine: Towards continuous depth mpi with nerf for novel view synthesis. In ICCV (pp. 12578–12588).
Zurück zum Zitat Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In ICCV (pp. 2755–2763). Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In ICCV (pp. 2755–2763).
Zurück zum Zitat Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., & Lin, L. (2018). Single view stereo matching. In CVPR (pp. 155–163). Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., & Lin, L. (2018). Single view stereo matching. In CVPR (pp. 155–163).
Zurück zum Zitat Martin-Brualla, R., Pandey, R., Yang, S., Pidlypenskyi, P., Taylor, J., Valentin, J. P. C., Khamis, S., Davidson, P. L., Tkach, A., Lincoln, P., Kowdle, A., Rhemann, C., Goldman, D. B., Keskin, C., Seitz, S. M., Izadi, S., & Fanello, S. (2018). Lookingood: Enhancing performance capture with real-time neural re-rendering. ACM TOG 37, 255:1–255:14. Martin-Brualla, R., Pandey, R., Yang, S., Pidlypenskyi, P., Taylor, J., Valentin, J. P. C., Khamis, S., Davidson, P. L., Tkach, A., Lincoln, P., Kowdle, A., Rhemann, C., Goldman, D. B., Keskin, C., Seitz, S. M., Izadi, S., & Fanello, S. (2018). Lookingood: Enhancing performance capture with real-time neural re-rendering. ACM TOG 37, 255:1–255:14.
Zurück zum Zitat Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR (pp. 4040–4048). Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR (pp. 4040–4048).
Zurück zum Zitat Meshry, M., Goldman, D. B., Khamis, S., Hoppe, H., Pandey, R., Snavely, N., & Martin-Brualla, R. (2019). Neural rerendering in the wild. In CVPR (pp. 6871–6880). Meshry, M., Goldman, D. B., Khamis, S., Hoppe, H., Pandey, R., Snavely, N., & Martin-Brualla, R. (2019). Neural rerendering in the wild. In CVPR (pp. 6871–6880).
Zurück zum Zitat Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV. Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV.
Zurück zum Zitat Novotný, D., Graham, B., & Reizenstein, J. (2019). Perspectivenet: A scene-consistent image generator for new view synthesis in real indoor environments. In NeurIPS. Novotný, D., Graham, B., & Reizenstein, J. (2019). Perspectivenet: A scene-consistent image generator for new view synthesis in real indoor environments. In NeurIPS.
Zurück zum Zitat Park, E., Yang, J., Yumer, E., Ceylan, D., & Berg, A. C. (2017). Transformation-grounded image generation network for novel 3d view synthesis. In CVPR (pp. 702–711). Park, E., Yang, J., Yumer, E., Ceylan, D., & Berg, A. C. (2017). Transformation-grounded image generation network for novel 3d view synthesis. In CVPR (pp. 702–711).
Zurück zum Zitat Park, K., Sinha, U., Barron, J. T., Bouaziz, S., Goldman, D. B., Seitz, S. M., & Brualla, R. M. (2020). Deformable neural radiance fields. arXiv:2011.12948 Park, K., Sinha, U., Barron, J. T., Bouaziz, S., Goldman, D. B., Seitz, S. M., & Brualla, R. M. (2020). Deformable neural radiance fields. arXiv:​2011.​12948
Zurück zum Zitat Penner, E., & Zhang, L. (2017). Soft 3d reconstruction for view synthesis. ACM TOG, 36, 1–11.CrossRef Penner, E., & Zhang, L. (2017). Soft 3d reconstruction for view synthesis. ACM TOG, 36, 1–11.CrossRef
Zurück zum Zitat Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (pp. 519–528). Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR (pp. 519–528).
Zurück zum Zitat Shih, M. L., Su, S. Y., Kopf, J., & Huang, J. B. (2020). 3d photography using context-aware layered depth inpainting. In CVPR (pp. 8025–8035). Shih, M. L., Su, S. Y., Kopf, J., & Huang, J. B. (2020). 3d photography using context-aware layered depth inpainting. In CVPR (pp. 8025–8035).
Zurück zum Zitat Sinha, S. N., Kopf, J., Goesele, M., Scharstein, D., & Szeliski, R. (2012). Image-based rendering for scenes with reflections. ACM TOG, 31, 1–10.CrossRef Sinha, S. N., Kopf, J., Goesele, M., Scharstein, D., & Szeliski, R. (2012). Image-based rendering for scenes with reflections. ACM TOG, 31, 1–10.CrossRef
Zurück zum Zitat Srinivasan, P. P., Tucker, R., Barron, J. T., Ramamoorthi, R., Ng, R., & Snavely, N. (2019). Pushing the boundaries of view extrapolation with multiplane images. In CVPR (pp. 175–184). Srinivasan, P. P., Tucker, R., Barron, J. T., Ramamoorthi, R., Ng, R., & Snavely, N. (2019). Pushing the boundaries of view extrapolation with multiplane images. In CVPR (pp. 175–184).
Zurück zum Zitat Srinivasan, P. P., Wang, T., Sreelal, A., Ramamoorthi, R., & Ng, R. (2017). Learning to synthesize a 4d rgbd light field from a single image. In ICCV. Srinivasan, P. P., Wang, T., Sreelal, A., Ramamoorthi, R., & Ng, R. (2017). Learning to synthesize a 4d rgbd light field from a single image. In ICCV.
Zurück zum Zitat Sun, S. H., Huh, M., Liao, Y. H., Zhang, N., Lim, J. J. (2018). Multi-view to novel view: Synthesizing novel views with self-learned confidence. In ECCV. Sun, S. H., Huh, M., Liao, Y. H., Zhang, N., Lim, J. J. (2018). Multi-view to novel view: Synthesizing novel views with self-learned confidence. In ECCV.
Zurück zum Zitat Szeliski, R., & Golland, P. (2004). Stereo matching with transparency and matting. International Journal of Computer Vision, 32, 45–61.CrossRef Szeliski, R., & Golland, P. (2004). Stereo matching with transparency and matting. International Journal of Computer Vision, 32, 45–61.CrossRef
Zurück zum Zitat Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3d models from single images with a convolutional network. In ECCV. Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3d models from single images with a convolutional network. In ECCV.
Zurück zum Zitat Tucker, R., & Snavely, N. (2020). Single-view view synthesis with multiplane images. In CVPR (pp. 548–557). Tucker, R., & Snavely, N. (2020). Single-view view synthesis with multiplane images. In CVPR (pp. 548–557).
Zurück zum Zitat Tulsiani, S., Tucker, R., & Snavely, N. (2018). Layer-structured 3d scene inference via view synthesis. In ECCV. Tulsiani, S., Tucker, R., & Snavely, N. (2018). Layer-structured 3d scene inference via view synthesis. In ECCV.
Zurück zum Zitat Wang, Z., Wang, H., Chen, T., Wang, Z., & Ma, K. (2021). Troubleshooting blind image quality models in the wild. In CVPR (pp. 16251–16260). Wang, Z., Wang, H., Chen, T., Wang, Z., & Ma, K. (2021). Troubleshooting blind image quality models in the wild. In CVPR (pp. 16251–16260).
Zurück zum Zitat Watson, J., Mac A. O., Turmukhambetov, D., Brostow, G. J., & Firman, M. (2020). Learning stereo from single images. In ECCV (pp. 722–740). Watson, J., Mac A. O., Turmukhambetov, D., Brostow, G. J., & Firman, M. (2020). Learning stereo from single images. In ECCV (pp. 722–740).
Zurück zum Zitat Xie, J., Girshick, R. B., Farhadi, A. (2016). Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In ECCV. Xie, J., Girshick, R. B., Farhadi, A. (2016). Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In ECCV.
Zurück zum Zitat Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T. S. (2019). Free-form image inpainting with gated convolution. In ICCV (pp. 4471–4480). Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T. S. (2019). Free-form image inpainting with gated convolution. In ICCV (pp. 4471–4480).
Zurück zum Zitat Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR (pp. 586–595). Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR (pp. 586–595).
Zurück zum Zitat Zhou, T., Tucker, R., Flynn, J., Fyffe, G., & Snavely, N. (2018). Stereo magnification: Learning view synthesis using multiplane images. ACM TOG, 37(4), 1–12.CrossRef Zhou, T., Tucker, R., Flynn, J., Fyffe, G., & Snavely, N. (2018). Stereo magnification: Learning view synthesis using multiplane images. ACM TOG, 37(4), 1–12.CrossRef
Metadaten
Titel
Single-View View Synthesis with Self-rectified Pseudo-Stereo
verfasst von
Yang Zhou
Hanjie Wu
Wenxi Liu
Zheng Xiong
Jing Qin
Shengfeng He
Publikationsdatum
10.05.2023
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 8/2023
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01803-z

Weitere Artikel der Ausgabe 8/2023

International Journal of Computer Vision 8/2023 Zur Ausgabe

Premium Partner