Skip to main content
Erschienen in: International Journal of Computer Vision 12/2020

15.07.2020

Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

verfasst von: Haozhe Xie, Hongxun Yao, Shengping Zhang, Shangchen Zhou, Wenxiu Sun

Erschienen in: International Journal of Computer Vision | Ausgabe 12/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recovering the 3D shape of an object from single or multiple images with deep neural networks has been attracting increasing attention in the past few years. Mainstream works (e.g. 3D-R2N2) use recurrent neural networks (RNNs) to sequentially fuse feature maps of input images. However, RNN-based approaches are unable to produce consistent reconstruction results when given the same input images with different orders. Moreover, RNNs may forget important features from early input images due to long-term memory loss. To address these issues, we propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume. To further correct the wrongly recovered parts in the fused 3D volume, a refiner is adopted to generate the final output. Experimental results on the ShapeNet, Pix3D, and Things3D benchmarks show that Pix2Vox++ performs favorably against state-of-the-art methods in terms of both accuracy and efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. TPAMI, 37(8), 1670–1687.CrossRef Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. TPAMI, 37(8), 1670–1687.CrossRef
Zurück zum Zitat Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332.CrossRef Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332.CrossRef
Zurück zum Zitat Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). ShapeNet: An information-rich 3D model repository. arXiv:1512.03012 Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). ShapeNet: An information-rich 3D model repository. arXiv:​1512.​03012
Zurück zum Zitat Chen, Z., & Zhang, H. (2019). Learning implicit fields for generative shape modeling. In CVPR Chen, Z., & Zhang, H. (2019). Learning implicit fields for generative shape modeling. In CVPR
Zurück zum Zitat Choy, C. B., Xu, D., Gwak, J., Chen, K., & Savarese, S. (2016). 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV Choy, C. B., Xu, D., Gwak, J., Chen, K., & Savarese, S. (2016). 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV
Zurück zum Zitat Dibra, E., Jain, H., Öztireli, A. C., Ziegler, R., & Gross, M. H. (2017). Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In CVPR Dibra, E., Jain, H., Öztireli, A. C., Ziegler, R., & Gross, M. H. (2017). Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In CVPR
Zurück zum Zitat Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In CVPR Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In CVPR
Zurück zum Zitat Fuentes-Pacheco, J., Ascencio, J. R., & Rendón-Mancha, J. M. (2015). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.CrossRef Fuentes-Pacheco, J., Ascencio, J. R., & Rendón-Mancha, J. M. (2015). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.CrossRef
Zurück zum Zitat Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., & Bengio, Y. (2014). Generative adversarial nets. In NIPS Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., & Bengio, Y. (2014). Generative adversarial nets. In NIPS
Zurück zum Zitat Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). A papier-mâché approach to learning 3D surface generation. In CVPR Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). A papier-mâché approach to learning 3D surface generation. In CVPR
Zurück zum Zitat Harltey, A., & Zisserman, A. (2006). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press. Harltey, A., & Zisserman, A. (2006). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR
Zurück zum Zitat Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR
Zurück zum Zitat Huang, P., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. (2018). Deepmvs: Learning multi-view stereopsis. In CVPR Huang, P., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. (2018). Deepmvs: Learning multi-view stereopsis. In CVPR
Zurück zum Zitat Hwang, K., & Sung, W. (2015). Single stream parallelization of generalized LSTM-like RNNs on a GPU. In ICASSP Hwang, K., & Sung, W. (2015). Single stream parallelization of generalized LSTM-like RNNs on a GPU. In ICASSP
Zurück zum Zitat Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In NIPS Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In NIPS
Zurück zum Zitat Kato, H., & Harada, T. (2019). Learning view priors for single-view 3D reconstruction. In CVPR Kato, H., & Harada, T. (2019). Learning view priors for single-view 3D reconstruction. In CVPR
Zurück zum Zitat Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR
Zurück zum Zitat Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR
Zurück zum Zitat Lin, C., Kong, C., & Lucey, S. (2018). Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI Lin, C., Kong, C., & Lucey, S. (2018). Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI
Zurück zum Zitat Lin, C., Wang, O., Russell, B. C., Shechtman, E., Kim, V. G., Fisher, M., & Lucey, S. (2019) . Hotometric mesh optimization for video-aligned 3D object reconstruction. In CVPR Lin, C., Wang, O., Russell, B. C., Shechtman, E., Kim, V. G., Fisher, M., & Lucey, S. (2019) . Hotometric mesh optimization for video-aligned 3D object reconstruction. In CVPR
Zurück zum Zitat Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH
Zurück zum Zitat Mescheder, L. M., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A.(2019). Occupancy networks: Learning 3D reconstruction in function space. In CVPR Mescheder, L. M., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A.(2019). Occupancy networks: Learning 3D reconstruction in function space. In CVPR
Zurück zum Zitat Mo, K., Guerrero, P., Yi, L., Su, H., Wonka, P., Mitra, N. J., et al. (2019). StructureNet: Hierarchical graph networks for 3D shape generation. ACM Transactions on Graphics, 242:19(6), 38–242:1. Mo, K., Guerrero, P., Yi, L., Su, H., Wonka, P., Mitra, N. J., et al. (2019). StructureNet: Hierarchical graph networks for 3D shape generation. ACM Transactions on Graphics, 242:19(6), 38–242:1.
Zurück zum Zitat Mo, K., Zhu, S., Chang, A. X., Yi, L., Tripathi, S., Guibas, L. J., & Su, H. (2019b). PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR Mo, K., Zhu, S., Chang, A. X., Yi, L., Tripathi, S., Guibas, L. J., & Su, H. (2019b). PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR
Zurück zum Zitat Özyeil, O., Voroninski, V., Basri, R., & Singer, A. (2017). A survey of structure from motion. Acta Numerica, 26, 305–364.MathSciNetCrossRef Özyeil, O., Voroninski, V., Basri, R., & Singer, A. (2017). A survey of structure from motion. Acta Numerica, 26, 305–364.MathSciNetCrossRef
Zurück zum Zitat Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In ICML Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In ICML
Zurück zum Zitat Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L. V., & Geiger, A. (2018). Raynet: Learning volumetric 3D reconstruction with ray potentials. In CVPR Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L. V., & Geiger, A. (2018). Raynet: Learning volumetric 3D reconstruction with ray potentials. In CVPR
Zurück zum Zitat Paschalidou, D., Gool, L. V., & Geiger, A. (2020). Learning unsupervised hierarchical part decomposition of 3D objects from a single RGB image. In CVPR Paschalidou, D., Gool, L. V., & Geiger, A. (2020). Learning unsupervised hierarchical part decomposition of 3D objects from a single RGB image. In CVPR
Zurück zum Zitat Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In NeurIPS Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In NeurIPS
Zurück zum Zitat Richter, S. R., & Roth, S. (2015). Discriminative shape from shading in uncalibrated illumination. In CVPR Richter, S. R., & Roth, S. (2015). Discriminative shape from shading in uncalibrated illumination. In CVPR
Zurück zum Zitat Richter, S. R., & Roth, S. (2018). Matryoshka networks: Predicting 3D geometry via nested shape layers. In CVPR Richter, S. R., & Roth, S. (2018). Matryoshka networks: Predicting 3D geometry via nested shape layers. In CVPR
Zurück zum Zitat Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI
Zurück zum Zitat Shin, D., Fowlkes, C. C., & Hoiem, D. (2018). Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction. In CVPR Shin, D., Fowlkes, C. C., & Hoiem, D. (2018). Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction. In CVPR
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR
Zurück zum Zitat Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. A. (2017). Semantic scene completion from a single depth image. In CVPR Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. A. (2017). Semantic scene completion from a single depth image. In CVPR
Zurück zum Zitat Su, H., Qi, C. R., Li, Y., & Guibas, L. J.(2015). Render for CNN: Viewpoint stimation in images using cnns trained with rendered 3D model views. In ICCV Su, H., Qi, C. R., Li, Y., & Guibas, L. J.(2015). Render for CNN: Viewpoint stimation in images using cnns trained with rendered 3D model views. In ICCV
Zurück zum Zitat Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J. B., & Freeman, W. T. (2018). Pix3D: Dataset and methods for single-image 3D shape modeling. In CVPR Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J. B., & Freeman, W. T. (2018). Pix3D: Dataset and methods for single-image 3D shape modeling. In CVPR
Zurück zum Zitat Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2017). Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In ICCV Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2017). Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In ICCV
Zurück zum Zitat Tatarchenko, M., Richter, S. R., Ranftl, R., Li, Z., Koltun, V., & Brox, T. (2019). What do single-view 3D reconstruction networks learn? In CVPR Tatarchenko, M., Richter, S. R., Ranftl, R., Li, Z., Koltun, V., & Brox, T. (2019). What do single-view 3D reconstruction networks learn? In CVPR
Zurück zum Zitat Vinyals, O., Bengio, S., & Kudlur, M. (2016). Order matters: Sequence to sequence for sets. In ICLR Vinyals, O., Bengio, S., & Kudlur, M. (2016). Order matters: Sequence to sequence for sets. In ICLR
Zurück zum Zitat Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y. (2018). Pixel2Mesh: Generating 3D mesh models from single RGB images. In ECCV Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y. (2018). Pixel2Mesh: Generating 3D mesh models from single RGB images. In ECCV
Zurück zum Zitat Wen, C., Zhang, Y., Li, Z., & Fu, Y. (2019). Pixel2Mesh++: Multi-view 3D mesh generation via deformation. In ICCV Wen, C., Zhang, Y., Li, Z., & Fu, Y. (2019). Pixel2Mesh++: Multi-view 3D mesh generation via deformation. In ICCV
Zurück zum Zitat Witkin, A. P. (1981). Recovering surface shape and orientation from texture. Artificial intelligence, 17(1–3), 17–45.CrossRef Witkin, A. P. (1981). Recovering surface shape and orientation from texture. Artificial intelligence, 17(1–3), 17–45.CrossRef
Zurück zum Zitat Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS
Zurück zum Zitat Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., & Tenenbaum, J. (2017). MarrNet: 3D shape reconstruction via 2.5D sketches. In NIPS Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., & Tenenbaum, J. (2017). MarrNet: 3D shape reconstruction via 2.5D sketches. In NIPS
Zurück zum Zitat Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W. T., & Tenenbaum, J. B. (2018). Learning shape priors for single-view 3D completion and reconstruction. In ECCV Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W. T., & Tenenbaum, J. B. (2018). Learning shape priors for single-view 3D completion and reconstruction. In ECCV
Zurück zum Zitat Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A deep representation for volumetric shapes. In CVPR Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A deep representation for volumetric shapes. In CVPR
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR
Zurück zum Zitat Xiao, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2012). Recognizing scene viewpoint using panoramic place representation. In CVPR Xiao, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2012). Recognizing scene viewpoint using panoramic place representation. In CVPR
Zurück zum Zitat Xie, H., Yao, H., Sun, X., Zhou, S., & Zhang, S. (2019). Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV Xie, H., Yao, H., Sun, X., Zhou, S., & Zhang, S. (2019). Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV
Zurück zum Zitat Xu, Q., Wang, W., Ceylan, D., Mech, R., & Neumann, U. (2019). DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In NeurIPS Xu, Q., Wang, W., Ceylan, D., Mech, R., & Neumann, U. (2019). DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In NeurIPS
Zurück zum Zitat Yang, B., Rosa, S., Markham, A., Trigoni, N., & Wen, H. (2019). Dense 3D object reconstruction from a single depth view. TPAMI, 41(12), 2820–2834.CrossRef Yang, B., Rosa, S., Markham, A., Trigoni, N., & Wen, H. (2019). Dense 3D object reconstruction from a single depth view. TPAMI, 41(12), 2820–2834.CrossRef
Zurück zum Zitat Yang, B., Wang, S., Markham, A., & Trigoni, N. (2020). Attentional aggregation of deep feature sets for multi-view 3D reconstruction. IJCV, 128(1), 53–73.MathSciNetCrossRef Yang, B., Wang, S., Markham, A., & Trigoni, N. (2020). Attentional aggregation of deep feature sets for multi-view 3D reconstruction. IJCV, 128(1), 53–73.MathSciNetCrossRef
Zurück zum Zitat Zhang, Y., Liu, Z., Liu, T., Peng, B., & Li, X. (2019). RealPoint3D: An efficient generation network for 3D object reconstruction from a single image. IEEE Access, 7, 57,539–57,549.CrossRef Zhang, Y., Liu, Z., Liu, T., Peng, B., & Li, X. (2019). RealPoint3D: An efficient generation network for 3D object reconstruction from a single image. IEEE Access, 7, 57,539–57,549.CrossRef
Zurück zum Zitat Zhu, C., Xu, K., Chaudhuri, S., Yi, R., & Zhang, H. (2018). SCORES: Shape composition with recursive substructure priors. ACM Transactions on Graphics, 37(6), 211:1–211:14. Zhu, C., Xu, K., Chaudhuri, S., Yi, R., & Zhang, H. (2018). SCORES: Shape composition with recursive substructure priors. ACM Transactions on Graphics, 37(6), 211:1–211:14.
Metadaten
Titel
Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images
verfasst von
Haozhe Xie
Hongxun Yao
Shengping Zhang
Shangchen Zhou
Wenxiu Sun
Publikationsdatum
15.07.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 12/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01347-6

Weitere Artikel der Ausgabe 12/2020

International Journal of Computer Vision 12/2020 Zur Ausgabe

Premium Partner