nach oben

International Journal of Computer Vision

Erschienen in:

15.07.2020

Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

verfasst von: Haozhe Xie, Hongxun Yao, Shengping Zhang, Shangchen Zhou, Wenxiu Sun

Erschienen in: International Journal of Computer Vision | Ausgabe 12/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recovering the 3D shape of an object from single or multiple images with deep neural networks has been attracting increasing attention in the past few years. Mainstream works (e.g. 3D-R2N2) use recurrent neural networks (RNNs) to sequentially fuse feature maps of input images. However, RNN-based approaches are unable to produce consistent reconstruction results when given the same input images with different orders. Moreover, RNNs may forget important features from early input images due to long-term memory loss. To address these issues, we propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume. To further correct the wrongly recovered parts in the fused 3D volume, a refiner is adopted to generate the final output. Experimental results on the ShapeNet, Pix3D, and Things3D benchmarks show that Pix2Vox++ performs favorably against state-of-the-art methods in terms of both accuracy and efficiency.

Vorheriger Artikel Incorporating Side Information by Adaptive Convolution

Nächster Artikel Long-Short Temporal–Spatial Clues Excited Network for Robust Person Re-identification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Barron, J. T., & Malik, J. (2015). Shape, illumination, and reflectance from shading. TPAMI, 37(8), 1670–1687.CrossRef

Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332.CrossRef

Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). ShapeNet: An information-rich 3D model repository. arXiv:1512.03012

Chen, Z., & Zhang, H. (2019). Learning implicit fields for generative shape modeling. In CVPR

Choy, C. B., Xu, D., Gwak, J., Chen, K., & Savarese, S. (2016). 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV

Dibra, E., Jain, H., Öztireli, A. C., Ziegler, R., & Gross, M. H. (2017). Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In CVPR

Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In CVPR

Fuentes-Pacheco, J., Ascencio, J. R., & Rendón-Mancha, J. M. (2015). Visual simultaneous localization and mapping: A survey. Artificial Intelligence Review, 43(1), 55–81.CrossRef

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., & Bengio, Y. (2014). Generative adversarial nets. In NIPS

Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). A papier-mâché approach to learning 3D surface generation. In CVPR

Han, X., Laga, H., & Bennamoun, M. (2019). Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era. TPAMI,. https://doi.org/10.1109/TPAMI.2019.2954885.CrossRef

Harltey, A., & Zisserman, A. (2006). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR

Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR

Huang, P., Matzen, K., Kopf, J., Ahuja, N., & Huang, J. (2018). Deepmvs: Learning multi-view stereopsis. In CVPR

Hwang, K., & Sung, W. (2015). Single stream parallelization of generalized LSTM-like RNNs on a GPU. In ICASSP

Kar, A., Häne, C., & Malik, J. (2017). Learning a multi-view stereo machine. In NIPS

Kato, H., & Harada, T. (2019). Learning view priors for single-view 3D reconstruction. In CVPR

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In ICLR

Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR

Lin, C., Kong, C., & Lucey, S. (2018). Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI

Lin, C., Wang, O., Russell, B. C., Shechtman, E., Kim, V. G., Fisher, M., & Lucey, S. (2019) . Hotometric mesh optimization for video-aligned 3D object reconstruction. In CVPR

Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH

Mescheder, L. M., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A.(2019). Occupancy networks: Learning 3D reconstruction in function space. In CVPR

Mo, K., Guerrero, P., Yi, L., Su, H., Wonka, P., Mitra, N. J., et al. (2019). StructureNet: Hierarchical graph networks for 3D shape generation. ACM Transactions on Graphics, 242:19(6), 38–242:1.

Mo, K., Zhu, S., Chang, A. X., Yi, L., Tripathi, S., Guibas, L. J., & Su, H. (2019b). PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR

Özyeil, O., Voroninski, V., Basri, R., & Singer, A. (2017). A survey of structure from motion. Acta Numerica, 26, 305–364.MathSciNetCrossRef

Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In ICML

Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L. V., & Geiger, A. (2018). Raynet: Learning volumetric 3D reconstruction with ray potentials. In CVPR

Paschalidou, D., Gool, L. V., & Geiger, A. (2020). Learning unsupervised hierarchical part decomposition of 3D objects from a single RGB image. In CVPR

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In NeurIPS

Richter, S. R., & Roth, S. (2015). Discriminative shape from shading in uncalibrated illumination. In CVPR

Richter, S. R., & Roth, S. (2018). Matryoshka networks: Predicting 3D geometry via nested shape layers. In CVPR

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI

Shin, D., Fowlkes, C. C., & Hoiem, D. (2018). Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction. In CVPR

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR

Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. A. (2017). Semantic scene completion from a single depth image. In CVPR

Su, H., Qi, C. R., Li, Y., & Guibas, L. J.(2015). Render for CNN: Viewpoint stimation in images using cnns trained with rendered 3D model views. In ICCV

Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J. B., & Freeman, W. T. (2018). Pix3D: Dataset and methods for single-image 3D shape modeling. In CVPR

Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2017). Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In ICCV

Tatarchenko, M., Richter, S. R., Ranftl, R., Li, Z., Koltun, V., & Brox, T. (2019). What do single-view 3D reconstruction networks learn? In CVPR

Vinyals, O., Bengio, S., & Kudlur, M. (2016). Order matters: Sequence to sequence for sets. In ICLR

Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y. (2018). Pixel2Mesh: Generating 3D mesh models from single RGB images. In ECCV

Wen, C., Zhang, Y., Li, Z., & Fu, Y. (2019). Pixel2Mesh++: Multi-view 3D mesh generation via deformation. In ICCV

Witkin, A. P. (1981). Recovering surface shape and orientation from texture. Artificial intelligence, 17(1–3), 17–45.CrossRef

Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS

Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., & Tenenbaum, J. (2017). MarrNet: 3D shape reconstruction via 2.5D sketches. In NIPS

Wu, J., Zhang, C., Zhang, X., Zhang, Z., Freeman, W. T., & Tenenbaum, J. B. (2018). Learning shape priors for single-view 3D completion and reconstruction. In ECCV

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A deep representation for volumetric shapes. In CVPR

Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In CVPR

Xiao, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2012). Recognizing scene viewpoint using panoramic place representation. In CVPR

Xie, H., Yao, H., Sun, X., Zhou, S., & Zhang, S. (2019). Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV

Xu, Q., Wang, W., Ceylan, D., Mech, R., & Neumann, U. (2019). DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In NeurIPS

Yang, B., Rosa, S., Markham, A., Trigoni, N., & Wen, H. (2019). Dense 3D object reconstruction from a single depth view. TPAMI, 41(12), 2820–2834.CrossRef

Yang, B., Wang, S., Markham, A., & Trigoni, N. (2020). Attentional aggregation of deep feature sets for multi-view 3D reconstruction. IJCV, 128(1), 53–73.MathSciNetCrossRef

Zhang, Y., Liu, Z., Liu, T., Peng, B., & Li, X. (2019). RealPoint3D: An efficient generation network for 3D object reconstruction from a single image. IEEE Access, 7, 57,539–57,549.CrossRef

Zhu, C., Xu, K., Chaudhuri, S., Yi, R., & Zhang, H. (2018). SCORES: Shape composition with recursive substructure priors. ACM Transactions on Graphics, 37(6), 211:1–211:14.

Titel: Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images
verfasst von: Haozhe Xie
Hongxun Yao
Shengping Zhang
Shangchen Zhou
Wenxiu Sun
Publikationsdatum: 15.07.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 12/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01347-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 12/2020

Rooted Spanning Superpixels

Learning the spatiotemporal variability in longitudinal shape data sets

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

Incorporating Side Information by Adaptive Convolution

Necessary and Sufficient Polynomial Constraints on Compatible Triplets of Essential Matrices

Learning the Clustering of Longitudinal Shape Data Sets into a Mixture of Independent or Branching Trajectories

Premium Partner