Skip to main content

2016 | OriginalPaper | Buchkapitel

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

verfasst von : Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, Silvio Savarese

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data [13]. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework (i) outperforms the state-of-the-art methods for single view reconstruction, and (ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
3.
Zurück zum Zitat Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015) Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
4.
Zurück zum Zitat Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building rome in a day. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE (2009) Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building rome in a day. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE (2009)
5.
Zurück zum Zitat Anwar, Z., Ferrie, F.: Towards robust voxel-coloring: handling camera calibration errors and partial emptiness of surface voxels. In: Proceedings of the 18th International Conference on Pattern Recognition, ICPR 2006, vol. 1. IEEE Computer Society, Washington, DC, USA (2006). doi:10.1109/ICPR.2006.1129 Anwar, Z., Ferrie, F.: Towards robust voxel-coloring: handling camera calibration errors and partial emptiness of surface voxels. In: Proceedings of the 18th International Conference on Pattern Recognition, ICPR 2006, vol. 1. IEEE Computer Society, Washington, DC, USA (2006). doi:10.​1109/​ICPR.​2006.​1129
6.
Zurück zum Zitat Bao, Y., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction using semantic priors. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2013) Bao, Y., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction using semantic priors. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2013)
7.
Zurück zum Zitat Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef
8.
Zurück zum Zitat Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010 Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010
9.
Zurück zum Zitat Bhat, D.N., Nayar, S.K.: Ordinal measures for image correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 415–423 (1998)CrossRef Bhat, D.N., Nayar, S.K.: Ordinal measures for image correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 415–423 (1998)CrossRef
10.
Zurück zum Zitat Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1063–1074 (2003)CrossRef Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1063–1074 (2003)CrossRef
11.
Zurück zum Zitat Bongsoo Choy, C., Stark, M., Corbett-Davies, S., Savarese, S.: Enriching object detection with 2D–3D registration and continuous viewpoint estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 Bongsoo Choy, C., Stark, M., Corbett-Davies, S., Savarese, S.: Enriching object detection with 2D–3D registration and continuous viewpoint estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
12.
Zurück zum Zitat Broadhurst, A., Drummond, T.W., Cipolla, R.: A probabilistic framework for space carving. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001, Proceedings, vol. 1. IEEE (2001) Broadhurst, A., Drummond, T.W., Cipolla, R.: A probabilistic framework for space carving. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001, Proceedings, vol. 1. IEEE (2001)
13.
Zurück zum Zitat Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository. Technical report, Stanford University, Princeton University, Toyota Technological Institute at Chicago (2015) Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository. Technical report, Stanford University, Princeton University, Toyota Technological Institute at Chicago (2015)
14.
Zurück zum Zitat Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. ArXiv e-prints arXiv:1406.1078 (2014) Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. ArXiv e-prints arXiv:​1406.​1078 (2014)
15.
16.
Zurück zum Zitat Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.: Dense reconstruction using 3D object shape priors. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013) Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.: Dense reconstruction using 3D object shape priors. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
17.
Zurück zum Zitat Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 27, 11:1–11:15 (2014) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 27, 11:1–11:15 (2014)
18.
Zurück zum Zitat Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 834–849. Springer, Heidelberg (2014) Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 834–849. Springer, Heidelberg (2014)
19.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2012 (2011) Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2012 (2011)
20.
Zurück zum Zitat Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: CVPR (2016) Firman, M., Mac Aodha, O., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: CVPR (2016)
21.
Zurück zum Zitat Fitzgibbon, A., Zisserman, A.: Automatic 3D model acquisition and generation of new images from video sequences. In: 9th European Signal Processing Conference (EUSIPCO 1998). IEEE (1998) Fitzgibbon, A., Zisserman, A.: Automatic 3D model acquisition and generation of new images from video sequences. In: 9th European Signal Processing Conference (EUSIPCO 1998). IEEE (1998)
22.
Zurück zum Zitat Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)CrossRef Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)CrossRef
23.
Zurück zum Zitat Slabaugh, G.G., Culbertson, W.B., Malzbender, T., Stevens, M.R., Schafer, R.W.: Methods for volumetric reconstruction of visual scenes. Int. J. Comput. Vis. 57(3), 179–199 (2004)CrossRef Slabaugh, G.G., Culbertson, W.B., Malzbender, T., Stevens, M.R., Schafer, R.W.: Methods for volumetric reconstruction of visual scenes. Int. J. Comput. Vis. 57(3), 179–199 (2004)CrossRef
24.
Zurück zum Zitat Häming, K., Peters, G.: The structure-from-motion reconstruction pipeline-a survey with focus on short image sequences. Kybernetika 46(5), 926–937 (2010)MATHMathSciNet Häming, K., Peters, G.: The structure-from-motion reconstruction pipeline-a survey with focus on short image sequences. Kybernetika 46(5), 926–937 (2010)MATHMathSciNet
25.
Zurück zum Zitat Häne, C., Savinov, N., Pollefeys, M.: Class specific 3D object shape priors using surface normals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014) Häne, C., Savinov, N., Pollefeys, M.: Class specific 3D object shape priors using surface normals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
26.
27.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015) He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
28.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
29.
Zurück zum Zitat Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. (TOG) 24(3), 577–584 (2005)CrossRef Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. (TOG) 24(3), 577–584 (2005)CrossRef
30.
Zurück zum Zitat Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2015) Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2015)
31.
Zurück zum Zitat Kemelmacher-Shlizerman, I., Basri, R.: 3D face reconstruction from a single image using a single reference face shape. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 394–405 (2011)CrossRef Kemelmacher-Shlizerman, I., Basri, R.: 3D face reconstruction from a single image using a single reference face shape. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 394–405 (2011)CrossRef
33.
Zurück zum Zitat Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. Int. J. Comput. Vis. 38(3), 199–218 (2000)CrossRefMATH Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. Int. J. Comput. Vis. 38(3), 199–218 (2000)CrossRefMATH
34.
Zurück zum Zitat Lawrence, G.R.: Machine perception of three-dimensional solids. Ph.D. thesis (1963) Lawrence, G.R.: Machine perception of three-dimensional solids. Ph.D. thesis (1963)
35.
Zurück zum Zitat Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 418–433 (2005)CrossRef Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 418–433 (2005)CrossRef
36.
Zurück zum Zitat Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2015) Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition (2015)
37.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), 145–166 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), 145–166 (2004)CrossRef
38.
Zurück zum Zitat Matthews, I., Xiao, J., Baker, S.: 2D vs. 3D deformable face models: representational power, construction, and real-time fitting. Int. J. Comput. Vis. 75(1), 93–113 (2007)CrossRef Matthews, I., Xiao, J., Baker, S.: 2D vs. 3D deformable face models: representational power, construction, and real-time fitting. Int. J. Comput. Vis. 75(1), 93–113 (2007)CrossRef
39.
Zurück zum Zitat Nevatia, R., Binford, T.O.: Description and recognition of curved objects. Artif. Intell. 8(1), 77–98 (1977)CrossRefMATH Nevatia, R., Binford, T.O.: Description and recognition of curved objects. Artif. Intell. 8(1), 77–98 (1977)CrossRefMATH
40.
Zurück zum Zitat Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013)CrossRef Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013)CrossRef
41.
Zurück zum Zitat Rock, J., Gupta, T., Thorsen, J., Gwak, J., Shin, D., Hoiem, D.: Completing 3D object shape from one depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) Rock, J., Gupta, T., Thorsen, J., Gwak, J., Shin, D., Hoiem, D.: Completing 3D object shape from one depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
42.
Zurück zum Zitat Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: A nonrigid kernel-based framework for 2D–3D pose estimation and 2D image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1098–1115 (2011)CrossRef Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: A nonrigid kernel-based framework for 2D–3D pose estimation and 2D image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1098–1115 (2011)CrossRef
43.
Zurück zum Zitat Saponaro, P., Sorensen, S., Rhein, S., Mahoney, A.R., Kambhamettu, C.: Reconstruction of textureless regions using structure from motion and image-based interpolation. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE (2014) Saponaro, P., Sorensen, S., Rhein, S., Mahoney, A.R., Kambhamettu, C.: Reconstruction of textureless regions using structure from motion and image-based interpolation. In: 2014 IEEE International Conference on Image Processing (ICIP). IEEE (2014)
44.
Zurück zum Zitat Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRef Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRef
45.
Zurück zum Zitat Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. Int. J. Comput. Vis. 35(2), 151–173 (1999)CrossRef Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. Int. J. Comput. Vis. 35(2), 151–173 (1999)CrossRef
46.
Zurück zum Zitat Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. ArXiv e-prints arXiv:1511.06452 (2015) Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. ArXiv e-prints arXiv:​1511.​06452 (2015)
48.
Zurück zum Zitat Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH (2012) Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH (2012)
49.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)
50.
Zurück zum Zitat Vicente, S., Carreira, J., Agapito, L., Batista, J.: Reconstructing PASCAL VOC. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014) Vicente, S., Carreira, J., Agapito, L., Batista, J.: Reconstructing PASCAL VOC. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
51.
Zurück zum Zitat Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: A benchmark for 3D object detection in the wild. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE (2014) Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: A benchmark for 3D object detection in the wild. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE (2014)
52.
Zurück zum Zitat Zia, M.Z., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object modeling and recognition. In: TPAMI (2013) Zia, M.Z., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object modeling and recognition. In: TPAMI (2013)
Metadaten
Titel
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
verfasst von
Christopher B. Choy
Danfei Xu
JunYoung Gwak
Kevin Chen
Silvio Savarese
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46484-8_38

Premium Partner