Skip to main content

2018 | OriginalPaper | Buchkapitel

MVSNet: Depth Inference for Unstructured Multi-view Stereo

verfasst von : Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Validation set: scans \(\{\)3, 5, 17, 21, 28, 35, 37, 38, 40, 43, 56, 59, 66, 67, 82, 86, 106, 117\(\}\). Evaluation set: scans \(\{\)1, 4, 9, 10, 11, 12, 13, 15, 23, 24, 29, 32, 33, 34, 48, 49, 62, 75, 77, 110, 114, 118\(\}\). Training set: the other 79 scans.
 
Literatur
1.
Zurück zum Zitat Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. (IJCV) 120, 153–168 (2016)MathSciNetCrossRef Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. (IJCV) 120, 153–168 (2016)MathSciNetCrossRef
4.
Zurück zum Zitat Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 834–848 (2017)CrossRef Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 834–848 (2017)CrossRef
5.
Zurück zum Zitat Collins, R.T.: A space-sweep approach to true multi-image matching. In: Computer Vision and Pattern Recognition (CVPR) (1996) Collins, R.T.: A space-sweep approach to true multi-image matching. In: Computer Vision and Pattern Recognition (CVPR) (1996)
6.
Zurück zum Zitat Fuhrmann, S., Langguth, F., Goesele, M.: MVE-a multi-view reconstruction environment. In: Eurographics Workshop on Graphics and Cultural Heritage (GCH) (2014) Fuhrmann, S., Langguth, F., Goesele, M.: MVE-a multi-view reconstruction environment. In: Eurographics Workshop on Graphics and Cultural Heritage (GCH) (2014)
7.
Zurück zum Zitat Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 32, 1362–1376 (2010)CrossRef Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 32, 1362–1376 (2010)CrossRef
8.
Zurück zum Zitat Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: International Conference on Computer Vision (ICCV) (2015) Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: International Conference on Computer Vision (ICCV) (2015)
9.
Zurück zum Zitat Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Computer Vision and Pattern Recognition (CVPR) (2012) Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Computer Vision and Pattern Recognition (CVPR) (2012)
10.
Zurück zum Zitat Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Computer Vision and Pattern Recognition (CVPR) (2015) Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Computer Vision and Pattern Recognition (CVPR) (2015)
11.
Zurück zum Zitat Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. In: International Conference on Computer Vision (ICCV) (2017) Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. In: International Conference on Computer Vision (ICCV) (2017)
12.
Zurück zum Zitat Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 30, 328–341 (2008)CrossRef Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 30, 328–341 (2008)CrossRef
13.
Zurück zum Zitat Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: Computer Vision and Pattern Recognition (CVPR) (2007) Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: Computer Vision and Pattern Recognition (CVPR) (2007)
14.
Zurück zum Zitat Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. In: International Conference on Computer Vision (ICCV) (2017) Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. In: International Conference on Computer Vision (ICCV) (2017)
15.
Zurück zum Zitat Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: Advances in Neural Information Processing Systems (NIPS) (2017) Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: Advances in Neural Information Processing Systems (NIPS) (2017)
16.
Zurück zum Zitat Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (TOG) 32, 29 (2013)CrossRef Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (TOG) 32, 29 (2013)CrossRef
17.
Zurück zum Zitat Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P.: End-to-end learning of geometry and context for deep stereo regression. In: Computer Vision and Pattern Recognition (CVPR) (2017) Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P.: End-to-end learning of geometry and context for deep stereo regression. In: Computer Vision and Pattern Recognition (CVPR) (2017)
18.
Zurück zum Zitat Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. (TOG) 36, 78 (2017)CrossRef Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. (TOG) 36, 78 (2017)CrossRef
19.
Zurück zum Zitat Knöbelreiter, P., Reinbacher, C., Shekhovtsov, A., Pock, T.: End-to-end training of hybrid CNN-CRF models for stereo. In: Computer Vision and Pattern Recognition (CVPR) (2017) Knöbelreiter, P., Reinbacher, C., Shekhovtsov, A., Pock, T.: End-to-end training of hybrid CNN-CRF models for stereo. In: Computer Vision and Pattern Recognition (CVPR) (2017)
20.
Zurück zum Zitat Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. Int. J. Comput. Vis. (IJCV) 38, 199–218 (2000)CrossRef Kutulakos, K.N., Seitz, S.M.: A theory of shape by space carving. Int. J. Comput. Vis. (IJCV) 38, 199–218 (2000)CrossRef
22.
Zurück zum Zitat Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 27, 418–433 (2005)CrossRef Lhuillier, M., Quan, L.: A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 27, 418–433 (2005)CrossRef
23.
Zurück zum Zitat Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Computer Vision and Pattern Recognition (CVPR) (2016) Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Computer Vision and Pattern Recognition (CVPR) (2016)
24.
Zurück zum Zitat Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Computer Vision and Pattern Recognition (CVPR) (2016) Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Computer Vision and Pattern Recognition (CVPR) (2016)
25.
Zurück zum Zitat Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Computer Vision and Pattern Recognition (CVPR) (2015) Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Computer Vision and Pattern Recognition (CVPR) (2015)
26.
Zurück zum Zitat Merrell, P., et al.: Real-time visibility-based fusion of depth maps. In: International Conference on Computer Vision (ICCV) (2007) Merrell, P., et al.: Real-time visibility-based fusion of depth maps. In: International Conference on Computer Vision (ICCV) (2007)
28.
Zurück zum Zitat Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2011) Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2011)
33.
Zurück zum Zitat Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. Int. J. Comput. Vis. (IJCV) 35, 151–173 (1999)CrossRef Seitz, S.M., Dyer, C.R.: Photorealistic scene reconstruction by voxel coloring. Int. J. Comput. Vis. (IJCV) 35, 151–173 (1999)CrossRef
34.
Zurück zum Zitat Seki, A., Pollefeys, M.: SGM-Nets: semi-global matching with neural networks. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2017) Seki, A., Pollefeys, M.: SGM-Nets: semi-global matching with neural networks. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)
35.
Zurück zum Zitat Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. In: Machine Vision and Applications (MVA) (2012) Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. In: Machine Vision and Applications (MVA) (2012)
36.
Zurück zum Zitat Vu, H.H., Labatut, P., Pons, J.P., Keriven, R.: High accuracy and visibility-consistent dense multiview stereo. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34, 889–901 (2012)CrossRef Vu, H.H., Labatut, P., Pons, J.P., Keriven, R.: High accuracy and visibility-consistent dense multiview stereo. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34, 889–901 (2012)CrossRef
37.
Zurück zum Zitat Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: Computer Vision and Pattern Recognition (CVPR) (2017) Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: Computer Vision and Pattern Recognition (CVPR) (2017)
38.
Zurück zum Zitat Yao, Y., Li, S., Zhu, S., Deng, H., Fang, T., Quan, L.: Relative camera refinement for accurate dense reconstruction. In: 3D Vision (3DV) (2017) Yao, Y., Li, S., Zhu, S., Deng, H., Fang, T., Quan, L.: Relative camera refinement for accurate dense reconstruction. In: 3D Vision (3DV) (2017)
39.
Zurück zum Zitat Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. (JMLR) 17, 2 (2016)MATH Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. (JMLR) 17, 2 (2016)MATH
40.
Zurück zum Zitat Zhang, R., Li, S., Fang, T., Zhu, S., Quan, L.: Joint camera clustering and surface segmentation for large-scale multi-view stereo. In: International Conference on Computer Vision (ICCV) (2015) Zhang, R., Li, S., Fang, T., Zhu, S., Quan, L.: Joint camera clustering and surface segmentation for large-scale multi-view stereo. In: International Conference on Computer Vision (ICCV) (2015)
Metadaten
Titel
MVSNet: Depth Inference for Unstructured Multi-view Stereo
verfasst von
Yao Yao
Zixin Luo
Shiwei Li
Tian Fang
Long Quan
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01237-3_47

Premium Partner