Skip to main content

2018 | OriginalPaper | Buchkapitel

Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Refinement

verfasst von : Minhyeok Heo, Jaehan Lee, Kyung-Rae Kim, Han-Ul Kim, Chang-Su Kim

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a monocular depth estimation algorithm based on whole strip masking (WSM) and reliability-based refinement. First, we develop a convolutional neural network (CNN) tailored for the depth estimation. Specifically, we design a novel filter, called WSM, to exploit the tendency that a scene has similar depths in horizonal or vertical directions. The proposed CNN combines WSM upsampling blocks with a ResNet encoder. Second, we measure the reliability of an estimated depth, by appending additional layers to the main CNN. Using the reliability information, we perform conditional random field (CRF) optimization to refine the estimated depth map. Experimental results demonstrate that the proposed algorithm provides the state-of-the-art depth estimation performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yang, S., Maturana, D., Scherer, S.: Real-time 3D scene layout from a single image using convolutional neural networks. In: ICRA, pp. 2183–2189 (2016) Yang, S., Maturana, D., Scherer, S.: Real-time 3D scene layout from a single image using convolutional neural networks. In: ICRA, pp. 2183–2189 (2016)
2.
Zurück zum Zitat Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., Guo, B.: An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. 31(6), 136 (2012)CrossRef Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., Guo, B.: An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. 31(6), 136 (2012)CrossRef
3.
Zurück zum Zitat Porzi, L., Buló, S.R., Penate-Sanchez, A., Ricci, E., Moreno-Noguer, F.: Learning depth-aware deep representations for robotic perception. IEEE Robot. Autom. Lett. 2(2), 468–475 (2017)CrossRef Porzi, L., Buló, S.R., Penate-Sanchez, A., Ricci, E., Moreno-Noguer, F.: Learning depth-aware deep representations for robotic perception. IEEE Robot. Autom. Lett. 2(2), 468–475 (2017)CrossRef
4.
Zurück zum Zitat Kim, K.R., Koh, Y.J., Kim, C.S.: Multiscale feature extractors for stereo matching cost computation. IEEE Access 6, 27971–27983 (2018)CrossRef Kim, K.R., Koh, Y.J., Kim, C.S.: Multiscale feature extractors for stereo matching cost computation. IEEE Access 6, 27971–27983 (2018)CrossRef
5.
Zurück zum Zitat Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings ECCV, pp. 482–496 (2010) Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings ECCV, pp. 482–496 (2010)
6.
Zurück zum Zitat Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings NIPS, pp. 1288–1296 (2010) Lee, D.C., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings NIPS, pp. 1288–1296 (2010)
7.
Zurück zum Zitat Russell, B.C., Torralba, A.: Building a database of 3D scenes from user annotations. In: Proceedings IEEE CVPR, pp. 2711–2718 (2009) Russell, B.C., Torralba, A.: Building a database of 3D scenes from user annotations. In: Proceedings IEEE CVPR, pp. 2711–2718 (2009)
8.
Zurück zum Zitat Ladicky, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: Proceedings IEEE CVPR, pp. 89–96 (2014) Ladicky, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: Proceedings IEEE CVPR, pp. 89–96 (2014)
9.
Zurück zum Zitat Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings NIPS, pp. 1161–1168 (2005) Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings NIPS, pp. 1161–1168 (2005)
10.
Zurück zum Zitat Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene sctructure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRef Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene sctructure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRef
11.
Zurück zum Zitat Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: Proceedings IEEE CVPR, pp. 1253–1260 (2010) Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: Proceedings IEEE CVPR, pp. 1253–1260 (2010)
12.
Zurück zum Zitat Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)CrossRef Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)CrossRef
13.
Zurück zum Zitat Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings NIPS, pp. 2366–2374 (2014) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings NIPS, pp. 2366–2374 (2014)
14.
Zurück zum Zitat Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings IEEE ICCV, pp. 2650–2658 (2015) Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings IEEE ICCV, pp. 2650–2658 (2015)
15.
Zurück zum Zitat Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings IEEE CVPR, pp. 5162–5170 (2015) Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings IEEE CVPR, pp. 5162–5170 (2015)
16.
Zurück zum Zitat Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: Proceedings IEEE CVPR, pp. 2800–2809 (2015) Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: Proceedings IEEE CVPR, pp. 2800–2809 (2015)
17.
Zurück zum Zitat Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings IEEE CVPR, pp. 1119–1127 (2015) Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings IEEE CVPR, pp. 1119–1127 (2015)
18.
Zurück zum Zitat Chakrabarti, A., Shao, J., Shakhnarovich, G.: Depth from a single image by harmonizing overcomplete local network predictions. In: Proceedings NIPS, pp. 2658–2666 (2016) Chakrabarti, A., Shao, J., Shakhnarovich, G.: Depth from a single image by harmonizing overcomplete local network predictions. In: Proceedings NIPS, pp. 2658–2666 (2016)
19.
Zurück zum Zitat Laina, I., Rupprecht, C., Belagiannis, V.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings IEEE 3DV, pp. 239–248 (2016) Laina, I., Rupprecht, C., Belagiannis, V.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings IEEE 3DV, pp. 239–248 (2016)
20.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings IEEE CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings IEEE CVPR, pp. 770–778 (2016)
21.
Zurück zum Zitat Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: Proceedings IEEE CVPR, pp. 716–723 (2014) Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: Proceedings IEEE CVPR, pp. 716–723 (2014)
22.
Zurück zum Zitat Kim, H.U., Kim, C.S.: CDT: Cooperative detection and tracking for tracing multiple objects in video sequences. In: Proceedings ECCV, pp. 851–867 (2016) Kim, H.U., Kim, C.S.: CDT: Cooperative detection and tracking for tracing multiple objects in video sequences. In: Proceedings ECCV, pp. 851–867 (2016)
23.
Zurück zum Zitat Jang, W.D., Kim, C.S.: Online video object segmentation via convolutional trident network. In: Proceedings IEEE CVPR, pp. 5849–5856 (2017) Jang, W.D., Kim, C.S.: Online video object segmentation via convolutional trident network. In: Proceedings IEEE CVPR, pp. 5849–5856 (2017)
24.
Zurück zum Zitat Lee, J.T., Kim, H.U., Lee, C., Kim, C.S.: Semantic line detection and its applications. In: Proceedings IEEE ICCV, pp. 3229–3237 (2017) Lee, J.T., Kim, H.U., Lee, C., Kim, C.S.: Semantic line detection and its applications. In: Proceedings IEEE ICCV, pp. 3229–3237 (2017)
25.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings NIPS, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings NIPS, pp. 1097–1105 (2012)
26.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2012) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2012)
27.
Zurück zum Zitat Lee, J.H., Heo, M., Kim, K.R., Kim, C.S.: Single-image depth estimation based on Fourier domain analysis. In: Proceedings IEEE CVPR, pp. 330–339 (2018) Lee, J.H., Heo, M., Kim, K.R., Kim, C.S.: Single-image depth estimation based on Fourier domain analysis. In: Proceedings IEEE CVPR, pp. 330–339 (2018)
28.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings IEEE CVPR, pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings IEEE CVPR, pp. 248–255 (2009)
29.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings IEEE CVPR, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings IEEE CVPR, pp. 3431–3440 (2015)
30.
Zurück zum Zitat Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Proceedings NIPS, pp. 4898–4906 (2016) Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Proceedings NIPS, pp. 4898–4906 (2016)
32.
Zurück zum Zitat Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. AAA I, 4278–4284 (2016) Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. AAA I, 4278–4284 (2016)
33.
Zurück zum Zitat Matthies, L., Kanade, T., Szeliski, R.: Kalman filter-based algorithms for estimating depth from image sequences. Int. J. Comput. Vis. 3(3), 209–238 (1989)CrossRef Matthies, L., Kanade, T., Szeliski, R.: Kalman filter-based algorithms for estimating depth from image sequences. Int. J. Comput. Vis. 3(3), 209–238 (1989)CrossRef
34.
Zurück zum Zitat Levin, A., Lischinski, D., Weiss, Y.: A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 228–242 (2008)CrossRef Levin, A., Lischinski, D., Weiss, Y.: A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 228–242 (2008)CrossRef
35.
Zurück zum Zitat Yang, J., Ye, X., Li, K., Hou, C., Wang, Y.: Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 23(8), 3443–3458 (2016)MathSciNetCrossRef Yang, J., Ye, X., Li, K., Hou, C., Wang, Y.: Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 23(8), 3443–3458 (2016)MathSciNetCrossRef
36.
Zurück zum Zitat Diebel, J., Thrun, S.: An application of markov random fields to range sensing. In: Advances in Neural Information Processing Systems, pp. 291–298 (2006) Diebel, J., Thrun, S.: An application of markov random fields to range sensing. In: Advances in Neural Information Processing Systems, pp. 291–298 (2006)
37.
Zurück zum Zitat Park, J., Kim, H., Tai, Y.W., Brown, M.S., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1623–1630. IEEE (2011) Park, J., Kim, H., Tai, Y.W., Brown, M.S., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1623–1630. IEEE (2011)
38.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.: Caffe: Convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.: Caffe: Convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014)
39.
Zurück zum Zitat Li, J., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single RGB images. In: Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, pp. 22–29 (2017) Li, J., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single RGB images. In: Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, pp. 22–29 (2017)
Metadaten
Titel
Monocular Depth Estimation Using Whole Strip Masking and Reliability-Based Refinement
verfasst von
Minhyeok Heo
Jaehan Lee
Kyung-Rae Kim
Han-Ul Kim
Chang-Su Kim
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01225-0_3