Skip to main content

2018 | OriginalPaper | Buchkapitel

ActiveStereoNet: End-to-End Self-supervised Learning for Active Stereo Systems

verfasst von : Yinda Zhang, Sameh Khamis, Christoph Rhemann, Julien Valentin, Adarsh Kowdle, Vladimir Tankovich, Michael Schoenberg, Shahram Izadi, Thomas Funkhouser, Sean Fanello

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1 / 30th of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
3.
Zurück zum Zitat Besse, F., Rother, C., Fitzgibbon, A., Kautz, J.: PMBP: patchmatch belief propagation for correspondence field estimation. Int. J. Comput. Vis. 110(1), 2–13 (2014)CrossRef Besse, F., Rother, C., Fitzgibbon, A., Kautz, J.: PMBP: patchmatch belief propagation for correspondence field estimation. Int. J. Comput. Vis. 110(1), 2–13 (2014)CrossRef
4.
Zurück zum Zitat Bhandari, A., Feigin, M., Izadi, S., Rhemann, C., Schmidt, M., Raskar, R.: Resolving multipath interference in Kinect: an inverse problem approach. IEEE Sens. 16, 3419–3427 (2014) Bhandari, A., Feigin, M., Izadi, S., Rhemann, C., Schmidt, M., Raskar, R.: Resolving multipath interference in Kinect: an inverse problem approach. IEEE Sens. 16, 3419–3427 (2014)
5.
Zurück zum Zitat Bhandari, A., et al.: Resolving multi-path interference in time-of-flight imaging via modulation frequency diversity and sparse regularization. CoRR (2014) Bhandari, A., et al.: Resolving multi-path interference in time-of-flight imaging via modulation frequency diversity and sparse regularization. CoRR (2014)
6.
Zurück zum Zitat Bleyer, M., Gelautz, M.: Simple but effective tree structures for dynamic programming-based stereo matching. In: VISAPP, no. 2, pp. 415–422 (2008) Bleyer, M., Gelautz, M.: Simple but effective tree structures for dynamic programming-based stereo matching. In: VISAPP, no. 2, pp. 415–422 (2008)
7.
Zurück zum Zitat Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: BMVC, vol. 11, pp. 1–11 (2011) Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: BMVC, vol. 11, pp. 1–11 (2011)
8.
Zurück zum Zitat Dou, M., et al.: Motion2fusion: real-time volumetric performance capture. In: SIGGRAPH Asia (2017) Dou, M., et al.: Motion2fusion: real-time volumetric performance capture. In: SIGGRAPH Asia (2017)
9.
Zurück zum Zitat Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. In: SIGGRAPH (2016)CrossRef Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. In: SIGGRAPH (2016)CrossRef
10.
Zurück zum Zitat Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: real-time action recognition. JMLR 14, 2617–2640 (2013) Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: real-time action recognition. JMLR 14, 2617–2640 (2013)
11.
Zurück zum Zitat Fanello, S.R.: Learning to be a depth camera for close-range human capture and interaction. ACM SIGGRAPH Trans. Graph. 33, 86 (2014)MATH Fanello, S.R.: Learning to be a depth camera for close-range human capture and interaction. ACM SIGGRAPH Trans. Graph. 33, 86 (2014)MATH
12.
Zurück zum Zitat Fanello, S.R., et al.: HyperDepth: learning depth from structured light without matching. In: CVPR (2016) Fanello, S.R., et al.: HyperDepth: learning depth from structured light without matching. In: CVPR (2016)
13.
Zurück zum Zitat Fanello, S.R., et al.: Low compute and fully parallel computer vision with HashMatch (2017) Fanello, S.R., et al.: Low compute and fully parallel computer vision with HashMatch (2017)
14.
Zurück zum Zitat Fanello, S.R., et al.: Ultrastereo: efficient learning-based matching for active stereo systems. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6535–6544. IEEE (2017) Fanello, S.R., et al.: Ultrastereo: efficient learning-based matching for active stereo systems. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6535–6544. IEEE (2017)
16.
Zurück zum Zitat Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vis. 70(1), 41–54 (2006)CrossRef Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vis. 70(1), 41–54 (2006)CrossRef
17.
Zurück zum Zitat Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data. IEEE Trans. Image Process. 17, 1737–1754 (2008)MathSciNetCrossRef Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data. IEEE Trans. Image Process. 17, 1737–1754 (2008)MathSciNetCrossRef
19.
Zurück zum Zitat Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5248–5257 (2017) Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5248–5257 (2017)
20.
Zurück zum Zitat Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017) Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017)
21.
Zurück zum Zitat Hamzah, R.A., Ibrahim, H.: Literature survey on stereo vision disparity map algorithms. J. Sens. 2016, 23 (2016)CrossRef Hamzah, R.A., Ibrahim, H.: Literature survey on stereo vision disparity map algorithms. J. Sens. 2016, 23 (2016)CrossRef
22.
Zurück zum Zitat Hazan, E., Levy, K.Y., Shalev-Shwartz, S.: On graduated optimization for stochastic non-convex problems. In: ICML (2016) Hazan, E., Levy, K.Y., Shalev-Shwartz, S.: On graduated optimization for stochastic non-convex problems. In: ICML (2016)
23.
Zurück zum Zitat Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRef Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRef
24.
Zurück zum Zitat Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)CrossRef Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)CrossRef
25.
Zurück zum Zitat Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017) Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)
26.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015) Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015)
27.
Zurück zum Zitat Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. CoRR (2016) Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. CoRR (2016)
28.
Zurück zum Zitat Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. CoRR, vol. abs/1703.04309 (2017) Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. CoRR, vol. abs/1703.04309 (2017)
29.
Zurück zum Zitat Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., Bhowmik, A.: Intel realsense stereoscopic depth cameras. In: CVPR Workshops (2017) Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., Bhowmik, A.: Intel realsense stereoscopic depth cameras. In: CVPR Workshops (2017)
30.
Zurück zum Zitat Khamis, S., Fanello, S., Rhemann, C., Valentin, J., Kowdle, A., Izadi, S.: StereoNet: guided hierarchical refinement for edge-aware depth prediction. In: ECCV (2018) Khamis, S., Fanello, S., Rhemann, C., Valentin, J., Kowdle, A., Izadi, S.: StereoNet: guided hierarchical refinement for edge-aware depth prediction. In: ECCV (2018)
31.
Zurück zum Zitat Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 3, pp. 15–18. IEEE (2006) Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 3, pp. 15–18. IEEE (2006)
32.
Zurück zum Zitat Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001 Proceedings, vol. 2, pp. 508–515. IEEE (2001) Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001 Proceedings, vol. 2, pp. 508–515. IEEE (2001)
33.
Zurück zum Zitat Konolige, K.: Projected texture stereo. In: ICRA (2010) Konolige, K.: Projected texture stereo. In: ICRA (2010)
34.
Zurück zum Zitat Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time stereo matching on CUDA using an iterative refinement method for adaptive support-weight correspondences. IEEE Trans. Circuits Syst. Video Technol. 23, 94–104 (2013)CrossRef Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time stereo matching on CUDA using an iterative refinement method for adaptive support-weight correspondences. IEEE Trans. Circuits Syst. Video Technol. 23, 94–104 (2013)CrossRef
35.
Zurück zum Zitat Kuznietsov, Y., Stückler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017) Kuznietsov, Y., Stückler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)
36.
Zurück zum Zitat Liang, Z., et al.: Learning deep correspondence through prior and posterior feature constancy. arXiv preprint arXiv:1712.01039 (2017) Liang, Z., et al.: Learning deep correspondence through prior and posterior feature constancy. arXiv preprint arXiv:​1712.​01039 (2017)
37.
Zurück zum Zitat Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016) Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016)
38.
Zurück zum Zitat Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016) Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
39.
Zurück zum Zitat Naik, N., Kadambi, A., Rhemann, C., Izadi, S., Raskar, R., Kang, S.: A light transport model for mitigating multipath interference in TOF sensors. In: CVPR (2015) Naik, N., Kadambi, A., Rhemann, C., Izadi, S., Raskar, R., Kang, S.: A light transport model for mitigating multipath interference in TOF sensors. In: CVPR (2015)
40.
Zurück zum Zitat Neil, T., Tim, C.: Multi-resolution methods and graduated non-convexity. In: Vision Through Optimization (1997) Neil, T., Tim, C.: Multi-resolution methods and graduated non-convexity. In: Vision Through Optimization (1997)
41.
Zurück zum Zitat Nishihara, H.K.: PRISM: a practical mealtime imaging stereo matcher. In: Intelligent Robots: 3rd International Conference on Robot Vision and Sensory Controls, vol. 449, pp. 134–143. International Society for Optics and Photonics (1984) Nishihara, H.K.: PRISM: a practical mealtime imaging stereo matcher. In: Intelligent Robots: 3rd International Conference on Robot Vision and Sensory Controls, vol. 449, pp. 134–143. International Society for Optics and Photonics (1984)
42.
Zurück zum Zitat Pang, J., Sun, W., Ren, J., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: International Conference on Computer Vision-Workshop on Geometry Meets Deep Learning (ICCVW 2017), vol. 3 (2017) Pang, J., Sun, W., Ren, J., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: International Conference on Computer Vision-Workshop on Geometry Meets Deep Learning (ICCVW 2017), vol. 3 (2017)
43.
Zurück zum Zitat Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)CrossRef Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)CrossRef
44.
Zurück zum Zitat Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective confidence learning. CoRR, vol. abs/1701.00165 (2017) Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective confidence learning. CoRR, vol. abs/1701.00165 (2017)
46.
Zurück zum Zitat Tankovich, V., et al.: Sos: Stereo matching in o(1) with slanted support windows. In: IROS (2018) Tankovich, V., et al.: Sos: Stereo matching in o(1) with slanted support windows. In: IROS (2018)
47.
Zurück zum Zitat Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. In: SIGGRAPH (2016) Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. In: SIGGRAPH (2016)
48.
Zurück zum Zitat Taylor, J., et al.: Articulated distance fields for ultra-fast tracking of hands interacting. In: SIGGRAPH Asia (2017) Taylor, J., et al.: Articulated distance fields for ultra-fast tracking of hands interacting. In: SIGGRAPH Asia (2017)
49.
Zurück zum Zitat Tieleman, T., Hinton, G.: Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012) Tieleman, T., Hinton, G.: Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)
50.
Zurück zum Zitat Wang, S., Fanello, S.R., Rhemann, C., Izadi, S., Kohli, P.: The global patch collider. In: CVPR (2016) Wang, S., Fanello, S.R., Rhemann, C., Izadi, S., Kohli, P.: The global patch collider. In: CVPR (2016)
52.
Zurück zum Zitat Yoon, K.J., Kweon, I.S.: Locally adaptive support-weight approach for visual correspondence search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 924–931. IEEE (2005) Yoon, K.J., Kweon, I.S.: Locally adaptive support-weight approach for visual correspondence search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 924–931. IEEE (2005)
53.
Zurück zum Zitat Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. PAMI 28, 650–656 (2006)CrossRef Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. PAMI 28, 650–656 (2006)CrossRef
54.
Zurück zum Zitat Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361. IEEE (2015) Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361. IEEE (2015)
55.
Zurück zum Zitat Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599 (2015) Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599 (2015)
56.
Zurück zum Zitat Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)MATH Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)MATH
57.
Zurück zum Zitat Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2017)CrossRef Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2017)CrossRef
58.
Zurück zum Zitat Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017) Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:​1709.​00930 (2017)
59.
Zurück zum Zitat Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR, vol. 2, p. 7 (2017) Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR, vol. 2, p. 7 (2017)
Metadaten
Titel
ActiveStereoNet: End-to-End Self-supervised Learning for Active Stereo Systems
verfasst von
Yinda Zhang
Sameh Khamis
Christoph Rhemann
Julien Valentin
Adarsh Kowdle
Vladimir Tankovich
Michael Schoenberg
Shahram Izadi
Thomas Funkhouser
Sean Fanello
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01237-3_48