nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

ActiveStereoNet: End-to-End Self-supervised Learning for Active Stereo Systems

verfasst von : Yinda Zhang, Sameh Khamis, Christoph Rhemann, Julien Valentin, Adarsh Kowdle, Vladimir Tankovich, Michael Schoenberg, Shahram Izadi, Thomas Funkhouser, Sean Fanello

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1 / 30th of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel MVSNet: Depth Inference for Unstructured Multi-view Stereo

Nächstes Kapitel GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction

Nur mit Berechtigung zugänglich

Intel realsense d415. https://click.intel.com/intelr-realsensetm-depth-camera-d415.html. Accessed 28 Feb 2018

Intel realsense d435. https://click.intel.com/intelr-realsensetm-depth-camera-d435.html. Accessed 28 Feb 2018

Besse, F., Rother, C., Fitzgibbon, A., Kautz, J.: PMBP: patchmatch belief propagation for correspondence field estimation. Int. J. Comput. Vis. 110(1), 2–13 (2014)CrossRef

Bhandari, A., Feigin, M., Izadi, S., Rhemann, C., Schmidt, M., Raskar, R.: Resolving multipath interference in Kinect: an inverse problem approach. IEEE Sens. 16, 3419–3427 (2014)

Bhandari, A., et al.: Resolving multi-path interference in time-of-flight imaging via modulation frequency diversity and sparse regularization. CoRR (2014)

Bleyer, M., Gelautz, M.: Simple but effective tree structures for dynamic programming-based stereo matching. In: VISAPP, no. 2, pp. 415–422 (2008)

Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: BMVC, vol. 11, pp. 1–11 (2011)

Dou, M., et al.: Motion2fusion: real-time volumetric performance capture. In: SIGGRAPH Asia (2017)

Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. In: SIGGRAPH (2016)CrossRef

10.

Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: real-time action recognition. JMLR 14, 2617–2640 (2013)

11.

Fanello, S.R.: Learning to be a depth camera for close-range human capture and interaction. ACM SIGGRAPH Trans. Graph. 33, 86 (2014)MATH

12.

Fanello, S.R., et al.: HyperDepth: learning depth from structured light without matching. In: CVPR (2016)

13.

Fanello, S.R., et al.: Low compute and fully parallel computer vision with HashMatch (2017)

14.

Fanello, S.R., et al.: Ultrastereo: efficient learning-based matching for active stereo systems. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6535–6544. IEEE (2017)

15.

Fanello, S.R., Gori, I., Metta, G., Odone, F.: One-shot learning for real-time action recognition. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds.) IbPRIA 2013. LNCS, vol. 7887, pp. 31–40. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38628-2_4CrossRef

16.

Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. Comput. Vis. 70(1), 41–54 (2006)CrossRef

17.

Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data. IEEE Trans. Image Process. 17, 1737–1754 (2008)MathSciNetCrossRef

18.

Garg, R., Vijay Kumar, B.G., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45CrossRef

19.

Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5248–5257 (2017)

20.

Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017)

21.

Hamzah, R.A., Ibrahim, H.: Literature survey on stereo vision disparity map algorithms. J. Sens. 2016, 23 (2016)CrossRef

22.

Hazan, E., Levy, K.Y., Shalev-Shwartz, S.: On graduated optimization for stochastic non-convex problems. In: ICML (2016)

23.

Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRef

24.

Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)CrossRef

25.

Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)

26.

Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2015)

27.

Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. CoRR (2016)

28.

Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. CoRR, vol. abs/1703.04309 (2017)

29.

Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., Bhowmik, A.: Intel realsense stereoscopic depth cameras. In: CVPR Workshops (2017)

30.

Khamis, S., Fanello, S., Rhemann, C., Valentin, J., Kowdle, A., Izadi, S.: StereoNet: guided hierarchical refinement for edge-aware depth prediction. In: ECCV (2018)

31.

Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 3, pp. 15–18. IEEE (2006)

32.

Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions using graph cuts. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001 Proceedings, vol. 2, pp. 508–515. IEEE (2001)

33.

Konolige, K.: Projected texture stereo. In: ICRA (2010)

34.

Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time stereo matching on CUDA using an iterative refinement method for adaptive support-weight correspondences. IEEE Trans. Circuits Syst. Video Technol. 23, 94–104 (2013)CrossRef

35.

Kuznietsov, Y., Stückler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6647–6655 (2017)

36.

Liang, Z., et al.: Learning deep correspondence through prior and posterior feature constancy. arXiv preprint arXiv:1712.01039 (2017)

37.

Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016)

38.

Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)

39.

Naik, N., Kadambi, A., Rhemann, C., Izadi, S., Raskar, R., Kang, S.: A light transport model for mitigating multipath interference in TOF sensors. In: CVPR (2015)

40.

Neil, T., Tim, C.: Multi-resolution methods and graduated non-convexity. In: Vision Through Optimization (1997)

41.

Nishihara, H.K.: PRISM: a practical mealtime imaging stereo matcher. In: Intelligent Robots: 3rd International Conference on Robot Vision and Sensory Controls, vol. 449, pp. 134–143. International Society for Optics and Photonics (1984)

42.

Pang, J., Sun, W., Ren, J., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: International Conference on Computer Vision-Workshop on Geometry Meets Deep Learning (ICCVW 2017), vol. 3 (2017)

43.

Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)CrossRef

44.

Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective confidence learning. CoRR, vol. abs/1701.00165 (2017)

45.

Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer, London (2010). https://doi.org/10.1007/978-1-84882-935-0CrossRefMATH

46.

Tankovich, V., et al.: Sos: Stereo matching in o(1) with slanted support windows. In: IROS (2018)

47.

Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. In: SIGGRAPH (2016)

48.

Taylor, J., et al.: Articulated distance fields for ultra-fast tracking of hands interacting. In: SIGGRAPH Asia (2017)

49.

Tieleman, T., Hinton, G.: Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)

50.

Wang, S., Fanello, S.R., Rhemann, C., Izadi, S., Kohli, P.: The global patch collider. In: CVPR (2016)

51.

Xie, J., Girshick, R., Farhadi, A.: Deep3D: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 842–857. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_51CrossRef

52.

Yoon, K.J., Kweon, I.S.: Locally adaptive support-weight approach for visual correspondence search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 924–931. IEEE (2005)

53.

Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. PAMI 28, 650–656 (2006)CrossRef

54.

Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361. IEEE (2015)

55.

Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1592–1599 (2015)

56.

Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)MATH

57.

Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2017)CrossRef

58.

Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930 (2017)

59.

Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR, vol. 2, p. 7 (2017)

Titel: ActiveStereoNet: End-to-End Self-supervised Learning for Active Stereo Systems
verfasst von: Yinda Zhang
Sameh Khamis
Christoph Rhemann
Julien Valentin
Adarsh Kowdle
Vladimir Tankovich
Michael Schoenberg
Shahram Izadi
Thomas Funkhouser
Sean Fanello
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01236-6

Electronic ISBN: 978-3-030-01237-3

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01237-3_48

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"