nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

verfasst von : Ravi Garg, Vijay Kumar B.G., Gustavo Carneiro, Ian Reid

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manually labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth prediction, without requiring a pre-training stage or annotated ground-truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photometric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset gives comparable performance to that of the state-of-the-art supervised methods for single view depth estimation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Revisiting Visual Question Answering Baselines

Nächstes Kapitel A Continuous Optimization Approach for Efficient and Accurate Scene Flow

Nur mit Berechtigung zugänglich

All training images are assumed to be taken with a fixed rectified stereo setup as is the case in KITTI for simplicity but our method is generalizable to work with instances taken by different calibrated stereos.

We have dropped the training instance index i for simplicity.

A \(5 \times 18\) convolution can be used instead to increase network capacity and replicate the effect of a fully connected layer of [19].

Alexnet uses uneven padding for some convolutions leading to change in the aspect ratio and the image size.

Much like the log depth, inverse depth parametrization is less prone to the higher depth errors at very distant points and is used successfully in many stereo [13] and SLAM frameworks [27].

Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: IEEE International Conference on Computer Vision (ICCV) (2015)

Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004)CrossRef

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International Conference on Learning Representations (ICLR) (2015)

Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C.: A deep visual correspondence embedding model for stereo matching costs. In: IEEE International Conference on Computer Vision (ICCV) (2015)

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2014)

Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems (NIPS) (2014)

Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

10.

Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: learning to predict new views from the world’s imagery (2016)

11.

Garg, R., Pizarro, L., Rueckert, D., Agapito, L.: Dense multi-frame optic flow for non-rigid objects using subspace constraints. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part IV. LNCS, vol. 6495, pp. 460–473. Springer, Heidelberg (2011)CrossRef

12.

Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

13.

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1229–1235 (2013)

14.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

15.

Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Scenenet: understanding real world indoor scenes with synthetic data. arXiv preprint (2015). arXiv:1511.07041

16.

Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

17.

Horn, B.K., Schunck, B.G.: Determining optical flow. In: 1981 technical symposium east, pp. 319–331. International Society for Optics and Photonics (1981)

18.

Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Neural Information Processing Systems (NIPS) (2011)

19.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)

20.

Vijay Kumar, B.G., Carneiro, G., Reid, I.: Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

21.

Ladicky, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

22.

Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

23.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014)

24.

Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2016)

25.

Long, G., Kneip, L., Alvarez, J.M., Li, H.: Learning image matching by simply watching video. CoRR abs/1603.06041 (2016). http://arxiv.org/abs/1603.06041

26.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

27.

Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: dense tracking and mapping in real-time. In: IEEE International Conference on Computer Vision (ICCV) (2011)

28.

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: IEEE International on Computer Vision (ICCV) (2015)

29.

Saxena, A., Sun, M., Ng, A.: Make3d: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 31, 824–840 (2009)CrossRef

30.

Steinbrücker, F., Pock, T., Cremers, D.: Large displacement optical flow computation withoutwarping. In: IEEE International Conference on Computer Vision (ICCV) (2009)

31.

Vedaldi, A., Lenc, K.: Matconvnet - convolutional neural networks for matlab (2015)

32.

Xie, J., Girshick, R., Farhadi, A.: Deep. 3d: fully automatic 2d-to-3d video conversion with deep convolutional neural networks. arXiv preprint (2016). arXiv:1604.03650

33.

Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L\(^{1}\) optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) Pattern Recognition, vol. 4713, pp. 214–223. Springer, Heidelberg (2007)CrossRef

34.

Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

35.

Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: IEEE International Conference on Computer Vision (ICCV) (2015)

Titel: Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
verfasst von: Ravi Garg
Vijay Kumar B.G.
Gustavo Carneiro
Ian Reid
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2016
Print ISBN: 978-3-319-46483-1

Electronic ISBN: 978-3-319-46484-8

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-46484-8_45

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"