Skip to main content

2016 | OriginalPaper | Buchkapitel

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

verfasst von : Junyuan Xie, Ross Girshick, Ali Farhadi

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As 3D movie viewing becomes mainstream and the Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly. Producing 3D videos, however, remains challenging. In this paper we propose to use deep neural networks to automatically convert 2D videos and images to a stereoscopic 3D format. In contrast to previous automatic 2D-to-3D conversion algorithms, which have separate stages and need ground truth depth map as supervision, our approach is trained end-to-end directly on stereo pairs extracted from existing 3D movies. This novel training scheme makes it possible to exploit orders of magnitude more data and significantly increases performance. Indeed, Deep3D outperforms baselines in both quantitative and human subject evaluations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Forced perspective is an optical illusion technique that makes objects appear larger or smaller than they really are. It breaks down when viewed from another angle, which prevents stereo filming.
 
2
[10] without Oracle and Deep3D + Oracle are left out due to annotator budget Note that a change in average scene depth only pushes a scene further away or pull it closer and usually doesn’t affect the perception of depth variation in the scene.
 
Literatur
1.
Zurück zum Zitat Motion Picture Association of America: Theatrical market statistics (2014) Motion Picture Association of America: Theatrical market statistics (2014)
2.
Zurück zum Zitat Fehn, C.: Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-tv. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp. 93–104 (2004) Fehn, C.: Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-tv. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp. 93–104 (2004)
3.
Zurück zum Zitat Zhuo, S., Sim, T.: On the recovery of depth from a single defocused image. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 889–897. Springer, Heidelberg (2009)CrossRef Zhuo, S., Sim, T.: On the recovery of depth from a single defocused image. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 889–897. Springer, Heidelberg (2009)CrossRef
4.
Zurück zum Zitat Cozman, F., Krotkov, E.: Depth from scattering. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings, pp. 801–806. IEEE (1997) Cozman, F., Krotkov, E.: Depth from scattering. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings, pp. 801–806. IEEE (1997)
5.
Zurück zum Zitat Zhang, L., Vázquez, C., Knorr, S.: 3D-tv content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011)CrossRef Zhang, L., Vázquez, C., Knorr, S.: 3D-tv content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011)CrossRef
6.
Zurück zum Zitat Konrad, J., Wang, M., Ishwar, P., Wu, C., Mukherjee, D.: Learning-based, automatic 2D-to-3D image and video conversion. IEEE Trans. Image Process. 22(9), 3485–3496 (2013)CrossRef Konrad, J., Wang, M., Ishwar, P., Wu, C., Mukherjee, D.: Learning-based, automatic 2D-to-3D image and video conversion. IEEE Trans. Image Process. 22(9), 3485–3496 (2013)CrossRef
7.
Zurück zum Zitat Appia, V., Batur, U.: Fully automatic 2D to 3D conversion with aid of high-level image features. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, p. 90110W (2014) Appia, V., Batur, U.: Fully automatic 2D to 3D conversion with aid of high-level image features. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, p. 90110W (2014)
8.
Zurück zum Zitat Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRef Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)CrossRef
9.
Zurück zum Zitat Baig, M.H., Jagadeesh, V., Piramuthu, R., Bhardwaj, A., Di, W., Sundaresan, N.: Im2depth: scalable exemplar based depth transfer. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 145–152. IEEE (2014) Baig, M.H., Jagadeesh, V., Piramuthu, R., Bhardwaj, A., Di, W., Sundaresan, N.: Im2depth: scalable exemplar based depth transfer. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 145–152. IEEE (2014)
10.
Zurück zum Zitat Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015) Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
11.
Zurück zum Zitat Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015) Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)
12.
Zurück zum Zitat Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012) Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
13.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Rob. Res. (IJRR) 32, 1231–1237 (2013)CrossRef Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Rob. Res. (IJRR) 32, 1231–1237 (2013)CrossRef
14.
Zurück zum Zitat Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015) Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)
15.
Zurück zum Zitat Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702 (2015) Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. arXiv preprint arXiv:​1504.​00702 (2015)
16.
Zurück zum Zitat Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: learning to predict new views from the world’s imagery. arXiv preprint arXiv:1506.06825 (2015) Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: learning to predict new views from the world’s imagery. arXiv preprint arXiv:​1506.​06825 (2015)
17.
Zurück zum Zitat Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. arXiv preprint arXiv:1504.06852 (2015) Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. arXiv preprint arXiv:​1504.​06852 (2015)
18.
Zurück zum Zitat Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015) Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:​1511.​05440 (2015)
19.
Zurück zum Zitat Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159 (2015) Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:​1507.​02159 (2015)
20.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
21.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167 (2015)
22.
Zurück zum Zitat Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015) Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:​1512.​01274 (2015)
23.
Zurück zum Zitat Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRef Hirschmüller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRef
Metadaten
Titel
Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks
verfasst von
Junyuan Xie
Ross Girshick
Ali Farhadi
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46493-0_51