Skip to main content
Erschienen in: 3D Research 3/2018

01.09.2018 | 3DR Express

Automatic Depth Prediction from 2D Videos Based on Non-parametric Learning and Bi-directional Depth Propagation

verfasst von: Huihui Xu, Mingyan Jiang

Erschienen in: 3D Research | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With progress in 3D video technology, 2D to 3D conversion has drawn great attention in recent years. Predicting perceptually reasonable depth information from traditional monocular videos is a challenging task in 2D to 3D conversion. For the purpose of generating convincing depth maps, an efficient depth estimation algorithm utilizing non-parametric learning and bi-directional depth propagation is proposed. First, global depth maps on key frames are generated on the basis of gradient samples and the gradient reconstruction method. Then, foreground objects are extracted and employed to refine global depth maps for producing more local depth details. Next, the depth information of key frames is propagated utilizing bi-directional motion compensation to recover the forward and backward depth information of non-key frames. In the end, a weighting fusion strategy is designed to integrate forward and backward depths for predicting the depth information of each non-key frame. The quality of estimated depth maps is assessed by leveraging objective and subjective quality evaluation criteria. The experimental results verify that the proposed depth prediction framework outperforms some depth estimation methods and is efficient at producing 3D views.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bo, L., Ren, X., & Fox, D. (2013). Unsupervised feature learning for rgb-d based object recognition. In Int. symp. experimental robotics (pp. 387–402).CrossRef Bo, L., Ren, X., & Fox, D. (2013). Unsupervised feature learning for rgb-d based object recognition. In Int. symp. experimental robotics (pp. 387–402).CrossRef
2.
Zurück zum Zitat Nedevschi, S., Danescu, R., & Frentiu, D. (2004). High accuracy stereo vision system for far distance obstacle detection. In IEEE intelligent vehicles symposium (pp. 292–297). Nedevschi, S., Danescu, R., & Frentiu, D. (2004). High accuracy stereo vision system for far distance obstacle detection. In IEEE intelligent vehicles symposium (pp. 292–297).
3.
Zurück zum Zitat Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.CrossRef Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.CrossRef
4.
Zurück zum Zitat Kumar, M. P., & Koller, D. (2010). Efficiently selecting regions for scene understanding. In IEEE int. conf. computer vision and pattern recognition (pp. 3217–3224). Kumar, M. P., & Koller, D. (2010). Efficiently selecting regions for scene understanding. In IEEE int. conf. computer vision and pattern recognition (pp. 3217–3224).
5.
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, F. T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE int. conf. computer vision and pattern recognition (pp. 3360–3367). Wang, J., Yang, J., Yu, K., Lv, F., Huang, F. T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE int. conf. computer vision and pattern recognition (pp. 3360–3367).
6.
Zurück zum Zitat Bao, S. Y., & Savarese, S. (2011). Semantic structure from motion. In IEEE int. conf. computer vision and pattern recognition (pp. 2025–2032). Bao, S. Y., & Savarese, S. (2011). Semantic structure from motion. In IEEE int. conf. computer vision and pattern recognition (pp. 2025–2032).
7.
Zurück zum Zitat Knorr, S., Smolic, A., & Sikora, T. (2007). From 2d-to stereo-to multi-view video. In 3DTV conf. (pp. 1–4). Knorr, S., Smolic, A., & Sikora, T. (2007). From 2d-to stereo-to multi-view video. In 3DTV conf. (pp. 1–4).
8.
Zurück zum Zitat Lim, H., & Park, H. (2014). An efficient multi-view generation method from a single-view video based on affine geometry information. IEEE Transactions on Multimedia, 16(3), 726–737.CrossRef Lim, H., & Park, H. (2014). An efficient multi-view generation method from a single-view video based on affine geometry information. IEEE Transactions on Multimedia, 16(3), 726–737.CrossRef
9.
Zurück zum Zitat Hoiem, D., Stein, A. N., Efros, A. A., & Hebert, M. (2007). Recovering occlusion boundaries from a single image. In IEEE Int. conf. computer vision (pp. 1–8). Hoiem, D., Stein, A. N., Efros, A. A., & Hebert, M. (2007). Recovering occlusion boundaries from a single image. In IEEE Int. conf. computer vision (pp. 1–8).
10.
Zurück zum Zitat Guo, F., Tang, J., & Peng, H. (2014). Adaptive estimation of depth map for two-dimensional to three-dimensional stereoscopic conversion. Optical Review, 21(1), 60–73.CrossRef Guo, F., Tang, J., & Peng, H. (2014). Adaptive estimation of depth map for two-dimensional to three-dimensional stereoscopic conversion. Optical Review, 21(1), 60–73.CrossRef
11.
Zurück zum Zitat Tang, T., Hou, C. P., & Song, Z. J. (2015). Depth recovery and refinement from a single image using defocus cues. Journal of Modern Optics, 62(6), 441–448.CrossRef Tang, T., Hou, C. P., & Song, Z. J. (2015). Depth recovery and refinement from a single image using defocus cues. Journal of Modern Optics, 62(6), 441–448.CrossRef
12.
Zurück zum Zitat Tsai, Y. M., Chang, L., Chen, L. G. (2006). Block-based vanishing line and vanishing point detection for 3D scene reconstruction. In Int. symp. intelligent signal processing and communications (pp. 586–589). Tsai, Y. M., Chang, L., Chen, L. G. (2006). Block-based vanishing line and vanishing point detection for 3D scene reconstruction. In Int. symp. intelligent signal processing and communications (pp. 586–589).
13.
Zurück zum Zitat Lai, Y.-K., Lai, Y.-F., & Chen, Y.-C. (2013). An effective hybrid depth-generation algorithm for 2d-to-3d conversion in 3d displays. Journal of Display Technology, 9(3), 154–161.CrossRef Lai, Y.-K., Lai, Y.-F., & Chen, Y.-C. (2013). An effective hybrid depth-generation algorithm for 2d-to-3d conversion in 3d displays. Journal of Display Technology, 9(3), 154–161.CrossRef
14.
Zurück zum Zitat Yang, Y., Hu, X., Wu, N., Wang, P., Xu, D., Rong, S., et al. (2017). A depth map generation algorithm based on saliency detection for 2D to 3D conversion. 3D Research, 8(3), 1–29. Yang, Y., Hu, X., Wu, N., Wang, P., Xu, D., Rong, S., et al. (2017). A depth map generation algorithm based on saliency detection for 2D to 3D conversion. 3D Research, 8(3), 1–29.
15.
Zurück zum Zitat Jung, C., Wang, L., Zhu, X., & Jiao, L. (2015). 2D to 3D conversion with motion-type adaptive depth estimation. Multimedia Systems, 21(5), 451–464.CrossRef Jung, C., Wang, L., Zhu, X., & Jiao, L. (2015). 2D to 3D conversion with motion-type adaptive depth estimation. Multimedia Systems, 21(5), 451–464.CrossRef
16.
Zurück zum Zitat Cheng, C. C., Li, C. T., & Chen, L. G. (2010). A novel 2D-to-3D conversion system by using edge information. IEEE Transactions on Consumer Electronics, 56(3), 1739–1745.CrossRef Cheng, C. C., Li, C. T., & Chen, L. G. (2010). A novel 2D-to-3D conversion system by using edge information. IEEE Transactions on Consumer Electronics, 56(3), 1739–1745.CrossRef
17.
Zurück zum Zitat Feng, Y., Ren, J., & Jiang, J. (2011). Object-based 2d-to-3d video conversion for effective stereoscopic content generation in 3d-tv applications. IEEE Transactions on Broadcasting, 57(2), 500–509.CrossRef Feng, Y., Ren, J., & Jiang, J. (2011). Object-based 2d-to-3d video conversion for effective stereoscopic content generation in 3d-tv applications. IEEE Transactions on Broadcasting, 57(2), 500–509.CrossRef
18.
Zurück zum Zitat Yin, S., Dong, H., Jiang, G., Liu, L., & Wei, S. (2015). A novel 2D-to-3D video conversion method using time-coherent depth maps. Sensors, 15(7), 15246–15264.CrossRef Yin, S., Dong, H., Jiang, G., Liu, L., & Wei, S. (2015). A novel 2D-to-3D video conversion method using time-coherent depth maps. Sensors, 15(7), 15246–15264.CrossRef
20.
Zurück zum Zitat Ranftl, R., Vineet, V., Chen, Q., & Koltun, V. (2016) Dense monocular depth estimation in complex dynamic scenes. In IEEE int. conf. computer vision and pattern recognition (pp. 4058–4066). Ranftl, R., Vineet, V., Chen, Q., & Koltun, V. (2016) Dense monocular depth estimation in complex dynamic scenes. In IEEE int. conf. computer vision and pattern recognition (pp. 4058–4066).
21.
Zurück zum Zitat Tsai, T. H., Huang, T. W., & Wang, R. Z. (2018). A novel method for 2D-to-3D video conversion based on boundary information. EURASIP Journal on Image and Video Processing, 2, 1–13. Tsai, T. H., Huang, T. W., & Wang, R. Z. (2018). A novel method for 2D-to-3D video conversion based on boundary information. EURASIP Journal on Image and Video Processing, 2, 1–13.
22.
Zurück zum Zitat Saxena, A., Chung, S. H., & Ng, A. Y. (2008). 3-D depth reconstruction from a single still image. International Journal of Computer Vision, 76(1), 53–69.CrossRef Saxena, A., Chung, S. H., & Ng, A. Y. (2008). 3-D depth reconstruction from a single still image. International Journal of Computer Vision, 76(1), 53–69.CrossRef
23.
Zurück zum Zitat Saxena, A., Sun, M., & Ng, A. Y. (2009). Make 3D: learning 3-D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 2144–2158.CrossRef Saxena, A., Sun, M., & Ng, A. Y. (2009). Make 3D: learning 3-D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 2144–2158.CrossRef
24.
Zurück zum Zitat Liu, F., Shen, C. H., & Lin, G. S. (2015) Deep convolutional neural fields for depth estimation from a single image. In IEEE conf. computer vision and pattern recognition (pp. 2307–2314). Liu, F., Shen, C. H., & Lin, G. S. (2015) Deep convolutional neural fields for depth estimation from a single image. In IEEE conf. computer vision and pattern recognition (pp. 2307–2314).
26.
Zurück zum Zitat Lee, J. H., Heo, M., Kim, K. R., & Kim, C. S. (2018) Single-image depth estimation based on Fourier domain analysis. In IEEE conference on computer vision and pattern recognition (pp. 330–339). Lee, J. H., Heo, M., Kim, K. R., & Kim, C. S. (2018) Single-image depth estimation based on Fourier domain analysis. In IEEE conference on computer vision and pattern recognition (pp. 330–339).
27.
Zurück zum Zitat Konrad, J., Brown, G., Wang, M., Ishwar, P., Wu, C., & Mukherjee, D. (2012) Automatic 2d-to-3d image conversion using 3d examples from the internet. In Proceeding of SPIE stereoscopic displays and applications (pp. 82880F-1–82880F-12). Konrad, J., Brown, G., Wang, M., Ishwar, P., Wu, C., & Mukherjee, D. (2012) Automatic 2d-to-3d image conversion using 3d examples from the internet. In Proceeding of SPIE stereoscopic displays and applications (pp. 82880F-1–82880F-12).
28.
Zurück zum Zitat Karsch, K., Liu, C., & Kang, S. B. (2014). Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2144–2158.CrossRef Karsch, K., Liu, C., & Kang, S. B. (2014). Depth transfer: Depth extraction from video using non-parametric sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2144–2158.CrossRef
29.
Zurück zum Zitat Herrera, J. L., del Blanco, C. R., & García, N. (2018). Automatic depth extraction from 2D images using a cluster-based learning framework. IEEE Transactions on Image Processing, 27, 3288–3299.MathSciNetCrossRef Herrera, J. L., del Blanco, C. R., & García, N. (2018). Automatic depth extraction from 2D images using a cluster-based learning framework. IEEE Transactions on Image Processing, 27, 3288–3299.MathSciNetCrossRef
30.
Zurück zum Zitat Gwn Lore, K., Reddy, K., Giering, M., & Bernal, E. A. (2018) Generative adversarial networks for depth map estimation from RGB video. In IEEE conference on computer vision and pattern recognition workshops (pp. 1177–1185). Gwn Lore, K., Reddy, K., Giering, M., & Bernal, E. A. (2018) Generative adversarial networks for depth map estimation from RGB video. In IEEE conference on computer vision and pattern recognition workshops (pp. 1177–1185).
31.
Zurück zum Zitat Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRef Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.CrossRef
32.
Zurück zum Zitat Agrawal, A., Raskar, R., & Chellappa, R. (2006) What is the range of surface reconstructions from a gradient field. In European conf. computer vision (pp. 578–591).CrossRef Agrawal, A., Raskar, R., & Chellappa, R. (2006) What is the range of surface reconstructions from a gradient field. In European conf. computer vision (pp. 578–591).CrossRef
33.
Zurück zum Zitat Panda, D. K., & Meher, S. (2016). Detection of moving objects using fuzzy color difference histogram based background subtraction. IEEE Signal Processing Letters, 23(1), 45–49.CrossRef Panda, D. K., & Meher, S. (2016). Detection of moving objects using fuzzy color difference histogram based background subtraction. IEEE Signal Processing Letters, 23(1), 45–49.CrossRef
34.
Zurück zum Zitat Zhang, Q., Xu, L., & Jia, J. (2014). 100+ times faster weighted median filter (WMF). In IEEE conf. computer vision and pattern recognition (pp. 2830–2837). Zhang, Q., Xu, L., & Jia, J. (2014). 100+ times faster weighted median filter (WMF). In IEEE conf. computer vision and pattern recognition (pp. 2830–2837).
35.
Zurück zum Zitat Drulea, M., & Nedevschi, S. (2013). Motion estimation using the correlation transform. IEEE Transactions on Image Processing, 22(8), 3260–3270.CrossRef Drulea, M., & Nedevschi, S. (2013). Motion estimation using the correlation transform. IEEE Transactions on Image Processing, 22(8), 3260–3270.CrossRef
36.
Zurück zum Zitat Lie, W. N., Chen, C. Y., & Chen, W. C. (2011). 2D to 3D video conversion with key-frame depth propagation and trilateral filtering. Electronics Letters, 47(5), 319–321.CrossRef Lie, W. N., Chen, C. Y., & Chen, W. C. (2011). 2D to 3D video conversion with key-frame depth propagation and trilateral filtering. Electronics Letters, 47(5), 319–321.CrossRef
38.
Zurück zum Zitat Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRef Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRef
39.
Zurück zum Zitat Mannos, J., & Sakrison, D. (1974). The effects of a visual fidelity criterion of the encoding of images. IEEE Transactions on Information Theory, 20(4), 525–536.CrossRef Mannos, J., & Sakrison, D. (1974). The effects of a visual fidelity criterion of the encoding of images. IEEE Transactions on Information Theory, 20(4), 525–536.CrossRef
40.
Zurück zum Zitat Liu, A., Lin, W., & Narwaria, M. (2012). Image quality assessment based on gradient similarity. IEEE Transactions on Image Processing, 21(4), 1500–1512.MathSciNetCrossRef Liu, A., Lin, W., & Narwaria, M. (2012). Image quality assessment based on gradient similarity. IEEE Transactions on Image Processing, 21(4), 1500–1512.MathSciNetCrossRef
41.
Zurück zum Zitat Subjective assessment of stereoscopic television pictures, document ITU Rec., BT.1438, 2000. Subjective assessment of stereoscopic television pictures, document ITU Rec., BT.1438, 2000.
42.
Zurück zum Zitat Methodology for the subjective assessment of video quality in multimedia applications, document ITU Rec., BT.1788, 2007. Methodology for the subjective assessment of video quality in multimedia applications, document ITU Rec., BT.1788, 2007.
Metadaten
Titel
Automatic Depth Prediction from 2D Videos Based on Non-parametric Learning and Bi-directional Depth Propagation
verfasst von
Huihui Xu
Mingyan Jiang
Publikationsdatum
01.09.2018
Verlag
3D Display Research Center
Erschienen in
3D Research / Ausgabe 3/2018
Elektronische ISSN: 2092-6731
DOI
https://doi.org/10.1007/s13319-018-0184-9

Weitere Artikel der Ausgabe 3/2018

3D Research 3/2018 Zur Ausgabe