Skip to main content

2018 | OriginalPaper | Buchkapitel

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs

verfasst von : Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest Iphone X. However, they still suffer from heavy noises which limit their applications. Although plenty of progresses have been made to reduce the noises and boost geometric details, due to the inherent illness and the real-time requirement, the problem is still far from been solved. We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy. The rendering equation is exploited in our network in an unsupervised manner. In detail, we impose an unsupervised loss based on the light transport to extract the high-frequency geometry. Experimental results indicate that our network achieves real-time single depth enhancement on various categories of scenes. Thanks to the well decoupling of the low and high frequency information in the cascaded network, we achieve superior performance over the state-of-the-art techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Intensity image I plays the same role as \(C_{in}\). We study I for simplicity.
 
Literatur
1.
Zurück zum Zitat Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: Proceedings of CVPR, pp. 17–24. IEEE (2013) Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: Proceedings of CVPR, pp. 17–24. IEEE (2013)
2.
Zurück zum Zitat Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. Technical report, EECS, UC Berkeley, May 2013 Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. Technical report, EECS, UC Berkeley, May 2013
3.
Zurück zum Zitat Beeler, T., Bickel, B., Beardsley, P.A., Sumner, B., Gross, M.H.: High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29(4), 40:1–40:9 (2010)CrossRef Beeler, T., Bickel, B., Beardsley, P.A., Sumner, B., Gross, M.H.: High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29(4), 40:1–40:9 (2010)CrossRef
4.
5.
Zurück zum Zitat Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Robotics-DL Tentative, pp. 586–606. International Society for Optics and Photonics (1992) Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Robotics-DL Tentative, pp. 586–606. International Society for Optics and Photonics (1992)
6.
Zurück zum Zitat Chan, D., Buisman, H., Theobalt, C., Thrun, S.: A noise-aware filter for real-time depth upsampling. In: ECCV Workshop on Multi-camera & Multi-modal Sensor Fusion (2008) Chan, D., Buisman, H., Theobalt, C., Thrun, S.: A noise-aware filter for real-time depth upsampling. In: ECCV Workshop on Multi-camera & Multi-modal Sensor Fusion (2008)
7.
Zurück zum Zitat Chen, J., Bautembach, D., Izadi, S.: Scalable real-time volumetric surface reconstruction. ACM Trans. Graph. 32(4), 113:1–113:16 (2013)MATH Chen, J., Bautembach, D., Izadi, S.: Scalable real-time volumetric surface reconstruction. ACM Trans. Graph. 32(4), 113:1–113:16 (2013)MATH
8.
Zurück zum Zitat Cui, Y., Schuon, S., Thrun, S., Stricker, D., Theobalt, C.: Algorithms for 3D shape scanning with a depth camera. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1039–1050 (2013)CrossRef Cui, Y., Schuon, S., Thrun, S., Stricker, D., Theobalt, C.: Algorithms for 3D shape scanning with a depth camera. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1039–1050 (2013)CrossRef
9.
Zurück zum Zitat Diebel, J., Thrun, S.: An application of Markov random fields to range sensing. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS 2005, pp. 291–298. MIT Press, Cambridge (2005) Diebel, J., Thrun, S.: An application of Markov random fields to range sensing. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS 2005, pp. 291–298. MIT Press, Cambridge (2005)
10.
Zurück zum Zitat Dolson, J., Baek, J., Plagemann, C., Thrun, S.: Upsampling range data in dynamic environments. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1141–1148, June 2010 Dolson, J., Baek, J., Plagemann, C., Thrun, S.: Upsampling range data in dynamic environments. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1141–1148, June 2010
11.
Zurück zum Zitat Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Trans. Graph. 36(3), 32:1–32:13 (2017)CrossRef Guo, K., Xu, F., Yu, T., Liu, X., Dai, Q., Liu, Y.: Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Trans. Graph. 36(3), 32:1–32:13 (2017)CrossRef
12.
Zurück zum Zitat Han, Y., Lee, J.Y., Kweon, I.S.: High quality shape from a single RGB-D image under uncalibrated natural illumination. In: Proceedings of ICCV (2013) Han, Y., Lee, J.Y., Kweon, I.S.: High quality shape from a single RGB-D image under uncalibrated natural illumination. In: Proceedings of ICCV (2013)
13.
Zurück zum Zitat Han, Y., Lee, J.Y., Kweon, I.S.: High quality shape from a single RGB-D image under uncalibrated natural illumination. In: IEEE International Conference on Computer Vision, pp. 1617–1624 (2013) Han, Y., Lee, J.Y., Kweon, I.S.: High quality shape from a single RGB-D image under uncalibrated natural illumination. In: IEEE International Conference on Computer Vision, pp. 1617–1624 (2013)
14.
Zurück zum Zitat Hariharan, B., Arbelaez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization, pp. 447–456 (2014) Hariharan, B., Arbelaez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization, pp. 447–456 (2014)
15.
Zurück zum Zitat He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)CrossRef He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)CrossRef
16.
Zurück zum Zitat Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)CrossRef Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)CrossRef
17.
Zurück zum Zitat Horn, B.K.: Obtaining shape from shading information. In: The Psychology of Computer Vision, pp. 115–155 (1975) Horn, B.K.: Obtaining shape from shading information. In: The Psychology of Computer Vision, pp. 115–155 (1975)
18.
Zurück zum Zitat Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016)
19.
Zurück zum Zitat Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: Proceedings of UIST, pp. 559–568. ACM (2011) Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: Proceedings of UIST, pp. 559–568. ACM (2011)
20.
Zurück zum Zitat Kajiya, J.T.: The rendering equation. In: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1986, pp. 143–150. ACM, New York (1986) Kajiya, J.T.: The rendering equation. In: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1986, pp. 143–150. ACM, New York (1986)
21.
Zurück zum Zitat Khan, N., Tran, L., Tappen, M.: Training many-parameter shape-from-shading models using a surface database. In: Proceedings of ICCV Workshop (2009) Khan, N., Tran, L., Tappen, M.: Training many-parameter shape-from-shading models using a surface database. In: Proceedings of ICCV Workshop (2009)
22.
Zurück zum Zitat Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. 26(3), 96 (2007)CrossRef Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. 26(3), 96 (2007)CrossRef
23.
Zurück zum Zitat Lindner, M., Kolb, A., Hartmann, K.: Data-fusion of PMD-based distance-information and high-resolution RGB-images. In: 2007 International Symposium on Signals, Circuits and Systems, vol. 1, pp. 1–4, July 2007 Lindner, M., Kolb, A., Hartmann, K.: Data-fusion of PMD-based distance-information and high-resolution RGB-images. In: 2007 International Symposium on Signals, Circuits and Systems, vol. 1, pp. 1–4, July 2007
24.
Zurück zum Zitat Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Computer Vision and Pattern Recognition, pp. 4040–4048 (2016) Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
25.
Zurück zum Zitat Nestmeyer, T., Gehler, P.V.: Reflectance adaptive filtering improves intrinsic image estimation. In: CVPR, pp. 1771–1780 (2017) Nestmeyer, T., Gehler, P.V.: Reflectance adaptive filtering improves intrinsic image estimation. In: CVPR, pp. 1771–1780 (2017)
26.
Zurück zum Zitat Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 343–352, June 2015 Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 343–352, June 2015
27.
Zurück zum Zitat Newcombe, R.A., Izadi, S., et al.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011) Newcombe, R.A., Izadi, S., et al.: KinectFusion: real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)
28.
Zurück zum Zitat Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32(6), 169 (2013)CrossRef Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32(6), 169 (2013)CrossRef
29.
Zurück zum Zitat Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill (2016) Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill (2016)
30.
Zurück zum Zitat Or El, R., Rosman, G., Wetzler, A., Kimmel, R., Bruckstein, A.M.: RGBD-fusion: real-time high precision depth recovery. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 Or El, R., Rosman, G., Wetzler, A., Kimmel, R., Bruckstein, A.M.: RGBD-fusion: real-time high precision depth recovery. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
31.
Zurück zum Zitat Park, J., Kim, H., Tai, Y.W., Brown, M.S., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: 2011 International Conference on Computer Vision, pp. 1623–1630, November 2011 Park, J., Kim, H., Tai, Y.W., Brown, M.S., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: 2011 International Conference on Computer Vision, pp. 1623–1630, November 2011
33.
Zurück zum Zitat Richardson, E., Sela, M., Or-El, R., Kimmel, R.: Learning detailed face reconstruction from a single image. In: CVPR (2017) Richardson, E., Sela, M., Or-El, R., Kimmel, R.: Learning detailed face reconstruction from a single image. In: CVPR (2017)
34.
Zurück zum Zitat Richardt, C., Stoll, C., Dodgson, N.A., Seidel, H.P., Theobalt, C.: Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. Comput. Graph. Forum 31(2pt1), 247–256 (2012)CrossRef Richardt, C., Stoll, C., Dodgson, N.A., Seidel, H.P., Theobalt, C.: Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. Comput. Graph. Forum 31(2pt1), 247–256 (2012)CrossRef
35.
Zurück zum Zitat Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: 2017 International Conference on 3D Vision (3DV), pp. 57–66. IEEE (2017) Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: 2017 International Conference on 3D Vision (3DV), pp. 57–66. IEEE (2017)
37.
Zurück zum Zitat Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation (2017) Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation (2017)
38.
Zurück zum Zitat Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2, p. 5 (2017) Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2, p. 5 (2017)
39.
Zurück zum Zitat Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017) Varol, G., et al.: Learning from synthetic humans. In: CVPR (2017)
40.
Zurück zum Zitat Wei, G., Hirzinger, G.: Learning shape from shading by a multilayer network. IEEE Trans. Neural Netw. 7(4), 985–995 (1996)CrossRef Wei, G., Hirzinger, G.: Learning shape from shading by a multilayer network. IEEE Trans. Neural Netw. 7(4), 985–995 (1996)CrossRef
41.
Zurück zum Zitat Wojna, Z., et al.: The devil is in the decoder (2017) Wojna, Z., et al.: The devil is in the decoder (2017)
42.
Zurück zum Zitat Wu, C., Stoll, C., Valgaerts, L., Theobalt, C.: On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. (TOG) 32(6), 161 (2013) Wu, C., Stoll, C., Valgaerts, L., Theobalt, C.: On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. (TOG) 32(6), 161 (2013)
43.
Zurück zum Zitat Wu, C., Varanasi, K., Liu, Y., Seidel, H., Theobalt, C.: Shading-based dynamic shape refinement from multi-view video under general illumination, pp. 1108–1115 (2011) Wu, C., Varanasi, K., Liu, Y., Seidel, H., Theobalt, C.: Shading-based dynamic shape refinement from multi-view video under general illumination, pp. 1108–1115 (2011)
44.
Zurück zum Zitat Wu, C., Zollhöfer, M., Nießner, M., Stamminger, M., Izadi, S., Theobalt, C.: Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. (TOG) 33(6), 200 (2014)MATH Wu, C., Zollhöfer, M., Nießner, M., Stamminger, M., Izadi, S., Theobalt, C.: Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. (TOG) 33(6), 200 (2014)MATH
45.
Zurück zum Zitat Yang, Q., Yang, R., Davis, J., Nister, D.: Spatial-depth super resolution for range images. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007 Yang, Q., Yang, R., Davis, J., Nister, D.: Spatial-depth super resolution for range images. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
46.
Zurück zum Zitat Yu, L., Yeung, S., Tai, Y., Lin, S.: Shading-based shape refinement of RGB-D images, pp. 1415–1422 (2013) Yu, L., Yeung, S., Tai, Y., Lin, S.: Shading-based shape refinement of RGB-D images, pp. 1415–1422 (2013)
47.
Zurück zum Zitat Yu, T., et al.: BodyFusion: real-time capture of human motion and surface geometry using a single depth camera. In: The IEEE International Conference on Computer Vision (ICCV). IEEE, October 2017 Yu, T., et al.: BodyFusion: real-time capture of human motion and surface geometry using a single depth camera. In: The IEEE International Conference on Computer Vision (ICCV). IEEE, October 2017
48.
Zurück zum Zitat Yu, T., et al.: DoubleFusion: real-time capture of human performance with inner body shape from a depth sensor. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) Yu, T., et al.: DoubleFusion: real-time capture of human performance with inner body shape from a depth sensor. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
49.
Zurück zum Zitat Zhang, Z., Tsa, P.S., Cryer, J.E., Shah, M.: Shape from shading: a survey. IEEE PAMI 21(8), 690–706 (1999)CrossRef Zhang, Z., Tsa, P.S., Cryer, J.E., Shah, M.: Shape from shading: a survey. IEEE PAMI 21(8), 690–706 (1999)CrossRef
50.
Zurück zum Zitat Zhu, J., Wang, L., Yang, R., Davis, J.: Fusion of time-of-flight depth and stereo for high accuracy depth maps. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008 Zhu, J., Wang, L., Yang, R., Davis, J.: Fusion of time-of-flight depth and stereo for high accuracy depth maps. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Metadaten
Titel
DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs
verfasst von
Shi Yan
Chenglei Wu
Lizhen Wang
Feng Xu
Liang An
Kaiwen Guo
Yebin Liu
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01249-6_10

Premium Partner