skip to main content

Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera

Published:01 June 2017Publication History
Skip Abstract Section


This article proposes a real-time method that uses a single-view RGB-D input (a depth sensor integrated with a color camera) to simultaneously reconstruct a casual scene with a detailed geometry model, surface albedo, per-frame non-rigid motion, and per-frame low-frequency lighting, without requiring any template or motion priors. The key observation is that accurate scene motion can be used to integrate temporal information to recover the precise appearance, whereas the intrinsic appearance can help to establish true correspondence in the temporal domain to recover motion. Based on this observation, we first propose a shading-based scheme to leverage appearance information for motion estimation. Then, using the reconstructed motion, a volumetric albedo fusing scheme is proposed to complete and refine the intrinsic appearance of the scene by incorporating information from multiple frames. Since the two schemes are iteratively applied during recording, the reconstructed appearance and motion become increasingly more accurate. In addition to the reconstruction results, our experiments also show that additional applications can be achieved, such as relighting, albedo editing, and free-viewpoint rendering of a dynamic scene, since geometry, appearance, and motion are all reconstructed by our technique.

Skip Supplemental Material Section

Supplemental Material



290.3 MB


  1. Derek Bradley, Tiberiu Popa, Alla Sheffer, Wolfgang Heidrich, and Tamy Boubekeur. 2008. Markerless garment capture. ACM Transactions on Graphics 27, 3 (2008), 99.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In CVPR. IEEE, 1339--1346. Google ScholarGoogle ScholarCross RefCross Ref
  3. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4 (2015), 69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. ACM, New York, NY, 303--312. DOI: ScholarGoogle Scholar
  5. Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3 (2008), 98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Julie Dorsey, Holly Rushmeier, and François Sillion. 2010. Digital Modeling of Material Appearance. Morgan Kaufmann.Google ScholarGoogle Scholar
  7. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4 (2016), 114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mingsong Dou, Jonathan Taylor, Henry Fuchs, Andrew Fitzgibbon, and Shahram Izadi. 2015. 3D scanning deformable objects with a single RGB-D sensor. In CVPR. 493--501.Google ScholarGoogle Scholar
  9. Per Einarsson, Charles-Felix Chabert, Andrew Jones, Wan-Chun Ma, Bruce Lamond, Tim Hawkins, Mark T. Bolas, Sebastian Sylwan, and Paul E. Debevec. 2006. Relighting human locomotion with flowed reflectance fields. Rendering Techniques 2006 (2006), Vol. 17.Google ScholarGoogle Scholar
  10. Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using l0 regularization. In ICCV. 3083--3091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2017. Robust non-rigid motion tracking and surface reconstruction using L0 regularization. IEEE Transactions on Visualization and Computer Graphics PP, 99 (2017), 1--1.Google ScholarGoogle Scholar
  12. Samuel W. Hasinoff, Anat Levin, Philip R. Goode, and William T. Freeman. 2011. Diffuse reflectance imaging with astronomical applications. In ICCV. IEEE, 185--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. James Imber, Jean-Yves Guillemaut, and Adrian Hilton. 2014. Intrinsic textures for relightable free-viewpoint video. In ECCV. Springer, 392--407. Google ScholarGoogle ScholarCross RefCross Ref
  14. Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In ECCV.Google ScholarGoogle Scholar
  15. Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic studio: A massively multiview system for social motion capture. In ICCV. 3334--3342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guannan Li, Chenglei Wu, Carsten Stoll, Yebin Liu, Kiran Varanasi, Qionghai Dai, and Christian Theobalt. 2013b. Capturing relightable human performances under general uncontrolled illumination. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 275--284. Google ScholarGoogle ScholarCross RefCross Ref
  17. Hao Li, Bart Adams, Leonidas J Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics 28, 5 (2009), 175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hao Li, Etienne Vouga, Anton Gudym, Linjie Luo, Jonathan T Barron, and Gleb Gusev. 2013a. 3D self-portraits. ACM Trans. Graph. 32, 6 (2013), 187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Miao Liao, Qing Zhang, Huamin Wang, Ruigang Yang, and Minglun Gong. 2009. Modeling deformable objects from a single depth camera. In ICCV. Google ScholarGoogle ScholarCross RefCross Ref
  20. Yebin Liu, Qionghai Dai, and Wenli Xu. 2010. A point-cloud-based multiview stereo algorithm for free-viewpoint video. IEEE Trans. Vis. Comput. Graph. 16, 3 (2010), 407--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans-Peter Seidel, and Christian Theobalt. 2013. Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2720--2735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Richard M. Murray, Zexiang Li, S. Shankar Sastry, and S Shankara Sastry. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press.Google ScholarGoogle Scholar
  23. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR.Google ScholarGoogle Scholar
  24. Björn Nutti, Åsa Kronander, Mattias Nilsing, Kristofer Maad, Cristina Svensson, and Hao Li. 2014. Depth sensor-based realtime tumor tracking for accurate radiation therapy. In Eurographics (Short Papers). Citeseer, 1--4.Google ScholarGoogle Scholar
  25. Roy Or-El, Guy Rosman, Aaron Wetzler, Ron Kimmel, and Alfred M Bruckstein. 2015. Rgbd-fusion: Real-time high precision depth recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5407--5416.Google ScholarGoogle Scholar
  26. Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In SIGGRAPH. ACM, 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Szymon Rusinkiewicz and Marc Levoy. 2001. Efficient variants of the ICP algorithm. In Proceedings of the 3rd International Conference on 3-D Digital Imaging and Modeling, 2001. IEEE, 145--152. Google ScholarGoogle ScholarCross RefCross Ref
  28. Andrei Sharf, Dan A. Alcantara, Thomas Lewiner, Chen Greif, Alla Sheffer, Nina Amenta, and Daniel Cohen-Or. 2008. Space-time surface reconstruction using incompressible flow. ACM Transactions on Graphics 27, 5 (2008), 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. Comput. Graph. Appl. 27, 3 (2007), 21--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sima Taheri, Aswin C. Sankaranarayanan, and Rama Chellappa. 2013. Joint albedo estimation and pose tracking from video. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (2013), 1674--1689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christian Theobalt, Naveed Ahmed, Hendrik Lensch, Marcus Magnor, and Hans-Peter Seidel. 2007. Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Trans. Vis. Comput. Graph. 13, 4 (2007), 663--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Transac. Graph. 34, 6 (2015), 183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3 (2008), 97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Transactions on Graphics 28, 5 (2009), 174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM Trans. Graph. 28, 2 (2009), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel Weber, Jan Bender, Markus Schnoes, Andre Stork, and Dieter W. Fellner. 2013. Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput. Graph. Forum 32, 1 (2013), 16--26. DOI: ScholarGoogle ScholarCross RefCross Ref
  37. Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. 32, 6 (2013), 161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chenglei Wu, Kiran Varanasi, Yebin Liu, Hans-Peter Seidel, and Christian Theobalt. 2011. Shading-based dynamic shape refinement from multi-view video under general illumination. In ICCV. IEEE, 1108--1115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In ECCV. Springer, 757--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. 33, 6 (2014), 200:1--200:10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Wu, Z. Wang, and K. Zhou. 2016. Simultaneous localization and appearance estimation with a consumer RGB-D camera. IEEE Trans. Vis. Comput. Graph. 22, 8 (2016), 2012--2023. DOI: ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hongzhi Wu and Kun Zhou. 2015. AppFusion: Interactive appearance acquisition using a kinect sensor. In Computer Graphics Forum, Vol. 34. 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhe Wu, Sai-Kit Yeung, and Ping Tan. 2016. Towards building an RGBD-M scanner. arXiv Preprint arXiv:1603.03875 (2016).Google ScholarGoogle Scholar
  44. Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld kinects. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In CVPR. IEEE, 2353--2360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In CVPR. IEEE, 676--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Michael Zollhöfer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Trans. Graph. 34, 4 (July 2015), Article 96, 14 pages. DOI: ScholarGoogle ScholarDigital LibraryDigital Library
  48. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and others. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4 (2014), 156. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera



      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 36, Issue 3
        June 2017
        165 pages
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2017
        • Accepted: 1 March 2017
        • Revised: 1 February 2017
        • Received: 1 November 2016
        Published in tog Volume 36, Issue 3


        Request permissions about this article.

        Request Permissions

        Check for updates


        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.



      View online with eReader.
