Abstract
This article proposes a real-time method that uses a single-view RGB-D input (a depth sensor integrated with a color camera) to simultaneously reconstruct a casual scene with a detailed geometry model, surface albedo, per-frame non-rigid motion, and per-frame low-frequency lighting, without requiring any template or motion priors. The key observation is that accurate scene motion can be used to integrate temporal information to recover the precise appearance, whereas the intrinsic appearance can help to establish true correspondence in the temporal domain to recover motion. Based on this observation, we first propose a shading-based scheme to leverage appearance information for motion estimation. Then, using the reconstructed motion, a volumetric albedo fusing scheme is proposed to complete and refine the intrinsic appearance of the scene by incorporating information from multiple frames. Since the two schemes are iteratively applied during recording, the reconstructed appearance and motion become increasingly more accurate. In addition to the reconstruction results, our experiments also show that additional applications can be achieved, such as relighting, albedo editing, and free-viewpoint rendering of a dynamic scene, since geometry, appearance, and motion are all reconstructed by our technique.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera
- Derek Bradley, Tiberiu Popa, Alla Sheffer, Wolfgang Heidrich, and Tamy Boubekeur. 2008. Markerless garment capture. ACM Transactions on Graphics 27, 3 (2008), 99.Google ScholarDigital Library
- Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In CVPR. IEEE, 1339--1346. Google ScholarCross Ref
- Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4 (2015), 69. Google ScholarDigital Library
- Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. ACM, New York, NY, 303--312. DOI:https://doi.org/10.1145/237170.237269Google Scholar
- Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3 (2008), 98. Google ScholarDigital Library
- Julie Dorsey, Holly Rushmeier, and François Sillion. 2010. Digital Modeling of Material Appearance. Morgan Kaufmann.Google Scholar
- Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4 (2016), 114. Google ScholarDigital Library
- Mingsong Dou, Jonathan Taylor, Henry Fuchs, Andrew Fitzgibbon, and Shahram Izadi. 2015. 3D scanning deformable objects with a single RGB-D sensor. In CVPR. 493--501.Google Scholar
- Per Einarsson, Charles-Felix Chabert, Andrew Jones, Wan-Chun Ma, Bruce Lamond, Tim Hawkins, Mark T. Bolas, Sebastian Sylwan, and Paul E. Debevec. 2006. Relighting human locomotion with flowed reflectance fields. Rendering Techniques 2006 (2006), Vol. 17.Google Scholar
- Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using l0 regularization. In ICCV. 3083--3091. Google ScholarDigital Library
- Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2017. Robust non-rigid motion tracking and surface reconstruction using L0 regularization. IEEE Transactions on Visualization and Computer Graphics PP, 99 (2017), 1--1.Google Scholar
- Samuel W. Hasinoff, Anat Levin, Philip R. Goode, and William T. Freeman. 2011. Diffuse reflectance imaging with astronomical applications. In ICCV. IEEE, 185--192. Google ScholarDigital Library
- James Imber, Jean-Yves Guillemaut, and Adrian Hilton. 2014. Intrinsic textures for relightable free-viewpoint video. In ECCV. Springer, 392--407. Google ScholarCross Ref
- Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In ECCV.Google Scholar
- Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic studio: A massively multiview system for social motion capture. In ICCV. 3334--3342. Google ScholarDigital Library
- Guannan Li, Chenglei Wu, Carsten Stoll, Yebin Liu, Kiran Varanasi, Qionghai Dai, and Christian Theobalt. 2013b. Capturing relightable human performances under general uncontrolled illumination. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 275--284. Google ScholarCross Ref
- Hao Li, Bart Adams, Leonidas J Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics 28, 5 (2009), 175.Google ScholarDigital Library
- Hao Li, Etienne Vouga, Anton Gudym, Linjie Luo, Jonathan T Barron, and Gleb Gusev. 2013a. 3D self-portraits. ACM Trans. Graph. 32, 6 (2013), 187.Google ScholarDigital Library
- Miao Liao, Qing Zhang, Huamin Wang, Ruigang Yang, and Minglun Gong. 2009. Modeling deformable objects from a single depth camera. In ICCV. Google ScholarCross Ref
- Yebin Liu, Qionghai Dai, and Wenli Xu. 2010. A point-cloud-based multiview stereo algorithm for free-viewpoint video. IEEE Trans. Vis. Comput. Graph. 16, 3 (2010), 407--418. Google ScholarDigital Library
- Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans-Peter Seidel, and Christian Theobalt. 2013. Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2720--2735. Google ScholarDigital Library
- Richard M. Murray, Zexiang Li, S. Shankar Sastry, and S Shankara Sastry. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press.Google Scholar
- Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR.Google Scholar
- Björn Nutti, Åsa Kronander, Mattias Nilsing, Kristofer Maad, Cristina Svensson, and Hao Li. 2014. Depth sensor-based realtime tumor tracking for accurate radiation therapy. In Eurographics (Short Papers). Citeseer, 1--4.Google Scholar
- Roy Or-El, Guy Rosman, Aaron Wetzler, Ron Kimmel, and Alfred M Bruckstein. 2015. Rgbd-fusion: Real-time high precision depth recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5407--5416.Google Scholar
- Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In SIGGRAPH. ACM, 117--128. Google ScholarDigital Library
- Szymon Rusinkiewicz and Marc Levoy. 2001. Efficient variants of the ICP algorithm. In Proceedings of the 3rd International Conference on 3-D Digital Imaging and Modeling, 2001. IEEE, 145--152. Google ScholarCross Ref
- Andrei Sharf, Dan A. Alcantara, Thomas Lewiner, Chen Greif, Alla Sheffer, Nina Amenta, and Daniel Cohen-Or. 2008. Space-time surface reconstruction using incompressible flow. ACM Transactions on Graphics 27, 5 (2008), 110.Google ScholarDigital Library
- Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. Comput. Graph. Appl. 27, 3 (2007), 21--31. Google ScholarDigital Library
- Sima Taheri, Aswin C. Sankaranarayanan, and Rama Chellappa. 2013. Joint albedo estimation and pose tracking from video. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (2013), 1674--1689. Google ScholarDigital Library
- Christian Theobalt, Naveed Ahmed, Hendrik Lensch, Marcus Magnor, and Hans-Peter Seidel. 2007. Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Trans. Vis. Comput. Graph. 13, 4 (2007), 663--674. Google ScholarDigital Library
- Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Transac. Graph. 34, 6 (2015), 183. Google ScholarDigital Library
- Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3 (2008), 97.Google ScholarDigital Library
- Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Transactions on Graphics 28, 5 (2009), 174.Google ScholarDigital Library
- Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM Trans. Graph. 28, 2 (2009), 15. Google ScholarDigital Library
- Daniel Weber, Jan Bender, Markus Schnoes, Andre Stork, and Dieter W. Fellner. 2013. Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput. Graph. Forum 32, 1 (2013), 16--26. DOI:https://doi.org/10.1111/j.1467-8659.2012.03227.xGoogle ScholarCross Ref
- Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. 32, 6 (2013), 161. Google ScholarDigital Library
- Chenglei Wu, Kiran Varanasi, Yebin Liu, Hans-Peter Seidel, and Christian Theobalt. 2011. Shading-based dynamic shape refinement from multi-view video under general illumination. In ICCV. IEEE, 1108--1115. Google ScholarDigital Library
- Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In ECCV. Springer, 757--770. Google ScholarDigital Library
- Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. 33, 6 (2014), 200:1--200:10.Google ScholarDigital Library
- H. Wu, Z. Wang, and K. Zhou. 2016. Simultaneous localization and appearance estimation with a consumer RGB-D camera. IEEE Trans. Vis. Comput. Graph. 22, 8 (2016), 2012--2023. DOI:https://doi.org/10.1109/TVCG.2015.2498617Google ScholarDigital Library
- Hongzhi Wu and Kun Zhou. 2015. AppFusion: Interactive appearance acquisition using a kinect sensor. In Computer Graphics Forum, Vol. 34. 289--298. Google ScholarDigital Library
- Zhe Wu, Sai-Kit Yeung, and Ping Tan. 2016. Towards building an RGBD-M scanner. arXiv Preprint arXiv:1603.03875 (2016).Google Scholar
- Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld kinects. In ECCV. Google ScholarDigital Library
- Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In CVPR. IEEE, 2353--2360. Google ScholarDigital Library
- Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In CVPR. IEEE, 676--683. Google ScholarDigital Library
- Michael Zollhöfer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Trans. Graph. 34, 4 (July 2015), Article 96, 14 pages. DOI:https://doi.org/10.1145/2766887Google ScholarDigital Library
- Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and others. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4 (2014), 156. Google ScholarDigital Library
Index Terms
- Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera
Recommendations
Real-time non-rigid reconstruction using an RGB-D camera
We present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf ...
Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera
This article proposes a real-time method that uses a single-view RGB-D input (a depth sensor integrated with a color camera) to simultaneously reconstruct a casual scene with a detailed geometry model, surface albedo, per-frame non-rigid motion, and per-...
Real-time multiply recursive reflections and refractions using hybrid rendering
We present a new method for real-time rendering of multiple recursions of reflections and refractions. The method uses the strengths of real-time ray tracing for objects close to the camera, by storing them in a per-frame constructed bounding volume ...
Comments