Abstract
We present a fast, automatic method for accurately capturing full-body motion data using a single depth camera. At the core of our system lies a realtime registration process that accurately reconstructs 3D human poses from single monocular depth images, even in the case of significant occlusions. The idea is to formulate the registration problem in a Maximum A Posteriori (MAP) framework and iteratively register a 3D articulated human body model with monocular depth cues via linear system solvers. We integrate depth data, silhouette information, full-body geometry, temporal pose priors, and occlusion reasoning into a unified MAP estimation framework. Our 3D tracking process, however, requires manual initialization and recovery from failures. We address this challenge by combining 3D tracking with 3D pose detection. This combination not only automates the whole process but also significantly improves the robustness and accuracy of the system. Our whole algorithm is highly parallel and is therefore easily implemented on a GPU. We demonstrate the power of our approach by capturing a wide range of human movements in real time and achieve state-of-the-art accuracy in our comparison against alternative systems such as Kinect [2012].
- Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation. 9(7):1545--1588. Google ScholarDigital Library
- Baak, A., Müller, M., Bharaj, G., Seidel, H.-P., and Theobalt, C. 2011. A data-driven approach for real-time full body pose reconstruction from a depth camera. In IEEE 13th International Conference on Computer Vision (ICCV), 1092--1099. Google ScholarDigital Library
- Baker, S., and Matthews, I. 2004. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision. 56(3):221--255. Google ScholarDigital Library
- Bregler, C., Malik, J., and Pullen, K. 2004. Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision. 56(3):179--194. Google ScholarDigital Library
- Chai, J., and Hodgins, J. 2005. Performance animation from low-dimensional control signals. In ACM Transactions on Graphics. 24(3):686--696. Google ScholarDigital Library
- Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S. 2010. Real time motion capture using a single time-of-flight camera. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 755--762.Google Scholar
- Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. 2011. Efficient regression of general-activity human poses from depth images. In Proceedings of IEEE 13th International Conference on Computer Vision, 415--422. Google ScholarDigital Library
- Grest, D., Kruger, V., and Koch, R. 2007. Single view motion tracking by depth and silhouette information. In Proceedings of the 15th Scandinavian Conference on Image Analysis (SCIA), 719--729. Google ScholarDigital Library
- Kinect, 2012. Microsoft Kinect for Xbox 360.Google Scholar
- Knoop, S., Vacek, S., and Dillmann, R. 2006. Sensor fusion for 3D human body tracking with an articulated 3D body model. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1686--1691.Google Scholar
- Lepetit, V., and Fua, P. 2006. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(9): 1465--1479. Google ScholarDigital Library
- Liu, H., Wei, X., Chai, J., Ha, I., and Rhee, T. 2011. Realtime human motion control with a small number of inertial sensors. In Symposium on Interactive 3D Graphics and Games, ACM, I3D '11, 133--140. Google ScholarDigital Library
- Microsft Kinect API for Windows, 2012. http://www.microsoft.com/en-us/kinectforwindows/.Google Scholar
- Moeslund, T. B., Hilton, A., and Kruger, V. 2006. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding. 104:90--126. Google ScholarDigital Library
- Plagemann, C., Ganapathi, V., Koller, D., and Thrun, S. 2010. Realtime identification and localization of body parts from depth images. In Proceedings of International Conferences on Robotics and Automation (ICRA 2010), 3108--3113.Google Scholar
- Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from a single depth image. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1297--1304. Google ScholarDigital Library
- Siddiqui, M., and Medioni, G. 2010. Human pose estimation from a single view point, real-time range sensor. In CVCG at CVPR.Google Scholar
- Slyper, R., and Hodgins, J. 2008. Action capture with ac-celerometers. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 193--199. Google ScholarDigital Library
- Tautges, J., Zinke, A., Krüger, B., Baumann, J., Weber, A., Helten, T., Müller, M., Seidel, H.-P., and Eberhardt, B. 2011. Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics. 30(3): 18:1--18:12. Google ScholarDigital Library
- Vicon Systems, 2011. http://www.vicon.com.Google Scholar
- Ye, M., Wang, X., Yang, R., Ren, L., and Pollefeys, M. 2011. Accurate 3D pose estimation from a single depth image. In Proceedings of IEEE 13th International Conference on Computer Vision, 731--738. Google ScholarDigital Library
Index Terms
- Accurate realtime full-body motion capture using a single depth camera
Recommendations
Motion capture from body-mounted cameras
SIGGRAPH '11: ACM SIGGRAPH 2011 papersMotion capture technology generally requires that recordings be performed in a laboratory or closed stage setting with controlled lighting. This restriction precludes the capture of motions that require an outdoor setting or the traversal of large ...
Motion capture from body-mounted cameras
Motion capture technology generally requires that recordings be performed in a laboratory or closed stage setting with controlled lighting. This restriction precludes the capture of motions that require an outdoor setting or the traversal of large ...
Camera handoff: tracking in multiple uncalibrated stationary cameras
HUMO '00: Proceedings of the Workshop on Human Motion (HUMO'00)Multiple cameras are needed to completely cover an environment for monitoring activity. To track people successfully in multiple perspective imagery, one needs to establish a correspondence between objects captured by multiple cameras. We present a ...
Comments