Abstract
We propose a new method to quickly and accurately predict human pose---the 3D positions of body joints---from a single depth image, without depending on information from preceding frames. Our approach is strongly rooted in current object recognition strategies. By designing an intermediate representation in terms of body parts, the difficult pose estimation problem is transformed into a simpler per-pixel classification problem, for which efficient machine learning techniques exist. By using computer graphics to synthesize a very large dataset of training image pairs, one can train a classifier that estimates body part labels from test images invariant to pose, body shape, clothing, and other irrelevances. Finally, we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.
The system runs in under 5ms on the Xbox 360. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state-of-the-art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.
- Agarwal, A., Triggs, B. 3D human pose from silhouettes by relevance vector regression. In Proceedings of CVPR (2004). Google ScholarDigital Library
- Amit, Y., Geman, D. Shape quantization and recognition with randomized trees. Neural Computation, 9, 7 (1997), 1545--1588. Google ScholarDigital Library
- Belongie, S., Malik, J., Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 4 (2002), 509--522. Google ScholarDigital Library
- Breiman, L. Random forests. Mach. Learn. 45, 1 (2001), 5--32. Google ScholarDigital Library
- CMU Mocap Database. http://mocap.cs.cmu.edu.Google Scholar
- Comaniciu, D., Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24, 5 (2002). Google ScholarDigital Library
- Fergus, R., Perona, P., Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In Proceedings of CVPR (2003).Google ScholarCross Ref
- Ganapathi, V., Plagemann, C., Koller, D., Thrun, S. Real time motion capture using a single time-of-flight camera. In Proceedings of CVPR (2010).Google ScholarCross Ref
- Gavrila, D. Pedestrian detection from a moving vehicle. In Proceedings of ECCV (June 2000). Google ScholarDigital Library
- Gonzalez, T. Clustering to minimize the maximum intercluster distance. Theor. Comp. Sci. 38 (1985).Google Scholar
- Lepetit, V., Lagger, P., Fua, P. Randomized trees for real-time keypoint recognition. In Proceedings of CVPR (2005). Google ScholarDigital Library
- Moeslund, T., Hilton, A., Krüger, V. A survey of advances in vision-based human motion capture and analysis. CVIU 104(2--3) (2006), 90--126. Google ScholarDigital Library
- Navaratnam, R., Fitzgibbon, A.W., Cipolla, R. The joint manifold model for semi-supervised multi-valued regression. In Proceedings of ICCV (2007).Google ScholarCross Ref
- Ning, H., Xu, W., Gong, Y., Huang, T.S. Discriminative learning of visual words for 3D human pose estimation. In Proceedings of CVPR (2008).Google Scholar
- Okada, R., Soatto, S. Relevant feature selection for human pose estimation and localization in cluttered images. In Proceedings of ECCV (2008). Google ScholarDigital Library
- Plagemann, C., Ganapathi, V., Koller, D., Thrun, S. Real-time identification and localization of body parts from depth images. In Proceedings of ICRA (2010).Google ScholarCross Ref
- Poppe, R. Vision-based human motion analysis: An overview. CVIU 108(1--2) (2007), 4--18. Google ScholarDigital Library
- Ramanan, D., Forsyth, D. Finding and tracking people from the bottom up. In Proceedings of CVPR (2003).Google ScholarCross Ref
- Shakhnarovich, G., Viola, P., Darrell, T. Fast pose estimation with parameter sensitive hashing. In Proceedings of ICCV (2003). Google ScholarDigital Library
- Sharp, T. Implementing decision trees and forests on a GPU. In Proceedings of ECCV (2008).Google ScholarCross Ref
- Shotton, J., Winn, J., Rother, C., Criminisi, A. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of ECCV (2006). Google ScholarDigital Library
- Siddiqui, M., Medioni, G. Human pose estimation from a single view point, real-time range sensor. In IEEE International Workshop on Computer Vision for Computer Games (2010).Google ScholarCross Ref
- Sidenbladh, H., Black, M., Sigal, L. Implicit probabilistic models of human motion for synthesis and tracking. In Proceedings of ECCV (2002). Google ScholarDigital Library
- Sigal, L., Bhatia, S., Roth, S., Black, M., Isard, M. Tracking loose-limbed people. In Proceedings of CVPR (2004).Google ScholarCross Ref
- Urtasun, R., Darrell, T. Local probabilistic regression for activity-independent human pose inference. In Proceedings of CVPR (2008).Google ScholarCross Ref
- Wang, R., Popović, J. Real-time hand-tracking with a color glove. In Proceedings of ACM SIGGRAPH (2009). Google ScholarDigital Library
- Winn, J., Shotton, J. The layout consistent random field for recognizing and segmenting partially occluded objects. In Proceedings of CVPR (2006). Google ScholarDigital Library
- Zhu, Y., Fujimura, K. Constrained optimization for human pose estimation from depth sequences. In Proceedings of ACCV (2007). Google ScholarDigital Library
Index Terms
- Real-time human pose recognition in parts from single depth images
Recommendations
Real-time human pose recognition in parts from single depth images
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern RecognitionWe propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the ...
Principal direction analysis-based real-time 3D human pose reconstruction from a single depth image
SoICT '13: Proceedings of the 4th Symposium on Information and Communication TechnologyHuman pose estimation in real-time is a challenging problem in computer vision. In this paper, we present a novel approach to recover a 3D human pose in real-time from a single depth human silhouette using Principal Direction Analysis (PDA) on each ...
Real-time 3D human pose recovery from a single depth image using principal direction analysis
In this paper, we present a novel approach to recover a 3D human pose in real-time from a single depth image using principal direction analysis (PDA). Human body parts are first recognized from a human depth silhouette via trained random forests (RFs). ...
Comments