Skip to main content

Real-Time Human Pose Recognition in Parts from Single Depth Images

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 411))

Abstract

This chapter describes a method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc.. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result into world space and finding local modes of a 3D non-parametric density. The system runs at around 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters.We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, A., Triggs, B.: 3D human pose from silhouettes by relevance vector regression. In: Proc. CVPR (2004)

    Google Scholar 

  2. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9(7), 1545–1588 (1997)

    Article  Google Scholar 

  3. Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Ng, A.: Discriminative learning of markov random fields for segmentation of 3D scan data. In: Proc. CVPR (2005)

    Google Scholar 

  4. Autodesk MotionBuilder

    Google Scholar 

  5. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24 (2002)

    Google Scholar 

  6. Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: Proc. ICCV (2009)

    Google Scholar 

  7. Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: Proc. CVPR (1998)

    Google Scholar 

  8. Breiman, L.: Random forests. Mach. Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  9. CMU Mocap Database, http://mocap.cs.cmu.edu/

  10. Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24(5) (2002)

    Google Scholar 

  11. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)

    Article  Google Scholar 

  12. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR (2003)

    Google Scholar 

  13. Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: Proc. CVPR (2010)

    Google Scholar 

  14. Gavrila, D.M.: Pedestrian Detection from a Moving Vehicle. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 37–49. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comp. Sci. 38 (1985)

    Google Scholar 

  16. Grest, D., Woetzel, J., Koch, R.: Nonlinear Body Pose Estimation from Depth Images. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 285–292. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Ioffe, S., Forsyth, D.: Probabilistic methods for finding people. IJCV 43(1), 45–68 (2001)

    Article  MATH  Google Scholar 

  18. Kalogerakis, E., Hertzmann, A., Singh, K.: Learning 3D mesh segmentation and labeling. ACM Trans. Graphics 29(3) (2010)

    Google Scholar 

  19. Knoop, S., Vacek, S., Dillmann, R.: Sensor fusion for 3D human body tracking with an articulated 3D body model. In: Proc. ICRA (2006)

    Google Scholar 

  20. Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: Proc. CVPR, vol. 2, pp. 775–781 (2005)

    Google Scholar 

  21. Microsoft Corp. Redmond WA. Kinect for Xbox 360

    Google Scholar 

  22. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. In: CVIU (2006)

    Google Scholar 

  23. Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: NIPS (2006)

    Google Scholar 

  24. Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: Proc. ICCV (2003)

    Google Scholar 

  25. Navaratnam, R., Fitzgibbon, A.W., Cipolla, R.: The joint manifold model for semi-supervised multi-valued regression. In: Proc. ICCV (2007)

    Google Scholar 

  26. Ning, H., Xu, W., Gong, Y., Huang, T.S.: Discriminative learning of visual words for 3D human pose estimation. In: Proc. CVPR (2008)

    Google Scholar 

  27. Okada, R., Soatto, S.: Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 434–445. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  28. Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: Proc. ICRA (2010)

    Google Scholar 

  29. Poppe, R.: Vision-based human motion analysis: An overview. CVIU 108 (2007)

    Google Scholar 

  30. Quinlan, J.R.: Induction of decision trees. Mach. Learn. (1986)

    Google Scholar 

  31. Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: Proc. CVPR (2003)

    Google Scholar 

  32. Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., Torr, P.H.S.: Randomized trees for human pose detection. In: Proc. CVPR (2008)

    Google Scholar 

  33. Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: Proc. ICCV (2003)

    Google Scholar 

  34. Sharp, T.: Implementing Decision Trees and Forests on a GPU. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 595–608. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  35. Shepherd, B.A.: An appraisal of a decision tree approach to image classification. In: IJCAI (1983)

    Google Scholar 

  36. Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proc. CVPR (2008)

    Google Scholar 

  37. Siddiqui, M., Medioni, G.: Human pose estimation from a single view point, real-time range sensor. In: CVCG at CVPR (2010)

    Google Scholar 

  38. Sidenbladh, H., Black, M.J., Sigal, L.: Implicit Probabilistic Models of Human Motion for Synthesis and Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  39. Sigal, L., Bhatia, S., Roth, S., Black, M.J., Isard, M.: Tracking loose-limbed people. In: Proc. CVPR (2004)

    Google Scholar 

  40. Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proc. CVPR (2008)

    Google Scholar 

  41. Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: Proc. CVPR (2008)

    Google Scholar 

  42. Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. In: Proc. ACM SIGGRAPH (2009)

    Google Scholar 

  43. Winn, J., Shotton, J.: The layout consistent random field for recognizing and segmenting partially occluded objects. In: Proc. CVPR (2006)

    Google Scholar 

  44. Zhu, Y., Fujimura, K.: Constrained Optimization for Human Pose Estimation from Depth Sequences. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 408–418. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Shotton, J. et al. (2013). Real-Time Human Pose Recognition in Parts from Single Depth Images. In: Cipolla, R., Battiato, S., Farinella, G. (eds) Machine Learning for Computer Vision. Studies in Computational Intelligence, vol 411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28661-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28661-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28660-5

  • Online ISBN: 978-3-642-28661-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics