Top

International Journal of Computer Vision

Published in:

01-07-2015

Metric Regression Forests for Correspondence Estimation

Authors: Gerard Pons-Moll, Jonathan Taylor, Jamie Shotton, Aaron Hertzmann, Andrew Fitzgibbon

Published in: International Journal of Computer Vision | Issue 3/2015

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We present a new method for inferring dense data to model correspondences, focusing on the application of human pose estimation from depth images. Recent work proposed the use of regression forests to quickly predict correspondences between depth pixels and points on a 3D human mesh model. That work, however, used a proxy forest training objective based on the classification of depth pixels to body parts. In contrast, we introduce Metric Space Information Gain (MSIG), a new decision forest training objective designed to directly minimize the entropy of distributions in a metric space. When applied to a model surface, viewed as a metric space defined by geodesic distances, MSIG aims to minimize image-to-model correspondence uncertainty. A naïve implementation of MSIG would scale quadratically with the number of training examples. As this is intractable for large datasets, we propose a method to compute MSIG in linear time. Our method is a principled generalization of the proxy classification objective, and does not require an extrinsic isometric embedding of the model surface in Euclidean space. Our experiments demonstrate that this leads to correspondences that are considerably more accurate than state of the art, using far fewer training images.

previous article Correspondence, Matching and Recognition

next article Relatively-Paired Space Analysis: Learning a Latent Common Space From Relatively-Paired Observations

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Note that this is an extended version of Pons-Moll et al. (2013). Some portions of Taylor et al. (2012) have been included for clarity.

Distinct subscripts indicate whether \(p\) and \(l\) refer to vertices or spheres.

Baak, A., Müller, M., Bharaj, G., Seidel, H., & Theobalt, C. (2011). A data-driven approach for real-time full body pose reconstruction from a depth camera. In: IEEE international conference on computer vision pp. 1092–1099.

Balan, A., Sigal, L., Black, M., Davis, J., & Haussecker, H. (2007). Detailed human shape and pose from images. In: IEEE conference on computer vision and pattern recognition.

Bentley, J. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.MATHMathSciNetCrossRef

Besl, P., & McKay, N. (1992). A method for registration of 3d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 239–256.CrossRef

Black, M., & Rangarajan, A. (1996). On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International Journal on Computer Vision, 19(1), 57–91.CrossRef

Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal on Computer Vision, 87, 28–52.CrossRef

Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal on Computer Vision, 56(3), 179–194.CrossRef

Breiman, L. (1999). Random forests. Berkeley: UC. (Technical Report TR567).

Brubaker, M., Fleet, D., & Hertzmann, A. (2010). Physics-based person tracking using the anthropomorphic walker. In: International journal on computer vision.

Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8(1), 75–85.

Criminisi, A., & Shotton, J. (2013). Decision forests for computer vision and medical image analysis. London: Springer.CrossRef

Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal on Computer Vision, 61(2), 185–205.CrossRef

Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1(1), 269–271.MATHMathSciNetCrossRef

Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal on Computer Vision, 87, 75–92.CrossRef

Gall, J., Yao, A., Razavi, N., Van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. PAMI, 33(11), 2188–2202.CrossRef

Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In: European conference on computer vision.

Ganapathi, V., Plagemann, C., Thrun, S., & Koller, D. (2010). Real time motion capture using a time-of-flight camera. In: Conference in computer vision and pattern recognition.

Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In: IEEE international conference on computer vision, pp. 415–422.

Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta Crystallographica, 32(5), 922–923.CrossRef

Lee, C., & Elgammal, A. (2010). Coupled visual and kinematic manifold models for tracking. International Journal on Computer Vision, 87, 118–139.

Liu, W., & White, A. (1994). The importance of attribute selection measures in decision tree induction. Machine Learning, 15(1), 25–41.

Memisevic, R., Sigal, L., & Fleet, D. J. (2012). Shared kernel information embedding for discriminative inference. PAMI, 34(4), 778–790.CrossRef

Nowozin, S. (2012). Improved information gain estimates for decision tree induction. In: ICML.

Parzen, E. (1962). On estimation of a probability density function and mode. The Aannals of Mathematical Statistics, 33(3), 1065–1076.MATHMathSciNetCrossRef

Pons-Moll, G., Baak, A., Gall, J., Leal-Taixe, L., Mueller, M., Seidel, H., & Rosenhahn, B. (2011). Outdoor human motion capture using inverse kinematics and von mises-fisher sampling. In: International conference on computer vision.

Pons-Moll, G., Leal-Taixé, L., Truong, T., & Rosenhahn, B. (2011). Efficient and robust shape matching for model based human motion capture. In: DAGM.

Pons-Moll, G., & Rosenhahn, B. (2011). Model-based pose estimation. In Visual analysis of humans (pp. 139–170). London: Springer.

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., & Fitzgibbon, A. (2013). Metric regression forests for human pose estimation. In: British machine vision conference (BMVC).

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In: IEEE conference in computer vision and pattern recognition, pp. 1297–1304.

Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In: Conference in computer vision and pattern recognition.

Silverman, B. (1986). Density estimation for statistics and data analysis (Vol. 26). London: CRC press.

Sminchisescu, C., Bo, L., Ionescu, C., & Kanaujia, A. (2011). Feature-based pose estimation. In Visual analysis of humans (pp. 225–251). London: Springer.

Stoll, C., Hasler, N., Gall, J., Seidel, H., & Theobalt, C. (2011) Fast articulated motion tracking using a sums of gaussians body model. In: IEEE international conference on computer vision, pp. 951–958.

Taylor, J., Shotton, J., Sharp, T., & Fitzgibbon, A. (2012). The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: Conference in computer vision and pattern recognition.

Urtasun, R., & Darrell, T. (2008). Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference in computer vision and pattern recognition, pp. 1–8.

Title: Metric Regression Forests for Correspondence Estimation
Authors: Gerard Pons-Moll
Jonathan Taylor
Jamie Shotton
Aaron Hertzmann
Andrew Fitzgibbon
Publication date: 01-07-2015
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 3/2015
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-015-0818-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 3/2015

Correspondence, Matching and Recognition

Label Embedding: A Frugal Baseline for Text Recognition

Morphologically Invariant Matching of Structures with the Complete Rank Transform

Efficient Dense Rigid-Body Motion Segmentation and Estimation in RGB-D Video

Discovering Beautiful Attributes for Aesthetic Image Analysis

A Spline-Based Trajectory Representation for Sensor Fusion and Rolling Shutter Cameras

Premium Partner