nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

verfasst von : Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, Michael J. Black

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fit it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior pose accuracy with respect to the state of the art.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Joint Face Alignment and 3D Face Reconstruction

Nächstes Kapitel Do We Really Need to Collect Millions of Faces for Effective Face Recognition?

http://smplify.is.tue.mpg.de

http://chumpy.org

http://mocap.cs.cmu.edu

Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1446–1455 (2015)

Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH 24(3), 408–416 (2005)CrossRef

Balan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8 (2007)

Barron, C., Kakadiaris, I.: Estimating anthropometry and pose from a single uncalibrated image. Comput. Vis. Image Underst. CVIU 81(3), 269–284 (2001)CrossRefMATH

Bo, L., Sminchisescu, C.: Twin Gaussian processes for structured prediction. Int. J. Comput. Vis. IJCV 87(1–2), 28–52 (2010)CrossRef

Chen, Y., Kim, T.-K., Cipolla, R.: Inferring 3D shapes and deformations from single views. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 300–313. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_22 CrossRef

10.

Ericson, C.: Real-Time Collision Detection. The Morgan Kaufmann Series in Interactive 3-D Technology (2004)

11.

Fan, X., Zheng, K., Zhou, Y., Wang, S.: Pose locality constrained representation for 3D human pose reconstruction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 174–188. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10590-1_12

12.

Geman, S., McClure, D.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 52(4), 5–21 (1987)MathSciNet

13.

Grest, D., Koch, R.: Human model fitting from monocular posture images. In: Proceedings of VMV, pp. 665–1344 (2005)

14.

Guan, P., Weiss, A., Balan, A., Black, M.J.: Estimating human shape and pose from a single image. In: IEEE International Conference on Computer Vision, ICCV, pp. 1381–1388 (2009)

15.

Guan, P.: Virtual human bodies with clothing and hair: From images to animation. Ph.D. thesis, Brown University, Department of Computer Science, December 2012

16.

Hasler, N., Ackermann, H., Rosenhahn, B., Thormhlen, T., Seidel, H.P.: Multilinear pose and body shape estimation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1823–1830 (2010)

17.

Ionescu, C., Carreira, J., Sminchisescu, C.: Iterated second-order label sensitive pooling for 3D human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1661–1668 (2014)

18.

Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 36(7), 1325–1339 (2014)CrossRef

19.

Jain, A., Thormählen, T., Seidel, H.P., Theobalt, C.: MovieReshape: tracking and reshaping of humans in videos. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH 29(5), 148:1–148:10 (2010)

20.

Jain, A., Tompson, J., LeCun, Y., Bregler, C.: MoDeep: a deep learning framework using motion features for human pose estimation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 302–315. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16808-1_21

21.

Jiang, H.: 3D human pose reconstruction using millions of exemplars. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1674–1677 (2010)

22.

Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference, pp. 12.1-12.11 (2010)

23.

Kiefel, M., Gehler, P.V.: Human pose estimation with fields of parts. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 331–346. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_22

24.

Kostrikov, I., Gall, J.: Depth sweep regression forests for estimating 3D human pose from images. In: Proceedings of the British Machine Vision Conference (2014)

25.

Kulkarni, T.D., Kohli, P., Tenenbaum, J.B., Mansinghka, V.: Picture: a probabilistic programming language for scene perception. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4390–4399 (2015)

26.

Lee, H., Chen, Z.: Determination of 3D human body postures from a single view. Comput. Vis. Graph. Image Process. 30(2), 148–168 (1985)CrossRef

27.

Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 332–347. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16808-1_23

28.

Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_11

29.

Loper, M., Mahmood, N., Black, M.J.: MoSh: motion and shape capture from sparse markers. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH Asia 33(6), 220:1–220:13 (2014)

30.

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH Asia 34(6), 248: 1–248: 16 (2015)

31.

Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006)MATH

32.

Olson, E., Agarwal, P.: Inference on networks of mixtures for robust robot mapping. Int. J. Robot. Res. 32(7), 826–840 (2013)CrossRef

33.

Parameswaran, V., Chellappa, R.: View independent human body pose estimation from a single perspective image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 16–22 (2004)

34.

Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: IEEE International Conference on Computer Vision, ICCV, pp. 1913–1921 (2015)

35.

Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9003, pp. 538–552. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16865-4_35

36.

Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4929–4937 (2016)

37.

Pons-Moll, G., Fleet, D., Rosenhahn, B.: Posebits for monocular human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2345–2352 (2014)

38.

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.: Metric regression forests for correspondence estimation. Int. J. Comput. Vis. IJCV 113(3), 1–13 (2015)MathSciNet

39.

Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_41

40.

Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH 23(3), 309–314 (2004)CrossRef

41.

Sigal, L., Balan, A., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. IJCV 87(1), 4–27 (2010)CrossRef

42.

Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems (NIPS), vol. 20, pp. 1337–1344 (2008)

43.

Simo-Serra, E., Quattoni, A., Torras, C., Moreno-Noguer, F.: A joint model for 2D and 3D pose estimation from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3634–3641 (2013)

44.

Simo-Serra, E., Ramisa, A., Alenya, G., Torras, C., Moreno-Noguer, F.: Single image 3D human pose estimation from noisy observations. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2673–2680 (2012)

45.

Sminchisescu, C., Telea, A.: Human pose estimation from silhouettes, a consistent approach using distance level sets. In: WSCG International Conference for Computer Graphics, Visualization and Computer Vision, pp. 413–420 (2002)

46.

Sminchisescu, C., Triggs, B.: Covariance scaled sampling for monocular 3D body tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 447–454 (2001)

47.

Sridhar, S., Mueller, F., Oulasvirta, A., Theobalt, C.: Fast and robust hand tracking using detection-guided optimization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3121–3221 (2015)

48.

Taylor, C.: Reconstruction of articulated objects from point correspondences in single uncalibrated image. Comput. Vis. Image Underst. CVIU 80(10), 349–363 (2000)CrossRefMATH

49.

Tekin, B., Rozantsev, A., Lepetit, V., Fua, P.: Direct prediction of 3D body poses from motion compensated sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 991–1000 (2016)

50.

Thiery, J.M., Guy, E., Boubekeur, T.: Sphere-meshes: shape approximation using spherical quadric error metrics. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH Asia 32(6), 178:1–178:12 (2013)

51.

Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1653–1660 (2014)

52.

Wang, C., Wang, Y., Lin, Z., Yuille, A., Gao, W.: Robust estimation of 3D human poses from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2369–2376 (2014)

53.

Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4724–4732 (2016)

54.

Yang, Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3546–3553 (2011)

55.

Yasin, H., Iqbal, U., Krüger, B., Weber, A., Gall, J.: A dual-source approach for 3D pose estimation from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4948–4956 (2016)

56.

Zhou, F., Torre, F.: Spatio-temporal matching for human detection in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 62–77. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10599-4_5

57.

Zhou, S., Fu, H., Liu, L., Cohen-Or, D., Han, X.: Parametric reshaping of human bodies in images. ACM Trans. Graph. (TOG) - Proc. ACM SIGGRAPH 29(4), 126:1–126:10 (2010)

58.

Zhou, X., Zhu, M., Leonardos, S., Derpanis, K., Daniilidis, K.: Sparse representation for 3D shape estimation: a convex relaxation approach. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4447–4455 (2015)

59.

Zhou, X., Zhu, M., Leonardos, S., Derpanis, K., Daniilidis, K.: Sparseness meets deepness: 3D human pose estimation from monocular video. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4447–4455 (2016)

Titel: Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
verfasst von: Federica Bogo
Angjoo Kanazawa
Christoph Lassner
Peter Gehler
Javier Romero
Michael J. Black
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2016
Print ISBN: 978-3-319-46453-4

Electronic ISBN: 978-3-319-46454-1

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-46454-1_34

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner