nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Deeply Learned Compositional Models for Human Pose Estimation

verfasst von : Wei Tang, Pei Yu, Ying Wu

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Compositional models represent patterns with hierarchies of meaningful parts and subparts. Their ability to characterize high-order relationships among body parts helps resolve low-level ambiguities in human pose estimation (HPE). However, prior compositional models make unrealistic assumptions on subpart-part relationships, making them incapable to characterize complex compositional patterns. Moreover, state spaces of their higher-level parts can be exponentially large, complicating both inference and learning. To address these issues, this paper introduces a novel framework, termed as Deeply Learned Compositional Model (DLCM), for HPE. It exploits deep neural networks to learn the compositionality of human bodies. This results in a novel network with a hierarchical compositional architecture and bottom-up/top-down inference stages. In addition, we propose a novel bone-based part representation. It not only compactly encodes orientations, scales and shapes of parts, but also avoids their potentially large state spaces. With significantly lower complexities, our approach outperforms state-of-the-art methods on three benchmark datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Multimodal Unsupervised Image-to-Image Translation

Nächstes Kapitel Unsupervised Video Object Segmentation with Motion-Based Bilateral Networks

Nur mit Berechtigung zugänglich

We focus on multilevel compositional models in this paper.

Each entry of a score map evaluates the goodness of a part being at a certain state, e.g., location and type.

We do not need Or-nodes [13, 14] here as part variations have been explicitly modeled by the state variables of And-nodes.

In practice, we find repeated ends can be removed without deteriorating performance.

http://www.ece.northwestern.edu/~wtt450/project/ECCV18_DLCM.html.

Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A.: 3D human pose estimation: a review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, 1–20 (2016)CrossRef

Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Amari, S., Arbib, M.A. (eds.) Competition and Cooperation in Neural Nets, pp. 267–285. Springer, Heidelberg (1982). https://doi.org/10.1007/978-3-642-46466-9_18CrossRef

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef

Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29CrossRef

Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

Yang, W., Ouyang, W., Li, H., Wang, X.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3073–3082 (2016)

Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_44CrossRef

Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5669–5678 (2017)

10.

Geman, S., Potter, D.F., Chi, Z.: Composition systems. Q. Appl. Math. 60(4), 707–736 (2002)MathSciNetCrossRef

11.

Bienenstock, E., Geman, S., Potter, D.: Compositionality, MDL priors, and object recognition. In: Advances in Neural Information Processing Systems, pp. 838–844 (1997)

12.

Tian, Y., Zitnick, C.L., Narasimhan, S.G.: Exploring the spatial hierarchy of mixture models for human pose estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 256–269. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_19CrossRef

13.

Zhu, S.C., Mumford, D., et al.: A Stochastic Grammar of Images, vol. 2. Now Publishers, Inc., Hanover (2007). https://dl.acm.org/citation.cfm?id=1315337

14.

Zhu, L.L., Chen, Y., Yuille, A.: Recursive compositional models for vision: description and review of recent work. J. Math. Imaging Vis. 41(1–2), 122 (2011)MathSciNetCrossRef

15.

Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1705–1712 (2011)

16.

Rothrock, B., Park, S., Zhu, S.C.: Integrating grammar and segmentation for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3214–3221 (2013)

17.

Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: IEEE International Conference on Computer Vision, pp. 723–730 (2011)

18.

Park, S., Zhu, S.C.: Attributed grammars for joint estimation of human attributes, part and pose. In: IEEE International Conference on Computer Vision, pp. 2372–2380 (2015)

19.

Park, S., Nie, B.X., Zhu, S.C.: Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Trans. Pattern Anal. Mach. Intell. 40(7), 1555–1569 (2018)CrossRef

20.

Felzenszwalb, P.F., Huttenlocher, D.P.: Distance transforms of sampled functions. Theory Comput. 8(1), 415–428 (2012)MathSciNetCrossRef

21.

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)

22.

Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1465–1472 (2011)

23.

Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 227–240. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_17CrossRef

24.

Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2145–2152 (2006)

25.

Tang, W., Yu, P., Zhou, J., Wu, Y.: Towards a unified compositional model for visual pattern modeling. In: IEEE International Conference on Computer Vision, pp. 2803–2812 (2017)

26.

Duan, K., Batra, D., Crandall, D.J.: A multi-layer composite model for human pose estimation. In: British Machine Vision Conference (2012)

27.

Wang, J., Yuille, A.L.: Semantic part segmentation using compositional model combining shape and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1788–1797 (2015)

28.

Zhu, L., Chen, Y., Torralba, A., Freeman, W., Yuille, A.: Part and appearance sharing: recursive compositional models for multi-view. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1919–1926 (2010)

29.

Hu, P., Ramanan, D.: Bottom-up and top-down reasoning with hierarchical rectified Gaussians. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5600–5609 (2016)

30.

Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference o Computer Vision and Pattern Recognitionn, pp. 1385–1392 (2011)

31.

Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: IEEE International Conference on Computer Vision, pp. 2621–2630 (2017)

32.

Ai, B., Zhou, Y., Yu, Y., Du, S.: Human pose estimation using deep structure guided learning. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1224–1231 (2017)

33.

Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: IEEE International Conference on Automatic Face Gesture Recognition, pp. 468–475 (2017)

34.

Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: International Conference on Machine Learning, pp. 111–118 (2010)

35.

Wan, L., Eigen, D., Fergus, R.: End-to-end integration of a convolution network, deformable parts model and non-maximum suppression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 851–859 (2015)

36.

Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef

37.

Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)

38.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

39.

Sapp, B., Taskar, B.: MODEC: multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3681 (2013)

40.

Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010)

41.

Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)

42.

Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)

43.

Chen, X., Yuille, A.L.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems, pp. 1736–1744 (2014)

44.

Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3CrossRef

45.

Lifshitz, I., Fetaya, E., Ullman, S.: Human pose estimation using deep consensus voting. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 246–260. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_16CrossRef

46.

Yu, X., Zhou, F., Chandraker, M.: Deep deformation network for object landmark localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 52–70. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_4CrossRef

47.

Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial PoseNet: a structure-aware convolutional network for human pose estimation. In: IEEE International Conference on Computer Vision, pp. 1221–1230 (2017)

48.

Sun, K., Lan, C., Xing, J., Zeng, W., Liu, D., Wang, J.: Human pose estimation using global and local normalization. In: IEEE International Conference on Computer Vision, pp. 5600–5608 (2017)

49.

Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: The IEEE International Conference on Computer Vision, pp. 1290–1299 (2017)

50.

Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, Upper Saddle River (2002)

51.

Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: NIPS Workshop (2011)

52.

Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4(2), 26–31 (2012)

53.

Gkioxari, G., Toshev, A., Jaitly, N.: Chained predictions using convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 728–743. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_44CrossRef

Titel: Deeply Learned Compositional Models for Human Pose Estimation
verfasst von: Wei Tang
Pei Yu
Ying Wu
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01218-2

Electronic ISBN: 978-3-030-01219-9

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01219-9_12

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"