Skip to main content
Top
Published in: Neural Processing Letters 7/2023

21-03-2023

Improving Human Pose Estimation Based on Stacked Hourglass Network

Authors: Xuelian Zou, Xiaojun Bi, Changdong Yu

Published in: Neural Processing Letters | Issue 7/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The performance of multi-person pose estimation has been greatly improved due to the rapid development of deep learning. However, the problems of self-occlusion, mutual occlusion and complex background not only have not been effectively solved. In order to further effectively solve these problems, in this paper, we design a novel Global and Local Content-aware Feature Boosting Network (GLCFBNet) that includes Intra-layer Feature Residual-like Module (IFRM), Input Feature Aggregation Module (IFAM), Spatial and Channel Feature Hourglass Attention Module (SCFHAM). We propose a novel IFRM that can expand receptive field of each convolution layer through aggregation feature. The IRAM can fully extract the edge information of the input image,and effectively solve the problem of negative background impact. The SCFHAM can accurately determine the location of the occluded keypoints, judge the global information of the reasonable keypoints, and extract the effective features for joint node positioning from the redundant feature information. We evaluate the effectiveness of our proposed method on the MSCOCO keypoint detection dataset, the MPII Human Pose dataset and the CrowdPose dataset.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Luo Y, Xu Z, Liu P, Du Y, Guo J-M (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155MathSciNetCrossRefMATH Luo Y, Xu Z, Liu P, Du Y, Guo J-M (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155MathSciNetCrossRefMATH
2.
go back to reference Majd M, Safabakhsh R (2019) A motion-aware convLSTM network for action recognition. Appl Intell 49(7):2515–2521CrossRef Majd M, Safabakhsh R (2019) A motion-aware convLSTM network for action recognition. Appl Intell 49(7):2515–2521CrossRef
3.
go back to reference Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481 Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481
4.
go back to reference Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499 Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499
5.
go back to reference Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19 Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
6.
go back to reference Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755 Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755
7.
go back to reference Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693 Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
8.
go back to reference Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872 Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872
9.
go back to reference Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392 Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392
10.
go back to reference Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595 Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
11.
go back to reference Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, vol 27 Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, vol 27
12.
go back to reference Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082 Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082
13.
go back to reference Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4220–4229 Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4220–4229
14.
15.
go back to reference Kong D, Ma H, Chen Y, Xie X (2020) Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1546–1555 Kong D, Ma H, Chen Y, Xie X (2020) Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1546–1555
16.
go back to reference Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 190–206 Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 190–206
17.
go back to reference Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: IEEE computer society Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: IEEE computer society
18.
go back to reference Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30 Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30
19.
go back to reference Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi- context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840 Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi- context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840
20.
go back to reference He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision
21.
go back to reference Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911 Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911
22.
go back to reference Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343 Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343
23.
go back to reference Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: 2017 IEEE international conference on computer vision (ICCV) Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: 2017 IEEE international conference on computer vision (ICCV)
24.
go back to reference Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290 Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290
25.
go back to reference Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112 Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
26.
go back to reference Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703 Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
27.
go back to reference Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937 Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937
28.
go back to reference Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in neural information processing systems, vol 30 Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in neural information processing systems, vol 30
29.
30.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25
31.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
32.
go back to reference Wang R, Geng F, Wang X (2022) Mtpose: Human pose estimation with high-resolution multi-scale transformers. Neural Process Lett 54(5):3941–3964CrossRef Wang R, Geng F, Wang X (2022) Mtpose: Human pose estimation with high-resolution multi-scale transformers. Neural Process Lett 54(5):3941–3964CrossRef
33.
go back to reference Juan Lyu, Sai Ho, Ling (2018) Using multi-level convolutional neural network for classification of lung nodules on CT images. In: Conference proceedings : annual international conference of the IEEE engineering in medicine and biology society. IEEE Engineering in Medicine and Biology Society. Annual Conference, vol 2018, pp 686–689 Juan Lyu, Sai Ho, Ling (2018) Using multi-level convolutional neural network for classification of lung nodules on CT images. In: Conference proceedings : annual international conference of the IEEE engineering in medicine and biology society. IEEE Engineering in Medicine and Biology Society. Annual Conference, vol 2018, pp 686–689
34.
go back to reference Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412 Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412
35.
go back to reference Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662CrossRef Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662CrossRef
36.
go back to reference Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728 Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728
37.
go back to reference Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: European conference on computer vision, Springer, pp 56–72 Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: European conference on computer vision, Springer, pp 56–72
39.
go back to reference Zhang J, Su Q, Wang C, Gu H (2020) Monocular 3d vehicle detection with multi-instance depth and geometry reasoning for autonomous driving. Neurocomputing 403:182–192CrossRef Zhang J, Su Q, Wang C, Gu H (2020) Monocular 3d vehicle detection with multi-instance depth and geometry reasoning for autonomous driving. Neurocomputing 403:182–192CrossRef
40.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
41.
go back to reference Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154 Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
42.
go back to reference Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
43.
go back to reference Roy Abhijit, Guha Navab, Wachinger Nassir, Christian (2019) Recalibrating fully convolutional networks with spatial and channel squeeze and excitation blocks. IEEE Trans Med Imaging 38(2):540–549CrossRef Roy Abhijit, Guha Navab, Wachinger Nassir, Christian (2019) Recalibrating fully convolutional networks with spatial and channel squeeze and excitation blocks. IEEE Trans Med Imaging 38(2):540–549CrossRef
44.
go back to reference Hu Y, Li J, Huang Y, Gao X (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans Circ Syst Vid Technol 30(11):3911–3927CrossRef Hu Y, Li J, Huang Y, Gao X (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans Circ Syst Vid Technol 30(11):3911–3927CrossRef
45.
go back to reference Wang X, Tong J, Wang R (2021) Attention refined network for human pose estimation. Neural Process Lett 53(4):2853–2872CrossRef Wang X, Tong J, Wang R (2021) Attention refined network for human pose estimation. Neural Process Lett 53(4):2853–2872CrossRef
46.
go back to reference Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
47.
go back to reference Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science
48.
go back to reference Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Compute Graph 85:15–22CrossRef Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Compute Graph 85:15–22CrossRef
Metadata
Title
Improving Human Pose Estimation Based on Stacked Hourglass Network
Authors
Xuelian Zou
Xiaojun Bi
Changdong Yu
Publication date
21-03-2023
Publisher
Springer US
Published in
Neural Processing Letters / Issue 7/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-023-11212-5

Other articles of this Issue 7/2023

Neural Processing Letters 7/2023 Go to the issue