Skip to main content
Erschienen in: The Journal of Supercomputing 9/2023

30.01.2023

SA-GCN: structure-aware graph convolutional networks for crowd pose estimation

verfasst von: Jia Wang, Yanmin Luo

Erschienen in: The Journal of Supercomputing | Ausgabe 9/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we aim to capture the structure dependency of human joints and improve the localization accuracy of invisible joints. We propose a novel framework: Structure-aware Graph Convolutional Network (SA-GCN) for crowd pose estimation, which can be divided into two components: Sample Pose Net and Refined Pose Net. Firstly, Sample Pose Net includes a multi-scale feature fusion module, which uses multi-scale features to capture small-scale characters and extract the global “rough” pose as much as possible. Secondly, channel and spatial attention are injected into the multi-scale feature fusion module to strengthen the characteristics of small-scale characters. Finally, graph convolution obtained by the disentangled several parallel sub-graph convolution modules in Refined Pose Net. Global and structural advantages of graph convolution are more conducive to predicting difficult points in sample Pose. In addition, SA-GCN obtains lower parameters compared with the popular pose estimation networks. By which, we apply a novel framework SA-GCN to get feature maps for proposal and refinement, respectively. Comprehensive experiments demonstrate that the proposed method achieves superior pose estimation results on two benchmark datasets, CrowdPose and MSCOCO. Moreover, SA-GCN significantly outperforms state-of-the-art performance on CrowdPose and almost always generates plausible human pose predictions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Brasó G, Kister N, Leal-Taixé L (2021) The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11,853–11,863 Brasó G, Kister N, Leal-Taixé L (2021) The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11,853–11,863
2.
Zurück zum Zitat Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European Conference on Computer Vision. Springer, pp 717–732 Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European Conference on Computer Vision. Springer, pp 717–732
3.
Zurück zum Zitat Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299 Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299
4.
Zurück zum Zitat Chen Y, Wang Z, Peng Y et al (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112 Chen Y, Wang Z, Peng Y et al (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7103–7112
5.
Zurück zum Zitat Chen Y, Rohrbach M, Yan Z et al (2019) Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 433–442 Chen Y, Rohrbach M, Yan Z et al (2019) Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 433–442
6.
Zurück zum Zitat Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5386–5395 Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5386–5395
7.
Zurück zum Zitat Chu X, Yang W, Ouyang W et al (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840 Chu X, Yang W, Ouyang W et al (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840
8.
Zurück zum Zitat Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29 Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29
9.
Zurück zum Zitat Fang HS, Xie S, Tai YW et al (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2334–2343 Fang HS, Xie S, Tai YW et al (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2334–2343
10.
Zurück zum Zitat Fieraru M, Khoreva A, Pishchulin L et al (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 205–214 Fieraru M, Khoreva A, Pishchulin L et al (2018) Learning to refine human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 205–214
11.
Zurück zum Zitat Golda T, Kalb T, Schumann A et al (2019) Human pose estimation for real-world crowded scenarios. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–8 Golda T, Kalb T, Schumann A et al (2019) Human pose estimation for real-world crowded scenarios. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–8
12.
Zurück zum Zitat He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969 He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
13.
Zurück zum Zitat Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
14.
Zurück zum Zitat Insafutdinov E, Pishchulin L, Andres B et al (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp 34–50 Insafutdinov E, Pishchulin L, Andres B et al (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, pp 34–50
15.
Zurück zum Zitat Jiao L, Wu H, Wang H et al (2019) Multi-scale semantic image inpainting with residual learning and gan. Neurocomputing 331:199–212CrossRef Jiao L, Wu H, Wang H et al (2019) Multi-scale semantic image inpainting with residual learning and gan. Neurocomputing 331:199–212CrossRef
16.
Zurück zum Zitat Jin S, Liu W, Xie E et al (2020) Differentiable hierarchical graph grouping for multi-person pose estimation. In: European Conference on Computer Vision. Springer, pp 718–734 Jin S, Liu W, Xie E et al (2020) Differentiable hierarchical graph grouping for multi-person pose estimation. In: European Conference on Computer Vision. Springer, pp 718–734
17.
Zurück zum Zitat Ke L, Chang MC, Qi H et al (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728 Ke L, Chang MC, Qi H et al (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 713–728
18.
Zurück zum Zitat Khirodkar R, Chari V, Agrawal A et al (2021) Multi-instance pose networks: rethinking top-down pose estimation, pp 3122–3131 Khirodkar R, Chari V, Agrawal A et al (2021) Multi-instance pose networks: rethinking top-down pose estimation, pp 3122–3131
20.
21.
Zurück zum Zitat Li J, Wang C, Zhu H et al (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,863–10,872 Li J, Wang C, Zhu H et al (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,863–10,872
22.
Zurück zum Zitat Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755 Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, pp 740–755
23.
Zurück zum Zitat Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835CrossRef Liu C, Yuen PC (2010) Human action recognition using boosted eigenactions. Image Vis Comput 28(5):825–835CrossRef
24.
Zurück zum Zitat Luo Y, Xu Z, Liu P et al (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155CrossRefMATHMathSciNet Luo Y, Xu Z, Liu P et al (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155CrossRefMATHMathSciNet
25.
Zurück zum Zitat Luo Y, Ou Z, Wan T et al (2022) Fastnet: fast high-resolution network for human pose estimation. Image Vis Comput:104390 Luo Y, Ou Z, Wan T et al (2022) Fastnet: fast high-resolution network for human pose estimation. Image Vis Comput:104390
26.
Zurück zum Zitat Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision. Springer, pp 483–499 Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision. Springer, pp 483–499
27.
Zurück zum Zitat Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inf Process Syst 30 Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. Adv Neural Inf Process Syst 30
28.
Zurück zum Zitat Ou Z, Luo Y, Chen J et al (2022) Srfnet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711CrossRef Ou Z, Luo Y, Chen J et al (2022) Srfnet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711CrossRef
29.
Zurück zum Zitat Papandreou G, Zhu T, Kanazawa N et al (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911 Papandreou G, Zhu T, Kanazawa N et al (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4903–4911
30.
Zurück zum Zitat Paszke A, Gross S, Chintala S et al (2017) Automatic differentiation in pytorch Paszke A, Gross S, Chintala S et al (2017) Automatic differentiation in pytorch
31.
Zurück zum Zitat Pishchulin L, Insafutdinov E, Tang S et al (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4929–4937 Pishchulin L, Insafutdinov E, Tang S et al (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4929–4937
33.
Zurück zum Zitat Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5693–5703 Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5693–5703
34.
Zurück zum Zitat Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 190–206 Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 190–206
35.
Zurück zum Zitat Wan T, Luo Y, Zhang Z et al (2022) Tsnet: tree structure network for human pose estimation. Signal Image Video Process 16(2):551–558CrossRef Wan T, Luo Y, Zhang Z et al (2022) Tsnet: tree structure network for human pose estimation. Signal Image Video Process 16(2):551–558CrossRef
36.
Zurück zum Zitat Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4724–4732 Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4724–4732
37.
Zurück zum Zitat Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19 Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
38.
Zurück zum Zitat Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481 Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 466–481
39.
Zurück zum Zitat Yang W, Li S, Ouyang W et al (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1281–1290 Yang W, Li S, Ouyang W et al (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1281–1290
40.
Zurück zum Zitat Zhang Z, Luo Y, Gou J (2021) Double anchor embedding for accurate multi-person 2d pose estimation. Image Vis Comput 111(104):198 Zhang Z, Luo Y, Gou J (2021) Double anchor embedding for accurate multi-person 2d pose estimation. Image Vis Comput 111(104):198
41.
Zurück zum Zitat Zhao L, Peng X, Tian Y et al (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3425–3435 Zhao L, Peng X, Tian Y et al (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3425–3435
42.
Zurück zum Zitat Zhu Y, Ma C, Du J (2019) Rotated cascade r-cnn: a shape robust detector with coordinate regression. Pattern Recogn 96(106):964 Zhu Y, Ma C, Du J (2019) Rotated cascade r-cnn: a shape robust detector with coordinate regression. Pattern Recogn 96(106):964
Metadaten
Titel
SA-GCN: structure-aware graph convolutional networks for crowd pose estimation
verfasst von
Jia Wang
Yanmin Luo
Publikationsdatum
30.01.2023
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 9/2023
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05055-z

Weitere Artikel der Ausgabe 9/2023

The Journal of Supercomputing 9/2023 Zur Ausgabe

Premium Partner