Top

Pattern Analysis and Applications

Published in:

01-02-2023 | Theoretical Advances

MSRT: multi-scale representation transformer for regression-based human pose estimation

Authors: Beiguang Shan, Qingxuan Shi, Fang Yang

Published in: Pattern Analysis and Applications | Issue 2/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, we are interested in the human pose estimation problem with a focus on leveraging discriminative pose features. Recent pose estimation works concentrate on extracting high-level features but ignore the low-level details, thus reducing the prediction accuracy. To mitigate the above issues, we propose an end-to-end method called multi-scale representation transformer network (MSRT). Our network consists of two key components: feature aggregation module (FAM) and transformers. The FAM splits and stacks feature maps of different scales, then fuses them to achieve multi-scale representation learning. This module makes up for the lack of detailed information in the high-level features. Furthermore, we utilize Transformers to identify long-range interactions among feature maps, and capture implicit body structure information, which allows the proposed network to refine the locations of terminal and occluded joints. Compared with existing regression-based methods, MSRT achieves superior results on the COCO2017 and MPII datasets.

previous article Dual image-based reversible fragile watermarking scheme for tamper detection and localization

next article On the learning of vague languages for syntactic pattern recognition

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Geng Z, Sun K, Xiao B, Zhang Z, Wang J (2021) Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14676–14686

Su C, Li J, Zhang S, Xing J, Gao W, Tian Q (2017) Pose-driven deep convolutional model for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp. 3960–3969

Farrajota M, Rodrigues JM, du Buf JH (2019) Human action recognition in videos with articulated pose information by deep networks. Pattern Anal Appl 22(4):1307–1318MathSciNetCrossRef

Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp. 466–481

Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5693–5703

Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV), pp. 529–545

Wei F, Sun X, Li H, Wang J, Lin S (2020) Point-set anchors for object detection, instance segmentation and pose estimation. In: European conference on computer vision, pp. 527–544

Fang H.-S, Xie S, Tai Y.-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp. 2334–2343

Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10863–10872

10.

Hidalgo G, Raaj Y, Idrees H, Xiang D, Joo H, Simon T, Sheikh Y (2019) Single-network whole-body pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6982–6991

11.

Shi Q, Di H, Lu Y, Lv F, Tian X (2017) Video pose estimation with global motion cues. Neurocomputing 219:269–279CrossRef

12.

Zhou T, Wang W, Liu S, Yang Y, Van Gool L (2021) Differentiable multi-granularity human representation learning for instance-aware human semantic parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1622–1631

13.

Zhou L, Chen Y, Gao Y, Wang J, Lu H (2020) Occlusion-aware Siamese network for human pose estimation. In: European conference on computer vision, pp. 396–412

14.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008

15.

Sun X, Shang J, Liang S, Wei Y (2017) Compositional human pose regression. In: Proceedings of the IEEE international conference on computer vision, pp. 2602–2611

16.

Li K, Wang S, Zhang X, Xu Y, Xu W, Tu Z (2021) Pose recognition with cascade transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1944–1953

17.

Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K(2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4903–4911

18.

Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5674–5682

19.

Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, Yu G, Lu H, Wei Y, Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148

20.

Wang J, Long X, Gao Y, Ding E, Wen S (2020) Graph-PCNN: two stage human pose estimation with graph pose refinement. In: European conference on computer vision, pp. 492–508

21.

Toshev A, Szegedy C (2014) Human pose estimation via deep neural networks. CVPR.(Columbus, Ohio, 2014), pp. 1653–1660

22.

Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4733–4742

23.

Tian Z, Chen H, Shen C (2019) Directpose: direct end-to-end multi-person pose estimation. arXiv preprint arXiv:1911.07451

24.

Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

25.

Nie X, Feng J, Zhang J, Yan S (2019) Single-stage multi-person pose machines. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6951–6960

26.

Li J, Bian S, Zeng A, Wang C, Pang B, Liu W, Lu C (2021) Human pose regression with residual log-likelihood estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 11025–11034

27.

Mao W, Ge Y, Shen C, Tian Z, Wang X, Wang Z, Hengel A.V.D (2022) Poseur: direct human pose regression with transformers. arXiv preprint arXiv:2201.07412

28.

Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi S.C, Ling H (2019) Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3064–3074

29.

Zhou T, Li J, Wang S, Tao R, Shen J (2020) Matnet: motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338CrossRefMATH

30.

Wang W, Zhao S, Shen J, Hoi S.C, Borji A (2019) Salient object detection with pyramid attention and salient edges. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1448–1457

31.

Fan D.-P, Wang W, Cheng M.-M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8554–8564

32.

Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27(5):2368–2378MathSciNetCrossRef

33.

Wang W, Shen J (2017) Deep cropping via attention box prediction and aesthetics assessment. In: Proceedings of the IEEE international conference on computer vision, pp. 2186–2194

34.

Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159

35.

Yang S, Quan Z, Nie M, Yang W (2020) Transpose: towards explainable human pose estimation by transformer. arXiv preprint arXiv:2012.14214

36.

Khan S, Naseer M, Hayat M, Zamir S.W, Khan F.S, Shah M (2021) Transformers in vision: a survey. arXiv preprint arXiv:2101.01169

37.

Zheng C, Zhu S, Mendieta M, Yang T, Chen C, Ding Z (2021) 3d human pose estimation with spatial and temporal transformers. arXiv preprint arXiv:2103.10455

38.

Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, et al. (2020) A survey on visual transformer. arXiv preprint arXiv:2012.12556

39.

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp. 213–229

40.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

41.

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030

42.

Li Y, Zhang S, Wang Z, Yang S, Yang W, Xia S.-T, Zhou E (2021) Tokenpose: learning keypoint tokens for human pose estimation. arXiv preprint arXiv:2104.03516

43.

Mao W, Ge Y, Shen C, Tian Z, Wang X, Wang Z (2021) Tfpose: direct human pose estimation with transformers. arXiv preprint arXiv:2103.15320

44.

Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, pp. 1385–1392. IEEE

45.

Chen X, Yuille AL (2015) Parsing occluded people by flexible compositions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3945–3954

46.

Fu L, Zhang J, Huang K (2016) ORGM: occlusion relational graphical model for human pose estimation. IEEE Trans Image Process 26(2):927–941MathSciNetCrossRefMATH

47.

Islam M.A, Jia S, Bruce N.D (2020) How much position information do convolutional neural networks encode? arXiv preprint arXiv:2001.08248

48.

Wu K, Peng H, Chen M, Fu J, Chao H (2021) Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10033–10041

49.

Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C.L (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755

50.

Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3686–3693

51.

Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7103–7112

52.

Li Z, Ye J, Song M, Huang Y, Pan Z (2021) Online knowledge distillation for efficient pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 11740–11750

53.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

54.

Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp. 190–206

55.

Nibali A, He Z, Morganc S, Prendergast L (2018) Numerical coordinate regression with convolutional neural networks. arXiv preprint arXiv:1801.07372

Title: MSRT: multi-scale representation transformer for regression-based human pose estimation
Authors: Beiguang Shan
Qingxuan Shi
Fang Yang
Publication date: 01-02-2023
Publisher: Springer London
Published in: Pattern Analysis and Applications / Issue 2/2023
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-023-01130-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2023

Semi-supervised transformable architecture search for feature distillation

Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification

Fuzzy and non-fuzzy k-quantile clustering for high-variance data

On improvability of hash clustering data from different sources by bipartite graph

Dual autoencoder based zero shot learning in special domain

Feature enhancement modules applied to a feature pyramid network for object detection

Premium Partner