Top

Multimedia Systems

Published in:

01-02-2024 | Regular Paper

Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation

Authors: Jeonghwan Kim, Hyukmin Kwon, Seong Yong Lim, Wonjun Kim

Published in: Multimedia Systems | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper presents a parameter-free method for 3D human pose estimation via the Laplacian decomposition-based transformer. The non-local interactions between 3D mesh vertices of the whole body are effectively estimated in transformer-based approaches while the graph model also has begun to be embedded into the transformer for consideration of neighborhood interactions in the kinematic topology. Even though such combination has shown the remarkable progress in 3D human pose estimation, scale-aware relationships between body parts are not sufficiently explored in literature. To supplement this point, we propose to apply the Laplacian pyramid module to the transformer, which decomposes encoded features into Laplacian residuals of different scale spaces. Furthermore, we separately compute self-attentions according to body parts for generating more natural human poses. Experimental results on benchmark datasets show that the proposed method successfully improves the performance of 3D human pose estimation. The code and model are publicly available at: https://github.com/DCVL-3D/Laphormer_release.

previous article DiffuseRoll: multi-track multi-attribute music generation based on diffusion model

next article ITrans: generative image inpainting with transformers

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition, pp. 3686–3693. Columbus, OH, USA (2014)

Anguelov, D., Srinivasan, P., Koller, D., et al.: SCAPE: shape completion and animation of people. ACM Trans. Graph 24(3), 408–416 (2005)CrossRef

Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, MJ.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: European conference on computer vision, pp. 561–578. Amsterdam, Netherlands, (2016)

Choi, H., Moon, G., Lee, KM.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: European conference on Computer vision, virtual, pp. 769–787 (2020)

Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, pp. 519–534. Springer (2016)

Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)CrossRef

Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE conference computer vision and pattern recognition, pp. 7122–7131. Salt Lake City, UT, USA (2018)

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. International conference on learning representations. In: International conference on learning representations, pp. 1–13. San Diego, CA, USA (2015)

Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: IEEE Conference on computer vision and pattern recognition, virtual, pp. 5253–5263 (2020)

10.

Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation, In: International conference on computer vision, virtual, pp. 11127–11137 (2021)

11.

Kolotouros, N., Pavlakos, G., Black, MJ., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: International conference on computer vision, pp. 2252–2261, Seoul, Korea (2019)

12.

Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction, In: IEEE conference on computer vision and pattern recognition, pp. 4501–4510. Long Beach, CA, USA (2019)

13.

Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on computer vision and pattern recognition, pp. 624–632. Honolulu, HI, USA (2017)

14.

Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE conference on computer vision and pattern recognition, pp. 6050–6059. Honolulu, HI, USA (2017)

15.

Lim, S., Kim, W.: DSLR: deep stacked Laplacian restorer for low-light image enhancement. IEEE Trans. Multimed. 23, 4272–4284 (2020)CrossRef

16.

Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: IEEE conference on computer vision and pattern recognition, virtual, pp. 1954–1963 (2021)

17.

Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: International conference on computer vision, virtual, pp. 12939–12948 (2021)

18.

Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft COCO: common objects in context. In: European conference on computer vision, pp. 740–755. Zurich, Switzerland (2014)

19.

Loper, M., Mahmood, N., Romero, J., et al.: SMPL: a skinned multi-person linear model. ACM Trans. Graph 34(6), 2481–24816 (2015)CrossRef

20.

Marcard, T.V., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: European conference on computer vision, pp. 601–617 (2018), Munich, Germany

21.

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C.: Single-shot multi-person 3D pose estimation from monocular RGB. In: international conference on 3D vision, pp. 120–130. Verona, Italy (2018)

22.

Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: European conference on computer vision, virtual, pp. 752–768 (2020)

23.

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Advances in neural information processing systems, pp. 1–4. Long Beach, CA, USA (2017)

24.

Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE conference on computer vision and pattern recognition, pp. 10975–10985. Long Beach, CA, USA (2019)

25.

Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: European conference on computer vision, pp. 704–720. Munich, Germany (2018)

26.

Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3D people. In: International conference on computer vision virtual, pp. 11179–11188 (2021)

27.

Sun, Y., Liu, W., Bao, Q., Fu, Y., Mei, T., Black, M.J.: Putting people in their place: monocular regression of 3D people in depth. In: IEEE conference on computer vision and pattern recognition, pp. 13243–13252. New Orleans, Louisiana, USA (2022)

28.

Wan, Z., Li, Z., Tian, M., Liu, J., Yi, S., Li, H.: Encoder–decoder with multi-level attention for 3D human shape and pose estimation. In: International conference on computer vision virtual, pp. 13033–13042 (2021)

29.

Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)CrossRefPubMed

30.

Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: European conference on computer vision, Munich, pp. 52–67. Munich, Germany (2018)

31.

Wei, W.L., Lin, J.C., Liu, T.L., Liao, H.Y.M.: Capturing humans in motion: temporal-attentive 3d human pose and shape estimation from monocular video, In: IEEE conference on computer vision and pattern recognition, pp. 13211–13220. New Orleans, Louisiana, USA (2022)

32.

Zeng, W., Jin, S., Liu, W., Qian, C., Luo, P., Ouyang, W., Wang, X.: Not all tokens are equal: human-centric visual analysis via token clustering transformer. In: IEEE conference on computer vision and pattern recognition, pp. 11101–11111. New Orleans, Louisiana, USA (2022)

33.

Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L., Sun, Z.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: International conference on computer vision, virtual, pp. 11446–11456 (2021)

34.

Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: IEEE conference on computer vision and pattern recognition, virtual, pp. 7376–7385 (2020)

Title: Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
Authors: Jeonghwan Kim
Hyukmin Kwon
Seong Yong Lim
Wonjun Kim
Publication date: 01-02-2024
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 1/2024
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-023-01216-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2024

ITrans: generative image inpainting with transformers

Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach

Universal unsupervised cross-domain 3D shape retrieval

SwinCT: feature enhancement based low-dose CT images denoising with swin transformer

STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition

An automatic music generation method based on RSCLN_Transformer network