Skip to main content
Top
Published in: Multimedia Systems 1/2024

01-02-2024 | Regular Paper

Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation

Authors: Jeonghwan Kim, Hyukmin Kwon, Seong Yong Lim, Wonjun Kim

Published in: Multimedia Systems | Issue 1/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents a parameter-free method for 3D human pose estimation via the Laplacian decomposition-based transformer. The non-local interactions between 3D mesh vertices of the whole body are effectively estimated in transformer-based approaches while the graph model also has begun to be embedded into the transformer for consideration of neighborhood interactions in the kinematic topology. Even though such combination has shown the remarkable progress in 3D human pose estimation, scale-aware relationships between body parts are not sufficiently explored in literature. To supplement this point, we propose to apply the Laplacian pyramid module to the transformer, which decomposes encoded features into Laplacian residuals of different scale spaces. Furthermore, we separately compute self-attentions according to body parts for generating more natural human poses. Experimental results on benchmark datasets show that the proposed method successfully improves the performance of 3D human pose estimation. The code and model are publicly available at: https://​github.​com/​DCVL-3D/​Laphormer_​release.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition, pp. 3686–3693. Columbus, OH, USA (2014) Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE conference on computer vision and pattern recognition, pp. 3686–3693. Columbus, OH, USA (2014)
2.
go back to reference Anguelov, D., Srinivasan, P., Koller, D., et al.: SCAPE: shape completion and animation of people. ACM Trans. Graph 24(3), 408–416 (2005)CrossRef Anguelov, D., Srinivasan, P., Koller, D., et al.: SCAPE: shape completion and animation of people. ACM Trans. Graph 24(3), 408–416 (2005)CrossRef
3.
go back to reference Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, MJ.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: European conference on computer vision, pp. 561–578. Amsterdam, Netherlands, (2016) Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, MJ.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: European conference on computer vision, pp. 561–578. Amsterdam, Netherlands, (2016)
4.
go back to reference Choi, H., Moon, G., Lee, KM.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: European conference on Computer vision, virtual, pp. 769–787 (2020) Choi, H., Moon, G., Lee, KM.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: European conference on Computer vision, virtual, pp. 769–787 (2020)
5.
go back to reference Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, pp. 519–534. Springer (2016) Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, pp. 519–534. Springer (2016)
6.
go back to reference Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)CrossRef Ionescu, C., Papava, D., Olaru, V., et al.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)CrossRef
7.
go back to reference Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE conference computer vision and pattern recognition, pp. 7122–7131. Salt Lake City, UT, USA (2018) Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE conference computer vision and pattern recognition, pp. 7122–7131. Salt Lake City, UT, USA (2018)
8.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. International conference on learning representations. In: International conference on learning representations, pp. 1–13. San Diego, CA, USA (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. International conference on learning representations. In: International conference on learning representations, pp. 1–13. San Diego, CA, USA (2015)
9.
go back to reference Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: IEEE Conference on computer vision and pattern recognition, virtual, pp. 5253–5263 (2020) Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: IEEE Conference on computer vision and pattern recognition, virtual, pp. 5253–5263 (2020)
10.
go back to reference Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation, In: International conference on computer vision, virtual, pp. 11127–11137 (2021) Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation, In: International conference on computer vision, virtual, pp. 11127–11137 (2021)
11.
go back to reference Kolotouros, N., Pavlakos, G., Black, MJ., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: International conference on computer vision, pp. 2252–2261, Seoul, Korea (2019) Kolotouros, N., Pavlakos, G., Black, MJ., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: International conference on computer vision, pp. 2252–2261, Seoul, Korea (2019)
12.
go back to reference Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction, In: IEEE conference on computer vision and pattern recognition, pp. 4501–4510. Long Beach, CA, USA (2019) Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction, In: IEEE conference on computer vision and pattern recognition, pp. 4501–4510. Long Beach, CA, USA (2019)
13.
go back to reference Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on computer vision and pattern recognition, pp. 624–632. Honolulu, HI, USA (2017) Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on computer vision and pattern recognition, pp. 624–632. Honolulu, HI, USA (2017)
14.
go back to reference Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE conference on computer vision and pattern recognition, pp. 6050–6059. Honolulu, HI, USA (2017) Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE conference on computer vision and pattern recognition, pp. 6050–6059. Honolulu, HI, USA (2017)
15.
go back to reference Lim, S., Kim, W.: DSLR: deep stacked Laplacian restorer for low-light image enhancement. IEEE Trans. Multimed. 23, 4272–4284 (2020)CrossRef Lim, S., Kim, W.: DSLR: deep stacked Laplacian restorer for low-light image enhancement. IEEE Trans. Multimed. 23, 4272–4284 (2020)CrossRef
16.
go back to reference Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: IEEE conference on computer vision and pattern recognition, virtual, pp. 1954–1963 (2021) Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: IEEE conference on computer vision and pattern recognition, virtual, pp. 1954–1963 (2021)
17.
go back to reference Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: International conference on computer vision, virtual, pp. 12939–12948 (2021) Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: International conference on computer vision, virtual, pp. 12939–12948 (2021)
18.
go back to reference Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft COCO: common objects in context. In: European conference on computer vision, pp. 740–755. Zurich, Switzerland (2014) Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft COCO: common objects in context. In: European conference on computer vision, pp. 740–755. Zurich, Switzerland (2014)
19.
go back to reference Loper, M., Mahmood, N., Romero, J., et al.: SMPL: a skinned multi-person linear model. ACM Trans. Graph 34(6), 2481–24816 (2015)CrossRef Loper, M., Mahmood, N., Romero, J., et al.: SMPL: a skinned multi-person linear model. ACM Trans. Graph 34(6), 2481–24816 (2015)CrossRef
20.
go back to reference Marcard, T.V., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: European conference on computer vision, pp. 601–617 (2018), Munich, Germany Marcard, T.V., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: European conference on computer vision, pp. 601–617 (2018), Munich, Germany
21.
go back to reference Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C.: Single-shot multi-person 3D pose estimation from monocular RGB. In: international conference on 3D vision, pp. 120–130. Verona, Italy (2018) Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Sridhar, S., Pons-Moll, G., Theobalt, C.: Single-shot multi-person 3D pose estimation from monocular RGB. In: international conference on 3D vision, pp. 120–130. Verona, Italy (2018)
22.
go back to reference Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: European conference on computer vision, virtual, pp. 752–768 (2020) Moon, G., Lee, K.M.: I2L-MeshNet: image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: European conference on computer vision, virtual, pp. 752–768 (2020)
23.
go back to reference Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Advances in neural information processing systems, pp. 1–4. Long Beach, CA, USA (2017) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Advances in neural information processing systems, pp. 1–4. Long Beach, CA, USA (2017)
24.
go back to reference Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE conference on computer vision and pattern recognition, pp. 10975–10985. Long Beach, CA, USA (2019) Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE conference on computer vision and pattern recognition, pp. 10975–10985. Long Beach, CA, USA (2019)
25.
go back to reference Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: European conference on computer vision, pp. 704–720. Munich, Germany (2018) Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: European conference on computer vision, pp. 704–720. Munich, Germany (2018)
26.
go back to reference Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3D people. In: International conference on computer vision virtual, pp. 11179–11188 (2021) Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3D people. In: International conference on computer vision virtual, pp. 11179–11188 (2021)
27.
go back to reference Sun, Y., Liu, W., Bao, Q., Fu, Y., Mei, T., Black, M.J.: Putting people in their place: monocular regression of 3D people in depth. In: IEEE conference on computer vision and pattern recognition, pp. 13243–13252. New Orleans, Louisiana, USA (2022) Sun, Y., Liu, W., Bao, Q., Fu, Y., Mei, T., Black, M.J.: Putting people in their place: monocular regression of 3D people in depth. In: IEEE conference on computer vision and pattern recognition, pp. 13243–13252. New Orleans, Louisiana, USA (2022)
28.
go back to reference Wan, Z., Li, Z., Tian, M., Liu, J., Yi, S., Li, H.: Encoder–decoder with multi-level attention for 3D human shape and pose estimation. In: International conference on computer vision virtual, pp. 13033–13042 (2021) Wan, Z., Li, Z., Tian, M., Liu, J., Yi, S., Li, H.: Encoder–decoder with multi-level attention for 3D human shape and pose estimation. In: International conference on computer vision virtual, pp. 13033–13042 (2021)
29.
go back to reference Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)CrossRefPubMed Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)CrossRefPubMed
30.
go back to reference Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: European conference on computer vision, Munich, pp. 52–67. Munich, Germany (2018) Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: European conference on computer vision, Munich, pp. 52–67. Munich, Germany (2018)
31.
go back to reference Wei, W.L., Lin, J.C., Liu, T.L., Liao, H.Y.M.: Capturing humans in motion: temporal-attentive 3d human pose and shape estimation from monocular video, In: IEEE conference on computer vision and pattern recognition, pp. 13211–13220. New Orleans, Louisiana, USA (2022) Wei, W.L., Lin, J.C., Liu, T.L., Liao, H.Y.M.: Capturing humans in motion: temporal-attentive 3d human pose and shape estimation from monocular video, In: IEEE conference on computer vision and pattern recognition, pp. 13211–13220. New Orleans, Louisiana, USA (2022)
32.
go back to reference Zeng, W., Jin, S., Liu, W., Qian, C., Luo, P., Ouyang, W., Wang, X.: Not all tokens are equal: human-centric visual analysis via token clustering transformer. In: IEEE conference on computer vision and pattern recognition, pp. 11101–11111. New Orleans, Louisiana, USA (2022) Zeng, W., Jin, S., Liu, W., Qian, C., Luo, P., Ouyang, W., Wang, X.: Not all tokens are equal: human-centric visual analysis via token clustering transformer. In: IEEE conference on computer vision and pattern recognition, pp. 11101–11111. New Orleans, Louisiana, USA (2022)
33.
go back to reference Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L., Sun, Z.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: International conference on computer vision, virtual, pp. 11446–11456 (2021) Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L., Sun, Z.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: International conference on computer vision, virtual, pp. 11446–11456 (2021)
34.
go back to reference Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: IEEE conference on computer vision and pattern recognition, virtual, pp. 7376–7385 (2020) Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: IEEE conference on computer vision and pattern recognition, virtual, pp. 7376–7385 (2020)
Metadata
Title
Learning scale-aware relationships via Laplacian decomposition-based transformer for 3D human pose estimation
Authors
Jeonghwan Kim
Hyukmin Kwon
Seong Yong Lim
Wonjun Kim
Publication date
01-02-2024
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 1/2024
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-023-01216-5

Other articles of this Issue 1/2024

Multimedia Systems 1/2024 Go to the issue