nach oben

Pattern Analysis and Applications

Erschienen in:

24.01.2022 | Theoretical Advances

3D hand pose estimation from a single RGB image through semantic decomposition of VAE latent space

verfasst von: Xinru Guo, Song Xu, Xiangbo Lin, Yi Sun, Xiaohong Ma

Erschienen in: Pattern Analysis and Applications | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Based on the disentanglement representation learning theory and the cross-modal variational autoencoder (VAE) model, we derive a “Single Input Multiple Output” (SIMO) disentangled model \({\text{cmSIMO} - \beta \,\text{VAE}}\). With the guidance of this derived model, we design a new VAE network, named da-VAE, for the challenging task of 3D hand pose estimation from a single RGB image. The designed da-VAE network has a multi-head encoder with the attention modules. Cooperating with the specific supervisions, the latent space is decomposed into subspaces with explicit semantics, which are relevant to the generative factors of hand pose, shape, appearance and others. The performance of the proposed da-VAE network is evaluated on RHD and STB dataset. The experimental results show competitive accuracies with the state-of-the-art methods.

Vorheriger Artikel Maxmin distance sort heuristic-based initial centroid method of partitional clustering for big data mining

Nächster Artikel A novel fully parallel skeletonization algorithm

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp. 4903–4911

Iqbal U, Molchanov P, Gall TBJ, Kautz J (2018) Hand pose estimation via latent 2.5d heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134

Cai Y, Ge L, Cai J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682

Boukhayma A, Bem R de, Torr PHS (2019) 3d hand shape and pose from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10843–10852

Ge L, Ren Z, Li Y, Xue Z, Wang Y, Cai J, Yuan J (2019) 3d hand shape and pose estimation from a single rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10833–10842

Zhang X, Li Q, Mo H, Zhang W, Zheng W (2019) End-to-end hand mesh recovery from a monocular rgb image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2354–2364

Baek S, Kim KI, Kim TK (2019) Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1067–1076

Cai Y, Ge L, Liu J, Cai J, Cham T-J, Yuan J, Thalmann NM (2019) Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2272–2281

Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828CrossRef

10.

Ridgeway K (2016) A survey of inductive biases for factorial representation-learning. arXiv preprintarXiv:1612.05299, 2016

11.

Kingma DP, Welling M (2014) Auto-encoding variational bayes. In International Conference on Learning Representation (ICLR)

12.

Kulkarni TD, Whitney W, Kohli P, Tenenbaum JB (2015) Deep convolutional inverse graphics network. Advances in Neural Information Processing Systems (NIPS), pp 2539–2547

13.

Karaletsos T, Belongie S, Rtsch G (2016) Bayesian representation learning with oracle constraints. In International Conference on Learning Representations (ICLR)

14.

Kim M, Wang Y, Sahu P, Pavlovic V (2019) Bayes-factor-vae: Hierarchical bayesian deep auto-encoder models for factor disentanglement. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2979–2987

15.

Chen RTQ, Li X, Grosse R, Duvenaud D (2018) Isolating sources of disentanglement in variational autoencoders. arXiv preprintarXiv:1802.04942

16.

Yang L, Yao A (2019) Disentangling latent hands for image synthesis and pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9877–9886

17.

Locatello F, Bauer S, Lucic M, Raetsch G, Gelly S, Schölkopf B, Bachem O (2019) Challenging common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning (ICML), pp. 4114–4124

18.

Vahdat A, Kautz J (2020) Nvae: a deep hierarchical variational autoencoder. arXiv preprintarXiv:2007.03898

19.

Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3d hand pose tracking and estimation using stereo matching. arXiv preprintarXiv:1610.07214

20.

Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) \(\beta\)-vae: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR)

21.

Burgess CP, Higgins I, Pal A, Matthey L, Watters N, Desjardins G, Lerchner A (2018) Understanding disentangling in \(\beta\)-vae. arXiv preprintarXiv:1804.03599

22.

Kim H, Mnih A (2018) Disentangling by factorising. In International Conference on Machine Learning, pp. 2649–2658

23.

Kumar A, Sattigeri P, Balakrishnan A (2017) Variational inference of disentangled latent concepts from unlabeled observations. In International Conference on Learning Representations (ICLR)

24.

Dupont E (2018) Learning disentangled joint continuous and discrete representations. Adv Neural Inf Process Syst (NIPS), pp. 710–720

25.

Lee W, Kim D, Hong S, Lee H (2020) High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision (ECCV), pp. 157–174

26.

Siddharth N, Paige B, van de Meent J-W, Desmaison A, Goodman N, Kohli P, Wood F, Torr P (2017) Learning disentangled representations with semi-supervised deep generative models. Adv Neural Inf Process Syst (NIPS) 30:5925–5935

27.

Ruiz A, Martinez O, Binefa X, Verbeek J (2019) Learning disentangled representations with reference-based variational autoencoders. arXiv preprintarXiv:1901.08534

28.

Chen J, Batmanghelich K (2020) Weakly supervised disentanglement by pairwise similarities. Proce AAAI Conf Artif Intell 34:3495–3502

29.

Locatello F, Tschannen M, Bauer S, Rätsch G, Schölkopf B, Bachem O (2019) Disentangling factors of variation using few labels. arXiv preprintarXiv:1905.01258

30.

Wan C, Probst T, Van Gool L, Yao A (2017) Crossing nets: combining gans and vaes with a shared latent space for hand pose estimation. In Proc IEEE Conf Computer Vision Pattern Recogn (CVPR), pp. 680–689

31.

Gao Y, Wang Y, Falco P, Navab N, Tombari F (2019) Variational object-aware 3-d hand pose from a single rgb image. IEEE Robot Autom Letts 4(4):4239–4246CrossRef

32.

Spurr A, Song J, Park S, Hilliges O (2018) Cross-modal deep variational hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 89–98

33.

Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3d hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2335–2343

34.

Kulon D, Guler RA, Kokkinos I, Bronstein MM, Zafeiriou S (2020) Weakly-supervised mesh-convolutional hand reconstruction in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4990–5000

35.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016

36.

Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–519

37.

Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 206–215

38.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141

39.

Li S, Lee D (2019) Point-to-pose voting based hand pose estimation using residual permutation equivariant layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11927–11936

40.

Romero J, Tzionas D, Black MJ (2017) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph (ToG) 36(6):1–17CrossRef

41.

Yang L, Li J, Xu W, Diao Y, Lu C (2020) Bihand: Recovering hand mesh with multi-stage bisected hourglass networks. arXiv preprintarXiv:2008.05079

42.

Zhou Y, Habermann M, Xu W, Habibie I, Theobalt C, Xu F (2020) Monocular real-time hand shape and motion capture using multi-modal data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5346–5355

43.

Zhao L, Peng X, Chen Y, Kapadia M, Metaxas DN (2020) Knowledge as priors: cross-modal knowledge generalization for datasets without superior knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6528–6537

44.

Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49–59

Titel: 3D hand pose estimation from a single RGB image through semantic decomposition of VAE latent space
verfasst von: Xinru Guo
Song Xu
Xiangbo Lin
Yi Sun
Xiaohong Ma
Publikationsdatum: 24.01.2022
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 1/2022
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-021-01048-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

Degraded document image preprocessing using local adaptive sharpening and illumination compensation

Maxmin distance sort heuristic-based initial centroid method of partitional clustering for big data mining

Influence of heterogeneous edge weights on assortative mixing patterns in military personnel networks

Machine learning applied to emerald gemstone grading: framework proposal and creation of a public dataset

A new approach for deconvolution filtering of 2D systems described by the Fornasini–Marchesini and discrete moments

Grafted and vanishing random subspaces

Premium Partner