Top

Multimedia Systems

Published in:

09-01-2022 | Letter to the Editor

HandO: a hybrid 3D hand–object reconstruction model for unknown objects

Authors: Hang Yu, Chilam Cheang, Yanwei Fu, Xiangyang Xue

Published in: Multimedia Systems | Issue 5/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In various multimedia applications, it is of great significance to reconstruct 3D meshes of hands and objects from single RGB images. Mesh-based methods mainly resort to mesh displacements by estimating relative positions between hands and objects, while the distance may be inaccurate. Methods based on signed distance function (SDF) learn relative positions by concurrently sampling hand meshes and object meshes; unfortunately, these methods have very limited capability of reconstructing smooth surfaces with rich details. For example, SDF-based methods are inclined to lose the typologies. To the best of our knowledge, only limited works can simultaneously reconstruct the hands and objects with smooth surfaces and accurate relative positions. To this end, we present a novel hybrid model—hand–object Model (HandO) enabling the hand–object 3D reconstruction with smooth surfaces and accurate positions. Critically, our model for the first time makes the hybrid 3D representation for this task by bringing meshes, SDFs, and parametric models together. A feature extractor is employed to extract the image features, and SDF sample points are projected onto these features to extract the local features of each sampled point. Essentially, our model can be naturally extended to reconstruct a whole body holding an object via the new hybrid representation. Additionally, to overcome the lack of training data, a synthetic body-holding dataset is contributed to the community, thus facilitating the research of reconstructing the hand and object. It contains 31763 images of over 50 object categories. Extensive experiments demonstrate that our model can achieve better performance over the competitors on benchmark datasets.

previous article FedFV: federated face verification via equivalent class embeddings

next article Exemplar-guided low-light image enhancement

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

The dataset is shown at https://baboon527.github.io/HandO/.

Shan, D., Geng, J., Shu, M., Fouhey, D.: Understanding human hands in contact at internet scale. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9869–9878 (2020)

Zhang, J., Pepose, S., Joo, H., Ramanan, D., Malik, J., Kanazawa, A.: Perceiving 3d human-object spatial arrangements from a single image in the wild. In: European conference on computer vision, pp. 34–51 (2020)

Diller, C., Funkhouser, T., Dai, A.: Forecasting characteristic 3D poses of human actions. ArXiv Preprint. arXiv:2011.15079 (2020)

Parger, M., Tang, C., Xu, Y., Twigg, C., Tao, L., Li, Y., Wang, R., Steinberger, M.: UNOC: understanding occlusion for embodied presence in virtual reality. ArXiv Preprint. arXiv:2012.03680 (2020)

Hassan, M., Choutas, V., Tzionas, D., Black, M.: Resolving 3D human pose ambiguities with 3D scene constraints. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2282–2292 (2019)

Monszpart, A., Guerrero, P., Ceylan, D., Yumer, E., Mitra, N.: iMapper: interaction-guided scene mapping from monocular videos. ACM Trans. Graph. 38, 1–15 (2019)CrossRef

Zhang, Y., Hassan, M., Neumann, H., Black, M., Tang, S.: Generating 3d people in scenes without people. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6194–6204 (2020)

Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.: Populating 3D scenes by learning human–scene interaction. ArXiv Preprint. arXiv:2012.11581 (2020)

Liu, M., Pan, Z., Xu, K., Ganguly, K., Manocha, D.: Generating grasp poses for a high-dof gripper using neural networks. ArXiv Preprint. arXiv:1903.00425 (2019)

10.

Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. ArXiv Preprint. arXiv:2008.04451 (2020)

11.

Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 605–613 (2017)

12.

Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE international conference on computer vision, pp. 2088–2096 (2017)

13.

Groueix, T., Fisher, M., Kim, V., Russell, B., Aubry, M.: A papier-mâché approach to learning 3d surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 216–224 (2018)

14.

Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.: Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp. 52–67 (2018)

15.

Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4460–4470 (2019)

16.

Han, X., Laga, H., Bennamoun, M.: Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1578–1604 (2019)CrossRef

17.

Choy, C., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp. 628–644 (2016)

18.

Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2mesh++: Multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1042–1051 (2019)

19.

Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 571–580 (2020)

20.

Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., Rogez, G.: Ganhand: predicting human grasp affordances in multi-object scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5031–5041 (2020)

21.

Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand–object interactions in the wild. ArXiv Preprint. arXiv:2012.09856 (2020)

22.

Romero, J., Tzionas, D., Black, M.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36, 1–17 (2017)CrossRef

23.

Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE international conference on computer vision, pp. 4903–4911 (2017)

24.

Tekin, B., Bogo, F., Pollefeys, M.: H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4511–4520 (2019)

25.

Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11807–11816 (2019)

26.

Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: Learning a contact potential field to model the hand–object interaction. ArXiv Preprint. arXiv:2012.00924 (2020)

27.

Taheri, O., Ghorbani, N., Black, M., Tzionas, D.: GRAB: a dataset of whole-body human grasping of objects. In: European conference on computer vision, pp. 581–600 (2020)

28.

Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 431–440 (2020)

29.

Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2455–2464 (2020)

30.

Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)CrossRef

31.

Rong, Y., Shiratori, T., Joo, H.: FrankMocap: fast monocular 3D hand and body motion capture by regression and integration. ArXiv Preprint. arXiv:2008.08324 (2020)

32.

Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3d annotation of hand and object poses. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3196–3206 (2020)

33.

Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7376–7385 (2020)

34.

Kulon, D., Guler, R.A., Kokkinos, I., et al.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4990–5000 (2020)

35.

Jiang, H., Liu, S., Wang, J., et al.: Hand-object contact consistency reasoning for human grasps generation. arXiv preprint. arXiv:2104.03304 (2021)

36.

Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. ArXiv Preprint. arXiv:1406.2283 (2014)

37.

Qi, C., Su, H., Mo, K., Guibas, L.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017)

38.

Qi, C., Yi, L., Su, H., Guibas, L.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. ArXiv Preprint. arXiv:1706.02413 (2017)

39.

Park, J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 165–174 (2019)

40.

Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3d reconstruction. ArXiv Preprint. arXiv:1905.10711 (2019)

41.

Shi, Z., Yu, L., El-Latif, A., Ahmed, A., Niu, X.: Skeleton modulated topological perception map for rapid viewpoint selection. IEICE Trans. Inf. Syst. 95, 2585–2588 (2012)CrossRef

42.

Shi, Z.-F., Yu, L.-Y., El-Latif, A., Ahmed, A., Le, D., Niu, X.-M.: A kinematics significance based skeleton map for rapid viewpoint selection. Res. J. Appl. Sci. Eng. Technol. 4, 2887–2892 (2012)

43.

Gad, R., Talha, M., El-Latif, A., Ahmed, A., Zorkany, M., El-Sayed, A., El-Fishawy, N., Ghulam, M.: Iris recognition using multi-algorithmic approaches for cognitive internet of things (CIoT) framewok. Future Gener. Comput. Syst. 86, 178–191 (2018)CrossRef

44.

Kumar, A., Singh, N., Kumar, P., Vijayvergia, A., Kumar, K.: A novel superpixel based color spatial feature for salient object detection. In: 2017 Conference on information and communication technology (CICT), IEEE, pp. 1–5 (2017)

45.

Kumain, S.C., Singh, M., Singh, N., Kumar, K.: An efficient Gaussian noise reduction technique for noisy images using optimized filter approach. In: IEEE in 2018 first international conference on secure cyber computing and communication (ICSCCC), IEEE, pp. 243–248 (2018)

46.

Atrish, A., Singh, N., Kumar, K., Kumar, V.: An automated hierarchical framework for player recognition in sports image. In: Proceedings of the international conference on video and image processing, pp. 103–108 (2017)

47.

Kumar, K., Shrimankar, D.D., Singh, N.: Key-lectures: keyframes extraction in video lectures. In: Machine intelligence and signal analysis, pp. 453–459. Springer, Singapore (2019)CrossRef

48.

Sharma, S., Kumar, K., Singh, N.: Deep eigen space based ASL recognition system. IETE J Res (2020). https://doi.org/10.1080/03772063.2020.1780164CrossRef

49.

Kumar, K.: Text query based summarized event searching interface system using deep learning over cloud. Multimedia Tools Appl. 80(7), 11079–11094 (2021)CrossRef

50.

Sharma, S., Kumar, P., Kumar, K.: A-PNR: automatic plate number recognition. In: Proceedings of the 7th international conference on computer and communication technology, pp. 106–110 (2017)

51.

Loper, M., Black, M.: OpenDR: an approximate differentiable renderer. In: European conference on computer vision, pp. 154–169 (2014)

52.

Lorensen, W., Cline, H.: Marching cubes: a high resolution 3D surface construction algorithm. Comput. Graph. 21, 163–169 (1987)CrossRef

53.

Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A., Tzionas, D., Black, M.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10975–10985 (2019)

54.

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 1–16 (2015)CrossRef

55.

Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 409–419 (2018)

56.

Mahmood, N., Ghorbani, N., Troje, N., Pons-Moll, G., Black, M.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5442–5451 (2019)

57.

Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2304–2314 (2019)

58.

Bhatnagar, B., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3d people from images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5420–5430 (2019)

59.

Brahmbhatt, S., Ham, C., Kemp, C., Hays, J.: Contactdb: analyzing and predicting grasp contact via thermal imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8709–8719 (2019)

60.

Pumarola, A., Sanchez-Riera, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3dpeople: modeling the geometry of dressed humans. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2242–2251 (2019)

61.

Jakob, W.: Mitsuba renderer (2010). http://www.mitsuba-renderer.org. Accessed 1 Dec 2020

62.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)

63.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

64.

Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)

65.

Kingma, D., Ba, J.: Adam: A method for stochastic optimization. ArXiv Preprint. arXiv:1412.6980 (2014)

Title: HandO: a hybrid 3D hand–object reconstruction model for unknown objects
Authors: Hang Yu
Chilam Cheang
Yanwei Fu
Xiangyang Xue
Publication date: 09-01-2022
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 5/2022
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-021-00874-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 5/2022

Optimized generative adversarial network based breast cancer diagnosis with wavelet and texture features

Novel design of cryptosystems for video/audio streaming via dynamic synchronized chaos-based random keys

Fixed-resolution representation network for human pose estimation

ESGAN for generating high quality enhanced samples

An olfactory display for virtual reality glasses

Robust 3D face modeling and tracking from RGB-D images