Skip to main content
Top
Published in: Multimedia Systems 5/2022

09-01-2022 | Letter to the Editor

HandO: a hybrid 3D hand–object reconstruction model for unknown objects

Authors: Hang Yu, Chilam Cheang, Yanwei Fu, Xiangyang Xue

Published in: Multimedia Systems | Issue 5/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In various multimedia applications, it is of great significance to reconstruct 3D meshes of hands and objects from single RGB images. Mesh-based methods mainly resort to mesh displacements by estimating relative positions between hands and objects, while the distance may be inaccurate. Methods based on signed distance function (SDF) learn relative positions by concurrently sampling hand meshes and object meshes; unfortunately, these methods have very limited capability of reconstructing smooth surfaces with rich details. For example, SDF-based methods are inclined to lose the typologies. To the best of our knowledge, only limited works can simultaneously reconstruct the hands and objects with smooth surfaces and accurate relative positions. To this end, we present a novel hybrid model—hand–object Model (HandO) enabling the hand–object 3D reconstruction with smooth surfaces and accurate positions. Critically, our model for the first time makes the hybrid 3D representation for this task by bringing meshes, SDFs, and parametric models together. A feature extractor is employed to extract the image features, and SDF sample points are projected onto these features to extract the local features of each sampled point. Essentially, our model can be naturally extended to reconstruct a whole body holding an object via the new hybrid representation. Additionally, to overcome the lack of training data, a synthetic body-holding dataset is contributed to the community, thus facilitating the research of reconstructing the hand and object. It contains 31763 images of over 50 object categories. Extensive experiments demonstrate that our model can achieve better performance over the competitors on benchmark datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
1.
go back to reference Shan, D., Geng, J., Shu, M., Fouhey, D.: Understanding human hands in contact at internet scale. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9869–9878 (2020) Shan, D., Geng, J., Shu, M., Fouhey, D.: Understanding human hands in contact at internet scale. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9869–9878 (2020)
2.
go back to reference Zhang, J., Pepose, S., Joo, H., Ramanan, D., Malik, J., Kanazawa, A.: Perceiving 3d human-object spatial arrangements from a single image in the wild. In: European conference on computer vision, pp. 34–51 (2020) Zhang, J., Pepose, S., Joo, H., Ramanan, D., Malik, J., Kanazawa, A.: Perceiving 3d human-object spatial arrangements from a single image in the wild. In: European conference on computer vision, pp. 34–51 (2020)
3.
4.
go back to reference Parger, M., Tang, C., Xu, Y., Twigg, C., Tao, L., Li, Y., Wang, R., Steinberger, M.: UNOC: understanding occlusion for embodied presence in virtual reality. ArXiv Preprint. arXiv:2012.03680 (2020) Parger, M., Tang, C., Xu, Y., Twigg, C., Tao, L., Li, Y., Wang, R., Steinberger, M.: UNOC: understanding occlusion for embodied presence in virtual reality. ArXiv Preprint. arXiv:​2012.​03680 (2020)
5.
go back to reference Hassan, M., Choutas, V., Tzionas, D., Black, M.: Resolving 3D human pose ambiguities with 3D scene constraints. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2282–2292 (2019) Hassan, M., Choutas, V., Tzionas, D., Black, M.: Resolving 3D human pose ambiguities with 3D scene constraints. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2282–2292 (2019)
6.
go back to reference Monszpart, A., Guerrero, P., Ceylan, D., Yumer, E., Mitra, N.: iMapper: interaction-guided scene mapping from monocular videos. ACM Trans. Graph. 38, 1–15 (2019)CrossRef Monszpart, A., Guerrero, P., Ceylan, D., Yumer, E., Mitra, N.: iMapper: interaction-guided scene mapping from monocular videos. ACM Trans. Graph. 38, 1–15 (2019)CrossRef
7.
go back to reference Zhang, Y., Hassan, M., Neumann, H., Black, M., Tang, S.: Generating 3d people in scenes without people. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6194–6204 (2020) Zhang, Y., Hassan, M., Neumann, H., Black, M., Tang, S.: Generating 3d people in scenes without people. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6194–6204 (2020)
8.
go back to reference Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.: Populating 3D scenes by learning human–scene interaction. ArXiv Preprint. arXiv:2012.11581 (2020) Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.: Populating 3D scenes by learning human–scene interaction. ArXiv Preprint. arXiv:​2012.​11581 (2020)
9.
go back to reference Liu, M., Pan, Z., Xu, K., Ganguly, K., Manocha, D.: Generating grasp poses for a high-dof gripper using neural networks. ArXiv Preprint. arXiv:1903.00425 (2019) Liu, M., Pan, Z., Xu, K., Ganguly, K., Manocha, D.: Generating grasp poses for a high-dof gripper using neural networks. ArXiv Preprint. arXiv:​1903.​00425 (2019)
10.
go back to reference Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. ArXiv Preprint. arXiv:2008.04451 (2020) Karunratanakul, K., Yang, J., Zhang, Y., Black, M., Muandet, K., Tang, S.: Grasping field: learning implicit representations for human grasps. ArXiv Preprint. arXiv:​2008.​04451 (2020)
11.
go back to reference Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 605–613 (2017) Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 605–613 (2017)
12.
go back to reference Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE international conference on computer vision, pp. 2088–2096 (2017) Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE international conference on computer vision, pp. 2088–2096 (2017)
13.
go back to reference Groueix, T., Fisher, M., Kim, V., Russell, B., Aubry, M.: A papier-mâché approach to learning 3d surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 216–224 (2018) Groueix, T., Fisher, M., Kim, V., Russell, B., Aubry, M.: A papier-mâché approach to learning 3d surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 216–224 (2018)
14.
go back to reference Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.: Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp. 52–67 (2018) Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.: Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp. 52–67 (2018)
15.
go back to reference Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4460–4470 (2019) Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4460–4470 (2019)
16.
go back to reference Han, X., Laga, H., Bennamoun, M.: Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1578–1604 (2019)CrossRef Han, X., Laga, H., Bennamoun, M.: Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1578–1604 (2019)CrossRef
17.
go back to reference Choy, C., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp. 628–644 (2016) Choy, C., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp. 628–644 (2016)
18.
go back to reference Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2mesh++: Multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1042–1051 (2019) Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2mesh++: Multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1042–1051 (2019)
19.
go back to reference Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 571–580 (2020) Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 571–580 (2020)
20.
go back to reference Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., Rogez, G.: Ganhand: predicting human grasp affordances in multi-object scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5031–5041 (2020) Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., Rogez, G.: Ganhand: predicting human grasp affordances in multi-object scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5031–5041 (2020)
21.
go back to reference Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand–object interactions in the wild. ArXiv Preprint. arXiv:2012.09856 (2020) Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand–object interactions in the wild. ArXiv Preprint. arXiv:​2012.​09856 (2020)
22.
go back to reference Romero, J., Tzionas, D., Black, M.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36, 1–17 (2017)CrossRef Romero, J., Tzionas, D., Black, M.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36, 1–17 (2017)CrossRef
23.
go back to reference Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE international conference on computer vision, pp. 4903–4911 (2017) Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE international conference on computer vision, pp. 4903–4911 (2017)
24.
go back to reference Tekin, B., Bogo, F., Pollefeys, M.: H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4511–4520 (2019) Tekin, B., Bogo, F., Pollefeys, M.: H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4511–4520 (2019)
25.
go back to reference Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11807–11816 (2019) Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11807–11816 (2019)
26.
go back to reference Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: Learning a contact potential field to model the hand–object interaction. ArXiv Preprint. arXiv:2012.00924 (2020) Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: Learning a contact potential field to model the hand–object interaction. ArXiv Preprint. arXiv:​2012.​00924 (2020)
27.
go back to reference Taheri, O., Ghorbani, N., Black, M., Tzionas, D.: GRAB: a dataset of whole-body human grasping of objects. In: European conference on computer vision, pp. 581–600 (2020) Taheri, O., Ghorbani, N., Black, M., Tzionas, D.: GRAB: a dataset of whole-body human grasping of objects. In: European conference on computer vision, pp. 581–600 (2020)
28.
go back to reference Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 431–440 (2020) Song, C., Song, J., Huang, Q.: Hybridpose: 6d object pose estimation under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 431–440 (2020)
29.
go back to reference Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2455–2464 (2020) Yang, Z., Yan, S., Huang, Q.: Extreme relative pose network under hybrid representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2455–2464 (2020)
30.
go back to reference Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)CrossRef Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 172–186 (2019)CrossRef
31.
go back to reference Rong, Y., Shiratori, T., Joo, H.: FrankMocap: fast monocular 3D hand and body motion capture by regression and integration. ArXiv Preprint. arXiv:2008.08324 (2020) Rong, Y., Shiratori, T., Joo, H.: FrankMocap: fast monocular 3D hand and body motion capture by regression and integration. ArXiv Preprint. arXiv:​2008.​08324 (2020)
32.
go back to reference Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3d annotation of hand and object poses. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3196–3206 (2020) Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3d annotation of hand and object poses. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3196–3206 (2020)
33.
go back to reference Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7376–7385 (2020) Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7376–7385 (2020)
34.
go back to reference Kulon, D., Guler, R.A., Kokkinos, I., et al.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4990–5000 (2020) Kulon, D., Guler, R.A., Kokkinos, I., et al.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4990–5000 (2020)
35.
go back to reference Jiang, H., Liu, S., Wang, J., et al.: Hand-object contact consistency reasoning for human grasps generation. arXiv preprint. arXiv:2104.03304 (2021) Jiang, H., Liu, S., Wang, J., et al.: Hand-object contact consistency reasoning for human grasps generation. arXiv preprint. arXiv:​2104.​03304 (2021)
36.
go back to reference Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. ArXiv Preprint. arXiv:1406.2283 (2014) Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. ArXiv Preprint. arXiv:​1406.​2283 (2014)
37.
go back to reference Qi, C., Su, H., Mo, K., Guibas, L.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017) Qi, C., Su, H., Mo, K., Guibas, L.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660 (2017)
38.
go back to reference Qi, C., Yi, L., Su, H., Guibas, L.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. ArXiv Preprint. arXiv:1706.02413 (2017) Qi, C., Yi, L., Su, H., Guibas, L.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. ArXiv Preprint. arXiv:​1706.​02413 (2017)
39.
go back to reference Park, J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 165–174 (2019) Park, J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 165–174 (2019)
40.
go back to reference Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3d reconstruction. ArXiv Preprint. arXiv:1905.10711 (2019) Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: Disn: deep implicit surface network for high-quality single-view 3d reconstruction. ArXiv Preprint. arXiv:​1905.​10711 (2019)
41.
go back to reference Shi, Z., Yu, L., El-Latif, A., Ahmed, A., Niu, X.: Skeleton modulated topological perception map for rapid viewpoint selection. IEICE Trans. Inf. Syst. 95, 2585–2588 (2012)CrossRef Shi, Z., Yu, L., El-Latif, A., Ahmed, A., Niu, X.: Skeleton modulated topological perception map for rapid viewpoint selection. IEICE Trans. Inf. Syst. 95, 2585–2588 (2012)CrossRef
42.
go back to reference Shi, Z.-F., Yu, L.-Y., El-Latif, A., Ahmed, A., Le, D., Niu, X.-M.: A kinematics significance based skeleton map for rapid viewpoint selection. Res. J. Appl. Sci. Eng. Technol. 4, 2887–2892 (2012) Shi, Z.-F., Yu, L.-Y., El-Latif, A., Ahmed, A., Le, D., Niu, X.-M.: A kinematics significance based skeleton map for rapid viewpoint selection. Res. J. Appl. Sci. Eng. Technol. 4, 2887–2892 (2012)
43.
go back to reference Gad, R., Talha, M., El-Latif, A., Ahmed, A., Zorkany, M., El-Sayed, A., El-Fishawy, N., Ghulam, M.: Iris recognition using multi-algorithmic approaches for cognitive internet of things (CIoT) framewok. Future Gener. Comput. Syst. 86, 178–191 (2018)CrossRef Gad, R., Talha, M., El-Latif, A., Ahmed, A., Zorkany, M., El-Sayed, A., El-Fishawy, N., Ghulam, M.: Iris recognition using multi-algorithmic approaches for cognitive internet of things (CIoT) framewok. Future Gener. Comput. Syst. 86, 178–191 (2018)CrossRef
44.
go back to reference Kumar, A., Singh, N., Kumar, P., Vijayvergia, A., Kumar, K.: A novel superpixel based color spatial feature for salient object detection. In: 2017 Conference on information and communication technology (CICT), IEEE, pp. 1–5 (2017) Kumar, A., Singh, N., Kumar, P., Vijayvergia, A., Kumar, K.: A novel superpixel based color spatial feature for salient object detection. In: 2017 Conference on information and communication technology (CICT), IEEE, pp. 1–5 (2017)
45.
go back to reference Kumain, S.C., Singh, M., Singh, N., Kumar, K.: An efficient Gaussian noise reduction technique for noisy images using optimized filter approach. In: IEEE in 2018 first international conference on secure cyber computing and communication (ICSCCC), IEEE, pp. 243–248 (2018) Kumain, S.C., Singh, M., Singh, N., Kumar, K.: An efficient Gaussian noise reduction technique for noisy images using optimized filter approach. In: IEEE in 2018 first international conference on secure cyber computing and communication (ICSCCC), IEEE, pp. 243–248 (2018)
46.
go back to reference Atrish, A., Singh, N., Kumar, K., Kumar, V.: An automated hierarchical framework for player recognition in sports image. In: Proceedings of the international conference on video and image processing, pp. 103–108 (2017) Atrish, A., Singh, N., Kumar, K., Kumar, V.: An automated hierarchical framework for player recognition in sports image. In: Proceedings of the international conference on video and image processing, pp. 103–108 (2017)
47.
go back to reference Kumar, K., Shrimankar, D.D., Singh, N.: Key-lectures: keyframes extraction in video lectures. In: Machine intelligence and signal analysis, pp. 453–459. Springer, Singapore (2019)CrossRef Kumar, K., Shrimankar, D.D., Singh, N.: Key-lectures: keyframes extraction in video lectures. In: Machine intelligence and signal analysis, pp. 453–459. Springer, Singapore (2019)CrossRef
49.
go back to reference Kumar, K.: Text query based summarized event searching interface system using deep learning over cloud. Multimedia Tools Appl. 80(7), 11079–11094 (2021)CrossRef Kumar, K.: Text query based summarized event searching interface system using deep learning over cloud. Multimedia Tools Appl. 80(7), 11079–11094 (2021)CrossRef
50.
go back to reference Sharma, S., Kumar, P., Kumar, K.: A-PNR: automatic plate number recognition. In: Proceedings of the 7th international conference on computer and communication technology, pp. 106–110 (2017) Sharma, S., Kumar, P., Kumar, K.: A-PNR: automatic plate number recognition. In: Proceedings of the 7th international conference on computer and communication technology, pp. 106–110 (2017)
51.
go back to reference Loper, M., Black, M.: OpenDR: an approximate differentiable renderer. In: European conference on computer vision, pp. 154–169 (2014) Loper, M., Black, M.: OpenDR: an approximate differentiable renderer. In: European conference on computer vision, pp. 154–169 (2014)
52.
go back to reference Lorensen, W., Cline, H.: Marching cubes: a high resolution 3D surface construction algorithm. Comput. Graph. 21, 163–169 (1987)CrossRef Lorensen, W., Cline, H.: Marching cubes: a high resolution 3D surface construction algorithm. Comput. Graph. 21, 163–169 (1987)CrossRef
53.
go back to reference Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A., Tzionas, D., Black, M.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10975–10985 (2019) Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A., Tzionas, D., Black, M.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10975–10985 (2019)
54.
go back to reference Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 1–16 (2015)CrossRef Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 1–16 (2015)CrossRef
55.
go back to reference Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 409–419 (2018) Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 409–419 (2018)
56.
go back to reference Mahmood, N., Ghorbani, N., Troje, N., Pons-Moll, G., Black, M.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5442–5451 (2019) Mahmood, N., Ghorbani, N., Troje, N., Pons-Moll, G., Black, M.: AMASS: archive of motion capture as surface shapes. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5442–5451 (2019)
57.
go back to reference Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2304–2314 (2019) Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2304–2314 (2019)
58.
go back to reference Bhatnagar, B., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3d people from images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5420–5430 (2019) Bhatnagar, B., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3d people from images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5420–5430 (2019)
59.
go back to reference Brahmbhatt, S., Ham, C., Kemp, C., Hays, J.: Contactdb: analyzing and predicting grasp contact via thermal imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8709–8719 (2019) Brahmbhatt, S., Ham, C., Kemp, C., Hays, J.: Contactdb: analyzing and predicting grasp contact via thermal imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8709–8719 (2019)
60.
go back to reference Pumarola, A., Sanchez-Riera, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3dpeople: modeling the geometry of dressed humans. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2242–2251 (2019) Pumarola, A., Sanchez-Riera, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3dpeople: modeling the geometry of dressed humans. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2242–2251 (2019)
62.
go back to reference He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)
63.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
64.
go back to reference Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)
Metadata
Title
HandO: a hybrid 3D hand–object reconstruction model for unknown objects
Authors
Hang Yu
Chilam Cheang
Yanwei Fu
Xiangyang Xue
Publication date
09-01-2022
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 5/2022
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-021-00874-7

Other articles of this Issue 5/2022

Multimedia Systems 5/2022 Go to the issue