Skip to main content
Top
Published in: Neural Processing Letters 3/2020

18-02-2020

Selective Embedding with Gated Fusion for 6D Object Pose Estimation

Authors: Shantong Sun, Rongke Liu, Qiuchen Du, Shuqiao Sun

Published in: Neural Processing Letters | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Deep learning method for 6D object pose estimation based on RGB image and depth (RGB-D) has been successfully applied to robot grasping. The fusion of RGB and depth is one of the most important difficulties. Previous works on the fusion of these two features are mostly concatenated together without considering the different contributions of the two types of features to pose estimation. We propose a selective embedding with gated fusion structure called SEGate, which can adjust the weights of RGB and depth features adaptively. Furthermore, we aggregate the local features of point clouds according to the distance between them. More specifically, the close point clouds contribute a lot to local features, while the distant point clouds contribute a little. Experiments show that our approach achieves the state-of-art performance in both LineMOD and YCB-Video datasets. Meanwhile, our approach is more robust to the pose estimation of occluded objects.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Munaro M, Menegatti E (2014) Fast RGB-D people tracking for service robots. Auton Robot 37(3):227–242CrossRef Munaro M, Menegatti E (2014) Fast RGB-D people tracking for service robots. Auton Robot 37(3):227–242CrossRef
2.
go back to reference Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2011) Gradient response maps for real-time detection oftextureless objects. IEEE Trans PAMI 34(5):876–888CrossRef Hinterstoisser S, Cagniart C, Ilic S, Sturm P, Navab N, Fua P, Lepetit V (2011) Gradient response maps for real-time detection oftextureless objects. IEEE Trans PAMI 34(5):876–888CrossRef
3.
go back to reference Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, pp 548–562 Hinterstoisser S, Lepetit V, Ilic S, Holzer S, Bradski G, Konolige K, Navab N (2012) Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Asian conference on computer vision, pp 548–562
4.
go back to reference Besl PJ, McKay ND (1992) Method for registration of 3-D shapes. Sens fus IV: Control Paradig Data Struct 1611:586–606 Besl PJ, McKay ND (1992) Method for registration of 3-D shapes. Sens fus IV: Control Paradig Data Struct 1611:586–606
5.
go back to reference Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3D object recognition. In: IEEE computer society conference on computer vision and pattern recognition, pp 998–1005 Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3D object recognition. In: IEEE computer society conference on computer vision and pattern recognition, pp 998–1005
6.
go back to reference Papazov C, Burschka D (2010) An efficient ransac for 3d object recognition in noisy and occluded scenes. In: Asian conference on computer vision, pp 135–148 Papazov C, Burschka D (2010) An efficient ransac for 3d object recognition in noisy and occluded scenes. In: Asian conference on computer vision, pp 135–148
7.
go back to reference Hinterstoisser S, Lepetit V, Rajkumar N, Konolige K (2016) Going further with point pair features. In: European conference on computer vision, pp 834–848 Hinterstoisser S, Lepetit V, Rajkumar N, Konolige K (2016) Going further with point pair features. In: European conference on computer vision, pp 834–848
8.
go back to reference Kiforenko L, Drost B, Tombari F, Kruger N, Buch AG (2018) A performance evaluation of point pair features. Comput Vis Image Underst 166:66–80CrossRef Kiforenko L, Drost B, Tombari F, Kruger N, Buch AG (2018) A performance evaluation of point pair features. Comput Vis Image Underst 166:66–80CrossRef
9.
go back to reference Schnabel R, Wahl R, Klein R (2007) Efficient RANSAC for point-cloud shape detection. Comput Gr forum 26(2):214–226CrossRef Schnabel R, Wahl R, Klein R (2007) Efficient RANSAC for point-cloud shape detection. Comput Gr forum 26(2):214–226CrossRef
10.
go back to reference Aldoma A, Marton ZC, Tombari F, Wohlkinger W, Potthast C, Zeisl B, Vincze M (2012) Tutorial: point cloud library: three-dimensional object recognition and 6 dof pose estimation. IEEE Robot Automation Mag 19(3):80–91CrossRef Aldoma A, Marton ZC, Tombari F, Wohlkinger W, Potthast C, Zeisl B, Vincze M (2012) Tutorial: point cloud library: three-dimensional object recognition and 6 dof pose estimation. IEEE Robot Automation Mag 19(3):80–91CrossRef
11.
go back to reference Aldoma A, Tombari F, Stefano LD, Vincze M (2012) A global hypotheses verification method for 3d object recognition. In: European conference on computer vision, pp 511–524 Aldoma A, Tombari F, Stefano LD, Vincze M (2012) A global hypotheses verification method for 3d object recognition. In: European conference on computer vision, pp 511–524
12.
go back to reference Guo Y, Bennamoun M, Sohel F, Lu M, Wan J, Kwok NM (2016) A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis 116(1):66–89MathSciNetCrossRef Guo Y, Bennamoun M, Sohel F, Lu M, Wan J, Kwok NM (2016) A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis 116(1):66–89MathSciNetCrossRef
13.
go back to reference Doumanoglou A, Kouskouridas R, Malassiotis S, Kim TK (2016) Recovering 6D object pose and predicting next-best-view in the crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3583–3592 Doumanoglou A, Kouskouridas R, Malassiotis S, Kim TK (2016) Recovering 6D object pose and predicting next-best-view in the crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3583–3592
14.
go back to reference Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim TK (2017) Latent-class hough forests for 6 DoF object pose estimation. IEEE Trans PAMI 40(1):119–132CrossRef Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim TK (2017) Latent-class hough forests for 6 DoF object pose estimation. IEEE Trans PAMI 40(1):119–132CrossRef
15.
go back to reference Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, pp 536–551 Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6d object pose estimation using 3d object coordinates. In: European conference on computer vision, pp 536–551
16.
go back to reference Brachmann E, Michel F, Krull A, Yang MY, Gumhold S (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372 Brachmann E, Michel F, Krull A, Yang MY, Gumhold S (2016) Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3364–3372
17.
go back to reference Rangaprasad AS (2017) Probabilistic approaches for pose estimation, Carnegie Mellon University Rangaprasad AS (2017) Probabilistic approaches for pose estimation, Carnegie Mellon University
18.
go back to reference Rad M, Lepetit V (2017) BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836 Rad M, Lepetit V (2017) BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3836
19.
go back to reference Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529 Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521–1529
20.
go back to reference Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. Preprint arXiv:1711.00199 Xiang Y, Schmidt T, Narayanan V, Fox D (2017) Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. Preprint arXiv:​1711.​00199
21.
go back to reference Li C, Bai J, Hager GD (2018) A unified framework for multi-view multi-class object pose estimation. In: Proceedings of the european conference on computer vision (ECCV), pp 254–269 Li C, Bai J, Hager GD (2018) A unified framework for multi-view multi-class object pose estimation. In: Proceedings of the european conference on computer vision (ECCV), pp 254–269
22.
go back to reference Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L (2019) Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3343-3352 Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L (2019) Densefusion: 6d object pose estimation by iterative dense fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3343-3352
23.
go back to reference Suwajanakorn S, Snavely N, Tompson JJ, Norouzi M (2018) Discovery of latent 3d keypoints via end-to-end geometric reasoning. In: Advances in neural information processing systems, pp 2059–2070 Suwajanakorn S, Snavely N, Tompson JJ, Norouzi M (2018) Discovery of latent 3d keypoints via end-to-end geometric reasoning. In: Advances in neural information processing systems, pp 2059–2070
24.
go back to reference Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. Preprint arXiv:1809.10790 Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. Preprint arXiv:​1809.​10790
25.
go back to reference Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946 Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946
26.
go back to reference Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 808–816 Song S, Xiao J (2016) Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 808–816
27.
go back to reference Li C, Lu B, Zhang Y, Liu H, Qu Y (2018) 3D reconstruction of indoor scenes via image registration. Neural Process Lett 48(3):1281–1304CrossRef Li C, Lu B, Zhang Y, Liu H, Qu Y (2018) 3D reconstruction of indoor scenes via image registration. Neural Process Lett 48(3):1281–1304CrossRef
28.
go back to reference Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927 Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918–927
29.
go back to reference Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4490–4499 Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4490–4499
30.
go back to reference Guo D, Li W, Fang X (2017) Capturing temporal structures for video captioning by spatio-temporal contexts and channel attention mechanism. Neural Process Lett 46(1):313–328CrossRef Guo D, Li W, Fang X (2017) Capturing temporal structures for video captioning by spatio-temporal contexts and channel attention mechanism. Neural Process Lett 46(1):313–328CrossRef
31.
go back to reference Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164 Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
33.
go back to reference Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19 Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19
34.
go back to reference Wojek C, Walk S, Roth S, Schiele B (2011) Monocular 3D scene understanding with explicit occlusion reasoning. CVPR 2011:1993–2000 Wojek C, Walk S, Roth S, Schiele B (2011) Monocular 3D scene understanding with explicit occlusion reasoning. CVPR 2011:1993–2000
35.
go back to reference Xu Y, Zhou X, Liu P, Xu H (2019) Rapid pedestrian detection based on deep omega-shape features with partial occlusion handing. Neural Process Lett 49(3):923–937CrossRef Xu Y, Zhou X, Liu P, Xu H (2019) Rapid pedestrian detection based on deep omega-shape features with partial occlusion handing. Neural Process Lett 49(3):923–937CrossRef
36.
go back to reference Sanyal R, Ahmed SM, Jaiswal M, Chaudhury KN (2017) A scalable ADMM algorithm for rigid registration. IEEE Signal Process Lett 24(10):1453–1457CrossRef Sanyal R, Ahmed SM, Jaiswal M, Chaudhury KN (2017) A scalable ADMM algorithm for rigid registration. IEEE Signal Process Lett 24(10):1453–1457CrossRef
37.
go back to reference Eitel A, Springenberg J T, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 681–687 Eitel A, Springenberg J T, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 681–687
38.
go back to reference Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 135–150 Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 135–150
39.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
40.
go back to reference Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883 Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
41.
go back to reference Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3029–3037 Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3029–3037
42.
go back to reference Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660 Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
43.
go back to reference Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108 Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108
44.
go back to reference Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: Proceedings of the european conference on computer vision (ECCV), pp 699–715 Sundermeyer M, Marton ZC, Durner M, Brucker M, Triebel R (2018) Implicit 3d orientation learning for 6d object detection from rgb images. In: Proceedings of the european conference on computer vision (ECCV), pp 699–715
45.
go back to reference Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 244–253 Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3d bounding box estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 244–253
46.
go back to reference Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: International conference on computer vision, pp 858–865 Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: International conference on computer vision, pp 858–865
Metadata
Title
Selective Embedding with Gated Fusion for 6D Object Pose Estimation
Authors
Shantong Sun
Rongke Liu
Qiuchen Du
Shuqiao Sun
Publication date
18-02-2020
Publisher
Springer US
Published in
Neural Processing Letters / Issue 3/2020
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-020-10198-8

Other articles of this Issue 3/2020

Neural Processing Letters 3/2020 Go to the issue