Skip to main content
Erschienen in: International Journal of Computer Vision 3/2020

05.11.2019

DeepIM: Deep Iterative Matching for 6D Pose Estimation

verfasst von: Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, Dieter Fox

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Estimating 6D poses of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using a disentangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.CrossRef Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3), 346–359.CrossRef
Zurück zum Zitat Besl, P. J., & McKay, N. D. (1992). Method for registration of 3-d shapes. In P. J. Besl & N. D. McKay (Eds.), Sensor fusion IV: Control paradigms and data structures (Vol. 1611, pp. 586–607). Bellingham: International Society for Optics and Photonics. CrossRef Besl, P. J., & McKay, N. D. (1992). Method for registration of 3-d shapes. In P. J. Besl & N. D. McKay (Eds.), Sensor fusion IV: Control paradigms and data structures (Vol. 1611, pp. 586–607). Bellingham: International Society for Optics and Photonics. CrossRef
Zurück zum Zitat Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., & Rother, C. (2014). Learning 6D object pose estimation using 3D object coordinates. In: European conference on computer vision (ECCV). Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., & Rother, C. (2014). Learning 6D object pose estimation using 3D object coordinates. In: European conference on computer vision (ECCV).
Zurück zum Zitat Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., & Rother, C. (2016). Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3364–3372). Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., & Rother, C. (2016). Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3364–3372).
Zurück zum Zitat Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., & Dollar, A. M. (2015). The YCB object and model set: Towards common benchmarks for manipulation research. In: 2015 International conference on advanced robotics (ICAR), IEEE (pp. 510–517). Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., & Dollar, A. M. (2015). The YCB object and model set: Towards common benchmarks for manipulation research. In: 2015 International conference on advanced robotics (ICAR), IEEE (pp. 510–517).
Zurück zum Zitat Carreira, J., Agrawal, P., Fragkiadaki, K., & Malik, J. (2016). Human pose estimation with iterative error feedback. In: IEEE conference on computer vision and pattern recognition (CVPR). Carreira, J., Agrawal, P., Fragkiadaki, K., & Malik, J. (2016). Human pose estimation with iterative error feedback. In: IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Collet, A., Martinez, M., & Srinivasa, S. S. (2011). The MOPED framework: Object recognition and pose estimation for manipulation. International Journal of Robotics Research (IJRR), 30(10), 1284–1306.CrossRef Collet, A., Martinez, M., & Srinivasa, S. S. (2011). The MOPED framework: Object recognition and pose estimation for manipulation. International Journal of Robotics Research (IJRR), 30(10), 1284–1306.CrossRef
Zurück zum Zitat Costante, G., & Ciarfuglia, T. A. (2018). LS-VO: Learning dense optical subspace for robust visual odometry estimation. IEEE Robotics and Automation Letters, 3(3), 1735–1742.CrossRef Costante, G., & Ciarfuglia, T. A. (2018). LS-VO: Learning dense optical subspace for robust visual odometry estimation. IEEE Robotics and Automation Letters, 3(3), 1735–1742.CrossRef
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, 886–893. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, 886–893.
Zurück zum Zitat Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., & Fox, D. (2019). PoseRBPF: A Rao-blackwellized particle filter for 6D object pose tracking. In Robotics: Science and systems (RSS). Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., & Fox, D. (2019). PoseRBPF: A Rao-blackwellized particle filter for 6D object pose tracking. In Robotics: Science and systems (RSS).
Zurück zum Zitat Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 2758–2766. Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 2758–2766.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. IEEE International Journal of Computer Vision (ICCV), 88(2), 303–338.CrossRef Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. IEEE International Journal of Computer Vision (ICCV), 88(2), 303–338.CrossRef
Zurück zum Zitat Garon, M., & Lalonde, J. F. (2017). Deep 6-DOF tracking. IEEE Transactions on Visualization and Computer Graphics, 23(11), 2410–2418.CrossRef Garon, M., & Lalonde, J. F. (2017). Deep 6-DOF tracking. IEEE Transactions on Visualization and Computer Graphics, 23(11), 2410–2418.CrossRef
Zurück zum Zitat Garon, M., Boulet, P. O., Doironz, J. P., Beaulieu, L., & Lalonde, J. F. (2016). Real-time high resolution 3D data on the hololens. In IEEE international symposium on mixed and augmented reality (ISMAR-Adjunct), IEEE (pp. 189–191). Garon, M., Boulet, P. O., Doironz, J. P., Beaulieu, L., & Lalonde, J. F. (2016). Real-time high resolution 3D data on the hololens. In IEEE international symposium on mixed and augmented reality (ISMAR-Adjunct), IEEE (pp. 189–191).
Zurück zum Zitat Girshick, R. (2015). Fast R-CNN. In: IEEE international conference on computer vision (ICCV) (pp. 1440–1448). Girshick, R. (2015). Fast R-CNN. In: IEEE international conference on computer vision (ICCV) (pp. 1440–1448).
Zurück zum Zitat Gu, C., & Ren, X. (2010). Discriminative mixture-of-templates for viewpoint classification. In European conference on computer vision (ECCV) (pp. 408–421). Gu, C., & Ren, X. (2010). Discriminative mixture-of-templates for viewpoint classification. In European conference on computer vision (ECCV) (pp. 408–421).
Zurück zum Zitat Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., et al. (2012a). Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(5), 876–888.CrossRef Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., et al. (2012a). Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(5), 876–888.CrossRef
Zurück zum Zitat Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., & Navab, N. (2012b). Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In Asian conference on computer vision (ACCV). Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., & Navab, N. (2012b). Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In Asian conference on computer vision (ACCV).
Zurück zum Zitat Hinterstoisser, S., Lepetit, V., Rajkumar, N., & Konolige, K. (2016). Going further with point pair features. In European conference on computer vision (ECCV) (pp. 834–848). Hinterstoisser, S., Lepetit, V., Rajkumar, N., & Konolige, K. (2016). Going further with point pair features. In European conference on computer vision (ECCV) (pp. 834–848).
Zurück zum Zitat Hodan, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., & Zabulis, X. (2017). T-less: An RGB-D dataset for 6D pose estimation of texture-less objects. In IEEE winter conference on applications of computer vision (WACV), IEEE (pp. 880–888). Hodan, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., & Zabulis, X. (2017). T-less: An RGB-D dataset for 6D pose estimation of texture-less objects. In IEEE winter conference on applications of computer vision (WACV), IEEE (pp. 880–888).
Zurück zum Zitat Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 5, 433–449.CrossRef Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 5, 433–449.CrossRef
Zurück zum Zitat Jurie, F., & Dhome, M. (2001). Real time 3D template matching. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1, p. I). Jurie, F., & Dhome, M. (2001). Real time 3D template matching. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1, p. I).
Zurück zum Zitat Kehl, W., Manhardt, F., Tombari, F., Ilic, S., & Navab, N. (2017). SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1521–1529). Kehl, W., Manhardt, F., Tombari, F., Ilic, S., & Navab, N. (2017). SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1521–1529).
Zurück zum Zitat Kendall, A., & Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning. In IEEE conference on computer vision and pattern recognition (CVPR). Kendall, A., & Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Krull, A., Brachmann, E., Michel, F., Ying Yang, M., Gumhold, S., & Rother, C. (2015). Learning analysis-by-synthesis for 6D pose estimation in RGB-D images. In IEEE international conference on computer vision (ICCV) (pp. 954–962). Krull, A., Brachmann, E., Michel, F., Ying Yang, M., Gumhold, S., & Rother, C. (2015). Learning analysis-by-synthesis for 6D pose estimation in RGB-D images. In IEEE international conference on computer vision (ICCV) (pp. 954–962).
Zurück zum Zitat Lin, C. H., & Lucey, S. (2017). Inverse compositional spatial transformer networks. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2568–2576). Lin, C. H., & Lucey, S. (2017). Inverse compositional spatial transformer networks. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2568–2576).
Zurück zum Zitat Liu, M. Y., Tuzel, O., Veeraraghavan, A., & Chellappa, R. (2010). Fast directional chamfer matching. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1696–1703). Liu, M. Y., Tuzel, O., Veeraraghavan, A., & Chellappa, R. (2010). Fast directional chamfer matching. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1696–1703).
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (ECCV) (pp. 21–37). Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (ECCV) (pp. 21–37).
Zurück zum Zitat Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3431–3440). Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3431–3440).
Zurück zum Zitat Lowe, D. G. (1999). Object recognition from local scale-invariant features. IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 1150–1157). Lowe, D. G. (1999). Object recognition from local scale-invariant features. IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 1150–1157).
Zurück zum Zitat Manhardt, F., Kehl, W., Navab, N., & Tombari, F. (2018). Deep model-based 6D pose refinement in RGB. In European conference on computer vision (ECCV) (pp. 800–815). Manhardt, F., Kehl, W., Navab, N., & Tombari, F. (2018). Deep model-based 6D pose refinement in RGB. In European conference on computer vision (ECCV) (pp. 800–815).
Zurück zum Zitat Mellado, N., Aiger, D., & Mitra, N. J. (2014). Super 4pcs fast global pointcloud registration via smart indexing. Computer Graphics Forum, 33, 205–215.CrossRef Mellado, N., Aiger, D., & Mitra, N. J. (2014). Super 4pcs fast global pointcloud registration via smart indexing. Computer Graphics Forum, 33, 205–215.CrossRef
Zurück zum Zitat Mian, A. S., Bennamoun, M., & Owens, R. (2006). Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 28(10), 1584–1601.CrossRef Mian, A. S., Bennamoun, M., & Owens, R. (2006). Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 28(10), 1584–1601.CrossRef
Zurück zum Zitat Michel, F., Kirillov, A., Brachmann, E., Krull, A., Gumhold, S., Savchynskyy, B., & Rother, C. (2017). Global hypothesis generation for 6D object pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR). Michel, F., Kirillov, A., Brachmann, E., Krull, A., Gumhold, S., Savchynskyy, B., & Rother, C. (2017). Global hypothesis generation for 6D object pose estimation. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Mousavian, A., Anguelov, D., Flynn, J., & Košecká, J. (2017). 3D bounding box estimation using deep learning and geometry. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5632–5640). Mousavian, A., Anguelov, D., Flynn, J., & Košecká, J. (2017). 3D bounding box estimation using deep learning and geometry. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5632–5640).
Zurück zum Zitat Nistér, D. (2005). Preemptive RANSAC for live structure and motion estimation. Machine Vision and Applications, 16(5), 321–329.CrossRef Nistér, D. (2005). Preemptive RANSAC for live structure and motion estimation. Machine Vision and Applications, 16(5), 321–329.CrossRef
Zurück zum Zitat Oberweger, M., Wohlhart, P., & Lepetit, V. (2015). Training a feedback loop for hand pose estimation. In IEEE international conference on computer vision (ICCV). Oberweger, M., Wohlhart, P., & Lepetit, V. (2015). Training a feedback loop for hand pose estimation. In IEEE international conference on computer vision (ICCV).
Zurück zum Zitat Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. IEEE Computer Vision and Pattern Recognition (CVPR), 1(2), 4. Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. IEEE Computer Vision and Pattern Recognition (CVPR), 1(2), 4.
Zurück zum Zitat Rad, M., & Lepetit, V. (2017). BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In IEEE international conference on computer vision (ICCV). Rad, M., & Lepetit, V. (2017). BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In IEEE international conference on computer vision (ICCV).
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779–788). Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779–788).
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NIPS). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2006). 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision (IJCV), 66(3), 231–259.CrossRef Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2006). 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision (IJCV), 66(3), 231–259.CrossRef
Zurück zum Zitat Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the ICP algorithm. In: Third international conference on 3-D digital imaging and modeling, 2001. Proceedings. IEEE (pp. 145–152). Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the ICP algorithm. In: Third international conference on 3-D digital imaging and modeling, 2001. Proceedings. IEEE (pp. 145–152).
Zurück zum Zitat Rusu, R. B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (FPFH) for 3D registration. In IEEE international conference on robotics and automation (ICRA), Citeseer (pp. 3212–3217). Rusu, R. B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (FPFH) for 3D registration. In IEEE international conference on robotics and automation (ICRA), Citeseer (pp. 3212–3217).
Zurück zum Zitat Salvi, J., Matabosch, C., Fofi, D., & Forest, J. (2007). A review of recent range image registration methods with accuracy evaluation. Image and Vision Computing, 25(5), 578–596.CrossRef Salvi, J., Matabosch, C., Fofi, D., & Forest, J. (2007). A review of recent range image registration methods with accuracy evaluation. Image and Vision Computing, 25(5), 578–596.CrossRef
Zurück zum Zitat Saxena, A., Pandya, H., Kumar, G., Gaud, A., & Krishna, K. M. (2017). Exploring convolutional networks for end-to-end visual servoing. In IEEE international conference on robotics and automation (ICRA) (pp. 3817–3823). Saxena, A., Pandya, H., Kumar, G., Gaud, A., & Krishna, K. M. (2017). Exploring convolutional networks for end-to-end visual servoing. In IEEE international conference on robotics and automation (ICRA) (pp. 3817–3823).
Zurück zum Zitat Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2930–2937). Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2930–2937).
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556.
Zurück zum Zitat Sundermeyer, M., Marton, Z. C., Durner, M., Brucker, M., & Triebel, R. (2018). Implicit 3D orientation learning for 6D object detection from RGB images. In European conference on computer vision (ECCV) (pp. 699–715). Sundermeyer, M., Marton, Z. C., Durner, M., Brucker, M., & Triebel, R. (2018). Implicit 3D orientation learning for 6D object detection from RGB images. In European conference on computer vision (ECCV) (pp. 699–715).
Zurück zum Zitat Tam, G. K., Cheng, Z. Q., Lai, Y. K., Langbein, F. C., Liu, Y., Marshall, D., et al. (2013). Registration of 3D point clouds and meshes: A survey from rigid to nonrigid. IEEE Transactions on Visualization and Computer Graphics, 19(7), 1199–1217.CrossRef Tam, G. K., Cheng, Z. Q., Lai, Y. K., Langbein, F. C., Liu, Y., Marshall, D., et al. (2013). Registration of 3D point clouds and meshes: A survey from rigid to nonrigid. IEEE Transactions on Visualization and Computer Graphics, 19(7), 1199–1217.CrossRef
Zurück zum Zitat Tekin, B., Sinha, S. N., & Fua, P. (2017). Real-time seamless single shot 6D object pose prediction. arXiv preprint arXiv:1711.08848. Tekin, B., Sinha, S. N., & Fua, P. (2017). Real-time seamless single shot 6D object pose prediction. arXiv preprint arXiv:​1711.​08848.
Zurück zum Zitat Theiler, P. W., Wegner, J. D., & Schindler, K. (2015). Globally consistent registration of terrestrial laser scans via graph optimization. ISPRS Journal of Photogrammetry and Remote Sensing, 109, 126–138.CrossRef Theiler, P. W., Wegner, J. D., & Schindler, K. (2015). Globally consistent registration of terrestrial laser scans via graph optimization. ISPRS Journal of Photogrammetry and Remote Sensing, 109, 126–138.CrossRef
Zurück zum Zitat Tjaden, H., Schwanecke, U., & Schömer, E. (2017). Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 124–132). Tjaden, H., Schwanecke, U., & Schömer, E. (2017). Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 124–132).
Zurück zum Zitat Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In European conference on computer vision (ECCV), Springer (pp. 356–369). Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In European conference on computer vision (ECCV), Springer (pp. 356–369).
Zurück zum Zitat Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., & Birchfield, S. (2018). Deep object pose estimation for semantic robotic grasping of household objects. In Conference on robot learning (pp. 306–316). Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., & Birchfield, S. (2018). Deep object pose estimation for semantic robotic grasping of household objects. In Conference on robot learning (pp. 306–316).
Zurück zum Zitat Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L., & Savarese, S. (2019). Densefusion: 6D object pose estimation by iterative dense fusion. arXiv preprint arXiv:1901.04780. Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L., & Savarese, S. (2019). Densefusion: 6D object pose estimation by iterative dense fusion. arXiv preprint arXiv:​1901.​04780.
Zurück zum Zitat Wang, S., Clark, R., Wen, H., & Trigoni, N. (2017). Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In IEEE international conference on robotics and automation (ICRA), IEEE (pp. 2043–2050). Wang, S., Clark, R., Wen, H., & Trigoni, N. (2017). Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In IEEE international conference on robotics and automation (ICRA), IEEE (pp. 2043–2050).
Zurück zum Zitat Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1912–1920). Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1912–1920).
Zurück zum Zitat Xiang, Y., Schmidt, T., Narayanan, V., & Fox, D. (2018). PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. In Robotics: Science and systems (RSS). Xiang, Y., Schmidt, T., Narayanan, V., & Fox, D. (2018). PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. In Robotics: Science and systems (RSS).
Zurück zum Zitat Yang, J., Li, H., Campbell, D., & Jia, Y. (2016). GO-ICP: a globally optimal solution to 3D ICP point-set registration. arXiv preprint arXiv:1605.03344. Yang, J., Li, H., Campbell, D., & Jia, Y. (2016). GO-ICP: a globally optimal solution to 3D ICP point-set registration. arXiv preprint arXiv:​1605.​03344.
Zurück zum Zitat Zeng, A., Yu, K. T., Song, S., Suo, D., Walker, E., Rodriguez, A., & Xiao, J. (2017). Multi-view self-supervised deep learning for 6D pose estimation in the Amazon picking challenge. In IEEE international conference on robotics and automation (ICRA) (pp. 1386–1383). Zeng, A., Yu, K. T., Song, S., Suo, D., Walker, E., Rodriguez, A., & Xiao, J. (2017). Multi-view self-supervised deep learning for 6D pose estimation in the Amazon picking challenge. In IEEE international conference on robotics and automation (ICRA) (pp. 1386–1383).
Zurück zum Zitat Zhou, Q. Y., Park, J., & Koltun, V. (2016). Fast global registration. In European conference on computer vision (ECCV), Springer (pp. 766–782). Zhou, Q. Y., Park, J., & Koltun, V. (2016). Fast global registration. In European conference on computer vision (ECCV), Springer (pp. 766–782).
Metadaten
Titel
DeepIM: Deep Iterative Matching for 6D Pose Estimation
verfasst von
Yi Li
Gu Wang
Xiangyang Ji
Yu Xiang
Dieter Fox
Publikationsdatum
05.11.2019
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01250-9

Weitere Artikel der Ausgabe 3/2020

International Journal of Computer Vision 3/2020 Zur Ausgabe

Premium Partner