Skip to main content

2018 | OriginalPaper | Buchkapitel

A Unified Framework for Multi-view Multi-class Object Pose Estimation

verfasst von : Chi Li, Jin Bai, Gregory D. Hager

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
An object class may refer to either an object instance or an object category.
 
2
SO(3) is the Special Orthogonal group of rations in three dimensions.
 
3
\(s_{min}\) and \(s_{max}\) may vary across different axes.
 
4
Please refer to Sect. 5 for more details on the mPCK metric.
 
5
The ratio of the number of pixels with correctly predicted mask label versus all.
 
7
We re-implement this method because the source code is not publicly available.
 
Literatur
1.
Zurück zum Zitat Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided rgbd feature learning for 3d object pose estimation. In: CVPR (2017) Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided rgbd feature learning for 3d object pose estimation. In: CVPR (2017)
3.
Zurück zum Zitat Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: CVPR (2016) Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: CVPR (2016)
4.
Zurück zum Zitat Chirikjian, G.S., Mahony, R., Ruan, S., Trumpf, J.: Pose changes from a different point of view. J. Mech. Robot. 10, 021008 (2018)CrossRef Chirikjian, G.S., Mahony, R., Ruan, S., Trumpf, J.: Pose changes from a different point of view. J. Mech. Robot. 10, 021008 (2018)CrossRef
5.
Zurück zum Zitat Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.K.: Recovering 6d object pose and predicting next-best-view in the crowd. In: CVPR (2016) Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.K.: Recovering 6d object pose and predicting next-best-view in the crowd. In: CVPR (2016)
7.
Zurück zum Zitat F. Tombari, S.S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: ICIP (2011) F. Tombari, S.S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: ICIP (2011)
8.
Zurück zum Zitat Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012) Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
10.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV. IEEE (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV. IEEE (2017)
12.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: JMLR (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: JMLR (2015)
13.
Zurück zum Zitat Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM symposium on User interface software and technology. ACM (2011) Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM symposium on User interface software and technology. ACM (2011)
14.
Zurück zum Zitat Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. In: CVPR. IEEE (2016) Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. In: CVPR. IEEE (2016)
15.
Zurück zum Zitat Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: CVPR (2017) Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: CVPR (2017)
17.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
18.
Zurück zum Zitat Krull, A., Brachmann, E., Michel, F., Ying Yang, M., Gumhold, S., Rother, C.: Learning analysis-by-synthesis for 6d pose estimation in rgb-d images. In: ICCV (2015) Krull, A., Brachmann, E., Michel, F., Ying Yang, M., Gumhold, S., Rother, C.: Learning analysis-by-synthesis for 6d pose estimation in rgb-d images. In: ICCV (2015)
19.
Zurück zum Zitat Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3d scenes. In: ICRA. IEEE (2012) Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3d scenes. In: ICRA. IEEE (2012)
21.
Zurück zum Zitat Li, C., Boheren, J., Carlson, E., Hager, G.D.: Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. In: ICRA (2016) Li, C., Boheren, J., Carlson, E., Hager, G.D.: Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. In: ICRA (2016)
22.
Zurück zum Zitat Li, C., Xiao, H., Tateno, K., Tombari, F., Navab, N., Hager, G.D.: Incremental scene understanding on dense slam. In: IROS. IEEE (2016) Li, C., Xiao, H., Tateno, K., Tombari, F., Navab, N., Hager, G.D.: Incremental scene understanding on dense slam. In: IROS. IEEE (2016)
23.
Zurück zum Zitat Li, C., Zia, M.Z., Tran, Q.H., Yu, X., Hager, G.D., Chandraker, M.: Deep supervision with shape concepts for occlusion-aware 3d object parsing. In: CVPR (2017) Li, C., Zia, M.Z., Tran, Q.H., Yu, X., Hager, G.D., Chandraker, M.: Deep supervision with shape concepts for occlusion-aware 3d object parsing. In: CVPR (2017)
24.
Zurück zum Zitat Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task cnn for viewpoint estimation. In: BMVC (2016) Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task cnn for viewpoint estimation. In: BMVC (2016)
25.
Zurück zum Zitat Michel, F., et al.: Global hypothesis generation for 6d object pose estimation. In: ICCV (2017) Michel, F., et al.: Global hypothesis generation for 6d object pose estimation. In: ICCV (2017)
26.
Zurück zum Zitat Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3d bounding box estimation using deep learning and geometry. In: CVPR. IEEE (2017) Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3d bounding box estimation using deep learning and geometry. In: CVPR. IEEE (2017)
28.
Zurück zum Zitat Pillai, S., Leonard, J.: Monocular slam supported object recognition. In: RSS (2015) Pillai, S., Leonard, J.: Monocular slam supported object recognition. In: RSS (2015)
29.
Zurück zum Zitat Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: ICCV (2017) Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: ICCV (2017)
30.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS (2015)
31.
Zurück zum Zitat Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 345–348 (2010)CrossRef Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 345–348 (2010)CrossRef
32.
Zurück zum Zitat Salas-Moreno, R., Newcombe, R., Strasdat, H., Kelly, P., Davison, A.: Slam++: simultaneous localisation and mapping at the level of objects. In: CVPR (2013) Salas-Moreno, R., Newcombe, R., Strasdat, H., Kelly, P., Davison, A.: Slam++: simultaneous localisation and mapping at the level of objects. In: CVPR (2013)
33.
Zurück zum Zitat Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: CVPR, pp. 945–953 (2015) Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: CVPR, pp. 945–953 (2015)
34.
Zurück zum Zitat Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with Rendered 3D model views. In: ICCV (2015) Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with Rendered 3D model views. In: ICCV (2015)
35.
Zurück zum Zitat Tateno, K., Tombari, F., Laina, I., Navab, N.: Cnn-slam: Real-time dense monocular slam with learned depth prediction. In: CVPR (2017) Tateno, K., Tombari, F., Laina, I., Navab, N.: Cnn-slam: Real-time dense monocular slam with learned depth prediction. In: CVPR (2017)
37.
Zurück zum Zitat Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6d object pose prediction. arXiv preprint arXiv:1711.08848 (2017) Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6d object pose prediction. arXiv preprint arXiv:​1711.​08848 (2017)
38.
Zurück zum Zitat Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular pose estimation of 3d objects using temporally consistent local color histograms. In: CVPR (2017) Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular pose estimation of 3d objects using temporally consistent local color histograms. In: CVPR (2017)
39.
Zurück zum Zitat Qiu, W., et al.: Unrealcv: virtual worlds for computer vision. In: ACM Multimedia Open Source Software Competition (2017) Qiu, W., et al.: Unrealcv: virtual worlds for computer vision. In: ACM Multimedia Open Source Software Competition (2017)
40.
Zurück zum Zitat Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: CVPR (2015) Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: CVPR (2015)
41.
Zurück zum Zitat Xiang, Y., et al.: Objectnet3d: a large scale database for 3d object recognition. In: ECCV (2016) Xiang, Y., et al.: Objectnet3d: a large scale database for 3d object recognition. In: ECCV (2016)
42.
Zurück zum Zitat Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3d object detection in the wild. In: WACV (2014) Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3d object detection in the wild. In: WACV (2014)
43.
Zurück zum Zitat Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017) Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:​1711.​00199 (2017)
44.
Zurück zum Zitat Yan, Y., Chirikjian, G.S.: Almost-uniform sampling of rotations for conformational searches in robotics and structural biology. In: ICRA (2012) Yan, Y., Chirikjian, G.S.: Almost-uniform sampling of rotations for conformational searches in robotics and structural biology. In: ICRA (2012)
45.
Zurück zum Zitat Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011) Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)
46.
Zurück zum Zitat Zeng, A., et al.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: ICRA. IEEE (2017) Zeng, A., et al.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: ICRA. IEEE (2017)
Metadaten
Titel
A Unified Framework for Multi-view Multi-class Object Pose Estimation
verfasst von
Chi Li
Jin Bai
Gregory D. Hager
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01270-0_16

Premium Partner