nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

A Unified Framework for Multi-view Multi-class Object Pose Estimation

verfasst von : Chi Li, Jin Bai, Gregory D. Hager

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel HandMap: Robust Hand Pose Estimation via Intermediate Dense Guidance Map Supervision

Nächstes Kapitel Dynamic Task Prioritization for Multitask Learning

An object class may refer to either an object instance or an object category.

SO(3) is the Special Orthogonal group of rations in three dimensions.

\(s_{min}\) and \(s_{max}\) may vary across different axes.

Please refer to Sect. 5 for more details on the mPCK metric.

The ratio of the number of pixels with correctly predicted mask label versus all.

https://github.com/tum-mvp/ObjRecRANSAC.

We re-implement this method because the source code is not publicly available.

Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided rgbd feature learning for 3d object pose estimation. In: CVPR (2017)

Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6d object pose estimation using 3d object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35CrossRef

Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In: CVPR (2016)

Chirikjian, G.S., Mahony, R., Ruan, S., Trumpf, J.: Pose changes from a different point of view. J. Mech. Robot. 10, 021008 (2018)CrossRef

Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.K.: Recovering 6d object pose and predicting next-best-view in the crowd. In: CVPR (2016)

Erkent, Ö., Shukla, D., Piater, J.: Integration of probabilistic pose estimates from multiple views. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 154–170. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_10CrossRef

F. Tombari, S.S., Stefano, L.D.: A combined texture-shape descriptor for enhanced 3d feature matching. In: ICIP (2011)

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)

Girshick, R.: Fast r-cnn. arXiv preprint arXiv:1504.08083 (2015)

10.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV. IEEE (2017)

11.

Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42CrossRef

12.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: JMLR (2015)

13.

Izadi, S., et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: ACM symposium on User interface software and technology. ACM (2011)

14.

Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. In: CVPR. IEEE (2016)

15.

Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: CVPR (2017)

16.

Kehl, W., Milletari, F., Tombari, F., Ilic, S., Navab, N.: Deep learning of local RGB-D patches for 3d object detection and 6d pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 205–220. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_13CrossRef

17.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

18.

Krull, A., Brachmann, E., Michel, F., Ying Yang, M., Gumhold, S., Rother, C.: Learning analysis-by-synthesis for 6d pose estimation in rgb-d images. In: ICCV (2015)

19.

Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3d scenes. In: ICRA. IEEE (2012)

20.

Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with large-scale data collection. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds.) ISER 2016. SPAR, vol. 1, pp. 173–184. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-50115-4_16CrossRef

21.

Li, C., Boheren, J., Carlson, E., Hager, G.D.: Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. In: ICRA (2016)

22.

Li, C., Xiao, H., Tateno, K., Tombari, F., Navab, N., Hager, G.D.: Incremental scene understanding on dense slam. In: IROS. IEEE (2016)

23.

Li, C., Zia, M.Z., Tran, Q.H., Yu, X., Hager, G.D., Chandraker, M.: Deep supervision with shape concepts for occlusion-aware 3d object parsing. In: CVPR (2017)

24.

Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task cnn for viewpoint estimation. In: BMVC (2016)

25.

Michel, F., et al.: Global hypothesis generation for 6d object pose estimation. In: ICCV (2017)

26.

Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3d bounding box estimation using deep learning and geometry. In: CVPR. IEEE (2017)

27.

Papazov, C., Burschka, D.: An efficient RANSAC for 3d object recognition in noisy and occluded scenes. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 135–148. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19315-6_11CrossRef

28.

Pillai, S., Leonard, J.: Monocular slam supported object recognition. In: RSS (2015)

29.

Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In: ICCV (2017)

30.

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS (2015)

31.

Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz 24, 345–348 (2010)CrossRef

32.

Salas-Moreno, R., Newcombe, R., Strasdat, H., Kelly, P., Davison, A.: Slam++: simultaneous localisation and mapping at the level of objects. In: CVPR (2013)

33.

Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: CVPR, pp. 945–953 (2015)

34.

Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with Rendered 3D model views. In: ICCV (2015)

35.

Tateno, K., Tombari, F., Laina, I., Navab, N.: Cnn-slam: Real-time dense monocular slam with learned depth prediction. In: CVPR (2017)

36.

Tejani, A., Tang, D., Kouskouridas, R., Kim, T.-K.: Latent-class hough forests for 3d object detection and pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 462–477. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_30CrossRef

37.

Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6d object pose prediction. arXiv preprint arXiv:1711.08848 (2017)

38.

Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular pose estimation of 3d objects using temporally consistent local color histograms. In: CVPR (2017)

39.

Qiu, W., et al.: Unrealcv: virtual worlds for computer vision. In: ACM Multimedia Open Source Software Competition (2017)

40.

Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3d pose estimation. In: CVPR (2015)

41.

Xiang, Y., et al.: Objectnet3d: a large scale database for 3d object recognition. In: ECCV (2016)

42.

Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3d object detection in the wild. In: WACV (2014)

43.

Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)

44.

Yan, Y., Chirikjian, G.S.: Almost-uniform sampling of rotations for conformational searches in robotics and structural biology. In: ICRA (2012)

45.

Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)

46.

Zeng, A., et al.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: ICRA. IEEE (2017)

Titel: A Unified Framework for Multi-view Multi-class Object Pose Estimation
verfasst von: Chi Li
Jin Bai
Gregory D. Hager
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01269-4

Electronic ISBN: 978-3-030-01270-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01270-0_16

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner