nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Bridging the Robot Perception Gap with Mid-Level Vision

verfasst von : Chi Li, Jonathan Bohren, Gregory D. Hager

Erschienen in: Robotics Research

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The practical application of machine perception to support physical manipulation in unstructured environments remains a barrier to the development of intelligent robotic systems. Recently, great progress has been made by the large-scale machine perception community, but these methods have made few contributions to the applied robotic perception. This is in part because such large-scale systems are designed to recognize category labels of large numbers of objects from a single image, rather than highly accurate, efficient, and robust pose estimation in environments for which a robot has reliable prior knowledge. In this paper, we illustrate the potential for synergistic integration of modern computer vision methods into robotics by augmenting a RANSAC-based registration method with a state-of-the art semantic segmentation algorithm. We detail a convolutional architecture for semantic labeling of the scene, modified to operate efficiently using integral images. We combine this labeling with two novel scene parsing variants of RANSAC, and show, on a new RGB-D dataset that contains complex configurations of textureless and highly specular objects, that our method demonstrates improved performance of pose estimation over the unaugmented algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Deep Feature Learning for Acoustics-Based Terrain Classification

See http://github.com/tum-mvp/ObjRecRANSAC.git for the reference implementation of [9].

For efficiency purpose, raw point clouds are downsampled via octree with the leaf size as 0.005 m.

We replace the soft encoder used in [12] with the hard encoder to speed up the computation.

In our implementation, PrimeSense Carmine 1.08 depth sensor is used. We found no difference in performance between using default camera parameters and manual calibration.

The implementations of normal estimation and CSHOT come from PCL Library.

F-measure is a joint measurement computed by precision and recall as \(\frac{2\cdot precision \cdot recall}{precision+recall}\).

Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA (2011)

Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: ICRA (2014)

Macias, N., Wen, J.: Vision guided robotic block stacking. In: IROS (2014)

Niekum, S., Osentoski, S., Konidaris, G., Chitta, S., Marthi, B., Barto, A.G.: Learning grounded finite-state representations from unstructured demonstrations. In: IJRR (2014)

Lindsey, Q., Mellinger, D., Kumar, V.: Construction with quadrotor teams. Auton. Robot. 33(3), 323–336 (2012)CrossRef

Bohren, J., Papazov, C., Burschka, D., Krieger, K., Parusel, S., Haddadin, S., Shepherdson, W.L., Hager, G.D., Whitcomb, L.L.: A pilot study in vision-based augmented telemanipulation for remote assembly over high-latency networks. In: ICRA (2013)

Pauwels, K., Ivan, V., Ros, E., Vijayakumar, S.: Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera. In: IROS (2014)

Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: CVPR (2010)

Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: ACCV (2010)

10.

Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV, 2012 (2013)

11.

Hager, G.D., Wegbreit, B.: Scene parsing using a prior world model. In: IJRR (2011)

12.

Li, C., Reiter, A., Hager, G.D.: Beyond spatial pooling, fine-grained representation learning in multiple domains. In: CVPR (2015)

13.

Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: ECCV (2010)

14.

Aldoma, A., Tombari, F., Prankl, J., Richtsfeld, A., Di Stefano, L., Vincze, M.: Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation. In: ICRA (2013)

15.

Xie, Z., Singh, A., Uang, J., Narayan, K.S., Abbeel, P.: Multimodal blending for high-accuracy instance recognition. In: IROS (2013)

16.

Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2012)

17.

Fischer, J., Bormann, R., Arbeiter, G., Verl, A.: A feature descriptor for texture-less object representation using 2D and 3D cues from RGB-D data. In: ICRA (2013)

18.

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)

19.

Tombari, F., Salti, S., Di Stefano, L.: A combined texture-shape descriptor for enhanced 3D feature matching. In: ICIP (2011)

20.

Woodford, O.J., Pham, M.T., Maki, A., Perbet, F., Stenger, B.: Demisting the hough transform for 3D shape recognition and registration. In: IJCV (2014)

21.

Aldoma, A., Tombari, F., Stefano, L.D., Vincze, M.: A global hypotheses verification method for 3D object recognition. In: ECCV (2012)

22.

Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: IROS (2010)

23.

Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. PAMI I, 2012 (2012)

24.

Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., Vincze, M: Segmentation of unknown objects in indoor environments. In: IROS (2012)

25.

Uckermann, A., Haschke, R., Ritter, H.: Realtime 3D segmentation for human-robot interaction. In: IROS (2013)

26.

Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: ISER (2013)

27.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

28.

Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012)

29.

Gupta, S., Girshick, R., Arbelez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: ECCV (2014)

30.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

31.

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)

32.

Viola, P., Jones, M.: Robust real-time object detection. In: IJCV (2001)

Titel: Bridging the Robot Perception Gap with Mid-Level Vision
verfasst von: Chi Li
Jonathan Bohren
Gregory D. Hager
Verlag: Springer International Publishing
Buch: Robotics Research
Print ISBN: 978-3-319-60915-7

Electronic ISBN: 978-3-319-60916-4

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-60916-4_1

Neuer Inhalt

Bildnachweise

Smart-Manufacturing Dashboard Banner/© AdobeStock_583269095, VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, chassis.tech plus/© TÜV Süd Product Service GmbH, Search Icon, Banner Hanser, Schiffschraube/© Angelika Bentin | stock.adobe.com, Rudergelenkwelle/© Weicon GmbH & Co. KG, Digitalisierung im Marketing/© Fotolia/alphaspirit, Unternehmensnachfolge/© Ulrich/stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.