Skip to main content
Erschienen in:
Buchtitelbild

2018 | OriginalPaper | Buchkapitel

Bridging the Robot Perception Gap with Mid-Level Vision

verfasst von : Chi Li, Jonathan Bohren, Gregory D. Hager

Erschienen in: Robotics Research

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The practical application of machine perception to support physical manipulation in unstructured environments remains a barrier to the development of intelligent robotic systems. Recently, great progress has been made by the large-scale machine perception community, but these methods have made few contributions to the applied robotic perception. This is in part because such large-scale systems are designed to recognize category labels of large numbers of objects from a single image, rather than highly accurate, efficient, and robust pose estimation in environments for which a robot has reliable prior knowledge. In this paper, we illustrate the potential for synergistic integration of modern computer vision methods into robotics by augmenting a RANSAC-based registration method with a state-of-the art semantic segmentation algorithm. We detail a convolutional architecture for semantic labeling of the scene, modified to operate efficiently using integral images. We combine this labeling with two novel scene parsing variants of RANSAC, and show, on a new RGB-D dataset that contains complex configurations of textureless and highly specular objects, that our method demonstrates improved performance of pose estimation over the unaugmented algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
See http://​github.​com/​tum-mvp/​ObjRecRANSAC.​git for the reference implementation of [9].
 
2
For efficiency purpose, raw point clouds are downsampled via octree with the leaf size as 0.005 m.
 
3
We replace the soft encoder used in [12] with the hard encoder to speed up the computation.
 
4
In our implementation, PrimeSense Carmine 1.08 depth sensor is used. We found no difference in performance between using default camera parameters and manual calibration.
 
5
The implementations of normal estimation and CSHOT come from PCL Library.
 
6
F-measure is a joint measurement computed by precision and recall as \(\frac{2\cdot precision \cdot recall}{precision+recall}\).
 
Literatur
1.
Zurück zum Zitat Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA (2011) Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA (2011)
2.
Zurück zum Zitat Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: ICRA (2014) Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: ICRA (2014)
3.
Zurück zum Zitat Macias, N., Wen, J.: Vision guided robotic block stacking. In: IROS (2014) Macias, N., Wen, J.: Vision guided robotic block stacking. In: IROS (2014)
4.
Zurück zum Zitat Niekum, S., Osentoski, S., Konidaris, G., Chitta, S., Marthi, B., Barto, A.G.: Learning grounded finite-state representations from unstructured demonstrations. In: IJRR (2014) Niekum, S., Osentoski, S., Konidaris, G., Chitta, S., Marthi, B., Barto, A.G.: Learning grounded finite-state representations from unstructured demonstrations. In: IJRR (2014)
5.
Zurück zum Zitat Lindsey, Q., Mellinger, D., Kumar, V.: Construction with quadrotor teams. Auton. Robot. 33(3), 323–336 (2012)CrossRef Lindsey, Q., Mellinger, D., Kumar, V.: Construction with quadrotor teams. Auton. Robot. 33(3), 323–336 (2012)CrossRef
6.
Zurück zum Zitat Bohren, J., Papazov, C., Burschka, D., Krieger, K., Parusel, S., Haddadin, S., Shepherdson, W.L., Hager, G.D., Whitcomb, L.L.: A pilot study in vision-based augmented telemanipulation for remote assembly over high-latency networks. In: ICRA (2013) Bohren, J., Papazov, C., Burschka, D., Krieger, K., Parusel, S., Haddadin, S., Shepherdson, W.L., Hager, G.D., Whitcomb, L.L.: A pilot study in vision-based augmented telemanipulation for remote assembly over high-latency networks. In: ICRA (2013)
7.
Zurück zum Zitat Pauwels, K., Ivan, V., Ros, E., Vijayakumar, S.: Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera. In: IROS (2014) Pauwels, K., Ivan, V., Ros, E., Vijayakumar, S.: Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera. In: IROS (2014)
8.
Zurück zum Zitat Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: CVPR (2010) Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: CVPR (2010)
9.
Zurück zum Zitat Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: ACCV (2010) Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: ACCV (2010)
10.
Zurück zum Zitat Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV, 2012 (2013) Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV, 2012 (2013)
11.
Zurück zum Zitat Hager, G.D., Wegbreit, B.: Scene parsing using a prior world model. In: IJRR (2011) Hager, G.D., Wegbreit, B.: Scene parsing using a prior world model. In: IJRR (2011)
12.
Zurück zum Zitat Li, C., Reiter, A., Hager, G.D.: Beyond spatial pooling, fine-grained representation learning in multiple domains. In: CVPR (2015) Li, C., Reiter, A., Hager, G.D.: Beyond spatial pooling, fine-grained representation learning in multiple domains. In: CVPR (2015)
13.
Zurück zum Zitat Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: ECCV (2010) Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: ECCV (2010)
14.
Zurück zum Zitat Aldoma, A., Tombari, F., Prankl, J., Richtsfeld, A., Di Stefano, L., Vincze, M.: Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation. In: ICRA (2013) Aldoma, A., Tombari, F., Prankl, J., Richtsfeld, A., Di Stefano, L., Vincze, M.: Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation. In: ICRA (2013)
15.
Zurück zum Zitat Xie, Z., Singh, A., Uang, J., Narayan, K.S., Abbeel, P.: Multimodal blending for high-accuracy instance recognition. In: IROS (2013) Xie, Z., Singh, A., Uang, J., Narayan, K.S., Abbeel, P.: Multimodal blending for high-accuracy instance recognition. In: IROS (2013)
16.
Zurück zum Zitat Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2012) Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2012)
17.
Zurück zum Zitat Fischer, J., Bormann, R., Arbeiter, G., Verl, A.: A feature descriptor for texture-less object representation using 2D and 3D cues from RGB-D data. In: ICRA (2013) Fischer, J., Bormann, R., Arbeiter, G., Verl, A.: A feature descriptor for texture-less object representation using 2D and 3D cues from RGB-D data. In: ICRA (2013)
18.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004) Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)
19.
Zurück zum Zitat Tombari, F., Salti, S., Di Stefano, L.: A combined texture-shape descriptor for enhanced 3D feature matching. In: ICIP (2011) Tombari, F., Salti, S., Di Stefano, L.: A combined texture-shape descriptor for enhanced 3D feature matching. In: ICIP (2011)
20.
Zurück zum Zitat Woodford, O.J., Pham, M.T., Maki, A., Perbet, F., Stenger, B.: Demisting the hough transform for 3D shape recognition and registration. In: IJCV (2014) Woodford, O.J., Pham, M.T., Maki, A., Perbet, F., Stenger, B.: Demisting the hough transform for 3D shape recognition and registration. In: IJCV (2014)
21.
Zurück zum Zitat Aldoma, A., Tombari, F., Stefano, L.D., Vincze, M.: A global hypotheses verification method for 3D object recognition. In: ECCV (2012) Aldoma, A., Tombari, F., Stefano, L.D., Vincze, M.: A global hypotheses verification method for 3D object recognition. In: ECCV (2012)
22.
Zurück zum Zitat Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: IROS (2010) Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: IROS (2010)
23.
Zurück zum Zitat Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. PAMI I, 2012 (2012) Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. PAMI I, 2012 (2012)
24.
Zurück zum Zitat Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., Vincze, M: Segmentation of unknown objects in indoor environments. In: IROS (2012) Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., Vincze, M: Segmentation of unknown objects in indoor environments. In: IROS (2012)
25.
Zurück zum Zitat Uckermann, A., Haschke, R., Ritter, H.: Realtime 3D segmentation for human-robot interaction. In: IROS (2013) Uckermann, A., Haschke, R., Ritter, H.: Realtime 3D segmentation for human-robot interaction. In: IROS (2013)
26.
Zurück zum Zitat Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: ISER (2013) Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: ISER (2013)
27.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
28.
Zurück zum Zitat Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012) Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012)
29.
Zurück zum Zitat Gupta, S., Girshick, R., Arbelez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: ECCV (2014) Gupta, S., Girshick, R., Arbelez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: ECCV (2014)
30.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
31.
Zurück zum Zitat Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014) Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)
32.
Zurück zum Zitat Viola, P., Jones, M.: Robust real-time object detection. In: IJCV (2001) Viola, P., Jones, M.: Robust real-time object detection. In: IJCV (2001)
Metadaten
Titel
Bridging the Robot Perception Gap with Mid-Level Vision
verfasst von
Chi Li
Jonathan Bohren
Gregory D. Hager
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-60916-4_1

Neuer Inhalt