Skip to main content

2016 | OriginalPaper | Buchkapitel

Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation

verfasst von : Wadim Kehl, Fausto Milletari, Federico Tombari, Slobodan Ilic, Nassir Navab

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a 3D object detection method that uses regressed descriptors of locally-sampled RGB-D patches for 6D vote casting. For regression, we employ a convolutional auto-encoder that has been trained on a large collection of random local patches. During testing, scene patch descriptors are matched against a database of synthetic model view patches and cast 6D object votes which are subsequently filtered to refined hypotheses. We evaluate on three datasets to show that our method generalizes well to previously unseen input data, delivers robust detection results that compete with and surpass the state-of-the-art while being scalable in the number of objects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Due to computational constraints we took only 1 million patches for PCA training.
 
Literatur
1.
Zurück zum Zitat Aldoma, A., Tombari, F., Di Stefano, L., Vincze, M.: A global hypothesis verification framework for 3D object recognition in clutter. TPAMI 38(7), 1383–1396 (2015)CrossRef Aldoma, A., Tombari, F., Di Stefano, L., Vincze, M.: A global hypothesis verification framework for 3D object recognition in clutter. TPAMI 38(7), 1383–1396 (2015)CrossRef
2.
Zurück zum Zitat Aldoma, A., Tombari, F., Prankl, J., Richtsfeld, A., Di Stefano, L., Vincze, M.: Multimodal cue integration through hypotheses verification for RGB-D object recognition and 6DOF pose estimation. In: ICRA (2013) Aldoma, A., Tombari, F., Prankl, J., Richtsfeld, A., Di Stefano, L., Vincze, M.: Multimodal cue integration through hypotheses verification for RGB-D object recognition and 6DOF pose estimation. In: ICRA (2013)
3.
Zurück zum Zitat Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs : exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: CVPR (2014) Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs : exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: CVPR (2014)
4.
Zurück zum Zitat Bay, H., Tuytelaars, T., Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). doi:10.1007/11744023_32 CrossRef Bay, H., Tuytelaars, T., Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). doi:10.​1007/​11744023_​32 CrossRef
5.
Zurück zum Zitat Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10605-2_35 Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10605-2_​35
6.
Zurück zum Zitat Buch, A.G., Yang, Y., Krüger, N., Petersen, H.G. In search of inliers: 3D correspondence by local and global voting. In: CVPR (2014) Buch, A.G., Yang, Y., Krüger, N., Petersen, H.G. In search of inliers: 3D correspondence by local and global voting. In: CVPR (2014)
7.
Zurück zum Zitat Cai, H., Werner, T., Matas, J.: Fast detection of multiple textureless 3-D objects. In: Chen, M., Leibe, B., Neumann, B. (eds.) ICVS 2013. LNCS, vol. 7963, pp. 103–112. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39402-7_11 CrossRef Cai, H., Werner, T., Matas, J.: Fast detection of multiple textureless 3-D objects. In: Chen, M., Leibe, B., Neumann, B. (eds.) ICVS 2013. LNCS, vol. 7963, pp. 103–112. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-39402-7_​11 CrossRef
8.
Zurück zum Zitat Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. TPAMI 33(11), 2188–2202 (2011)CrossRef Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. TPAMI 33(11), 2188–2202 (2011)CrossRef
9.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Berkeley, U.C., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014) Girshick, R., Donahue, J., Darrell, T., Berkeley, U.C., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
10.
Zurück zum Zitat Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_30 CrossRef Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15555-0_​30 CrossRef
11.
Zurück zum Zitat Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: CVPR (2014) Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: CVPR (2014)
12.
Zurück zum Zitat Gupta, S., Arbelaez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: CVPR (2015) Gupta, S., Arbelaez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: CVPR (2015)
13.
Zurück zum Zitat Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., Wu, F., Rui, Y.: Efficient 2D-to-3D correspondence filtering for scalable 3D object recognition. In: CVPR (2013) Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., Wu, F., Rui, Y.: Efficient 2D-to-3D correspondence filtering for scalable 3D object recognition. In: CVPR (2013)
14.
Zurück zum Zitat Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. TPAMI 34(5), 879–888 (2012)CrossRef Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. TPAMI 34(5), 879–888 (2012)CrossRef
15.
Zurück zum Zitat Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_42 Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-37331-2_​42
16.
Zurück zum Zitat Hodan, T., Zabulis, X., Lourakis, M., Obdrzalek, S., Matas, J.: Detection and fine 3D pose estimation of textureless objects in RGB-D images. In: IROS (2015) Hodan, T., Zabulis, X., Lourakis, M., Obdrzalek, S., Matas, J.: Detection and fine 3D pose estimation of textureless objects in RGB-D images. In: IROS (2015)
17.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding, Technical report (2014). http://arxiv.org/abs/1408.5093 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding, Technical report (2014). http://​arxiv.​org/​abs/​1408.​5093
18.
Zurück zum Zitat Kehl, W., Tombari, F., Navab, N., Ilic, S., Lepetit, V.: Hashmod: a hashing method for scalable 3D object detection. In: BMVC (2015) Kehl, W., Tombari, F., Navab, N., Ilic, S., Lepetit, V.: Hashmod: a hashing method for scalable 3D object detection. In: BMVC (2015)
19.
Zurück zum Zitat Lowe, D.G.: Local feature view clustering for 3D object recognition. In: CVPR (2001) Lowe, D.G.: Local feature view clustering for 3D object recognition. In: CVPR (2001)
20.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRef
21.
Zurück zum Zitat Malisiewicz, T., Gupta, A., Efros, A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011) Malisiewicz, T., Gupta, A., Efros, A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)
22.
Zurück zum Zitat Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21735-7_7 CrossRef Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-21735-7_​7 CrossRef
23.
Zurück zum Zitat Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. IJCV 89, 348–361 (2009)CrossRef Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. IJCV 89, 348–361 (2009)CrossRef
24.
Zurück zum Zitat Muja, M., Lowe, D.: Scalable nearest neighbour methods for high dimensional data. TPAMI 36(11), 2227–2240 (2014)CrossRef Muja, M., Lowe, D.: Scalable nearest neighbour methods for high dimensional data. TPAMI 36(11), 2227–2240 (2014)CrossRef
25.
Zurück zum Zitat Pauwels, K., Rubio, L., Diaz, J., Ros, E.: Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues. In: CVPR (2013) Pauwels, K., Rubio, L., Diaz, J., Ros, E.: Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues. In: CVPR (2013)
26.
Zurück zum Zitat Pepik, B., Gehler, P., Stark, M., Schiele, B.: 3D2Pm3D deformable part models. In: ECCV (2012) Pepik, B., Gehler, P., Stark, M., Schiele, B.: 3D2Pm3D deformable part models. In: ECCV (2012)
27.
Zurück zum Zitat Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: ICCV (2013) Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: ICCV (2013)
28.
Zurück zum Zitat Rusu, R.B., Holzbach, A., Blodow, N., Beetz, M.: Fast geometric point labeling using conditional random fields. In: IROS (2009) Rusu, R.B., Holzbach, A., Blodow, N., Beetz, M.: Fast geometric point labeling using conditional random fields. In: IROS (2009)
29.
Zurück zum Zitat Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2011) Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2011)
30.
Zurück zum Zitat Tejani, A., Tang, D., Kouskouridas, R., Kim, T.-K.: Latent-class hough forests for 3D object detection and pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 462–477. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10599-4_30 Tejani, A., Tang, D., Kouskouridas, R., Kim, T.-K.: Latent-class hough forests for 3D object detection and pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 462–477. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10599-4_​30
31.
Zurück zum Zitat Tombari, F., Salti, S., Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_26 CrossRef Tombari, F., Salti, S., Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15558-1_​26 CrossRef
32.
Zurück zum Zitat Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015) Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)
33.
Zurück zum Zitat Xie, Z., Singh, A., Uang, J., Narayan, K.S., Abbeel, P.: Multimodal blending for high-accuracy instance recognition. In: IROS (2013) Xie, Z., Singh, A., Uang, J., Narayan, K.S., Abbeel, P.: Multimodal blending for high-accuracy instance recognition. In: IROS (2013)
Metadaten
Titel
Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation
verfasst von
Wadim Kehl
Fausto Milletari
Federico Tombari
Slobodan Ilic
Nassir Navab
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46487-9_13

Premium Partner